Rolling hash functions in Java

Overview

Rolling hash functions in Java

License: Apache 2.0

What is this?

This is a set of Java classes implementing various recursive n-gram hashing techniques, also called rolling hashing (http://en.wikipedia.org/wiki/Rolling_hash), including:

  • Randomized Karp-Rabin (sometimes called Rabin-Karp)
  • Hashing by Cyclic Polynomials (also known as Buzhash)

Code sample

String s = "some string";
int n = 3; //hash all sequences of 3 characters
CyclicHash ch = new CyclicHash(n);
int k = 0;
for(; k<n;++k) {
  ch.eat(s.charAt(k));
}
int rollinghash = ch.eat(s.charAt(k)); // the first or last 32-(n-1) bits are strongly universal
// do something with the hash value
for(;k<s.length();++k) {
  rollinghash = ch.update(s.charAt(k-n), s.charAt(k));
  // do something with the hash value
}

Reference

Daniel Lemire, Owen Kaser: Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language, Volume 24, Issue 4, October 2010, Pages 698-710 http://arxiv.org/abs/0705.4676

You might also like...

External-Memory Sorting in Java

Externalsortinginjava External-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm. The vers

Dec 29, 2022

A Java library for quickly and efficiently parsing and writing UUIDs

fast-uuid fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compar

Jan 1, 2023

Geohash utitlies in java

geo Java utility methods for geohashing. Status: production, available on Maven Central Maven site reports are here including javadoc. Add this to you

Jan 1, 2023

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Dec 25, 2022

High Performance Primitive Collections for Java

HPPC: High Performance Primitive Collections Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.

Dec 28, 2022

Java library for the HyperLogLog algorithm

java-hll A Java implementation of HyperLogLog whose goal is to be storage-compatible with other similar offerings from Aggregate Knowledge. NOTE: This

Dec 30, 2022

A simple integer compression library in Java

JavaFastPFOR: A simple integer compression library in Java License This code is released under the Apache License Version 2.0 http://www.apache.org/li

Dec 30, 2022

Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

Nov 14, 2022
Owner
Daniel Lemire
Daniel Lemire is a computer science professor. His research is focused on software performance and indexing.
Daniel Lemire
An embedded database implemented in pure java based on bitcask which is a log-structured hash table for K/V Data.

Baka Db An embedded database implemented in pure java based on bitcask which is a log-structured hash table for K/V Data. Usage import cn.ryoii.baka.B

ryoii 3 Dec 20, 2021
Google Hash Code '22 Question

Answer for - Mentorship and Teamwork Google Hash Code '22 Question Credit goes to Google LLC - Hash Code '22 Work is so much more fun when we are part

Dilshan Karunarathne 4 Apr 12, 2022
High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Real Logic 2.5k Jan 5, 2023
Bloofi: A java implementation of multidimensional Bloom filters

Bloofi: A java implementation of multidimensional Bloom filters Bloom filters are probabilistic data structures commonly used for approximate membersh

Daniel Lemire 71 Nov 2, 2022
A high performance caching library for Java

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release. C

Ben Manes 13k Jan 5, 2023
Chronicle Bytes has a similar purpose to Java NIO's ByteBuffer with many extensions

Chronicle-Bytes Chronicle-Bytes Chronicle Bytes contains all the low level memory access wrappers. It is built on Chronicle Core’s direct memory and O

Chronicle Software : Open Source 334 Jan 1, 2023
High performance Java implementation of a Cuckoo filter - Apache Licensed

Cuckoo Filter For Java This library offers a similar interface to Guava's Bloom filters. In most cases it can be used interchangeably and has addition

Mark Gunlogson 161 Dec 30, 2022
An advanced, but easy to use, platform for writing functional applications in Java 8.

Getting Cyclops X (10) The latest version is cyclops:10.4.0 Stackoverflow tag cyclops-react Documentation (work in progress for Cyclops X) Integration

AOL 1.3k Dec 29, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022