High Performance Primitive Collections for Java

Last update: Dec 28, 2022

Related tags

Overview

HPPC: High Performance Primitive Collections

Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.util.collections.* compatibility).

See the following for more information:

Wiki: https://github.com/carrotsearch/hppc/wiki
Bugs (github): https://github.com/carrotsearch/hppc/issues
Bugs (old issues only): http://issues.carrot2.org/browse/HPPC/

See ALTERNATIVES.txt if you're just shopping around.

See LICENSE.txt to make your company's lawyer happy.

See CHANGES.txt for API changes and updates.

Comments

deterministic iteration vs. performance degradation

We have intergrated hppc and made good success after setting a constant HashOrderMixing as we really need iteration to be deterministic. Now I think we had a chat where you recommended against such a solution, unfortunately I cannot find it anymore.

But we found some more explanation here.

There are still a few things that I do not understand and maybe you can help us :) ?

Why is this (partial) copying into the same implementation such a big problem that the design of the collection is heavily influenced? It seems you even plan to introduce an iterationSeed in 0.9?

Is the deterministic iteration really the problem and not something "good"? Wouldn't it be better to solve the problem when adding many elements e.g. a special "addAll" method that takes the collection and the element count or some kind of an "Acceptor" or "Filter"? Then it can internally handle the cluster problem when copying into other collections and you can still maintain deterministic iteration.

We probably have too seldom uses where we (partially) copy maps into other maps, but when exactly are those clusters created? Because as soon as the map has a different size the hashing should be different and the clustering should be less of a problem (?)

opened by karussell 14
WormMap template and test template that fits WormMap
WormMap implements the same interfaces as HashMap. But not all of the methods function in the same way.

Currently, there is no key distribution visualization, so visualizeKeyDistribution() throws an exception if called.

WormMap doesn't have a constant load factor, this is why the ensureCapacity() method is not 100% accurate.

Some of the methods that don't exist in hppc HashMap:

remove(KType key, VType value)

removeValue(VType value)

containsValue(VType value)

trimToSize()

getLoad()
opened by Svistoplyaz 13
New SortedIterationKTypeVTypeHashMap based on IndirectSort
A read-only view with sorted-iteration on a delegate KTypeVTypeHashMap.

The template generation works. I added Intrinsics.compare.

The assertion to verify the delegate map didn't change is minimal.

Minimal set of tests. Will put more tests after the first discussion on this sorted-iteration view.

I refactored a bit KTypeVTypeXyzMapTest by introducing AbstractKTypeVTypeTest to move duplicated helper methods there.
opened by bruno-roustant 10
Make most map operations thread safe.
I don't expect this PR to be merged as such, since I'm pretty sure it's not in the direct goals of this library. However, I would really love some pointers and possibly a review of some sort if possible. Thanks!

The goal is to make the hash map thread safe, for most use cases. So far, I've been able to work through the puts, gets, removals, updates, and other point based operations.

One caveat that I can think of is the iteration. My goal was to provide a lock free iteration, which may or may not have the latest values. The idea is to capture the state as of the moment that the iteration starts (keys, values, and mask). Doing so allows me to allow for all other point based map operations to work without waiting for the iteration to complete. Apart from having stale entries, there is one other and potentially more concerning:

When the remove operation invokes shiftConflictingKeys, the position of entries would jump, making the iteration less reliable

Apart from that caveat mentioned above, are there any other worth considering?

My larger goal is to make the iteration "eventually consistent".

Thank you!
opened by judepereira 5
Add in-place QuickSort and use it in SortedIterationKTypeVTypeHashMap.
In the case of SortedIterationKTypeVTypeHashMap, we want to sort the key set of a delegate hash map. We know there are no duplicate keys, and they are randomly distributed, and we don't need to have a stable sort. Hence it is a good case to use an in-place Quick sort instead of IndirectSort merge sort (which needs a copy of the source array).

So with this change, SortedIterationKTypeVTypeHashMap sorts with in-place quick sort. It is faster and does not require a clone of the indices array anymore.

This new QuickSort implementation, with 3-way partitioning, works on a provided comparator and also a provided swapper.

The comparator is different than the one in IndirectSort because it receives the indices of the elements to compare (not the value of the indirect indices-array). So it can be used for both direct and indirect sorting. (It is used for indirect sorting in SortedIterationKTypeVTypeHashMap, and for direct sorting in SortTest)

The swapper interface has two benefits. It allows the sort implementation to not depend on the array type anymore. And it also allows the caller to provide a custom swapper, able for example to swap the elements of two arrays at the same time. (Useful when we have two simple arrays for keys and values and we sort the keys while the values follow, then we can do a binary search on the keys, very compact)

I extracted the sort certification code from IndirectSortTest to move it to a new SortTest, which now tests both IndirectSort and QuickSort.
opened by bruno-roustant 4
SortedHashMap?

There are cases where we want a HashMap with iteration in sorted order. In those cases we can use the JDK TreeMap, but it uses more memory and provides slower O(log(N)) #get as well as slower iteration (2x slower than JDK HashMap in my benchmark). Often the TreeMap is built once and is not modified afterward, in this case TreeMap is not really advantageous.

There could be an opportunity to have a SortedHashMap based on HPPC HashMap (open-addressing). It would require to extend HPPC HashMap and have a int[] orderArray sorted using HPPC IndirectSort (it could even sort on values). If the orderArray is built once and reused (for example discarded only if the map is modified, which does not happen in the use-case I describe), then such a map requires much less memory and provides O(1) #get and fast iteration.

But such a map has a fixed iteration order! So we get into big perf issue if we iterate it to add to another HPPC HashMap.

So here is my question: Would it fit into HPPC with a warning that it must not be used to copy to another map (the same warning as ScatterMap had), or is it too dangerous or too exotic to fit?

opened by bruno-roustant 4
Added Identity Short Circuit to equals methods
Summary

This is a micro optimisation for when equals is called with a reference to the same object

This is to avoid having consumers of equals performing this check themselves

Outstanding Concerns

Not sure how sensitive this code path will be to the extra branch. I would guess that it's fairly branch predictor friendly.
improvement
opened by software-is-art 4
Robin Hood Hashing

I implemented robin hood hashing using the existing open hash map and the algorithm discussed here: http://sebastiansylvan.com/2013/05/08/robin-hood-hashing-should-be-your-default-hash-table-implementation/.

Removes are still done via the shiftConflicingKeys method. As discussed here http://sebastiansylvan.com/2013/08/05/more-on-robin-hood-hashing-2/ doing tombstones leads to long probe lengths when there are many removes and inserts of the same key. This shows up pretty well in the test suite. The existing shift scheme seems to be correct so I simply reused that.

Would like to do some performance testing and some comparison of probe length variance between this and the open hash map implementation.

opened by moonpolysoft 4
Test Smell: it is not a good test practice to use the loop in the test
Hi!

We notice that you use the loop structure in your test cases. For example, testPutWithExpansions() in KTypeVTypeWormMapTest.java

However, using the loop in test cases is not a good test practice. We analyzed the relevant Stack Overflow posts and summarized four potential negatives it brings:

Loops make the test case more complex.

In most cases, a loop can be replaced with a data-driven test that is more readable.

Loops break the assert-for-one-thing thumb rule. I don't mean a single assert statement.

When a test fails, knowing the reason is more complicated. Solution: To avoid using the loop in the test, JUnit provides an annotation (i.e., @ParameteredTest), enabling a test case to run multiple times with different parameters. We provide a usage example here:
opened by TestSmell 3
Support indexRemove in KTypeVTypeHashMap,KTypeHashSet,KTypeWormSet.
New API method KTypeVTypeMap.indexRemove() + implementation in KTypeVTypeHashMap and KTypeVTypeWormMap.

New public method indexRemove() in KTypeHashSet and KTypeWormSet.

tests
opened by bruno-roustant 3
Make WormMap KeysContainer public.

To add a test, I modified HppcExample_004_IteratingOverMaps to also use a WormMap. I don't know if it is too specific for a simple high-level example. Maybe there is another test to verify the public visibility?

opened by bruno-roustant 2
Use FunctionalInterface where appropriate
This would make the API nicer, as it allow lambda notation, as well as method handles. Feel free to squash this pull into one commit, but I edited the files in github, and that committed one file at a time.

E.g. instead of

list.forEach(new IntProcedure() { @Override public void apply(int value) { blackhole.consume(value); } });

one could then write

list.forEach(blackhole::consume);

or

list.forEach(x -> blackhole.consume(x));
opened by kno10 1

Releases(0.9.1)

0.9.1(Dec 15, 2021)
This release brings a few nice improvements from Bruno: SortedIterationKTypeVTypeHashMap view that allows you to traverse maps in sorted key order, an efficient QuickSort implementation for faster sorting and a few minor additions to the API.

Resolved issues: https://github.com/carrotsearch/hppc/milestone/3?closed=1

JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.1/api/

New features and API changes

GH-31: Added QuickSort and used it in SortedIterationKTypeVTypeHashMap. (Bruno Roustant) QuickSort can be used with custom element comparator and swapper.

GH-28: Added SortedIterationKTypeVTypeHashMap: a sorted-iteration order view over another key-value map. (Bruno Roustant)

Improvements

GH-26: Moved putIfAbsent to interface KTypeVTypeMap. (Dawid Weiss)

GH-25: Added addAll(KTypeContainer) on KTypeSet. (Erich Schubert, Dawid Weiss).

GH-27: Added identity short circuit to existing equals methods. (Callum Galbreath).

Source code(tar.gz)
Source code(zip)
0.9.0(Jun 8, 2021)
This is the API-breaking release of HPPC in a longer while. A new interesting associative container has been added (see worm hashing issue below) and some changes have been made to get rid of the Scatter*-type containers. You can now safely use either worm or regular version of hash maps and sets.

Resolved issues: https://github.com/carrotsearch/hppc/milestone/1?closed=1

JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.0/api/

New features and API changes

Java 11 is now required to compile HPPC. The resulting binary JAR is Java 1.8 compatible (Dawid Weiss).

GH-24: Support indexRemove in KTypeVTypeHashMap,KTypeHashSet,KTypeWormSet (Bruno Roustant).

GH-20: KeysContainer of WormMap is not public (Haoyu Zhai, Bruno Roustant).

GH-13: Add automatic module name to the generated JAR's manifest ("com.carrotsearch.hppc").

HPPC-179: Update java template parser to support Java 8.

HPPC-186: A different strategy has been implemented for collision avalanche avoidance. This results in removal of Scatter* maps and sets and their unification with their Hash* counterparts. This change should not affect any existing code unless it relied on static, specific ordering of keys. A side effect of this change is that key/value enumerators will return a different ordering of their container's values on each invocation. If your code relies on the order of values in associative arrays, it must order them after they are retrieved. (Bruno Roustant).

HPPC-176: A new set of associative containers implementing Worm Hashing has been added. This strategy is appropriate for a medium sized maps and sets (less than 2M entries). It takes more time to put entries in the map because it maintains chains of entries having the same hash. Then the lookup speed is fast even if the map is heavy loaded or hashes are clustered. On average it takes slightly less memory than KTypeVTypeHashMap: even though it allocates more data structures, the reasonable load factor is higher (it varies around 80%) so containers enlarge later. (Bruno Roustant, Aleksandr Danilin).

Improvements

HPPC-191: Improve Accountable implementation (Haoyu Zhai)

HPPC-183: Simplify IndirectSort comparator to use IntBinaryOperator.

HPPC-177: Modernize the build system to gradle and make it work with IntelliJ.

HPPC-184: Use closures where possible to make the resulting JAR smaller.

Bugs

HPPC-187: ObjectIdentityHashSet redistributes keys according to key.hashCode rather than object identity's hash code. (Bruno Roustant).

Source code(tar.gz)
Source code(zip)
0.9.0.RC2(Dec 14, 2020)
Release candidate preview for 0.9.0.

Resolved issues: https://github.com/carrotsearch/hppc/milestone/2?closed=1

JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.0.RC2/api/

New features and API changes

GH-13: Add automatic module name to the generated JAR's manifest ("com.carrotsearch.hppc")

Improvements

HPPC-191: Improve Accountable implementation (Haoyu Zhai)

Source code(tar.gz)
Source code(zip)
0.9.0.RC1(Sep 8, 2020)
Release candidate preview for 0.9.0.

Resolved issues: https://issues.carrot2.org/secure/ReleaseNote.jspa?projectId=10070&version=15325 https://github.com/carrotsearch/hppc/milestone/1?closed=1

JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.0.RC1/api/

New features and API changes

HPPC-179: Update java template parser to support Java 8.

HPPC-186: A different strategy has been implemented for collision avalanche avoidance. This results in removal of Scatter* maps and sets and their unification with their Hash* counterparts. This change should not affect any existing code unless it relied on static, specific ordering of keys. A side effect of this change is that key/value enumerators will return a different ordering of their container's values on each invocation. If your code relies on the order of values in associative arrays, it must order them after they are retrieved. (Bruno Roustant).

HPPC-176: A new set of associative containers implementing Worm Hashing has been added. This strategy is appropriate for a medium sized maps and sets (less than 2M entries). It takes more time to put entries in the map because it maintains chains of entries having the same hash. Then the lookup speed is fast even if the map is heavy loaded or hashes are clustered. On average it takes slightly less memory than KTypeVTypeHashMap: even though it allocates more data structures, the reasonable load factor is higher (it varies around 80%) so containers enlarge later. (Bruno Roustant, Aleksandr Danilin).

Improvements

HPPC-183: Simplify IndirectSort comparator to use IntBinaryOperator.

HPPC-177: Modernize the build system to gradle and make it work with IntelliJ.

HPPC-184: Use closures where possible to make the resulting JAR smaller.

Bugs

HPPC-187: ObjectIdentityHashSet redistributes keys according to key.hashCode rather than object identity's hash code. (Bruno Roustant).

Source code(tar.gz)
Source code(zip)
0.8.2(Jun 1, 2020)
This release adds utility methods for estimating allocated and used memory.

HPPC-175: Method estimating memory usage (Haoyu Zhai)

Resolved issues: https://issues.carrot2.org/projects/HPPC/versions/13522 JavaDoc: http://carrotsearch.github.io/hppc/releases/0.8.2/api
Source code(tar.gz)
Source code(zip)
0.8.1(May 24, 2018)

A bug fix release removing Intrisics class that somehow got included in the 0.8.0.

Resolved issues: https://issues.carrot2.org/projects/HPPC/versions/13521 JavaDoc: http://carrotsearch.github.io/hppc/releases/0.8.1/api
Source code(tar.gz)
Source code(zip)
0.8.0(May 24, 2018)

This is a maintenance release that drops the esoteric JARA completely, updates and modernizes project build process and brings a contractual improvement to map.get (return of default values is now guaranteed).

Resolved issues: https://issues.carrot2.org/projects/HPPC/versions/12628 JavaDoc: http://carrotsearch.github.io/hppc/releases/0.8.0/api
Source code(tar.gz)
Source code(zip)
0.7.3(Oct 22, 2017)

This is a bugfix release improving performance and documentation.

Check out the changelog.
Source code(tar.gz)
Source code(zip)
0.7.2(Oct 25, 2016)

This is a bugfix release providing OSGi support (http://issues.carrot2.org/secure/ReleaseNote.jspa?projectId=10070&version=12632).

Check out the changelog, virtually all the code remains identical to 0.7.1.
Source code(tar.gz)
Source code(zip)
hppc-0.7.2.zip(1.38 MB)
0.7.1(May 7, 2015)

This is a bugfix release fixing a couple rough edges.

Check out the changelog and JavaDoc API.
Source code(tar.gz)
Source code(zip)
hppc-0.7.1.zip(1.39 MB)
0.7.0(May 6, 2015)

This release fixes a number of issues and provides (backward incompatible) API cleanups. Check out the changelog and JavaDoc API.

In particular, note the introduction and difference between scatter and hash containers (alternative associative container implementations). There is also an example that explains the rationale for scatter and hash sets in detail.
Source code(tar.gz)
Source code(zip)
hppc-0.7.0.zip(1.35 MB)

Owner

Carrot Search

GitHub http://labs.carrotsearch.com

A Primitive Collection library that reduces memory usage and improves performance

Primitive-Collections This is a Simple Primitive Collections Library i started as a hobby Project. It is based on Java's Collection Library and FastUt

26 Dec 25, 2022

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

2.1k Dec 29, 2022

Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

967 Nov 14, 2022

High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

2.5k Jan 5, 2023

A high performance caching library for Java

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release. C

13k Jan 5, 2023

High performance Java implementation of a Cuckoo filter - Apache Licensed

Cuckoo Filter For Java This library offers a similar interface to Guava's Bloom filters. In most cases it can be used interchangeably and has addition

161 Dec 30, 2022

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

1.1k Dec 25, 2022

Java port of a concurrent trie hash map implementation from the Scala collections library

About This is a Java port of a concurrent trie hash map implementation from the Scala collections library. It is almost a line-by-line conversion from

147 Oct 31, 2022

A Persistent Java Collections Library

PCollections A Persistent Java Collections Library Overview PCollections serves as a persistent and immutable analogue of the Java Collections Framewo

708 Dec 28, 2022

A beginner's guide to Learn Java Collections Framework

Collections-Framework A beginner's guide to Learn Java Collections Framework Topics included: Collection Interface ArrayList Iterator Stack Queue and

97 Dec 30, 2022

Persistent (immutable) collections for Java and Kotlin

What are Dexx Collections? Dexx Collections are a port of Scala's immutable, persistent collection classes to pure Java. Persistent in the context of

208 Sep 30, 2022

The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Homework #14 Table of Contents General Info Technologies Used Project Status Contact General Information Homework contains topics: Sorting an ArrayLis

1 Feb 12, 2022

Simple Binary Encoding (SBE) - High Performance Message Codec

Simple Binary Encoding (SBE) SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applic

2.8k Dec 28, 2022

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

34 Oct 14, 2022

High Performance Primitive Collections for Java

Related tags

Overview

HPPC: High Performance Primitive Collections

Comments

Summary

Outstanding Concerns

Releases(0.9.1)

0.9.1(Dec 15, 2021)

New features and API changes

Improvements

0.9.0(Jun 8, 2021)

New features and API changes

Improvements

Bugs

0.9.0.RC2(Dec 14, 2020)

0.9.0.RC1(Sep 8, 2020)

0.8.2(Jun 1, 2020)

0.8.1(May 24, 2018)

0.8.0(May 24, 2018)

0.7.3(Oct 22, 2017)

0.7.2(Oct 25, 2016)

0.7.1(May 7, 2015)

0.7.0(May 6, 2015)