High Performance Primitive Collections for Java

Overview

HPPC: High Performance Primitive Collections

Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.util.collections.* compatibility).

See the following for more information:

See ALTERNATIVES.txt if you're just shopping around.

See LICENSE.txt to make your company's lawyer happy.

See CHANGES.txt for API changes and updates.

(c) Carrot Search s.c., http://carrotsearch.com/

Build Status

Comments
  • deterministic iteration vs. performance degradation

    deterministic iteration vs. performance degradation

    We have intergrated hppc and made good success after setting a constant HashOrderMixing as we really need iteration to be deterministic. Now I think we had a chat where you recommended against such a solution, unfortunately I cannot find it anymore.

    But we found some more explanation here.

    There are still a few things that I do not understand and maybe you can help us :) ?

    Why is this (partial) copying into the same implementation such a big problem that the design of the collection is heavily influenced? It seems you even plan to introduce an iterationSeed in 0.9?

    Is the deterministic iteration really the problem and not something "good"? Wouldn't it be better to solve the problem when adding many elements e.g. a special "addAll" method that takes the collection and the element count or some kind of an "Acceptor" or "Filter"? Then it can internally handle the cluster problem when copying into other collections and you can still maintain deterministic iteration.

    We probably have too seldom uses where we (partially) copy maps into other maps, but when exactly are those clusters created? Because as soon as the map has a different size the hashing should be different and the clustering should be less of a problem (?)

    opened by karussell 14
  • WormMap template and test template that fits WormMap

    WormMap template and test template that fits WormMap

    WormMap implements the same interfaces as HashMap. But not all of the methods function in the same way.

    1. Currently, there is no key distribution visualization, so visualizeKeyDistribution() throws an exception if called.
    2. WormMap doesn't have a constant load factor, this is why the ensureCapacity() method is not 100% accurate.

    Some of the methods that don't exist in hppc HashMap:

    1. remove(KType key, VType value)
    2. removeValue(VType value)
    3. containsValue(VType value)
    4. trimToSize()
    5. getLoad()
    opened by Svistoplyaz 13
  • New SortedIterationKTypeVTypeHashMap based on IndirectSort

    New SortedIterationKTypeVTypeHashMap based on IndirectSort

    A read-only view with sorted-iteration on a delegate KTypeVTypeHashMap.

    • The template generation works. I added Intrinsics.compare.
    • The assertion to verify the delegate map didn't change is minimal.
    • Minimal set of tests. Will put more tests after the first discussion on this sorted-iteration view.
    • I refactored a bit KTypeVTypeXyzMapTest by introducing AbstractKTypeVTypeTest to move duplicated helper methods there.
    opened by bruno-roustant 10
  • Make most map operations thread safe.

    Make most map operations thread safe.

    I don't expect this PR to be merged as such, since I'm pretty sure it's not in the direct goals of this library. However, I would really love some pointers and possibly a review of some sort if possible. Thanks!

    The goal is to make the hash map thread safe, for most use cases. So far, I've been able to work through the puts, gets, removals, updates, and other point based operations.

    One caveat that I can think of is the iteration. My goal was to provide a lock free iteration, which may or may not have the latest values. The idea is to capture the state as of the moment that the iteration starts (keys, values, and mask). Doing so allows me to allow for all other point based map operations to work without waiting for the iteration to complete. Apart from having stale entries, there is one other and potentially more concerning:

    • When the remove operation invokes shiftConflictingKeys, the position of entries would jump, making the iteration less reliable

    Apart from that caveat mentioned above, are there any other worth considering?

    My larger goal is to make the iteration "eventually consistent".

    Thank you!

    opened by judepereira 5
  • Add in-place QuickSort and use it in SortedIterationKTypeVTypeHashMap.

    Add in-place QuickSort and use it in SortedIterationKTypeVTypeHashMap.

    In the case of SortedIterationKTypeVTypeHashMap, we want to sort the key set of a delegate hash map. We know there are no duplicate keys, and they are randomly distributed, and we don't need to have a stable sort. Hence it is a good case to use an in-place Quick sort instead of IndirectSort merge sort (which needs a copy of the source array).

    So with this change, SortedIterationKTypeVTypeHashMap sorts with in-place quick sort. It is faster and does not require a clone of the indices array anymore.

    This new QuickSort implementation, with 3-way partitioning, works on a provided comparator and also a provided swapper.

    • The comparator is different than the one in IndirectSort because it receives the indices of the elements to compare (not the value of the indirect indices-array). So it can be used for both direct and indirect sorting. (It is used for indirect sorting in SortedIterationKTypeVTypeHashMap, and for direct sorting in SortTest)
    • The swapper interface has two benefits. It allows the sort implementation to not depend on the array type anymore. And it also allows the caller to provide a custom swapper, able for example to swap the elements of two arrays at the same time. (Useful when we have two simple arrays for keys and values and we sort the keys while the values follow, then we can do a binary search on the keys, very compact)

    I extracted the sort certification code from IndirectSortTest to move it to a new SortTest, which now tests both IndirectSort and QuickSort.

    opened by bruno-roustant 4
  • SortedHashMap?

    SortedHashMap?

    There are cases where we want a HashMap with iteration in sorted order. In those cases we can use the JDK TreeMap, but it uses more memory and provides slower O(log(N)) #get as well as slower iteration (2x slower than JDK HashMap in my benchmark). Often the TreeMap is built once and is not modified afterward, in this case TreeMap is not really advantageous.

    There could be an opportunity to have a SortedHashMap based on HPPC HashMap (open-addressing). It would require to extend HPPC HashMap and have a int[] orderArray sorted using HPPC IndirectSort (it could even sort on values). If the orderArray is built once and reused (for example discarded only if the map is modified, which does not happen in the use-case I describe), then such a map requires much less memory and provides O(1) #get and fast iteration.

    But such a map has a fixed iteration order! So we get into big perf issue if we iterate it to add to another HPPC HashMap.

    So here is my question: Would it fit into HPPC with a warning that it must not be used to copy to another map (the same warning as ScatterMap had), or is it too dangerous or too exotic to fit?

    opened by bruno-roustant 4
  • Added Identity Short Circuit to equals methods

    Added Identity Short Circuit to equals methods

    Summary

    • This is a micro optimisation for when equals is called with a reference to the same object
    • This is to avoid having consumers of equals performing this check themselves

    Outstanding Concerns

    Not sure how sensitive this code path will be to the extra branch. I would guess that it's fairly branch predictor friendly.

    improvement 
    opened by software-is-art 4
  • Robin Hood Hashing

    Robin Hood Hashing

    I implemented robin hood hashing using the existing open hash map and the algorithm discussed here: http://sebastiansylvan.com/2013/05/08/robin-hood-hashing-should-be-your-default-hash-table-implementation/.

    Removes are still done via the shiftConflicingKeys method. As discussed here http://sebastiansylvan.com/2013/08/05/more-on-robin-hood-hashing-2/ doing tombstones leads to long probe lengths when there are many removes and inserts of the same key. This shows up pretty well in the test suite. The existing shift scheme seems to be correct so I simply reused that.

    Would like to do some performance testing and some comparison of probe length variance between this and the open hash map implementation.

    opened by moonpolysoft 4
  • Test Smell: it is not a good test practice to use the loop in the test

    Test Smell: it is not a good test practice to use the loop in the test

    Hi!

    We notice that you use the loop structure in your test cases. For example, testPutWithExpansions() in KTypeVTypeWormMapTest.java 截屏2022-08-17 下午5 49 10

    However, using the loop in test cases is not a good test practice. We analyzed the relevant Stack Overflow posts and summarized four potential negatives it brings:

    1. Loops make the test case more complex.
    2. In most cases, a loop can be replaced with a data-driven test that is more readable.
    3. Loops break the assert-for-one-thing thumb rule. I don't mean a single assert statement.
    4. When a test fails, knowing the reason is more complicated. Solution: To avoid using the loop in the test, JUnit provides an annotation (i.e., @ParameteredTest), enabling a test case to run multiple times with different parameters. We provide a usage example here:
    截屏2022-08-17 下午5 46 03
    opened by TestSmell 3
  • Support indexRemove in KTypeVTypeHashMap,KTypeHashSet,KTypeWormSet.

    Support indexRemove in KTypeVTypeHashMap,KTypeHashSet,KTypeWormSet.

    New API method KTypeVTypeMap.indexRemove() + implementation in KTypeVTypeHashMap and KTypeVTypeWormMap.

    New public method indexRemove() in KTypeHashSet and KTypeWormSet.

    • tests
    opened by bruno-roustant 3
  • Make WormMap KeysContainer public.

    Make WormMap KeysContainer public.

    To add a test, I modified HppcExample_004_IteratingOverMaps to also use a WormMap. I don't know if it is too specific for a simple high-level example. Maybe there is another test to verify the public visibility?

    opened by bruno-roustant 2
  • Use FunctionalInterface where appropriate

    Use FunctionalInterface where appropriate

    This would make the API nicer, as it allow lambda notation, as well as method handles. Feel free to squash this pull into one commit, but I edited the files in github, and that committed one file at a time.

    E.g. instead of

    list.forEach(new IntProcedure() {
    	@Override
    	public void apply(int value) {
    		blackhole.consume(value);
    	}
    });
    

    one could then write

    list.forEach(blackhole::consume);
    

    or

    list.forEach(x -> blackhole.consume(x));
    
    opened by kno10 1
Releases(0.9.1)
  • 0.9.1(Dec 15, 2021)

    This release brings a few nice improvements from Bruno: SortedIterationKTypeVTypeHashMap view that allows you to traverse maps in sorted key order, an efficient QuickSort implementation for faster sorting and a few minor additions to the API.

    Resolved issues: https://github.com/carrotsearch/hppc/milestone/3?closed=1

    JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.1/api/

    New features and API changes

    • GH-31: Added QuickSort and used it in SortedIterationKTypeVTypeHashMap. (Bruno Roustant) QuickSort can be used with custom element comparator and swapper.

    • GH-28: Added SortedIterationKTypeVTypeHashMap: a sorted-iteration order view over another key-value map. (Bruno Roustant)

    Improvements

    • GH-26: Moved putIfAbsent to interface KTypeVTypeMap. (Dawid Weiss)

    • GH-25: Added addAll(KTypeContainer) on KTypeSet. (Erich Schubert, Dawid Weiss).

    • GH-27: Added identity short circuit to existing equals methods. (Callum Galbreath).

    Source code(tar.gz)
    Source code(zip)
  • 0.9.0(Jun 8, 2021)

    This is the API-breaking release of HPPC in a longer while. A new interesting associative container has been added (see worm hashing issue below) and some changes have been made to get rid of the Scatter*-type containers. You can now safely use either worm or regular version of hash maps and sets.

    Resolved issues: https://github.com/carrotsearch/hppc/milestone/1?closed=1

    JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.0/api/

    New features and API changes

    • Java 11 is now required to compile HPPC. The resulting binary JAR is Java 1.8 compatible (Dawid Weiss).

    • GH-24: Support indexRemove in KTypeVTypeHashMap,KTypeHashSet,KTypeWormSet (Bruno Roustant).

    • GH-20: KeysContainer of WormMap is not public (Haoyu Zhai, Bruno Roustant).

    • GH-13: Add automatic module name to the generated JAR's manifest ("com.carrotsearch.hppc").

    • HPPC-179: Update java template parser to support Java 8.

    • HPPC-186: A different strategy has been implemented for collision avalanche avoidance. This results in removal of Scatter* maps and sets and their unification with their Hash* counterparts. This change should not affect any existing code unless it relied on static, specific ordering of keys. A side effect of this change is that key/value enumerators will return a different ordering of their container's values on each invocation. If your code relies on the order of values in associative arrays, it must order them after they are retrieved. (Bruno Roustant).

    • HPPC-176: A new set of associative containers implementing Worm Hashing has been added. This strategy is appropriate for a medium sized maps and sets (less than 2M entries). It takes more time to put entries in the map because it maintains chains of entries having the same hash. Then the lookup speed is fast even if the map is heavy loaded or hashes are clustered. On average it takes slightly less memory than KTypeVTypeHashMap: even though it allocates more data structures, the reasonable load factor is higher (it varies around 80%) so containers enlarge later. (Bruno Roustant, Aleksandr Danilin).

    Improvements

    • HPPC-191: Improve Accountable implementation (Haoyu Zhai)

    • HPPC-183: Simplify IndirectSort comparator to use IntBinaryOperator.

    • HPPC-177: Modernize the build system to gradle and make it work with IntelliJ.

    • HPPC-184: Use closures where possible to make the resulting JAR smaller.

    Bugs

    • HPPC-187: ObjectIdentityHashSet redistributes keys according to key.hashCode rather than object identity's hash code. (Bruno Roustant).
    Source code(tar.gz)
    Source code(zip)
  • 0.9.0.RC2(Dec 14, 2020)

    Release candidate preview for 0.9.0.

    Resolved issues: https://github.com/carrotsearch/hppc/milestone/2?closed=1

    JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.0.RC2/api/

    New features and API changes

    • GH-13: Add automatic module name to the generated JAR's manifest ("com.carrotsearch.hppc")

    Improvements

    • HPPC-191: Improve Accountable implementation (Haoyu Zhai)
    Source code(tar.gz)
    Source code(zip)
  • 0.9.0.RC1(Sep 8, 2020)

    Release candidate preview for 0.9.0.

    Resolved issues: https://issues.carrot2.org/secure/ReleaseNote.jspa?projectId=10070&version=15325 https://github.com/carrotsearch/hppc/milestone/1?closed=1

    JavaDoc: http://carrotsearch.github.io/hppc/releases/0.9.0.RC1/api/

    New features and API changes

    • HPPC-179: Update java template parser to support Java 8.

    • HPPC-186: A different strategy has been implemented for collision avalanche avoidance. This results in removal of Scatter* maps and sets and their unification with their Hash* counterparts. This change should not affect any existing code unless it relied on static, specific ordering of keys. A side effect of this change is that key/value enumerators will return a different ordering of their container's values on each invocation. If your code relies on the order of values in associative arrays, it must order them after they are retrieved. (Bruno Roustant).

    • HPPC-176: A new set of associative containers implementing Worm Hashing has been added. This strategy is appropriate for a medium sized maps and sets (less than 2M entries). It takes more time to put entries in the map because it maintains chains of entries having the same hash. Then the lookup speed is fast even if the map is heavy loaded or hashes are clustered. On average it takes slightly less memory than KTypeVTypeHashMap: even though it allocates more data structures, the reasonable load factor is higher (it varies around 80%) so containers enlarge later. (Bruno Roustant, Aleksandr Danilin).

    Improvements

    • HPPC-183: Simplify IndirectSort comparator to use IntBinaryOperator.

    • HPPC-177: Modernize the build system to gradle and make it work with IntelliJ.

    • HPPC-184: Use closures where possible to make the resulting JAR smaller.

    Bugs

    • HPPC-187: ObjectIdentityHashSet redistributes keys according to key.hashCode rather than object identity's hash code. (Bruno Roustant).
    Source code(tar.gz)
    Source code(zip)
  • 0.8.2(Jun 1, 2020)

    This release adds utility methods for estimating allocated and used memory.

    • HPPC-175: Method estimating memory usage (Haoyu Zhai)

    Resolved issues: https://issues.carrot2.org/projects/HPPC/versions/13522 JavaDoc: http://carrotsearch.github.io/hppc/releases/0.8.2/api

    Source code(tar.gz)
    Source code(zip)
  • 0.8.1(May 24, 2018)

    A bug fix release removing Intrisics class that somehow got included in the 0.8.0.

    Resolved issues: https://issues.carrot2.org/projects/HPPC/versions/13521 JavaDoc: http://carrotsearch.github.io/hppc/releases/0.8.1/api

    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(May 24, 2018)

    This is a maintenance release that drops the esoteric JARA completely, updates and modernizes project build process and brings a contractual improvement to map.get (return of default values is now guaranteed).

    Resolved issues: https://issues.carrot2.org/projects/HPPC/versions/12628 JavaDoc: http://carrotsearch.github.io/hppc/releases/0.8.0/api

    Source code(tar.gz)
    Source code(zip)
  • 0.7.3(Oct 22, 2017)

  • 0.7.2(Oct 25, 2016)

  • 0.7.1(May 7, 2015)

  • 0.7.0(May 6, 2015)

A Primitive Collection library that reduces memory usage and improves performance

Primitive-Collections This is a Simple Primitive Collections Library i started as a hobby Project. It is based on Java's Collection Library and FastUt

Speiger 26 Dec 25, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022
Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

Roman Leventov 967 Nov 14, 2022
High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Real Logic 2.5k Jan 5, 2023
A high performance caching library for Java

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release. C

Ben Manes 13k Jan 5, 2023
High performance Java implementation of a Cuckoo filter - Apache Licensed

Cuckoo Filter For Java This library offers a similar interface to Guava's Bloom filters. In most cases it can be used interchangeably and has addition

Mark Gunlogson 161 Dec 30, 2022
Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Netflix, Inc. 1.1k Dec 25, 2022
Java port of a concurrent trie hash map implementation from the Scala collections library

About This is a Java port of a concurrent trie hash map implementation from the Scala collections library. It is almost a line-by-line conversion from

null 147 Oct 31, 2022
A Persistent Java Collections Library

PCollections A Persistent Java Collections Library Overview PCollections serves as a persistent and immutable analogue of the Java Collections Framewo

harold cooper 708 Dec 28, 2022
A beginner's guide to Learn Java Collections Framework

Collections-Framework A beginner's guide to Learn Java Collections Framework Topics included: Collection Interface ArrayList Iterator Stack Queue and

Anuj Kumar Sharma 97 Dec 30, 2022
Persistent (immutable) collections for Java and Kotlin

What are Dexx Collections? Dexx Collections are a port of Scala's immutable, persistent collection classes to pure Java. Persistent in the context of

Andrew O'Malley 208 Sep 30, 2022
The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Homework #14 Table of Contents General Info Technologies Used Project Status Contact General Information Homework contains topics: Sorting an ArrayLis

Mykhailo 1 Feb 12, 2022
Simple Binary Encoding (SBE) - High Performance Message Codec

Simple Binary Encoding (SBE) SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applic

Real Logic 2.8k Dec 28, 2022
Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

Alibaba 34 Oct 14, 2022
fasttuple - Collections that are laid out adjacently in both on- and off-heap memory.

FastTuple Introduction There are lots of good things about working on the JVM, like a world class JIT, operating system threads, and a world class gar

BMC TrueSight Pulse (formerly Boundary) 137 Sep 30, 2022
A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.

High Scale Lib This is Boundary's fork of Cliff Click's high scale lib. We will be maintaining this fork with bug fixes, improvements and versioned bu

BMC TrueSight Pulse (formerly Boundary) 402 Jan 2, 2023
Replicate your Key Value Store across your network, with consistency, persistance and performance.

Chronicle Map Version Overview Chronicle Map is a super-fast, in-memory, non-blocking, key-value store, designed for low-latency, and/or multi-process

Chronicle Software : Open Source 2.5k Dec 29, 2022
Bloofi: A java implementation of multidimensional Bloom filters

Bloofi: A java implementation of multidimensional Bloom filters Bloom filters are probabilistic data structures commonly used for approximate membersh

Daniel Lemire 71 Nov 2, 2022
Chronicle Bytes has a similar purpose to Java NIO's ByteBuffer with many extensions

Chronicle-Bytes Chronicle-Bytes Chronicle Bytes contains all the low level memory access wrappers. It is built on Chronicle Core’s direct memory and O

Chronicle Software : Open Source 334 Jan 1, 2023