A java.util.HashMap compatible map that won't stall puts or gets when resizing

Overview
/*
 * Written by Gil Tene, based on Apache Harmony version of java.util.HashMap.
 */

PauselessHashMap: A java.util.HashMap compatible Map implementation that
performs background resizing for inserts, avoiding the common "resize/rehash"
outlier experienced by normal HashMap.

get() and put() operations are "pauseless" in the sense that they do not
block during resizing of the map. Other operations, like remove(), putAll(),
clear(), and the derivation of keysets and such *will* block for pending
resize operations.

Like HashMap, PauselessHashMap provides no synchronization or thread-safe
behaviors on it's own, and MUST be externally synchronized if used by multiple
threads. The background resizing mechanism relies on the calling program
enforcing serialized access to all methods, and behavior is undefined if
concurrent access (for modification or otherwise) is allowed.

And like HashMap, PauselessHashMap is an implementation of Map. All optional
operations (adding and removing) are supported. Keys and values can be any objects.


-----------------------------

Here is some more background and commentary I included in my posting
on the mechanical sympathy group on the subject:
https://groups.google.com/forum/#!topic/mechanical-sympathy/DY8vysxdmj4

Pauseless HashMap

Some background: As those of you who have read my various rants may have
noticed, I spend a lot of my time thinking about the behavior of
latency/response-time/reaction-time. In addition to trying to understand
and teach about the behavior better (with monitoring and measurement tools
like HdrHistogram, LatencyUtils, and jHiccup), I actually work on things
that try to improve bad behavior (for some definitions of "improve" and
"bad"). Eliminating pausing behavior in GC was the lowest hanging fruit,
but more recent work has focused on eliminating pauses due to things like
at-market-open deoptimzations, or lock deflation, or de-basing, or all
sorts of TTSP (time to safepoint) issues. I've also learned a lot about
how to bring down Linux's contribution to latency spikes.

But the JVM and the OS are not the only things that cause latency spikes.
Sometimes your code is just "spiky". In my day job, I keep running into in
actual, real-world low latency system code that is typically super-fast,
but occasionally spikes in actual work latency due to some rare but huge
piece of work that needs to be done, most often due to some state
accumulation. After we eliminate GC pauses (which tend to dominate latency
spikes, and which simply disappear immediately when Zing is applied), we
often see this nice pattern of growing latency spikes at growing intervals,
with a near-perfect doubling in both magnitude and interval between the
spikes. This happens so often that we've studied the common causes, and
(by far) the most common culprits are HashMaps used to accumulate something
during the trading day. And it's all about resizing work.

I've had "build a Pauseless HashMap" on my weekend project list for over a
year now, but finally got around to actually building it (at the request of
someone on this list). There are probably at least 17 ways to skin a HashMap
so it won't stall puts and gets when it resizes, but this is my simple take
on it:

https://github.com/giltene/PauselessHashMap

Keep in mind that this is a "probably-working draft" that's gone through
some bench testing, but is not yet battle hardened (scrutiny is welcome).

I intentionally based this version on Apache Harmony's version of HashMap,
and not on OpenJDK's, in order to make it available without GPLv2 license
restrictions (for those who may want to include it in non-GPL products).
The background resizing concept itself is simple, and can be applied just
as easily to the OpenJDK version (e.g. if some future Java SE version
wants to use it). You can use 
(https://svn.apache.org/repos/asf/harmony/enhanced/java/trunk/classlib/modules/luni/src/main/java/java/util/HashMap.java)
as a baseline comparison for the code I started with.

This is also a classic example of how GC makes this sort of concurrent
programming thing both fast and simple. This is a classic case of an
asymmetric speed need between two concurrent actors that share mutable
state. I worked hard to make the fast path get() and put() cheap, and
managed (I think) to not even use volatiles in the fast path. In doing
this, I shifted all the cost I could think of to the background work,
where latency doesn't matter nearly as much. This sort of trick would
be much harder (or slower) to do if GC wasn't there to safely pick up
the junk behind us, as it would (invariably, I think) add a need for
additional synchronizing work in the fast path.

You might also like...

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Dec 25, 2022

High Performance Primitive Collections for Java

HPPC: High Performance Primitive Collections Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.

Dec 28, 2022

Java library for the HyperLogLog algorithm

java-hll A Java implementation of HyperLogLog whose goal is to be storage-compatible with other similar offerings from Aggregate Knowledge. NOTE: This

Dec 30, 2022

A simple integer compression library in Java

JavaFastPFOR: A simple integer compression library in Java License This code is released under the Apache License Version 2.0 http://www.apache.org/li

Dec 30, 2022

Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

Nov 14, 2022

Port of LevelDB to Java

LevelDB in Java This is a rewrite (port) of LevelDB in Java. This goal is to have a feature complete implementation that is within 10% of the performa

Dec 30, 2022

LMDB for Java

LMDB JNI LMDB JNI provide a Java API to LMDB which is an ultra-fast, ultra-compact key-value embedded data store developed by Symas for the OpenLDAP P

Apr 6, 2022

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

LMDB for Java LMDB offers: Transactions (full ACID semantics) Ordered keys (enabling very fast cursor-based iteration) Memory-mapped files (enabling o

Dec 23, 2022
Comments
  • There should be a way to safely stop the bacground threads

    There should be a way to safely stop the bacground threads

    Hi Gil.

    I tried to play with this Pauseless HashMap and its barking at me (. It looks like there is no way to cleanly stop those 2 threads running in the background and maybe JMH does not like that very well ???

    thread dump http://pastebin.com/CWSqZpfX jmh based benchmark http://pastebin.com/NUN4vDet

    If i replace line 34 with java.util.HashMap everything works fine.

    p.s. I sent this to the list since this is not necessarily the hashMap. Even if i kill the test the forked java process by jmh is still running. However there should be a way to safely stop the bacground threads.

    Thanks.

    opened by ymolists 2
  • [Snyk] Security upgrade junit:junit from 4.10 to 4.13.1

    [Snyk] Security upgrade junit:junit from 4.10 to 4.13.1

    This PR was automatically created by Snyk using the credentials of a real user.


    Snyk has created this PR to fix one or more vulnerable packages in the `maven` dependencies of this project.

    Changes included in this PR

    • Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
      • pom.xml

    Vulnerabilities that will be fixed

    With an upgrade:

    Severity | Priority Score (*) | Issue | Upgrade | Breaking Change | Exploit Maturity :-------------------------:|-------------------------|:-------------------------|:-------------------------|:-------------------------|:------------------------- low severity | 466/1000
    Why? Proof of Concept exploit, Has a fix available, CVSS 2.9 | Information Exposure
    SNYK-JAVA-JUNIT-1017047 | junit:junit:
    4.10 -> 4.13.1
    | No | Proof of Concept

    (*) Note that the real score may have changed since the PR was raised.

    Check the changes in this PR to ensure they won't cause issues with your project.


    Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.

    For more information: 🧐 View latest project report

    🛠 Adjust project settings

    📚 Read more about Snyk's upgrade and patch logic


    Learn how to fix vulnerabilities with free interactive lessons:

    🦉 Learn about vulnerability in an interactive lesson of Snyk Learn.

    opened by giltene 0
  • [Snyk] Upgrade junit:junit from 4.10 to 4.13.1

    [Snyk] Upgrade junit:junit from 4.10 to 4.13.1

    Snyk has created this PR to upgrade junit:junit from 4.10 to 4.13.1.

    :sparkles: Snyk has automatically assigned this pull request, [set who gets assigned](https://app.snyk.io/org/giltene/project/035e8b7c-1c03-4339-8702-2e4b1dec0bf9/settings/integration?utm_source=github&utm_medium=upgrade-pr/settings/integration).
    

    :information_source: Keep your dependencies up-to-date. This makes it easier to fix existing vulnerabilities and to more quickly identify and fix newly disclosed vulnerabilities when they affect your project.


    • The recommended version is 13 versions ahead of your current version.
    • The recommended version was released 3 months ago, on 2020-10-11.

    The recommended version fixes:

    Severity | Issue | PriorityScore (*) | Exploit Maturity | :-------------------------:|:-------------------------|-------------------------|:------------------------- | Information Exposure
    SNYK-JAVA-JUNIT-1017047 | 370/1000
    Why? Has a fix available, CVSS 2.9 | No Known Exploit

    (*) Note that the real score may have changed since the PR was raised.


    Note: You are seeing this because you or someone else with access to this repository has authorized Snyk to open upgrade PRs.

    For more information:

    🧐 View latest project report

    👩‍💻 Set who automatically gets assigned

    🛠 Adjust upgrade PR settings

    🔕 Ignore this dependency or unsubscribe from future upgrade PRs

    opened by snyk-bot 0
Owner
Gil Tene
I spend time in front of computer screens, and some behind them
Gil Tene
High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Real Logic 2.5k Jan 5, 2023
Bloofi: A java implementation of multidimensional Bloom filters

Bloofi: A java implementation of multidimensional Bloom filters Bloom filters are probabilistic data structures commonly used for approximate membersh

Daniel Lemire 71 Nov 2, 2022
A high performance caching library for Java

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release. C

Ben Manes 13k Jan 5, 2023
Chronicle Bytes has a similar purpose to Java NIO's ByteBuffer with many extensions

Chronicle-Bytes Chronicle-Bytes Chronicle Bytes contains all the low level memory access wrappers. It is built on Chronicle Core’s direct memory and O

Chronicle Software : Open Source 334 Jan 1, 2023
High performance Java implementation of a Cuckoo filter - Apache Licensed

Cuckoo Filter For Java This library offers a similar interface to Guava's Bloom filters. In most cases it can be used interchangeably and has addition

Mark Gunlogson 161 Dec 30, 2022
An advanced, but easy to use, platform for writing functional applications in Java 8.

Getting Cyclops X (10) The latest version is cyclops:10.4.0 Stackoverflow tag cyclops-react Documentation (work in progress for Cyclops X) Integration

AOL 1.3k Dec 29, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022
External-Memory Sorting in Java

Externalsortinginjava External-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm. The vers

Daniel Lemire 235 Dec 29, 2022
A Java library for quickly and efficiently parsing and writing UUIDs

fast-uuid fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compar

Jon Chambers 142 Jan 1, 2023
Geohash utitlies in java

geo Java utility methods for geohashing. Status: production, available on Maven Central Maven site reports are here including javadoc. Add this to you

Dave Moten 386 Jan 1, 2023