Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

Last update: Jul 28, 2022

Overview

This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used as an in-memory, bounded queue with opaque values (sequence of JDK primitive values): insertions at tail, removal from head, single entry peeks), and that has minimal garbage collection overhead. Insertions and removals are as individual entries, which are sub-sequences of the full buffer.

GC overhead minimization is achieved by use of direct ByteBuffers (memory allocated outside of GC-prone heap); and bounded nature by only supporting storage of simple primitive value (byte, `long') sequences where size is explicitly known.

Conceptually memory buffers are just simple circular buffers (ring buffers) that hold a sequence of primitive values, bit like arrays, but in a way that allows dynamic automatic resizings of the underlying storage. Library supports efficient reusing and sharing of underlying segments for sets of buffers, although for many use cases a single buffer suffices.

Buffers vary in two dimensions:

Type of primitive value contained: currently byte and long variants are implemente, but others (like int or char) will be easy to add as needed
Whether sequences are "chunky" -- sequences consists of 'chunks' created by distinct appendEntry() calls (and retrieved in exactly same sized chunks with getNextEntry()) -- or "streamy", meaning that values are coalesced and form a logical stream (so multiple appendEntry() calls may be coalesced into just one entry returned by getNextEntry()).

Since Java has no support for "generic primitives", there are separate classes for all combinations. This means that there are currently 4 flavors of buffers:

for byte (using MemBuffersForBytes)
ChunkyBytesMemBuffer
StreamyBytesMemBuffer
for long (using MemBuffersForLongs)
ChunkyLongsMemBuffer
StreamyLongsMemBuffer

Another thing that can vary is the way underlying segments are allocated; default is to use native ("direct") ByteBuffers. But more on this later on.

Licensing

Standard Apache 2.0 license.

Fancier stuff: multiple buffers

Although having individual buffers is useful as is, this is just the beginning. Conceptually library supports "buffer groups", sets of similary-valued buffer instances owned by a single factory (like MemBuffersForBytes) that share same segment allocator (com.fasterxml.util.membuf.SegmentAllocator). This makes it possible to share set of reusable underlying ByteBuffer instances for buffers in the same group.

This ability to share underlying segments between buffers, with strict memory bounds makes it possible to use library as basic buffer manager; for example to buffer input and/or output of a web server (byte-based "streamy" buffers), or as simplistic event queues (usually using "chunky" buffers).

To have multiple buffer groups simply construct multiple factory instances.

Thread-safety

All pieces are designed to be used by multiple threads (often just 2, producer/consumer), so all access is properly synchronized.

In addition, locking is done using buffer instances, so it may occasionally make sense to synchronize on buffer instance since this allows you to create atomic sequences of operations, like so:

MemBuffersForBytes factory = new MemBuffersForBytes(...);
ChunkyBytesMemBuffer buffer = factory.createChunkyBuffer(...);
synchronized (buffer) {
  // read latest, add right back:
  byte[] msg = buffer.getNextEntry();
  buffer.appendEntry(msg);
}

or similarly if you need to read a sequence of entries as atomic unit.

Status

Project has been used by multiple production systems (by multiple companies) since 2012, and by now has proven stable and performant for expected use cases. As such it is considered production ready: the official 1.0 version was released in October 2013.

The first accessible project that uses it is Arecibo, a metrics collection, aggregation and visualization.

Companies that use this library for production systems include:

Mode Media (nee Glam Media)
Salesforce

Usage

Getting it

To use with Maven, add:

<dependency>
  <groupId>com.fasterxml.util</groupId>
  <artifactId>low-gc-membuffers</artifactId>
  <version>1.1.1</version>
</dependency>

For downloadables, javadocs check out Wiki.

Start with a factory

Exact factory to use depends on value type: here we assume you are looking for byte-based buffers. If so, you will use MemBuffersForBytes. This object can be viewed as container and factory of actual buffers (ChunkyBytesMemBuffer or StreamyBytesMemBuffer). To construct one, you need to specify amount of memory to use, as well as how memory should be sliced: so, for example:

MemBuffersForBytes factory = new MemBuffersForBytes(30 * 1024, 2, 11);

would create instance that allocates at least 2 (and at most 11) segments (which wrap direct ByteBuffer instances) with size of 30 kB: that is, has memory usage between 60 and 330 kilobytes. The segments are then used by actual buffer instances (more on this in a bit)

So how do you choose parameters? Smaller the segments, more granular is memory allocation, which can mean more efficient memory use (since overhead is bounded to at most 1 segment-full per active buffer). But it also increases number of segment instances, possibly increasing fragmentation and adding overhead.

Note that you can create multiple instances of MemBuffers[Type] factories, if you want to have more control over how pool of segments is allocated amongst individual buffers.

Detour: allocating underlying storage segments

By default segments are allocated as ByteBuffers (or typed sub-types for longs and so on). But this behavior can be changed by passing alternate SegmentAllocator instances.

For example, if you instead wanted to use in-heap segments stored as basic byte arrays (byte[]), you could do this by:

MemBuffersForBytes factory = new MemBuffersForBytes(
  ArrayBytesSegment.allocator(30 * 1024, 2, 11));

or to use non-direct ByteBuffers:

MemBuffersForBytes factory = new MemBuffersForBytes(
  ByteBufferBytesSegment.allocator(30 * 1024, 2, 11, false));

Note that SegmentAllocator instances are implemented as inner classes of matching segment type, that is as ArrayBytesSegment.Allocator and ByteBufferBytesSegment.Allocator.

Also note that neither Allocators nor MemBuffers keep track of underlying segments. What this means it that buffers MUST be closed (explicitly, or indirectly by using wrappers) to make sure segments are released for reuse.

Create individual buffers, `MemBuffer`

Actual buffers are then allocated using

ChunkyBytesMemBuffer items = bufs.createChunkyBuffer(2, 5);

which would indicate that this buffer will hold on to at least 2 segments (i.e. about 60kB raw storage) and use at most 5 (so max usage of 150kB). Due to circular buffer style of allocation, at least 'segments - 1' amount of memory will be available for actual queue (i.e. guaranteed space of 120kB; that is, up to one segment may be temporarily unavailable depending on pattern of append/remove operations.

And start buffering/unbuffering

To append entries, you use:

byte[] dataEntry = ...; // serialize from, say, JSON
items.appendEntry(dataEntry);

or, if you don't want an exception if there is no more room:

if (!items.tryAppendEntry(dataEntry)) {
   // recover? Drop entry? Log?
}

and to pop entries:

byte[] next = items.getNextEntry(); // blocks if nothing available
// or:
next = items.getNextEntryIfAvailable();
if (next == null) { // nothing yet available
    //...
}
// or:
next = items.getNextEntry(1000L); // block for at most 1 second before giving up

And make sure that...

You '''always close buffers''' when you are done with them -- otherwise underlying segments may be leaked. This because buffers are only objects that keep track of segments; and nothing keeps track of MemBuffer instances created -- this is intentional, as synchronization otherwise needed is very expensive from concurrency perspective.

Note that version 0.9.1 allows use of MemBufferDecorator instances, which makes it possible to build wrappers that can implement simple auto-closing of buffers.

Statistics, anyone?

Finally, you can also obtain various statistics of buffer instances:

int entries = items.getEntryCount(); // how many available for getting?
int segmentsInUse = items.getSegmentCount(); // nr of internal segments
long maxFree = items.getMaximumAvailableSpace(); // approximate free space
long payload = items.getTotalPayloadLength(); // how much used by data?

Download

Check out Wiki for downloads, Javadocs etc.

Known/potential problems

Default (and currently only) buffer implementation uses direct ByteBuffers, and amount of memory that can be allocated is limited by JVM option -XX:MaxDirectMemorySize, which by default has relatively low size of 64megs. To increase this setting, add setting like:

-XX:MaxDirectMemorySize=512m

otherwise you are likely to hit an OutOfMemoryError when using larger buffers.

Future ideas

Here are some improvement ideas:

"Slab" allocation (issue #14): allow initial allocation of a longer off-heap memory segment, with size of N segments: this "slab" will be fixed and NOT dynamically allocated or freed; segments will be sub-allocated as needed.
- Main benefit is reduced need for actually memory management (no per-operation malloc or free)
- Adds fixed overhead: slab size needs to be balanced with costs
- Segments from slabs allocated before dynamic segments, as they do not incur additional allocation or memory usage cost (due to fixed default overhead)
Expose streamy byte buffers as InputStream (issue #19).
- Would need to choose what happens with end-of-input: snapshot (expose current and as EOF) vs blocking (works like pipe)

Comments

Return no longer used MemBuffer to MemBuffers

MemBuffer is created from MemBuffers like this:

MemBuffer items = bufs.createBuffer(2, 5);

Is there a way to return MemBuffer to MemBuffers when I do not need to use it any more? I want to implement a memory cache, so after the cache is expired I want to return MemBuffer to MemBuffers so MemBuffers can reuse the memory.

opened by ngocdaothanh 6
Add optional "auto-close" wrapper(s) for buffers
Currently buffer instances must be explicitly closed; otherwise underlying chunks are lost, causing de facto leakage.

Given that different use cases have different limitations, it would make sense to allow additional, optional wrapping, such that one could rely on auto-closing. There are multiple mechanisms to consider; for example:

Use of finalize(), to make it occur as part of GC

Use soft references, queue (and possibly background thread?)

It would probably be simplest to do (1) as optional wrapper first. (2) would most likely need either background thread for purging, or calls from outside to trigger cleanup.

Although, since this should mostly become an issue only when chunks are needed, perhaps (2) could be wired to be triggered from chunk allocation.
opened by cowtowncoder 4
Support more than just bytes

I've got a similar internal library that does the ring-buffer thing. For my use case though I am using long[] and long direct byte buffers. Have you considered making multiple types of buffers that expose other numeric types?

opened by spullara 3
peekNextEntry

I am considering using low-gc-membuffers to implement a distributed memory cache based on Akka. I think cache entries will be stored in MemBuffer.

But the getNextEntry method of MemBuffer removes the available entry from buffer. Is there a way to get an entry without removing it? May be peekNextEntry?

opened by ngocdaothanh 3
The doc (README?) should say something about -XX:MaxDirectMemorySize

The default max memory for directy ByteBuffer is 64 MB: http://stackoverflow.com/questions/3773775/default-for-xxmaxdirectmemorysize

I think the doc (README?) should say something about -XX:MaxDirectMemorySize, like this: http://www.hazelcast.com/docs/2.0/manual/single_html/index.html#ElasticMemory

opened by ngocdaothanh 2
Add convenience class for UTF-8 encoding/decoding, under util

Similar to #21, another commonly needed conversion is between java Strings, and UTF-8 encoded byte streams of the same. Since I have code, and since it is commonly used with this package, let's include it.

opened by cowtowncoder 1
Add alternative Segment impl that uses simple byte arrays

Although ByteBuffers are optimal from minimal garbage collection perspective, there are cases where using small number of large byte arrays works pretty well. And since ByteBuffers can have their downsides (wrt clean up, limitations, platform-specific issues), it seems reasonable to allow alternate implementation. Such implementation is likely to work best when entries are short and total buffer length moderate.

opened by cowtowncoder 1
README: com.fasterxml.membuf should be com.fasterxml.util.membuf

Currently, all classes are inside the package com.fasterxml.util.membuf, but the README says they are inside the package com.fasterxml.membuf.

Moreover, the project name and the package name do not match.

opened by ngocdaothanh 1
Add getNextEntry() method that takes in buffer from caller

To reduce allocation of temporary byte arrays it would be nice to allow caller to pass in buffer to use: combined with 'getNextEntryLength()' (in sync block) this could allow efficient access with caller-provided recycled buffers. If next segment fits, method should either return length of segment returned in buffer, or pointer right after contents of segment; but if segment did not fit, a marker value (-1) should be returned to indicate that no segment was read.

opened by cowtowncoder 1
Add 'close()' to buffer instances (make Closeable as well)

It would make sense to be able to discard buffer instances, to allow for shorter-term buffers (not just ones that last for the lifetime of app or service). While there may be more descriptive names (discard?), maybe best way would be to add close(), since then we could also make buffers Closeable to allow for general handling.

opened by cowtowncoder 1
Make default MemBuffer.append() block, add tryAppend()

Initial append implementation just assumed that call either succeeds or fails (throwing exception or returns false). But for most use cases a blocking variant would make most sense.

While not trivial test, this should obviously be tested to ensure that Object.notifyAll() is called in cases where an append() may be blocked (will probably need to add an indicator bit, since it's impossible to know about this otherwise, given arbitrary lengths of entries to add).

opened by cowtowncoder 1
Bump junit from 4.12 to 4.13.1
Bumps junit from 4.12 to 4.13.1.

Release notes

Sourced from junit's releases.

JUnit 4.13.1

Please refer to the release notes for details.

JUnit 4.13

Please refer to the release notes for details.

JUnit 4.13 RC 2

Please refer to the release notes for details.

JUnit 4.13 RC 1

Please refer to the release notes for details.

JUnit 4.13 Beta 3

Please refer to the release notes for details.

JUnit 4.13 Beta 2

Please refer to the release notes for details.

JUnit 4.13 Beta 1

Please refer to the release notes for details.

Commits

1b683f4 [maven-release-plugin] prepare release r4.13.1

ce6ce3a Draft 4.13.1 release notes

c29dd82 Change version to 4.13.1-SNAPSHOT

1d17486 Add a link to assertThrows in exception testing

543905d Use separate line for annotation in Javadoc

510e906 Add sub headlines to class Javadoc

610155b Merge pull request from GHSA-269g-pwp5-87pp

b6cfd1e Explicitly wrap float parameter for consistency (#1671)

a5d205c Fix GitHub link in FAQ (#1672)

3a5c6b4 Deprecated since jdk9 replacing constructor instance of Double and Float (#1660)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Add `appendEntry()` variant that blocks until it succeeds

Currently only read methods can block to wait for more data. But although implementing similar blocking for writing (appending) is more difficult, it should be doable. And when implementing queuing with multiple producers, ability to block (for limited amount of time) is a useful feature.

opened by cowtowncoder 0
Add convenience `InputStream` implementations, to read from streamy byte buffer
To improve interoperability, it'd be good to add InputStream implementations. Since it is byte-oriented, it is only applicable to byte-based buffers; and probably only streamy ones.

One variation is that of what to do when all current content has been read: so most likely two impls should be provided:

"Snapshot" variant which considers end of content to be end-of-input

Blocking variant that blocks when it runs out of content, waking up if new content is added -- this works bit like a pipe.
opened by cowtowncoder 0
Implement hybrid slab/chunk allocation scheme, to reduce # of native chunks

(inspired by Hal H, Alex F, on discussion for use at SFDC)

Current chunk allocation scheme has the benefit of allowing much of the memory to be returned to OS when not needed, depending on constraints defined for individual queues (i.e. if they can grow/shrink). But it has potentially non-trivial overhead as chunks need bit of JVM overhead for tracking.

On the other hand, allocating bigger "slabs" (piece of memory that is multiple of chunk) would make it possible to reduce such overhead. But it will be next to impossible to free such chunks before closing down individual "MemBuffers" instances, because likelihood of at least one chunk of a slab being in use is high. So in practice such slabs would be static; allocated when needed (or eagerly), and not freed for lifetime of individual Buffers instance (factory of group of individual queues).

One simple and seemingly efficient way to make use of this would be to allow specifying amount of "static" memory (as number of chunks) that may be allocated, and use this for allocation slabs. Chunks from slabs would then be the primary source of allocation; and dynamic (individual) chunks would only be allocated if slabs are full. Conversely, slab-based chunks would always be reusable; whereas reuse rate for dynamic chunks could be more closely limited.

opened by cowtowncoder 0
Add "multi-get" accessor

Sometimes it makes sense to get more than one entry, up to specified maximum, with one call. Main benefit is that synchronization is only done once, and the obvious downside is that multiple byte result byte arrays need to be allocated.

Although variants could be added to allow timed waits and such it seems best to start with a simple call that does NOT block (if none available, return empty array), and which takes single count argument that specifies maximum number of entries to return.

opened by cowtowncoder 0
Improve deallocation of direct ByteBuffers using sun.misc.Unsafe?

Apparently deallocation of direct-allocated ByteBuffers is bit tricky, since underlying off-heap buffers are NOT necessarily freed in timely manner, partly since there is no mechanism to force deallocation. As such they might only be collected when GC kicks in.

However there is apparently a way to force freeing using sun.misc.Unsafe. If so, we should probably try to use that mechanism to reduce likelihood of OOMEs caused by delayed deallocation.

opened by cowtowncoder 0

Owner

Tatu Saloranta

GitHub

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

1.1k Dec 25, 2022

A Primitive Collection library that reduces memory usage and improves performance

Primitive-Collections This is a Simple Primitive Collections Library i started as a hobby Project. It is based on Java's Collection Library and FastUt

26 Dec 25, 2022

External-Memory Sorting in Java

Externalsortinginjava External-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm. The vers

235 Dec 29, 2022

fasttuple - Collections that are laid out adjacently in both on- and off-heap memory.

FastTuple Introduction There are lots of good things about working on the JVM, like a world class JIT, operating system threads, and a world class gar

137 Sep 30, 2022

Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

967 Nov 14, 2022

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

LMDB for Java LMDB offers: Transactions (full ACID semantics) Ordered keys (enabling very fast cursor-based iteration) Memory-mapped files (enabling o

680 Dec 23, 2022

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

rtree In-memory immutable 2D R-tree implementation in java using RxJava Observables for reactive processing of search results. Status: released to Mav

999 Dec 20, 2022

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM. It is intended to train the use of debugging tools

1 Jan 6, 2022

Clojure's data structures modified for use outside of Clojure

This library has been extracted from the master branch of Clojure (http://clojure.org) version 1.5.1 (as of October 2013) http://github.com/richhick

221 Oct 6, 2022

An advanced, but easy to use, platform for writing functional applications in Java 8.

Getting Cyclops X (10) The latest version is cyclops:10.4.0 Stackoverflow tag cyclops-react Documentation (work in progress for Cyclops X) Integration

1.3k Dec 29, 2022

Immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch processes and shipped to multiple servers.

Minimal Perfect Hash Tables About Minimal Perfect Hash Tables are an immutable key/value store with efficient space utilization and fast reads. They a

92 Nov 22, 2022

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

34 Oct 14, 2022

LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan), audio (OpenAL), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR) applications.

LWJGL - Lightweight Java Game Library 3 LWJGL (https://www.lwjgl.org) is a Java library that enables cross-platform access to popular native APIs usef

4k Dec 29, 2022

Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

Related tags

Overview

Overview

Licensing

Fancier stuff: multiple buffers

Thread-safety

Status

Usage

Getting it

Start with a factory

Detour: allocating underlying storage segments

Create individual buffers, MemBuffer

And start buffering/unbuffering

And make sure that...

Statistics, anyone?

Download

Known/potential problems

Future ideas

Comments

JUnit 4.13.1

JUnit 4.13

JUnit 4.13 RC 2

JUnit 4.13 RC 1

JUnit 4.13 Beta 3

JUnit 4.13 Beta 2

JUnit 4.13 Beta 1

Owner

Tatu Saloranta

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

A Primitive Collection library that reduces memory usage and improves performance

External-Memory Sorting in Java

fasttuple - Collections that are laid out adjacently in both on- and off-heap memory.

Java Collections till the last breadcrumb of memory and performance

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM

Clojure's data structures modified for use outside of Clojure

An advanced, but easy to use, platform for writing functional applications in Java 8.

Immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch processes and shipped to multiple servers.

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

A high performance caching library for Java

A Java library for quickly and efficiently parsing and writing UUIDs

A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.

Java port of a concurrent trie hash map implementation from the Scala collections library

Java library for the HyperLogLog algorithm

A simple integer compression library in Java

LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan), audio (OpenAL), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR) applications.

Create individual buffers, `MemBuffer`