fasttuple - Collections that are laid out adjacently in both on- and off-heap memory.

Overview

FastTuple

Introduction

There are lots of good things about working on the JVM, like a world class JIT, operating system threads, and a world class garbage collector. However, one limiting factor can often be the interaction between primitive types and referenced types in Java. Primitive types are the built in types that represent integral numbers, floating point numbers, and boolean yes/no values. Primitives are also quite fast and memory efficient: they get allocated either on the stack if they're being used in a method, or inlined in an object when they're declared as field members. And they wind up being fast because the JIT can often optimize their access down to a single CPU instruction. This works really well when you know what types a class will hold as its state beforehand. If, on the other hand, you don't know what an object or array will hold at compile time the JVM forces you to box primitives. Boxing means that the primitives get wrapped in a heap allocated object, and their container will hold a reference to them. That type of overhead winds up being inefficient both in access time and memory space. Access time suffers because this type of layout breaks locality of reference, which is the principle that memory frequently accessed together should be stored adjacently. All modern memory hierarchies optimize for locality of reference in their caching implementations. The extra allocations and garbage generated also put pressure on the JVM's garbage collector, which can often be a cause of long pause times.

We wrote FastTuple to try and help solve this problem. FastTuple generates heterogeneous collections of primitive values and ensures as best it can that they will be laid out adjacently in memory. The individual values in the tuple can either be accessed from a statically bound interface, via an indexed accessor, or via reflective or other dynamic invocation techniques. FastTuple is designed to deal with a large number of tuples therefore it will also attempt to pool tuples such that they do not add significantly to the GC load of a system. FastTuple is also capable of allocating the tuple value storage entirely off-heap, using Java's direct memory capabilities.

FastTuple pulls off its trick via runtime bytecode generation. The user supplies it with a schema of field names and types. That schema is then built into a Java class definition which will contain accessor methods and either field definitions or the memory address for an off heap allocation, depending on which storage method was requested. The resulting Java class gets compiled into bytecode and then loaded as a reflective Class object. This Class object can then be used to create instances of the new class.

Building

This package depends on Janino 2.7.4, which is not currently getting pushed to maven central. As soon as that happens (we have a ticket in with the maintainer) we will begin pushing FastTuple to maven central as well. In the meantime you can build FastTuple like so:

wget http://janino.net/download/janino-2.7.4.zip
unzip janino-2.7.4.zip
mvn install:install-file -Dfile=janino-2.7.4/janino.jar -Dsources=janino-2.7.4/janino-src.zip -DgroupId=org.codehaus.janino -DartifactId=janino -Dversion=2.7.4 -Dpackaging=jar
mvn install:install-file -Dfile=janino-2.7.4/commons-compiler.jar -Dsources=janino-2.7.4/commons-compiler-src.zip -DgroupId=org.codehaus.janino -DartifactId=commons-compiler -Dversion=2.7.4 -Dpackaging=jar
mvn install

Note on GPG

You'll also need to ensure that you have GPG available on the command line. Mac OS X and Windows users may need to install via (https://gpgtools.org/) or cygwin respectively.

Usage

Interaction with FastTuple primarily takes place via the TupleSchema class. Each instance of TupleSchema describes a separate type of FastTuple both from the perspective of the FastTuple library and the JVM. At this time, allowable field types are the primitive classes for: long, int, short, char, byte, float, double. Support for String is planned for a later release. Some examples:

Heap Allocated Tuples

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		heapMemory().
		build();

	//creates a new tuple allocated on the JVM heap
	FastTuple tuple = schema.createTuple();

Direct Allocated Tuples

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		directMemory().
		build();

	//creates a new tuple, allocating memory off heap
	FastTuple tuple = schema.createTuple();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't destroy the tuple you are leaking memory
	schema.destroy(tuple);

Aligning Direct Allocated Tuples

Direct allocated tuples can be aligned such that they do not share cache lines. This is useful for situations where adjacent tuples are manipulated concurrently by different threads: an adequately padded FastTuple can eliminate false sharing in the CPU cache architecture. Veriifying that property, as expected, will require extensive benchmarking inside of the target system.

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		directMemory().
		padToWordSize(64).
		build();

	//creates a new tuple, allocating memory off heap
	FastTuple tuple = schema.createTuple();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't destroy the tuple you are leaking memory
	schema.destroy(tuple);

Utilizing Tuple Pools

Each schema will allocate a tuple pool per accessing thread if a poolSize is set.

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		poolOfSize(1024).
		//allocates an extra poolOfSize records when the pool is empty
		.expandingPool().
		build();

	//checks a tuple from the pool
	FastTuple tuple = schema.pool().checkout();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't check the tuple back in you either leak memory or objects. bad dog.
	schema.pool().release(tuple);

Performance

One of the main goals of this library is performance. Toward that end it has a full suite of microbenchmarks to test the various supported means of accessing and manipulating tuples for both tuning the library and showing the tradeoffs in overhead for things like pooling and allocating tuples on demand. Here's what a full run looks like on a late 2013 macbook pro 2.6ghz with Java 8 1.8.0_05-b13.

Benchmark                                                                              Mode   Samples         Mean   Mean error    Units
c.b.t.AccessMethodBenchmark.testAllocateSetAndDeallocate                              thrpt        20     7069.954      181.689   ops/ms
c.b.t.AccessMethodBenchmark.testClass                                                 thrpt        20  1628847.332    21756.564   ops/ms
c.b.t.AccessMethodBenchmark.testFastPool                                              thrpt        20    30715.791      570.086   ops/ms
c.b.t.AccessMethodBenchmark.testFastTuplePreAllocIndexed                              thrpt        20   160173.234     2498.380   ops/ms
c.b.t.AccessMethodBenchmark.testFastTuplePreAllocIndexedBoxing                        thrpt        20    59117.984      630.813   ops/ms
c.b.t.AccessMethodBenchmark.testFastTupleStaticBinding                                thrpt        20   157928.877     2411.094   ops/ms
c.b.t.AccessMethodBenchmark.testInvokeDynamic                                         thrpt        20    26085.594     1158.292   ops/ms
c.b.t.AccessMethodBenchmark.testLongArray                                             thrpt        20  1721846.666    23422.990   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapAllocateAndSet                                 thrpt        20     7512.737      167.141   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapDirectSet                                      thrpt        20   898743.426    37659.166   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapSchemaSet                                      thrpt        20   367841.110     4282.620   ops/ms
c.b.t.AccessMethodBenchmark.testPooledObject                                          thrpt        20    12572.464      155.458   ops/ms
c.b.t.AccessMethodBenchmark.testQueuedObject                                          thrpt        20    25092.751      242.771   ops/ms
c.b.t.AccessMethodBenchmark.testReflectField                                          thrpt        20    28850.478      275.462   ops/ms
c.b.t.AccessMethodBenchmark.testStormTuple                                            thrpt        20    79693.517      888.956   ops/ms
c.b.t.AccessMethodBenchmark.testTuplePool                                             thrpt        20    70790.084      950.775   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaAllocate                thrpt        20     7214.300       98.387   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaDeque                   thrpt        20   153937.210     2534.986   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolEval                thrpt        20    53563.512      836.586   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIface               thrpt        20    57669.480      716.962   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIndexed             thrpt        20    57633.447     1025.338   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIndexedBoxed        thrpt        20    38984.939      823.182   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocEval            thrpt        20   332142.591     6426.566   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIface           thrpt        20   370593.271     5369.162   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIndexed         thrpt        20   377811.881     7077.471   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIndexedBoxed    thrpt        20   115699.056     2402.657   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaAllocate                    thrpt        20   990788.476    12295.764   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaDeque                       thrpt        20   173527.417     3619.733   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolEval                    thrpt        20    59377.121     1071.014   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIface                   thrpt        20    62391.209      765.821   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIndexed                 thrpt        20    62120.412     1105.981   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIndexedBoxed            thrpt        20    42187.554      737.124   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocEval                thrpt        20   397337.746     5274.807   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocEvalField           thrpt        20   579136.768     8818.827   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIface               thrpt        20   554449.236     6740.073   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIndexed             thrpt        20   536258.735    10417.522   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIndexedBoxed        thrpt        20   202220.576     3806.694   ops/ms

To run the benchmarks:

mvn package
java -jar fasttuple-bench/target/microbenchmarks.jar
You might also like...

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

rtree In-memory immutable 2D R-tree implementation in java using RxJava Observables for reactive processing of search results. Status: released to Mav

Dec 20, 2022

A Primitive Collection library that reduces memory usage and improves performance

Primitive-Collections This is a Simple Primitive Collections Library i started as a hobby Project. It is based on Java's Collection Library and FastUt

Dec 25, 2022

External-Memory Sorting in Java

Externalsortinginjava External-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm. The vers

Dec 29, 2022

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

LMDB for Java LMDB offers: Transactions (full ACID semantics) Ordered keys (enabling very fast cursor-based iteration) Memory-mapped files (enabling o

Dec 23, 2022

Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

Overview This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used

Jul 28, 2022

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM. It is intended to train the use of debugging tools

Jan 6, 2022

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

Oct 14, 2022

A Java library for quickly and efficiently parsing and writing UUIDs

fast-uuid fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compar

Jan 1, 2023
Comments
  • Where can I find org.codehaus.janino?

    Where can I find org.codehaus.janino?

    Trying to build with maven I get:

    [WARNING] The POM for org.codehaus.janino:janino:jar:2.7.4 is missing, no dependency information available
    [WARNING] The POM for org.codehaus.janino:commons-compiler:jar:2.7.4 is missing, no dependency information available
    
    opened by eric 2
  • Updated to latest Janino and added boolean expression support

    Updated to latest Janino and added boolean expression support

    Not sure if this project is still being maintained or not, but I've made a few tweaks to integrate it with another project I'm working on.

    Awesome work! Great package, excellent and easy to follow code.

    I've added support for evaluating boolean expressions and updated the code to the latest versions of Guava and Janino, passes all the unit tests.

    Would love to get this up on Maven Central, if possible, happy to help in any way.

    Cheers!

    opened by nickrobison 0
Owner
BMC TrueSight Pulse (formerly Boundary)
BMC TrueSight Pulse (formerly Boundary)
Java large off heap cache

OHC - An off-heap-cache Features asynchronous cache loader support optional per entry or default TTL/expireAt entry eviction and expiration without a

Robert Stupp 801 Dec 31, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022
Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

Roman Leventov 967 Nov 14, 2022
The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Homework #14 Table of Contents General Info Technologies Used Project Status Contact General Information Homework contains topics: Sorting an ArrayLis

Mykhailo 1 Feb 12, 2022
Persistent (immutable) collections for Java and Kotlin

What are Dexx Collections? Dexx Collections are a port of Scala's immutable, persistent collection classes to pure Java. Persistent in the context of

Andrew O'Malley 208 Sep 30, 2022
High Performance Primitive Collections for Java

HPPC: High Performance Primitive Collections Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.

Carrot Search 890 Dec 28, 2022
Java port of a concurrent trie hash map implementation from the Scala collections library

About This is a Java port of a concurrent trie hash map implementation from the Scala collections library. It is almost a line-by-line conversion from

null 147 Oct 31, 2022
A Persistent Java Collections Library

PCollections A Persistent Java Collections Library Overview PCollections serves as a persistent and immutable analogue of the Java Collections Framewo

harold cooper 708 Dec 28, 2022
A beginner's guide to Learn Java Collections Framework

Collections-Framework A beginner's guide to Learn Java Collections Framework Topics included: Collection Interface ArrayList Iterator Stack Queue and

Anuj Kumar Sharma 97 Dec 30, 2022
Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Netflix, Inc. 1.1k Dec 25, 2022