FastTuple

Introduction

There are lots of good things about working on the JVM, like a world class JIT, operating system threads, and a world class garbage collector. However, one limiting factor can often be the interaction between primitive types and referenced types in Java. Primitive types are the built in types that represent integral numbers, floating point numbers, and boolean yes/no values. Primitives are also quite fast and memory efficient: they get allocated either on the stack if they're being used in a method, or inlined in an object when they're declared as field members. And they wind up being fast because the JIT can often optimize their access down to a single CPU instruction. This works really well when you know what types a class will hold as its state beforehand. If, on the other hand, you don't know what an object or array will hold at compile time the JVM forces you to box primitives. Boxing means that the primitives get wrapped in a heap allocated object, and their container will hold a reference to them. That type of overhead winds up being inefficient both in access time and memory space. Access time suffers because this type of layout breaks locality of reference, which is the principle that memory frequently accessed together should be stored adjacently. All modern memory hierarchies optimize for locality of reference in their caching implementations. The extra allocations and garbage generated also put pressure on the JVM's garbage collector, which can often be a cause of long pause times.

We wrote FastTuple to try and help solve this problem. FastTuple generates heterogeneous collections of primitive values and ensures as best it can that they will be laid out adjacently in memory. The individual values in the tuple can either be accessed from a statically bound interface, via an indexed accessor, or via reflective or other dynamic invocation techniques. FastTuple is designed to deal with a large number of tuples therefore it will also attempt to pool tuples such that they do not add significantly to the GC load of a system. FastTuple is also capable of allocating the tuple value storage entirely off-heap, using Java's direct memory capabilities.

FastTuple pulls off its trick via runtime bytecode generation. The user supplies it with a schema of field names and types. That schema is then built into a Java class definition which will contain accessor methods and either field definitions or the memory address for an off heap allocation, depending on which storage method was requested. The resulting Java class gets compiled into bytecode and then loaded as a reflective Class object. This Class object can then be used to create instances of the new class.

Building

This package depends on Janino 2.7.4, which is not currently getting pushed to maven central. As soon as that happens (we have a ticket in with the maintainer) we will begin pushing FastTuple to maven central as well. In the meantime you can build FastTuple like so:

wget http://janino.net/download/janino-2.7.4.zip
unzip janino-2.7.4.zip
mvn install:install-file -Dfile=janino-2.7.4/janino.jar -Dsources=janino-2.7.4/janino-src.zip -DgroupId=org.codehaus.janino -DartifactId=janino -Dversion=2.7.4 -Dpackaging=jar
mvn install:install-file -Dfile=janino-2.7.4/commons-compiler.jar -Dsources=janino-2.7.4/commons-compiler-src.zip -DgroupId=org.codehaus.janino -DartifactId=commons-compiler -Dversion=2.7.4 -Dpackaging=jar
mvn install

Note on GPG

You'll also need to ensure that you have GPG available on the command line. Mac OS X and Windows users may need to install via (https://gpgtools.org/) or cygwin respectively.

Usage

Interaction with FastTuple primarily takes place via the TupleSchema class. Each instance of TupleSchema describes a separate type of FastTuple both from the perspective of the FastTuple library and the JVM. At this time, allowable field types are the primitive classes for: long, int, short, char, byte, float, double. Support for String is planned for a later release. Some examples:

Heap Allocated Tuples

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		heapMemory().
		build();

	//creates a new tuple allocated on the JVM heap
	FastTuple tuple = schema.createTuple();

Direct Allocated Tuples

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		directMemory().
		build();

	//creates a new tuple, allocating memory off heap
	FastTuple tuple = schema.createTuple();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't destroy the tuple you are leaking memory
	schema.destroy(tuple);

Aligning Direct Allocated Tuples

Direct allocated tuples can be aligned such that they do not share cache lines. This is useful for situations where adjacent tuples are manipulated concurrently by different threads: an adequately padded FastTuple can eliminate false sharing in the CPU cache architecture. Veriifying that property, as expected, will require extensive benchmarking inside of the target system.

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		directMemory().
		padToWordSize(64).
		build();

	//creates a new tuple, allocating memory off heap
	FastTuple tuple = schema.createTuple();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't destroy the tuple you are leaking memory
	schema.destroy(tuple);

Utilizing Tuple Pools

Each schema will allocate a tuple pool per accessing thread if a poolSize is set.

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		poolOfSize(1024).
		//allocates an extra poolOfSize records when the pool is empty
		.expandingPool().
		build();

	//checks a tuple from the pool
	FastTuple tuple = schema.pool().checkout();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't check the tuple back in you either leak memory or objects. bad dog.
	schema.pool().release(tuple);

Performance

One of the main goals of this library is performance. Toward that end it has a full suite of microbenchmarks to test the various supported means of accessing and manipulating tuples for both tuning the library and showing the tradeoffs in overhead for things like pooling and allocating tuples on demand. Here's what a full run looks like on a late 2013 macbook pro 2.6ghz with Java 8 1.8.0_05-b13.

Benchmark                                                                              Mode   Samples         Mean   Mean error    Units
c.b.t.AccessMethodBenchmark.testAllocateSetAndDeallocate                              thrpt        20     7069.954      181.689   ops/ms
c.b.t.AccessMethodBenchmark.testClass                                                 thrpt        20  1628847.332    21756.564   ops/ms
c.b.t.AccessMethodBenchmark.testFastPool                                              thrpt        20    30715.791      570.086   ops/ms
c.b.t.AccessMethodBenchmark.testFastTuplePreAllocIndexed                              thrpt        20   160173.234     2498.380   ops/ms
c.b.t.AccessMethodBenchmark.testFastTuplePreAllocIndexedBoxing                        thrpt        20    59117.984      630.813   ops/ms
c.b.t.AccessMethodBenchmark.testFastTupleStaticBinding                                thrpt        20   157928.877     2411.094   ops/ms
c.b.t.AccessMethodBenchmark.testInvokeDynamic                                         thrpt        20    26085.594     1158.292   ops/ms
c.b.t.AccessMethodBenchmark.testLongArray                                             thrpt        20  1721846.666    23422.990   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapAllocateAndSet                                 thrpt        20     7512.737      167.141   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapDirectSet                                      thrpt        20   898743.426    37659.166   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapSchemaSet                                      thrpt        20   367841.110     4282.620   ops/ms
c.b.t.AccessMethodBenchmark.testPooledObject                                          thrpt        20    12572.464      155.458   ops/ms
c.b.t.AccessMethodBenchmark.testQueuedObject                                          thrpt        20    25092.751      242.771   ops/ms
c.b.t.AccessMethodBenchmark.testReflectField                                          thrpt        20    28850.478      275.462   ops/ms
c.b.t.AccessMethodBenchmark.testStormTuple                                            thrpt        20    79693.517      888.956   ops/ms
c.b.t.AccessMethodBenchmark.testTuplePool                                             thrpt        20    70790.084      950.775   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaAllocate                thrpt        20     7214.300       98.387   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaDeque                   thrpt        20   153937.210     2534.986   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolEval                thrpt        20    53563.512      836.586   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIface               thrpt        20    57669.480      716.962   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIndexed             thrpt        20    57633.447     1025.338   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIndexedBoxed        thrpt        20    38984.939      823.182   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocEval            thrpt        20   332142.591     6426.566   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIface           thrpt        20   370593.271     5369.162   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIndexed         thrpt        20   377811.881     7077.471   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIndexedBoxed    thrpt        20   115699.056     2402.657   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaAllocate                    thrpt        20   990788.476    12295.764   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaDeque                       thrpt        20   173527.417     3619.733   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolEval                    thrpt        20    59377.121     1071.014   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIface                   thrpt        20    62391.209      765.821   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIndexed                 thrpt        20    62120.412     1105.981   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIndexedBoxed            thrpt        20    42187.554      737.124   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocEval                thrpt        20   397337.746     5274.807   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocEvalField           thrpt        20   579136.768     8818.827   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIface               thrpt        20   554449.236     6740.073   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIndexed             thrpt        20   536258.735    10417.522   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIndexedBoxed        thrpt        20   202220.576     3806.694   ops/ms

To run the benchmarks:

mvn package
java -jar fasttuple-bench/target/microbenchmarks.jar

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

rtree In-memory immutable 2D R-tree implementation in java using RxJava Observables for reactive processing of search results. Status: released to Mav

Dec 20, 2022

A Primitive Collection library that reduces memory usage and improves performance

Primitive-Collections This is a Simple Primitive Collections Library i started as a hobby Project. It is based on Java's Collection Library and FastUt

Dec 25, 2022

External-Memory Sorting in Java

Externalsortinginjava External-Memory Sorting in Java: useful to sort very large files using multiple cores and an external-memory algorithm. The vers

Dec 29, 2022

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

LMDB for Java LMDB offers: Transactions (full ACID semantics) Ordered keys (enabling very fast cursor-based iteration) Memory-mapped files (enabling o

Dec 23, 2022

Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

Overview This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used

Jul 28, 2022

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM. It is intended to train the use of debugging tools

Jan 6, 2022

jOOλ - The Missing Parts in Java 8 jOOλ improves the JDK libraries in areas where the Expert Group's focus was elsewhere. It adds tuple support, function support, and a lot of additional functionality around sequential Streams. The JDK 8's main efforts (default methods, lambdas, and the Stream API) were focused around maintaining backwards compatibility and implementing a functional API for parallelism.

jOOλ jOOλ is part of the jOOQ series (along with jOOQ, jOOX, jOOR, jOOU) providing some useful extensions to Java 8 lambdas. It contains these classes

Dec 31, 2022

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

Oct 14, 2022

A Java library for quickly and efficiently parsing and writing UUIDs

fast-uuid fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compar

Jan 1, 2023

Where can I find org.codehaus.janino?

Trying to build with maven I get:

[WARNING] The POM for org.codehaus.janino:janino:jar:2.7.4 is missing, no dependency information available
[WARNING] The POM for org.codehaus.janino:commons-compiler:jar:2.7.4 is missing, no dependency information available

opened by eric 2

Updated to latest Janino and added boolean expression support

Not sure if this project is still being maintained or not, but I've made a few tweaks to integrate it with another project I'm working on.

Awesome work! Great package, excellent and easy to follow code.

I've added support for evaluating boolean expressions and updated the code to the latest versions of Guava and Janino, passes all the unit tests.

Would love to get this up on Maven Central, if possible, happy to help in any way.

Cheers!

opened by nickrobison 0

fasttuple - Collections that are laid out adjacently in both on- and off-heap memory.

Related tags

Overview

FastTuple

Introduction

Building

Note on GPG

Usage

Heap Allocated Tuples

Direct Allocated Tuples

Aligning Direct Allocated Tuples

Utilizing Tuple Pools

Performance

You might also like...

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

A Primitive Collection library that reduces memory usage and improves performance

External-Memory Sorting in Java

Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

jproblemgenerator creates scenarios in which Java programs leak memory or crash the JVM

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

A Java library for quickly and efficiently parsing and writing UUIDs

Comments

Where can I find org.codehaus.janino?

Updated to latest Janino and added boolean expression support

Owner

BMC TrueSight Pulse (formerly Boundary)

Java large off heap cache

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

Java Collections till the last breadcrumb of memory and performance

The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Persistent (immutable) collections for Java and Kotlin

High Performance Primitive Collections for Java

Java port of a concurrent trie hash map implementation from the Scala collections library

A Persistent Java Collections Library

A beginner's guide to Learn Java Collections Framework

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.