binary serialization format

Overview

Colfer Build Status

Colfer is a binary serialization format optimized for speed and size.

The project's compiler colf(1) generates source code from schema definitions to marshal and unmarshall data structures.

This is free and unencumbered software released into the public domain. The format is inspired by Protocol Buffers.

Language Support

  • C, ISO/IEC 9899:2011 compliant a.k.a. C11, C++ compatible
  • Go, a.k.a. golang
  • Java, Android compatible
  • JavaScript, a.k.a. ECMAScript, NodeJS compatible

Features

  • Simple and straightforward in use
  • No dependencies other than the core library
  • Both faster and smaller than the competition
  • Robust against malicious input
  • Maximum of 127 fields per data structure
  • No support for enumerations
  • Framed; suitable for concatenation/streaming

TODO's

  • Rust and Python support
  • Protocol revision

Use

Download a prebuilt compiler or run go get -u github.com/pascaldekloe/colfer/cmd/colf to make one yourself. Homebrew users can also brew install colfer.

The command prints its own manual when invoked without arguments.

NAME
	colf — compile Colfer schemas

SYNOPSIS
	colf [-h]
	colf [-vf] [-b directory] [-p package] \
		[-s expression] [-l expression] C [file ...]
	colf [-vf] [-b directory] [-p package] [-t files] \
		[-s expression] [-l expression] Go [file ...]
	colf [-vf] [-b directory] [-p package] [-t files] \
		[-x class] [-i interfaces] [-c file] \
		[-s expression] [-l expression] Java [file ...]
	colf [-vf] [-b directory] [-p package] \
		[-s expression] [-l expression] JavaScript [file ...]

DESCRIPTION
	The output is source code for either C, Go, Java or JavaScript.

	For each operand that names a file of a type other than
	directory, colf reads the content as schema input. For each
	named directory, colf reads all files with a .colf extension
	within that directory. If no operands are given, the contents of
	the current directory are used.

	A package definition may be spread over several schema files.
	The directory hierarchy of the input is not relevant to the
	generated code.

OPTIONS
  -b directory
    	Use a base directory for the generated code. (default ".")
  -c file
    	Insert a code snippet from a file.
  -f	Normalize the format of all schema input on the fly.
  -h	Prints the manual to standard error.
  -i interfaces
    	Make all generated classes implement one or more interfaces.
    	Use commas as a list separator.
  -l expression
    	Set the default upper limit for the number of elements in a
    	list. The expression is applied to the target language under
    	the name ColferListMax. (default "64 * 1024")
  -p package
    	Compile to a package prefix.
  -s expression
    	Set the default upper limit for serial byte sizes. The
    	expression is applied to the target language under the name
    	ColferSizeMax. (default "16 * 1024 * 1024")
  -t files
    	Supply custom tags with one or more files. Use commas as a list
    	separator. See the TAGS section for details.
  -v	Enable verbose reporting to standard error.
  -x class
    	Make all generated classes extend a super class.

TAGS
	Tags, a.k.a. annotations, are source code additions for structs
	and/or fields. Input for the compiler can be specified with the
	-f option. The data format is line-oriented.

		<line> :≡ <qual> <space> <code> ;
		<qual> :≡ <package> '.' <dest> ;
		<dest> :≡ <struct> | <struct> '.' <field> ;

	Lines starting with a '#' are ignored (as comments). Java output
	can take multiple tag lines for the same struct or field. Each
	code line is applied in order of appearance.

EXIT STATUS
	The command exits 0 on success, 1 on error and 2 when invoked
	without arguments.

EXAMPLES
	Compile ./io.colf with compact limits as C:

		colf -b src -s 2048 -l 96 C io.colf

	Compile ./*.colf with a common parent as Java:

		colf -p com.example.model -x com.example.io.IOBean Java

BUGS
	Report bugs at <https://github.com/pascaldekloe/colfer/issues>.

	Text validation is not part of the marshalling and unmarshalling
	process. C and Go just pass any malformed UTF-8 characters. Java
	and JavaScript replace unmappable content with the '?' character
	(ASCII 63).

SEE ALSO
	protoc(1), flatc(1)

It is recommended to commit the generated source code into the respective version control to preserve build consistency and minimise the need for compiler installations. Alternatively, you may use the Maven plugin.

<plugin>
	<groupId>net.quies.colfer</groupId>
	<artifactId>colfer-maven-plugin</artifactId>
	<version>1.11.2</version>
	<configuration>
		<packagePrefix>com/example</packagePrefix>
	</configuration>
</plugin>

Schema

Data structures are defined in .colf files. The format is quite self-explanatory.

// Package demo offers a demonstration.
// These comment lines will end up in the generated code.
package demo

// Course is the grounds where the game of golf is played.
type course struct {
	ID    uint64
	name  text
	holes []hole
	image binary
	tags  []text
}

type hole struct {
	// Lat is the latitude of the cup.
	lat float64
	// Lon is the longitude of the cup.
	lon float64
	// Par is the difficulty index.
	par uint8
	// Water marks the presence of water.
	water bool
	// Sand marks the presence of sand.
	sand bool
}

See what the generated code looks like in C, Go, Java or JavaScript.

The following table shows how Colfer data types are applied per language.

Colfer C Go Java JavaScript
bool char bool boolean Boolean
uint8 uint8_t uint8 byte † Number
uint16 uint16_t uint16 short † Number
uint32 uint32_t uint32 int † Number
uint64 uint64_t uint64 long † Number ‡
int32 int32_t int32 int Number
int64 int64_t int64 long Number ‡
float32 float float32 float Number
float64 double float64 double Number
timestamp timespec time.Time †† time.Instant Date + Number
text const char* + size_t string String String
binary uint8_t* + size_t []byte byte[] Uint8Array
list * + size_t slice array Array
  • † signed representation of unsigned data, i.e. may overflow to negative.
  • ‡ range limited to [1 - 2⁵³, 2⁵³ - 1]
  • †† timezone not preserved

Lists may contain floating points, text, binaries or data structures.

Security

Colfer is suited for untrusted data sources such as network I/O or bulk streams. Marshalling and unmarshalling comes with built-in size protection to ensure predictable memory consumption. The format prevents memory bombs by design.

The marshaller may not produce malformed output, regardless of the data input. In no event may the unmarshaller read outside the boundaries of a serial. Fuzz testing did not reveal any volnurabilities yet. Computing power is welcome.

Compatibility

Name changes do not affect the serialization format. Deprecated fields should be renamed to clearly discourage their use. For backwards compatibility new fields must be added to the end of colfer structs. Thus the number of fields can be seen as the schema version.

Performance

Colfer aims to be the fastest and the smallest format without compromising on reliability. See the benchmark wiki for a comparison. Suboptimal performance is treated like a bug.

Comments
  • No JavaScript Benchmarks?

    No JavaScript Benchmarks?

    Would love to see how the built-in JSON and Protobuf fare against colfer. Its kinda unfair to see C, Go, and Java make it to the benchmark games and JS stays at home. Could it be that the JS version couldn't live up to the Colfer performance promise?

    enhancement 
    opened by ohenepee 17
  • Add superclass to generated Java classes

    Add superclass to generated Java classes

    Adds a common interface to the generated Java - I'd like to write code that interacts with any Colfer object, and this makes it quite a bit simpler.

    I'm not much of a go programmer, so please let me know if something needs correction.

    enhancement 
    opened by kbarrette 15
  • Segmentation fault in a unique case

    Segmentation fault in a unique case

    Hi, I hope all are great.

    Scenario: I have the following mapping in my chappee.colf file.

    package chappee type mappee { maptype uint32 msgtype uint32 callmsg text debugleve text minseed uint64 maxseed uint64 }

    During packing, if I set the last two values, the packing is successful. Now, if during packing I pack first 4 (or any of them) and don't pack the last 2, I get a segmentation fault. I think it would be great that Colfer itself should take care of the values which are not set.

    It can be taken as a bug, or it can be taken as a feature request.

    Cheers, infoginx.com

    question 
    opened by ks228 13
  • Document UTF-8 Conversion in Spec + Simplify marshall/unmarshall Implementations.

    Document UTF-8 Conversion in Spec + Simplify marshall/unmarshall Implementations.

    After looking at this I thought we were doing something special, but it turns out some characters weren't encoded right. Single character string: '𐍈' is an example

    We should switch to utf8.js since it makes the code easier to debug. If we wanted to save memory allocations, we could make it re-use a fixed buffer and return the buffer, length pair. And then we can rewrite the code for subsequent for adjusting the size in.

    The Java version can be simplified with new String(bytes, utf8CharSet); and s.getBytes(utf8CharSet); As of Java 8, they decided to move away from UTF-16 as a default encoding, and previously were into UTF16. So we should just use the native implementation as much as possible than invent ours.

    @pascaldekloe would like to hear your thoughts. :)

    enhancement invalid 
    opened by guilt 9
  • Comments in generated code

    Comments in generated code

    Hi,

    colfer place the name of schema file to the comment it generates via Maven Plugin, I am using a Windows machine and colfer does not escape the "" in file path and java compiler then complains about 'illegal escape character', is there a way to disable this comment or let it the colfer escape it before putting in the comment?

    Thx

    bug 
    opened by mehmetsalgar 9
  • BufferOverflowException in writeObject

    BufferOverflowException in writeObject

    Maybe I missed something, but it looks like a bug: In Java, in generated class there is writeObject(ObjectOutputStream out) method which uses buf array of size 1024 by default. In case of java.nio.BufferUnderflowException size of it is enlarged 4 times. But in case when buf array is too small to handle marshalled object, java.nio.BufferOverflowException is thrown by marshal(byte[] buf, int offset) method, not java.nio.BufferUnderflowException, so buf is not enlarged and writeObject ends with exception.

    bug 
    opened by JanWichniarek 8
  • why not making colf java based on annotation processing

    why not making colf java based on annotation processing

    @Colfer could be used to identify pojo classes. Then the annotation processor would scan the class and generate the code for serialization / deserialization.

    This is the most popular java way to do it. Annotation processors are a standard part of the java compiler.

    Colf has a great potential on Android. And annotation processors are very common in Android builds.

    enhancement wontfix 
    opened by stephanenicolas 8
  • Document limitation: Only 126 fields, ever, no removing

    Document limitation: Only 126 fields, ever, no removing

    I wish it had been documented that a colf type cannot have more than 126 fields, ever, and this includes "deleted" fields.

    (Why 126? "Each definition starts with an 8-bit header. The 7 least significant bits identify the field by its (0-based position) index in the schema." and 127 is reserved for the termination byte.)

    Related, I wish https://github.com/pascaldekloe/colfer/#compatibility had explicitly said "Fields can never be removed."

    enhancement 
    opened by tv42 7
  • Timezone offset for Go

    Timezone offset for Go

    It would be great if this could be supported. Right now I have to serialise extra fields to rematerialize it correctly.

    I think, unless it is very hard, that I can give it a stab if you point me in the right direction.

    enhancement wontfix 
    opened by dahankzter 7
  • Is this used in production anywhere?

    Is this used in production anywhere?

    Hi, I work for a large mobile app company that is interested in replacing JSON with something faster. The one disappointing thing is that colfer is written in Go and JS it looks like, which we can't use for iOS or Android (without a fair amount of finagling at least), so are there any plans for it to get ported to C/C++? Also, are there any companies that use it? It would be great to see a list of some companies that use or even their experiences, as that would really sell the format. How much security auditing has occurred?

    The numbers on https://github.com/eishay/jvm-serializers/wiki are tantalizing, so I'm curious to hear more :)

    question 
    opened by michaeleiselsc 7
  • Document uint16 better.

    Document uint16 better.

    Why is uint16's compressed and uncompressed flipping the behavior of (index | 0x80)? This flag sort of is the anti-pattern of what uint32 and uint64 do. It's a bit confusing.

    enhancement 
    opened by guilt 6
  • Big Decimal Support

    Big Decimal Support

    Hi! Thanks for this excelent serializer!

    Please add suport for big.Int and decimal.Big !

    My main need is the following golang types (biginteger,native) math/big.Int and (bigdecimal) github.com/ericlagergren/decimal.Big

    for others langs I don't care, but I would suggest: java => (biginteger, native) https://docs.oracle.com/javase/10/docs/api/java/math/BigInteger.html (bigdecimal, native) https://docs.oracle.com/javase/10/docs/api/java/math/BigDecimal.html JavaScrip => (biginteger, native) https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt (or, biginteger) https://github.com/MikeMcl/decimal.js (bigdecimal) https://github.com/MikeMcl/decimal.js C => (biginteger) https://github.com/libtom/libtommath (bigdecimal) https://github.com/libtom/libtomfloat

    Thanks Very Very Very Much!

    Best Whishes, Dani.

    enhancement 
    opened by danieagle 6
  • Adding Dart

    Adding Dart

    For https://github.com/pascaldekloe/colfer/issues/71

    • Added Dart module, with tests ** MinInt64 as uint64 is wired directly because of the overflow to negative, because that's the only branch that would have needed counting bytes. I added another test with MinInt64+1 to prove that the rest works as expected. ** I didn't add EOF as it's in the Go code, mainly because using the library is very similar in these two cases. Dart error handling is through try-catch with errors and exceptions. By default, errors are used for programmatic errors, exceptions for user input errors, so it's not totally idiomatic, looks a little hacky. But in the end catching RangeError instead of a defined EOF exception are very similar to me, tested it as well
    • updated Readme ** updated footnote characters, because in theory it's possible to have multiple of them in any field. I took the rest from here: https://en.wikipedia.org/wiki/Note_(typography)

    what's missing:

    • benchmark
    • I did my best, and I'm pretty sure that it's working as-is, but a review from a more experienced Dart dev would be good
    enhancement 
    opened by vendelin8 17
  • Dart support

    Dart support

    Hi,

    Awesome lib, thanks! I'm planning to add a dart implementation. Writing it here, so if anyone else thinks about to add it, we don't do it multiple times.

    enhancement 
    opened by vendelin8 4
  • Rust support

    Rust support

    Hello,

    I wanted to ask about rust support.

    Considering how such code is usually generated with macros, should we just create a crate that just implements the protocol/ser/deser logic? Also, conforming with Serde to get automatic ser/deser of rust struct would be a very nice addition as well.

    Cheers,

    Mathieu

    enhancement help wanted 
    opened by OtaK 11
Releases(v1.8.1)
Owner
Pascal S. de Kloe
freelance engineer
Pascal S. de Kloe
Parquet-MR contains the java implementation of the Parquet format

Parquet MR Parquet-MR contains the java implementation of the Parquet format. Parquet is a columnar storage format for Hadoop; it provides efficient s

The Apache Software Foundation 1.8k Jan 5, 2023
binary serialization format

Colfer Colfer is a binary serialization format optimized for speed and size. The project's compiler colf(1) generates source code from schema definiti

Pascal S. de Kloe 680 Dec 25, 2022
Java binary serialization and cloning: fast, efficient, automatic

Kryo is a fast and efficient binary object graph serialization framework for Java. The goals of the project are high speed, low size, and an easy to u

Esoteric Software 5.7k Jan 5, 2023
A Java library for serializing objects as PHP serialization format.

Java PHP Serializer Latest release: A Java library for serializing objects as PHP serialization format. The library fully implements the PHP serializa

Marcos Passos 14 Jun 13, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Dec 28, 2022
Simple Binary Encoding (SBE) - High Performance Message Codec

Simple Binary Encoding (SBE) SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applic

Real Logic 2.8k Jan 3, 2023
Binary Artifact Management Tool

Artipie is an experimental binary artifact management tool, similar to Artifactory, Nexus, Archiva, ProGet, and many others. The following set of feat

Artipie 327 Dec 24, 2022
Simple Binary Encoding (SBE) - High Performance Message Codec

Simple Binary Encoding (SBE) SBE is an OSI layer 6 presentation for encoding and decoding binary application messages for low-latency financial applic

Real Logic 2.8k Dec 28, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Jan 3, 2023
Apache POI - A Java library for reading and writing Microsoft Office binary and OOXML file formats.

Apache POI A Java library for reading and writing Microsoft Office binary and OOXML file formats. The Apache POI Project's mission is to create and ma

The Apache Software Foundation 1.5k Jan 1, 2023
RSocket is a binary protocol for use on byte stream transports such as TCP, WebSockets, and Aeron

RSocket RSocket is a binary protocol for use on byte stream transports such as TCP, WebSockets, and Aeron. It enables the following symmetric interact

RSocket 2.2k Dec 30, 2022
log4j2-scan is a single binary command-line tool for CVE-2021-44228 vulnerability scanning and mitigation patch

log4j2-scan is a single binary command-line tool for CVE-2021-44228 vulnerability scanning and mitigation patch. It also supports nested JAR file scan

Logpresso GitHub 839 Dec 29, 2022
GalaxyCDC is a core component of PolarDB-X which is responsible for global binary log generation, publication and subscription.

中文文档 What is ApsaraDB GalaxyCDC ? GalaxyCDC is a core component of PolarDB-X which is responsible for global binary log generation, publication and su

null 56 Dec 19, 2022
Ribbon is a Inter Process Communication (remote procedure calls) library with built in software load balancers. The primary usage model involves REST calls with various serialization scheme support.

Ribbon Ribbon is a client side IPC library that is battle-tested in cloud. It provides the following features Load balancing Fault tolerance Multiple

Netflix, Inc. 4.4k Jan 1, 2023
A Java serialization/deserialization library to convert Java Objects into JSON and back

Gson Gson is a Java library that can be used to convert Java Objects into their JSON representation. It can also be used to convert a JSON string to a

Google 21.7k Jan 8, 2023
Screaming fast JSON parsing and serialization library for Android.

#LoganSquare The fastest JSON parsing and serializing library available for Android. Based on Jackson's streaming API, LoganSquare is able to consiste

BlueLine Labs 3.2k Dec 18, 2022
FlatBuffers: Memory Efficient Serialization Library

FlatBuffers FlatBuffers is a cross platform serialization library architected for maximum memory efficiency. It allows you to directly access serializ

Google 19.6k Dec 31, 2022
FST: fast java serialization drop in-replacement

fast-serialization up to 10 times faster 100% JDK Serialization compatible drop-in replacement (Ok, might be 99% ..). As an example: Lambda Serializat

moru0011 1.5k Dec 15, 2022
Ribbon is a Inter Process Communication (remote procedure calls) library with built in software load balancers. The primary usage model involves REST calls with various serialization scheme support.

Ribbon Ribbon is a client side IPC library that is battle-tested in cloud. It provides the following features Load balancing Fault tolerance Multiple

Netflix, Inc. 4.4k Jan 4, 2023
A universal types-preserving Java serialization library that can convert arbitrary Java Objects into JSON and back

A universal types-preserving Java serialization library that can convert arbitrary Java Objects into JSON and back, with a transparent support of any kind of self-references and with a full Java 9 compatibility.

Andrey Mogilev 9 Dec 30, 2021