A Java library for quickly and efficiently parsing and writing UUIDs

Overview

Maven Central

fast-uuid

fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compared to Java 8 and older; in benchmarks, it's a little more than fourteen times faster at parsing UUIDs and six times faster at writing UUIDs than the stock JDK implementation. Compared to Java 9 and newer, it's about six times faster when it comes to parsing UUIDs and offers no benefits for writing UUIDs.

This library is intended for applications that work with large quantities of UUIDs or that work with UUIDs in performance-sensitive code, and probably won't be helpful in applications that work with UUIDs infrequently.

Usage

Using fast-uuid is simple. To parse UUIDs:

UUID uuid = FastUUID.parseUUID(uuidStringOrCharacterSequence);

To convert UUIDs to strings:

String uuidString = FastUUID.toString(uuid);

Getting fast-uuid

For users of Maven (or Maven-compaitble build tools like Gradle), fast-uuid is available via Maven Central. You can add it to your project with the following Maven dependency declaration:

<dependency>
    <groupId>com.eatthepath</groupId>
    <artifactId>fast-uuid</artifactId>
    <version>0.1</version>
</dependency>

For users managing their own dependencies, you can add fast-uuid to your project by adding the fast-uuid jar file from the latest release to your classpath. fast-uuid has no additional dependencies.

How it works

Parsing UUIDs

Let's take a look at the OpenJDK implementation of UUID#fromString(String) for Java 8 and older:

public static UUID fromString(String name) {
    String[] components = name.split("-");
    if (components.length != 5)
        throw new IllegalArgumentException("Invalid UUID string: "+name);
    for (int i=0; i<5; i++)
        components[i] = "0x"+components[i];

    long mostSigBits = Long.decode(components[0]).longValue();
    mostSigBits <<= 16;
    mostSigBits |= Long.decode(components[1]).longValue();
    mostSigBits <<= 16;
    mostSigBits |= Long.decode(components[2]).longValue();

    long leastSigBits = Long.decode(components[3]).longValue();
    leastSigBits <<= 48;
    leastSigBits |= Long.decode(components[4]).longValue();

    return new UUID(mostSigBits, leastSigBits);
}

If you're just dealing with UUIDs every now and then, this is just fine. If you're doing a lot of UUID parsing, though, there are a few things we might be concerned about here:

  1. This implementation starts off by creating an array of (presumably) five sub-strings. This can be a bit slow in its own right, but beyond that, it also creates five new strings that will need to be garbage-collected eventually.
  2. For each of those substrings, this implementation performs a string concatenation, which requires still more string allocation and eventual garbage collection.
  3. Eventually, this implementation needs to convert hexadecimal strings into numbers. It does so with Long#decode(String) rather than using Long#parseLong(String, int), which means somebody else needs to do the work of figuring out which radix to use when parsing the strings. This seems unnecessary since we know for sure that we're dealing with hexadecimal strings.

It turns out a lot of these issues are interrelated, and we can untangle them to get a significant performance boost. By recognizing that we're always dealing with hexadecimal strings, for example, we can immediately resolve the third issue. Once we've done that, we don't need to concatenate strings to prepend "0x" to the beginning of each of our substrings. That alone speeds things up by about 50% and cuts the number of string allocations (and presumably garbage collection pressure) in half.

That leaves the first problem: can we find a way to parse a UUID without breaking it into substrings first? It turns out we can! Here we have to move away from the handy parsing tools that the JDK provides us, though, and write some of our own. We can even go further and, because we know for sure that we're dealing with hexadecimal strings of a fixed length, we can write a parser that drops a lot of error-checking and flexibility and picks up a lot of speed in return. That's exactly what FastUUID provides, and the result is that it can parse UUIDs a little more than four times faster than the default JDK implementation and, aside from the finished UUID, doesn't create anything on the heap that will need to get garbage-collected later.

Here are some benchmark results under Java 8:

Benchmark Throughput Margin of error
UUID#fromString(String) 1,402,810 UUIDs/second ± 47,330 UUIDs/second
FastUUID#parseUUID(String) 19,736,169 UUIDs/second ± 247,028 UUIDs/second

The Java 9 implementation (some comments have been removed in the interest of brevity) improves the situation significantly:

public static UUID fromString(String name) {
    int len = name.length();
    if (len > 36) {
        throw new IllegalArgumentException("UUID string too large");
    }

    int dash1 = name.indexOf('-', 0);
    int dash2 = name.indexOf('-', dash1 + 1);
    int dash3 = name.indexOf('-', dash2 + 1);
    int dash4 = name.indexOf('-', dash3 + 1);
    int dash5 = name.indexOf('-', dash4 + 1);

    if (dash4 < 0 || dash5 >= 0) {
        throw new IllegalArgumentException("Invalid UUID string: " + name);
    }

    long mostSigBits = Long.parseLong(name, 0, dash1, 16) & 0xffffffffL;
    mostSigBits <<= 16;
    mostSigBits |= Long.parseLong(name, dash1 + 1, dash2, 16) & 0xffffL;
    mostSigBits <<= 16;
    mostSigBits |= Long.parseLong(name, dash2 + 1, dash3, 16) & 0xffffL;
    long leastSigBits = Long.parseLong(name, dash3 + 1, dash4, 16) & 0xffffL;
    leastSigBits <<= 48;
    leastSigBits |= Long.parseLong(name, dash4 + 1, len, 16) & 0xffffffffffffL;

    return new UUID(mostSigBits, leastSigBits);
}

This implementation does away with the string concatenation (and resulting string allocation) entirely. The obvious gains are gone, but we might still be able to improve the situation by using a more application-specific parser than Long#parseLong(String, int, int, int). As it turns out, using an application-specific parser makes a surprisingly significant difference. In benchmarks, a specialized parser is about six times faster than the Java 9 implementation of UUID#fromString(String).

Benchmark Throughput Margin of error
UUID#fromString(String) 2,613,730 UUIDs/second ± 25,278 UUIDs/second
FastUUID#parseUUID(String) 16,796,302 UUIDs/second ± 191,695 UUIDs/second

UUIDs to strings

We've shown that we can significantly improve upon the stock UUID#fromString(String) implementation. Can we achieve similar gains in going from a UUID to a String? Let's take a look at the stock implementation of UUID#toString() from Java 8:

public String toString() {
    return (digits(mostSigBits >> 32, 8) + "-" +
            digits(mostSigBits >> 16, 4) + "-" +
            digits(mostSigBits, 4) + "-" +
            digits(leastSigBits >> 48, 4) + "-" +
            digits(leastSigBits, 12));
}

private static String digits(long val, int digits) {
    long hi = 1L << (digits * 4);
    return Long.toHexString(hi | (val & (hi - 1))).substring(1);
}

As before, we might notice a few areas of concern:

  1. We're performing a lot of string concatenations. Each of those requires allocating space for a new string and ultimately garbage-collecting the intermediate strings. If we can find a way to do less concatenation, we might see some performance gains.
  2. Furthermore, every call to digits produces two new strings (via the calls to toHexString and substring) that ultimately get discarded.

As before, we know some things about UUIDs that help us avoid some general-case error checking and trade some flexibility for performance. For example, we know that the string representation of a UUID will always be exactly 36 characters long (32 hexadecimal digits and four dashes). That means we can pre-allocate space by way of (for example) a StringBuilder. That alone will save us quite a few string allocations and yield significant performance improvements.

As with UUID parsing, we can go further and write our own "to hexadecimal" method that uses our knowledge about the size and structure of UUID strings to place digits in exactly the right place in the finished string, reducing the need to get substrings and perform concatenations. In the end, this lets us convert UUIDs to strings more than six times faster (and, again, with much less garbage-collection pressure) than the stock implementation.

Some benchmark results under Java 8:

Benchmark Throughput Margin of error
UUID#toString() 2,620,932 UUIDs/second ± 21,128 UUIDs/second
FastUUID#toString(UUID) 17,449,401 UUIDs/second ± 221,382 UUIDs/second

Java 9 uses a native method to convert UUIDs to strings, and our optimized implementation is actually a bit slower than the native approach. As a result, we just pass calls to FastUUID#toString(UUID) through to UUID#toString() under Java 9 and newer.

Benchmarking

Because fast-uuid is a performance-oriented project, it includes jmh benchmarks to compare its performance against the stock JDK implementation. To run the the benchmarks:

cd fast-uuid
mvn clean install

cd benchmark
mvn clean install -U

java -jar target/benchmarks.jar

License

fast-uuid is published under the MIT license.

Comments
  • x4 improvement on parsing + Fix the benchmarks

    x4 improvement on parsing + Fix the benchmarks

    You can improve the parsing by 4 times by using a lookup table instead of an if/else branching to compute the hex value of a char.

        private static int[] HEX_VALUE = new int[128];
        static {
            HEX_VALUE['0'] = 0;
            HEX_VALUE['1'] = 1;
            HEX_VALUE['2'] = 2;
            HEX_VALUE['3'] = 3;
            HEX_VALUE['4'] = 4;
            HEX_VALUE['5'] = 5;
            HEX_VALUE['6'] = 6;
            HEX_VALUE['7'] = 7;
            HEX_VALUE['8'] = 8;
            HEX_VALUE['9'] = 9;
            HEX_VALUE['a'] = 10;
            HEX_VALUE['A'] = 10;
            HEX_VALUE['b'] = 11;
            HEX_VALUE['B'] = 11;
            HEX_VALUE['c'] = 12;
            HEX_VALUE['C'] = 12;
            HEX_VALUE['d'] = 13;
            HEX_VALUE['D'] = 13;
            HEX_VALUE['e'] = 14;
            HEX_VALUE['E'] = 14;
            HEX_VALUE['f'] = 15;
            HEX_VALUE['F'] = 15;
        }
    

    Then just use HEX_VALUE[uuidSequence.charAt(X)]; instead of getHexValueForChar(uuidSequence.charAt(X)).

    Benchmark pom project was fixed because it pointed to a fast-uuid-parser which is named fast-uuid. I also removed some tests since the getHexValueForChar is now deleted. Readme has been updated in order to explain how to run the benchmarks (for those who don't know jmh).

    Benchmarks on my machine:

    Before

    Benchmark                                   Mode  Cnt         Score        Error  Units
    UUIDBenchmark.benchmarkFastParserToString  thrpt   40  18850285,268 ± 292260,574  ops/s
    UUIDBenchmark.benchmarkUUIDFromFastParser  thrpt   40   5430137,326 ± 354801,493  ops/s
    UUIDBenchmark.benchmarkUUIDFromString      thrpt   40   1319052,901 ± 101011,389  ops/s
    UUIDBenchmark.benchmarkUUIDToString        thrpt   40   2581791,612 ± 114785,630  ops/s
    

    After

    Benchmark                                   Mode  Cnt         Score        Error  Units
    UUIDBenchmark.benchmarkFastParserToString  thrpt   40  18154539,206 ± 617354,677  ops/s
    UUIDBenchmark.benchmarkUUIDFromFastParser  thrpt   40  23199948,837 ± 638756,033  ops/s
    UUIDBenchmark.benchmarkUUIDFromString      thrpt   40   1359176,406 ±  44637,824  ops/s
    UUIDBenchmark.benchmarkUUIDToString        thrpt   40   2468160,559 ± 158349,184  ops/s
    

    Let's find some ways to make it even faster ;-)

    opened by agrison 7
  • Any plans to implement 'String randomUUID()' without creating intermediate UUID object?

    Any plans to implement 'String randomUUID()' without creating intermediate UUID object?

    First of all - thanks for this library. I always had in my head idea to do the same, but now... you did it :+1:.

    Do you have any plans for implementing

    String randomUUID() instead of JDK UUID randomUUID()

    that will not allocate additional UUID object?

    wontfix 
    opened by doom369 5
  • CVE-2020-23921 vulnerability reported on fast-uuid

    CVE-2020-23921 vulnerability reported on fast-uuid

    OWASP vulnerability scan has reported CVE-2020-23921 https://nvd.nist.gov/vuln/detail/CVE-2020-23921 on fast-uuid jar. Can we check if this is a real vulnerability for this library and if yes what can be done to mitigate the same

    opened by babuv2 4
  • `toString` without the dashes

    `toString` without the dashes

    I realize a proper UUID has dash-delimiters. For my project specifically, we want those gone (to make UUID copyable from the commandline easily).

    Any chance you'd accept a PR for toStringWithoutDashes?

    opened by markusthoemmes 2
  • Publish to Maven Central

    Publish to Maven Central

    I'm opening this issue in the interest of transparency. I fully intend to publish this to Maven Central, but am traveling this week. I'll get it published as soon as I'm back home this weekend and can get my hands on my signing keys.

    opened by jchambers 1
  • Fold benchmarks into the main project

    Fold benchmarks into the main project

    Rather than having benchmarks floating in space as a weird semi-detached project, this brings them into the main project as "tests." Note that this has the side effect of bumping the minimum supported JDK up to Java 7.

    opened by jchambers 0
  • Bump junit from 4.12 to 4.13.1

    Bump junit from 4.12 to 4.13.1

    Bumps junit from 4.12 to 4.13.1.

    Release notes

    Sourced from junit's releases.

    JUnit 4.13.1

    Please refer to the release notes for details.

    JUnit 4.13

    Please refer to the release notes for details.

    JUnit 4.13 RC 2

    Please refer to the release notes for details.

    JUnit 4.13 RC 1

    Please refer to the release notes for details.

    JUnit 4.13 Beta 3

    Please refer to the release notes for details.

    JUnit 4.13 Beta 2

    Please refer to the release notes for details.

    JUnit 4.13 Beta 1

    Please refer to the release notes for details.

    Commits
    • 1b683f4 [maven-release-plugin] prepare release r4.13.1
    • ce6ce3a Draft 4.13.1 release notes
    • c29dd82 Change version to 4.13.1-SNAPSHOT
    • 1d17486 Add a link to assertThrows in exception testing
    • 543905d Use separate line for annotation in Javadoc
    • 510e906 Add sub headlines to class Javadoc
    • 610155b Merge pull request from GHSA-269g-pwp5-87pp
    • b6cfd1e Explicitly wrap float parameter for consistency (#1671)
    • a5d205c Fix GitHub link in FAQ (#1672)
    • 3a5c6b4 Deprecated since jdk9 replacing constructor instance of Double and Float (#1660)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Look up hex values in an array instead calculating them repeatedly

    Look up hex values in an array instead calculating them repeatedly

    This implements almost exactly the same approach as #3, but includes a little more error checking. In benchmarks, this approach is about 5% slower than @agrison's, but it's still roughly a 4x improvement over master (which was itself a 4x improvement over the stock implementation).

    opened by jchambers 0
  • Create `toStringBytes` method that returns a `byte[]` instead of a String

    Create `toStringBytes` method that returns a `byte[]` instead of a String

    For even better performance, it could even take an existing byte[] array, fill it with values and not create any garbage. This would be great especially on Java 9+, where char[] aren't the internal representations of UUID Strings anymore. It would be cool if it would a get a byte array for source as input (see https://github.com/jchambers/fast-uuid/issues/9), maybe even with an offset as length, like System.arraycopy or Arrays.equals does.

    opened by paplorinc 1
  • Add method to create hash String from byte[] and long/long

    Add method to create hash String from byte[] and long/long

    Hey, thanks for sharing this little library!

    Since UUIDs are sometimes the result of hashing calculations (e.g. MessageDigest.getInstance("md5")#digest), it may make sense to extend the lib with FastUUID.toString(byte[] uuid) and FastUUID.toString(long mostSignificantBits, long leastSignificantBits).

    I would gladly contribute this change, if you think it makes sense.

    opened by paplorinc 3
Releases(fast-uuid-0.2.0)
Owner
Jon Chambers
Jon Chambers
Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Netflix, Inc. 1.1k Dec 25, 2022
LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan), audio (OpenAL), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR) applications.

LWJGL - Lightweight Java Game Library 3 LWJGL (https://www.lwjgl.org) is a Java library that enables cross-platform access to popular native APIs usef

Lightweight Java Game Library 4k Dec 29, 2022
A modern I/O library for Android, Kotlin, and Java.

Okio See the project website for documentation and APIs. Okio is a library that complements java.io and java.nio to make it much easier to access, sto

Square 8.2k Dec 31, 2022
RxJava – Reactive Extensions for the JVM – a library for composing asynchronous and event-based programs using observable sequences for the Java VM.

RxJava: Reactive Extensions for the JVM RxJava is a Java VM implementation of Reactive Extensions: a library for composing asynchronous and event-base

ReactiveX 46.7k Dec 30, 2022
Jalgorithm is an open-source Java library which has implemented various algorithms and data structure

We loved Java and algorithms, so We made Jalgorithm ❤ Jalgorithm is an open-source Java library which has implemented various algorithms and data stru

Muhammad Karbalaee 35 Dec 15, 2022
A high performance caching library for Java

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release. C

Ben Manes 13k Jan 5, 2023
Java port of a concurrent trie hash map implementation from the Scala collections library

About This is a Java port of a concurrent trie hash map implementation from the Scala collections library. It is almost a line-by-line conversion from

null 147 Oct 31, 2022
Java library for the HyperLogLog algorithm

java-hll A Java implementation of HyperLogLog whose goal is to be storage-compatible with other similar offerings from Aggregate Knowledge. NOTE: This

Aggregate Knowledge (a Neustar service) 296 Dec 30, 2022
A simple integer compression library in Java

JavaFastPFOR: A simple integer compression library in Java License This code is released under the Apache License Version 2.0 http://www.apache.org/li

Daniel Lemire 487 Dec 30, 2022
A Persistent Java Collections Library

PCollections A Persistent Java Collections Library Overview PCollections serves as a persistent and immutable analogue of the Java Collections Framewo

harold cooper 708 Dec 28, 2022
A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.

High Scale Lib This is Boundary's fork of Cliff Click's high scale lib. We will be maintaining this fork with bug fixes, improvements and versioned bu

BMC TrueSight Pulse (formerly Boundary) 402 Jan 2, 2023
A Primitive Collection library that reduces memory usage and improves performance

Primitive-Collections This is a Simple Primitive Collections Library i started as a hobby Project. It is based on Java's Collection Library and FastUt

Speiger 26 Dec 25, 2022
Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

Overview This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used

Tatu Saloranta 132 Jul 28, 2022
Reactive Streams Utilities - Future standard utilities library for Reactive Streams.

Reactive Streams Utilities This is an exploration of what a utilities library for Reactive Streams in the JDK might look like. Glossary: A short gloss

Lightbend 61 May 27, 2021
Zero-dependency Reactive Streams publishers library

⚡️ Mutiny Zero: a zero-dependency Reactive Streams publishers library for Java Mutiny Zero is a minimal API for creating reactive-streams compliant pu

SmallRye 14 Dec 14, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022
gRPC and protocol buffers for Android, Kotlin, and Java.

Wire “A man got to have a code!” - Omar Little See the project website for documentation and APIs. As our teams and programs grow, the variety and vol

Square 3.9k Jan 5, 2023