A straight forward C++ to Java port of Daniel Lemire's fast_double_parser

Overview

FastDoubleParser

A straight forward C++ to Java port of Daniel Lemire's fast_double_parser.

https://github.com/lemire/fast_double_parser

Usage:

import FastDoubleParser;

double d = FastDoubleParser.parseDouble("1.2345");

Note: Method parseDouble takes a CharacterSequence as its argument. So, if you have a text inside of a StringBuffer, you do not need to convert it to a String, because StringBuffer extends from CharacterSequence.

The test directory contains some functional tests, and some performance tests.

How to run the performance test on a Mac:

  1. Install Java JDK 8 or higher, for example OpenJDK.
  2. Install the XCode command line tools from Apple.
  3. Open the Terminal and execute the following commands:

Command:

 git clone https://github.com/wrandelshofer/FastDoubleParser.git
 cd FastDoubleParser 
 javac -d out -encoding utf8 -sourcepath src/main/java test/main/java/org/fastdoubleparser/parser/FastDoubleParserBenchmark.java 
 java -classpath out ch.randelshofer.fastdoubleparser.FastDoubleParserBenchmark 
 java -classpath out ch.randelshofer.fastdoubleparser.FastDoubleParserBenchmark data/canada.txt

On my Mac mini (2018) I get the following results:

Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
OpenJDK 64-Bit Server VM, Oracle Corporation, 16+36-2231

parsing random numbers in the range [0,1)
=== number of trials 32 =====
FastDoubleParser.parseDouble  MB/s avg: 341.838343, min: 311.14, max: 367.31
Double.parseDouble            MB/s avg: 82.909605, min: 75.72, max: 89.24
Speedup FastDoubleParser vs Double: 4.123025

parsing numbers in file data/canada.txt
read 111126 lines
=== number of trials 32 =====
FastDoubleParser.parseDouble  MB/s avg: 319.496176, min: 216.37, max: 365.83
Double.parseDouble            MB/s avg: 71.851737, min: 46.78, max: 82.76
Speedup FastDoubleParser vs Double: 4.446603

FastDoubleParser also speeds up parsing of hexadecimal float literals:

Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
OpenJDK 64-Bit Server VM, Oracle Corporation, 16+36-2231

parsing numbers in file data/0to1_hexfloats.txt
read 100000 lines
=== number of trials 32 =====
FastDoubleParser.parseDouble  MB/s avg: 256.321683, min: 204.29, max: 294.19
Double.parseDouble            MB/s avg: 44.747641, min: 27.28, max: 53.00
Speedup FastDoubleParser vs Double: 5.728161

parsing numbers in file data/canada_hexfloats.txt
read 111126 lines
=== number of trials 32 =====
FastDoubleParser.parseDouble  MB/s avg: 255.023209, min: 210.57, max: 286.97
Double.parseDouble            MB/s avg: 45.355415, min: 26.25, max: 52.34
Speedup FastDoubleParser vs Double: 5.622773

Please note that the performance gains depend a lot on the shape of the input data. Below are two test sets that are less favorable for the current implementation of the code:

Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
OpenJDK 64-Bit Server VM, Oracle Corporation, 16+36-2231

parsing numbers in data/shorts.txt
read 100000 lines
=== number of trials 32 =====
FastDoubleParser.parseDouble  MB/s avg: 124.894498, min: 69.98, max: 167.59
Double.parseDouble            MB/s avg: 88.799489, min: 51.79, max: 122.16
Speedup FastDoubleParser vs Double: 1.406478


parsing numbers in file data/FastDoubleParser_errorcases.txt
read 26916 lines
=== number of trials 32 =====
FastDoubleParser.parseDouble  MB/s avg: 73.687863, min: 26.99, max: 97.97
Double.parseDouble            MB/s avg: 81.740633, min: 35.64, max: 109.20
Speedup FastDoubleParser vs Double: 0.901484
Comments
  • BigDecimal parser

    BigDecimal parser

    Thanks for all the hard work on the double and float parsers. Would there be any chance that you could consider adding support for BigDecimal parsing? A lot of the low level parser could be reused.

    opened by pjfanning 12
  • FastDoubleParser doesn't support all input formats as the default OpenJDK Float/Double parsers

    FastDoubleParser doesn't support all input formats as the default OpenJDK Float/Double parsers

    The FastDoubleParser was recently introduced in Jackson through this issue https://github.com/FasterXML/jackson-core/issues/577 is 3-4x times faster compared to the version that's implemented in OpenJDK. This is fantastic news, since many numerical processing workloads would benefit from this.

    However the OpenJDK Double/Float parsers support variety of input formats that the FastDoubleParser will fail on, therefore it can cause unexpected regressions when used.

    For example, the FastDoubleParser will fail with a NumberFormatException on these example patterns (there are more to be found in the OpenJDK Double/Float tests):

    1.1e-23f 0x.003p12f 0x1.17742db862a4P-1d

    I think apart from the first one in this list, the rest are all hexadecimal if I'm not mistaken.

    opened by grcevski 10
  • The parser throws StringIndexOutOfBoundsException/ArrayIndexOutOfBoundsException for some inputs

    The parser throws StringIndexOutOfBoundsException/ArrayIndexOutOfBoundsException for some inputs

    The parser throws StringIndexOutOfBoundsException/ArrayIndexOutOfBoundsException for some inputs. For example with the following input: "0x".

    This issue has been discovered in https://github.com/FasterXML/jackson-core/issues/809

    The only exceptions, that the parser may throw are:

    • NumberFormatException when the input is illegal
    • OutOfMemoryException when the JVM fails to allocate memory for objects created by the parser
    opened by wrandelshofer 6
  • make it allocation free on happy path

    make it allocation free on happy path

    https://github.com/wrandelshofer/FastDoubleParser/blob/8df68712f159304a828b4ddfc11a14a54ec6c973/src/main/java/ch/randelshofer/fastdoubleparser/FastDoubleParser.java#L530

    why allocate here?? by that time you know it's not a NaN for sure... so instead of returning null you can just return Double.NaN or whatever special constant.

    lack of jmh tests is also troubling :(

    opened by nevgeniev 6
  • Introduce constants for parsers

    Introduce constants for parsers

    The *FromByteArray, *FromCharArray and *FromCharSequence classes are stateless and their references in the *Parser classes can be made constants.

    I was working on a patch for Jackson to make double parsing work on char[] directly without allocating a String. When profiling the allocation rate did not go down as much as expected. The reason was that I got a lot of JavaDoubleBitsFromCharArray allocations in JavaDoubleParser. In theory escape analysis should catch these and scalar replacement should eliminate them. It does not. I haven't looked into it, maybe the method is already too large.

    allocations-before

    gcs-before

    After the changes the allocations and GCs are gone.

    allocations-after

    gcs-after

    I'm seeing a small performance improvement but it's hard to be sure as the deviation is still quite large.

    opened by marschall 5
  • Large number of incorrect parsing

    Large number of incorrect parsing

    For https://github.com/fastfloat/fast_float, we have extensive tests. I have run them through on FastDoubleParser and found many failures, I have collected them in this gist...

    https://gist.github.com/lemire/641a34589c36747f6d24ed6d29ac75f0

    The algorithm at https://github.com/fastfloat/fast_float handles all of these cases correctly.

    You may refer to https://arxiv.org/abs/2101.11408 or to the C# port at https://github.com/CarlVerret/csFastFloat

    opened by lemire 5
  • SWAR routines accept invalid non-digit chars/bytes

    SWAR routines accept invalid non-digit chars/bytes

    https://github.com/wrandelshofer/FastDoubleParser/blob/0903817a765b25e654f02a5a9d4f1476c98a80c9/src/main/java/ch.randelshofer.fastdoubleparser/ch/randelshofer/fastdoubleparser/FastDoubleSimd.java#L117

    Use 0x76 instead of 0x46 byte for detection of invalid digits in "numbers" like 1X345678.

    opened by plokhotnyuk 4
  • float parser

    float parser

    Hi - thanks for all the great work on the double parser. I've been experimenting with it for possible inclusion in jackson-core.

    Parsing floats using the double parser is also much faster than using Float.parseFloat but unfortunately casting doubles to floats can often give you different result from plain Float.parseFloat.

    Would it be possible to consider also supporting a dedicated float parser?

    An example is 7.006492321624086e-46 which Float.parseFloat returns as 1.4E-45 but using FastDoubleParser:

            double dbl = FastDoubleParser.parseDouble("7.006492321624086e-46");
            System.out.println("double=" + dbl); //7.006492321624085E-46
            System.out.println("float=" + (float)dbl); //0.0
    
    opened by pjfanning 4
  • Double.parseDouble(...) != FastDoubleParser.parseDouble(...)

    Double.parseDouble(...) != FastDoubleParser.parseDouble(...)

    I have found another input string for which the return values of Double.parseDouble and FastDoubleParser.parseDouble differ. This one is less important than #6 though as it implies only a very minor loss in precision:

    Double.parseDouble("-2.2222222222223e-322"): -2.2E-322
    FastDoubleParser.parseDouble("-2.2222222222223e-322"): 0.0
    

    Both this issue and #6 have been found with the open-source JVM fuzzer Jazzer. If you are interested in these kinds of findings, I could add the fuzzer to the project as a PR.

    opened by fmeum 4
  • Jackson OSS Fuzz issue

    Jackson OSS Fuzz issue

    Relates to https://github.com/FasterXML/jackson-core/issues/809

    This change is based on the stacktrace in that issue. I don't yet have the value that caused the failure.

    opened by pjfanning 3
  • Publish a multi-release JAR

    Publish a multi-release JAR

    "We use your java8 code in jackson-core. If you publish a jar with your java8 branch code that would be great - we would change our build to use your published jars and that shades the class packages to include them in jackson-core jar.

    One solution would be to append '-java8' to the artifact name (and '-java17' for the java17 jar). Or maven supports 'classifiers' which basically lead to a similar result."

    Originally posted by @pjfanning in https://github.com/wrandelshofer/FastDoubleParser/issues/22#issuecomment-1318507290

    opened by wrandelshofer 2
  • possible performance issue with very big doubles

    possible performance issue with very big doubles

    JavaDoubleParser seems to be slower than Double.parseDouble for very large numbers (thousands of digits). Malicious actors often create input files with large numbers to try to cause denial of service issues.

    I have a jmh benchmark at https://github.com/pjfanning/jackson-number-parse-bench

    ./gradlew jmh

    It's worth checking the build.gradle file as I have a param that controls which benchmark to run.

    jmh {
        includes = ['org.example.jackson.bench.DoubleParserBench']
    }
    

    I'm wondering if it would be possible to disregard the least significant digits. If there are 1000 digits, only the first 30 or 40 digits should really impact the double value - even if you were conservative and limited it 100 or 200, this would limit the risk vector.

    opened by pjfanning 24
  • BigInteger parser

    BigInteger parser

        @wrandelshofer I'm using v0.5.2 and have found that `JavaBigIntegerParser,parseBigInteger(CharSequence str)` accepts hex values like "AAAA" but `new BigInteger(String)` throws a NumberFormatException with "AAAA".
    

    Would it be possible to support being able to disable hex support?

    Originally posted by @pjfanning in https://github.com/wrandelshofer/FastDoubleParser/issues/24#issuecomment-1332929009

    opened by wrandelshofer 6
Releases(v0.5.4)
Owner
null
Socket.IO server implemented on Java. Realtime java framework

Netty-socketio Overview This project is an open-source Java implementation of Socket.IO server. Based on Netty server framework. Checkout Demo project

Nikita Koksharov 6k Dec 30, 2022
Apache Dubbo is a high-performance, java based, open source RPC framework.

Apache Dubbo Project Apache Dubbo is a high-performance, Java-based open-source RPC framework. Please visit official site for quick start and document

The Apache Software Foundation 38.2k Dec 31, 2022
The Java gRPC implementation. HTTP/2 based RPC

gRPC-Java - An RPC library and framework gRPC-Java works with JDK 7. gRPC-Java clients are supported on Android API levels 16 and up (Jelly Bean and l

grpc 10.2k Jan 1, 2023
TCP/UDP client/server library for Java, based on Kryo

KryoNet can be downloaded on the releases page. Please use the KryoNet discussion group for support. Overview KryoNet is a Java library that provides

Esoteric Software 1.7k Jan 2, 2023
An annotation-based Java library for creating Thrift serializable types and services.

Drift Drift is an easy-to-use, annotation-based Java library for creating Thrift clients and serializable types. The client library is similar to JAX-

null 225 Dec 24, 2022
ssh, scp and sftp for java

sshj - SSHv2 library for Java To get started, have a look at one of the examples. Hopefully you will find the API pleasant to work with :) Getting SSH

Jeroen van Erp 2.2k Jan 8, 2023
Java library for representing, parsing and encoding URNs as in RFC2141 and RFC8141

urnlib Java library for representing, parsing and encoding URNs as specified in RFC 2141 and RFC 8141. The initial URN RFC 2141 of May 1997 was supers

SLUB 24 May 10, 2022
jRT measures the response time of a java application to socket-based requests

jRT Version: 0.0.1 jRT is a instrumentation tool that logs and records networking I/O operations "response times" (applicaion response time if be corr

null 45 May 19, 2022
Java API over Accelio

JXIO JXIO is Java API over AccelIO (C library). AccelIO (http://www.accelio.org/) is a high-performance asynchronous reliable messaging and RPC librar

Accelio 75 Nov 1, 2022
Unconventional I/O library for Java

one-nio one-nio is a library for building high performance Java servers. It features OS capabilities and JDK internal APIs essential for making your h

OK.ru 589 Dec 29, 2022
Proteus Java Client

Netifi Proteus Java This project has been moved to https://github.com/netifi/netifi-java Build from Source Run the following Gradle command to build t

netifi-proteus 42 Nov 20, 2020
A Java library for capturing, crafting, and sending packets.

Japanese Logos Pcap4J Pcap4J is a Java library for capturing, crafting and sending packets. Pcap4J wraps a native packet capture library (libpcap, Win

Kaito Yamada 1k Dec 30, 2022
Socket.IO Client Implementation in Java

Socket.IO-Client for Java socket.io-java-client is an easy to use implementation of socket.io for Java. It uses Weberknecht as transport backend, but

Enno Boland 946 Dec 21, 2022
Full-featured Socket.IO Client Library for Java, which is compatible with Socket.IO v1.0 and later.

Socket.IO-client Java This is the Socket.IO Client Library for Java, which is simply ported from the JavaScript client. See also: Android chat demo en

Socket.IO 5k Jan 4, 2023
Asynchronous Http and WebSocket Client library for Java

Async Http Client Follow @AsyncHttpClient on Twitter. The AsyncHttpClient (AHC) library allows Java applications to easily execute HTTP requests and a

AsyncHttpClient 6k Dec 31, 2022
API gateway for REST and SOAP written in Java.

API gateway for REST and SOAP written in Java.

predic8 GmbH 389 Dec 31, 2022
Experimental Netty-based Java 16 application/web framework

Experimental Netty-based application/web framework. An example application can be seen here. Should I use this? Probably not! It's still incredibly ea

amy null 8 Feb 17, 2022
A barebones WebSocket client and server implementation written in 100% Java.

Java WebSockets This repository contains a barebones WebSocket server and client implementation written in 100% Java. The underlying classes are imple

Nathan Rajlich 9.5k Dec 30, 2022
A Java event based WebSocket and HTTP server

Webbit - A Java event based WebSocket and HTTP server Getting it Prebuilt JARs are available from the central Maven repository or the Sonatype Maven r

null 808 Dec 23, 2022