Comparison between Java and Common Lisp solutions to a phone-encoding problem described by Prechelt

Overview

Prechelt Phone Number Encoding

This project implements the phone number encoding described by Lutz Prechelt in his article for the COMMUNICATIONS OF THE ACM (October 1999/Vol. 42, No. 10), Comparing Java vs. C/C++ Efficiency Differences to Interpersonal Differences, later re-used for a more extended paper, An empirical comparisonofC, C++, Java,Perl, Python, Rexx, and Tclfor asearch/string-processing program from March, 2000.

The same problem was later used by Ron Garret (aka Erann Gat) on Lisp as an alternative to Java, also from the year 2000.

The instructions given to participants of the latter study (which was slightly adapted from the original paper, and used to implement my own solution in this repository) can be found at flownet.com.

Peter Norvig, after finding out about the Lisp paper, wrote his own Lisp program to solve the problem.

His solution is included in this repository at src/lisp/main.lisp as a baseline for performance measurements (see benchmark.sh).

Structure of this repository

The solution I first came up with is in src/java/Main.java.

Later, I ported Norvig's Lisp solution to Java (src/java/Main2.java) and Rust (src/rust/phone_encoder/src/main.rs).

In order to be able to benchmark the programs, I created a few utilities in Java and Rust:

To run the benchmarks, a shell script, benchmark.sh, is used.

Compiling and running

Requirements

The benchmarks currently require the following tools:

  • Java 16+.
  • Rust and Cargo (tested with rustc 1.53.0 (53cb7b09b 2021-06-17)).
  • SBCL, Steel Bank Common Lisp compiler.

Java

Run this command to compile the Java solutions:

$ javac src/java/*.java -d build

To compile the utilities, run:

$ javac src/java/util/*.java -d build

Rust

Compile the Rust solution:

$ cd src/rust/phone_encoder
$ cargo build --release

Compile the Benchmark runner:

$ cd src/rust/benchmark_runner
$ cargo build --release

Lisp

No compilation is required.

Running

Java

$ java -cp build Main dictionary.txt input.txt

For all programs, you can use the smaller dictionary at tests/words.txt and input file tests/numbers.txt to print just a few results.

Rust

$ cp src/rust/phone_encoder/target/release/phone_encoder .
$ ./phone_encoder dictionary.txt input.txt

Lisp

$ sbcl --script src/lisp/main.lisp dictionary.txt input.txt

Testing

To verify that your program works correctly execute it with arguments dictionary.txt input.txt, piping its output to a file, say prog_out.txt, then run:

$ diff -q <(sort prog_out.txt) <(sort output.txt)

If diff exits with 0 and does not print anything, your program is good.

The benchmarks generate input files of different sizes by running my Java utility. If you want to try your solution with larger inputs, try this, for example, to generate 50,000 random phone numbers:

$ java -cp build/util util.GeneratePhoneNumbers 50000 > phones_50_000.txt

To partially check your solution works with the larger inputs (the checker cannot guarantee that the digit-substitution algorithm is perfectly correct), run my Java utility as follows:

$ java -cp build/util util.OutputChecker dictionary.txt program_output.txt

Benchmarks

I've published the benchmarks results on a Google Docs Spreadsheet.

To run it yourself, checkout this repository and run:

See the requirements above first.

./benchmark.sh

Adding a new program to the benchmarks

I don't currently intend to accept pull requests to this repository because I used the programs in this repository on an article I wrote on my website and I don't want it to become stale.

However, feel free to fork it and modify it as you see fit!

  1. Add your program to the appropriate folder under src/lang/ (e.g. src/lisp/main.lisp).
  2. If it requires a new compiler, update README.md's Requirements section.
  3. Edit benchmark.sh as follows:
    • if it requires compilation, add the compilation command after the echo "Compiling Rust sources" block.
    • add the command to run your program to the COMMANDS array.

Solutions in other languages

I've found the following other solutions to Prechelt's phone number encoding so far:

This repository includes also Dart and Julia solutions in the dart and julia branches, respectively.

Comments
  • Faster Java and added C

    Faster Java and added C

    Hi. I have written a new Java implementation that runs faster than the current ones in most cases. I have also written the same solution in C. These are the benchmark results (running on WSL) against the new Java versions and the old Rust an Lisp implementations:

    ===> java -cp build/java Faster
    Proc,Memory(bytes),Time(ms)
    java,0,288
    Proc,Memory(bytes),Time(ms)
    java,0,311
    Proc,Memory(bytes),Time(ms)
    java,0,515
    Proc,Memory(bytes),Time(ms)
    java,0,607
    ===> java -cp build/java Main
    Proc,Memory(bytes),Time(ms)
    java,0,332
    Proc,Memory(bytes),Time(ms)
    java,0,383
    Proc,Memory(bytes),Time(ms)
    java,0,546
    Proc,Memory(bytes),Time(ms)
    java,0,801
    ===> java -cp build/java Main2
    Proc,Memory(bytes),Time(ms)
    java,0,256
    Proc,Memory(bytes),Time(ms)
    java,0,409
    Proc,Memory(bytes),Time(ms)
    java,0,951
    Proc,Memory(bytes),Time(ms)
    java,0,1335
    ===> sbcl --script src/lisp/main.lisp
    Proc,Memory(bytes),Time(ms)
    sbcl,0,189
    Proc,Memory(bytes),Time(ms)
    sbcl,0,298
    Proc,Memory(bytes),Time(ms)
    sbcl,0,761
    Proc,Memory(bytes),Time(ms)
    sbcl,0,1390
    ===> ./phone_encoder
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder,0,161
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder,0,274
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder,0,789
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder,0,1457
    ===> ./phone_encoder_c
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder_c,0,104
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder_c,0,119
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder_c,0,202
    Proc,Memory(bytes),Time(ms)
    ./phone_encoder_c,0,316
    

    The implementation is almost the same as Main.java, but I group the partial solution words by length. In Main.java completeSolution is called once for each word in the Trie:

    // each word in this trie may provide a new solution
    for ( Item word : trie.values ) {
        wordFound = true;
        var nextSolution = append( solution, new Node( word ) );
        // accept solution if we're at the end
        if ( atEndOfInput ) {
            onSolution.accept( nextSolution );
        } else {
            root.completeSolution( nextSolution, chars, index + 1, true, onSolution );
        }
    }
    

    In my implementation the equivalent method is only called once for each word length. E. g. if 123 translates to both "foo" and "bar" and the number to encode is 12345, I group "foo" and "bar", and then encode 45 only one time, instead of having to do it twice. Because of this, the partial solution becomes an irregular matrix of words instead of a list of words.

    opened by davidaf3 8
  • 156 more solutions?

    156 more solutions?

    Hi, I tried to code my solution to this study using c++17 (using 4h) following the instructions in flownet.com link.

    Running with your dictionary.txt and input.txt my program produces all the solutions in your output, but in addition outputs 156 more solutions.

    I tried to manually check two of these extra 'solutions', which seem to be correct:

    /78698/37395746288: bo"ig 8 Sud gab Tirol
                        78 69 8 373 957 46288
    657205-851: im 7 wem Pan
                65 7 205 851
    

    ???

    opened by jan100099 3
  • Low hanging fruit Julia optimizations

    Low hanging fruit Julia optimizations

    • Use an array instead of a function to get the mapping from letter to digit
    • Avoid using global variables (these are terrible because Julia can't do type inference on them)
    • Remove types from function signatures. They don't actually help much in optimizing.
    • Put all the code in a module. When using a bare script Julia has to compile everything everytime. When using a module, at least some "pre-compilation" hints are cached. This doesn't remove compilation entirely.

    time julia src/julia/phone_encoder.jl dictionary.txt input.txt goes from 2s to 1.5s on my machine. These are only improvements for the dictionary, so for large inputs there isn't much improvement.

    opened by savq 3
  • Added Go and updates

    Added Go and updates

    Hi!

    I have added a Go implementation and updated my old Java and C solutions. I ran (on WSL) the benchmark an these are the results I got:

    Proc,Run,Memory(bytes),Time(ms)
    ===> java -Xms20M -Xmx100M -cp build/java Main
    java-Main,0,439
    java-Main,0,1189
    java-Main,0,1870
    java-Main,0,3220
    java-Main,0,9520
    java-Main,0,13760
    ===> java -Xms20M -Xmx100M -cp build/java MainFaster
    java-MainFaster,0,371
    java-MainFaster,0,742
    java-MainFaster,0,1775
    java-MainFaster,0,3197
    java-MainFaster,0,1007
    java-MainFaster,0,1541
    ===> ./lisp-phone-encoder
    ./lisp-phone-encoder,0,1488
    ./lisp-phone-encoder,0,2707
    ./lisp-phone-encoder,0,7547
    ./lisp-phone-encoder,0,13030
    ./lisp-phone-encoder,0,27022
    ./lisp-phone-encoder,0,44292
    ===> ./rust
    ./rust,0,104
    ./rust,0,419
    ./rust,0,1693
    ./rust,0,3203
    ./rust,0,8665
    ./rust,0,13972
    ===> ./phone_encoder_c
    ./phone_encoder_c,0,130
    ./phone_encoder_c,0,254
    ./phone_encoder_c,0,1053
    ./phone_encoder_c,0,1551
    ./phone_encoder_c,0,490
    ./phone_encoder_c,0,908
    ===> ./phone_encoder_go
    ./phone_encoder_go,0,248
    ./phone_encoder_go,0,512
    ./phone_encoder_go,0,1549
    ./phone_encoder_go,0,2813
    ./phone_encoder_go,0,970
    ./phone_encoder_go,0,1755
    

    benchmark-result

    opened by davidaf3 6
  • Any interest in adding Julia to the mix?

    Any interest in adding Julia to the mix?

    Here is a ditto julia port of Norvig's code - https://gist.github.com/srikumarks/920fc8e61e06d0c5aa9162fbb9e606cd I don't have sbcl installed, but if the input sizes are large enough, julia trumps both the (current) rust implementation and Main2.java (openjdk17) on my machine. It uses more memory than rust and less memory than java.

    Note: Julia is infamous for adding a chunk of compile time initially which makes benchmarking hard. So the differences start showing up for n = 200000 or more. I tried 400000 and it was 4x as fast as Main2.java and 7x as fast as the (current) rust impl.

    opened by srikumarks 7
  • Add Nim

    Add Nim

    Here's a Nim version. I didn't want to install Java on my macOS and couldn't get benchmark.sh to run on a Linux machine, but maybe you're willing to install Nim to see it work?

    Installation instructions for Nim: https://nim-lang.org/install.html

    Here are my times. Nim:

    time ./nim_phone_encoder dictionary.txt phones_50_000.txt
    ...
    real	0m0.903s
    user	0m0.877s
    sys	0m0.019s
    

    Rust:

    time ./phone_encoder dictionary.txt phones_50_000.txt
    real	0m1.150s
    user	0m1.121s
    sys	0m0.024s
    
    opened by iffy 2
  • Optimized version of rust phone_encoder with ibig (around ~x2 faster for me)

    Optimized version of rust phone_encoder with ibig (around ~x2 faster for me)

    I swapped num-bigint with ibig, which explicitly focuses on performance. Also did some refactoring for it to look more Rusty and enabled Profile Guided Optimization. That said, ibig yielded most gains.

    Before:

    ===> java -cp build/java Main Proc,Memory(bytes),Time(ms) java,0,321 Proc,Memory(bytes),Time(ms) java,0,438 Proc,Memory(bytes),Time(ms) java,0,603 Proc,Memory(bytes),Time(ms) java,0,718 ===> java -cp build/java Main2 Proc,Memory(bytes),Time(ms) java,0,305 Proc,Memory(bytes),Time(ms) java,0,473 Proc,Memory(bytes),Time(ms) java,0,1013 Proc,Memory(bytes),Time(ms) java,0,1619 ===> sbcl --script src/lisp/main.lisp Proc,Memory(bytes),Time(ms) sbcl,0,92 Proc,Memory(bytes),Time(ms) sbcl,0,137 Proc,Memory(bytes),Time(ms) sbcl,0,480 Proc,Memory(bytes),Time(ms) sbcl,0,922 ===> ./phone_encoder Proc,Memory(bytes),Time(ms) ./phone_encoder,0,97 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,194 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,649 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,1223 Cleaning up

    After:

    ===> java -cp build/java Main Proc,Memory(bytes),Time(ms) java,0,325 Proc,Memory(bytes),Time(ms) java,0,422 Proc,Memory(bytes),Time(ms) java,0,619 Proc,Memory(bytes),Time(ms) java,0,858 ===> java -cp build/java Main2 Proc,Memory(bytes),Time(ms) java,0,294 Proc,Memory(bytes),Time(ms) java,0,524 Proc,Memory(bytes),Time(ms) java,0,1244 Proc,Memory(bytes),Time(ms) java,0,1702 ===> sbcl --script src/lisp/main.lisp Proc,Memory(bytes),Time(ms) sbcl,0,86 Proc,Memory(bytes),Time(ms) sbcl,0,147 Proc,Memory(bytes),Time(ms) sbcl,0,513 Proc,Memory(bytes),Time(ms) sbcl,0,875 ===> ./phone_encoder Proc,Memory(bytes),Time(ms) ./phone_encoder,0,45 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,102 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,315 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,616 Cleaning up

    My setup: Archlinux with AMD Ryzen 7 4800H

    opened by alianse777 4
  • Optimised Common Lisp code

    Optimised Common Lisp code

    My tiny take on it.

    Just some code refactoring, minor optimizations in a few places, some inlining, and additional type and optimization declarations here and there.

    I've set block compilation on resulting in better speed so please compile the file before use and change xxx.lisp to xxx.fasl in benchmark.sh.

    It is more than 2x faster than the original Common Lisp version.

    My setup: M1 MacBook Air. SBCL-2.1.7 Java Zulu16.32+15-CA (build 16.0.2+7) rustc 1.53.0

    opened by bpecsek 0
Owner
Renato Athaydes
Software developer specializing in the JVM (Java, Groovy, Kotlin), with a passion for writing great applications. I also like Dart, Go, security and protocols.
Renato Athaydes
Aix-bench, the Java benchmark for code synthesis problem.

AiXcoder NL2Code Evaluation Benchmark (aix-bench) 简体中文 Paper available: https://arxiv.org/abs/2206.13179 Introduction This is a method-level benchmark

null 28 Dec 12, 2022
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

java-string-similarity A library implementing different string similarity and distance measures. A dozen of algorithms (including Levenshtein edit dis

Thibault Debatty 2.5k Dec 29, 2022
Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

Alibaba 34 Oct 14, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022
A Java library for quickly and efficiently parsing and writing UUIDs

fast-uuid fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compar

Jon Chambers 142 Jan 1, 2023
gRPC and protocol buffers for Android, Kotlin, and Java.

Wire “A man got to have a code!” - Omar Little See the project website for documentation and APIs. As our teams and programs grow, the variety and vol

Square 3.9k Jan 5, 2023
The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Homework #14 Table of Contents General Info Technologies Used Project Status Contact General Information Homework contains topics: Sorting an ArrayLis

Mykhailo 1 Feb 12, 2022
Immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch processes and shipped to multiple servers.

Minimal Perfect Hash Tables About Minimal Perfect Hash Tables are an immutable key/value store with efficient space utilization and fast reads. They a

Indeed Engineering 92 Nov 22, 2022
High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Real Logic 2.5k Jan 5, 2023
Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Netflix, Inc. 1.1k Dec 25, 2022
Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

Roman Leventov 967 Nov 14, 2022
LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan), audio (OpenAL), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR) applications.

LWJGL - Lightweight Java Game Library 3 LWJGL (https://www.lwjgl.org) is a Java library that enables cross-platform access to popular native APIs usef

Lightweight Java Game Library 4k Dec 29, 2022
A modern I/O library for Android, Kotlin, and Java.

Okio See the project website for documentation and APIs. Okio is a library that complements java.io and java.nio to make it much easier to access, sto

Square 8.2k Dec 31, 2022
Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

rtree In-memory immutable 2D R-tree implementation in java using RxJava Observables for reactive processing of search results. Status: released to Mav

Dave Moten 999 Dec 20, 2022
RxJava – Reactive Extensions for the JVM – a library for composing asynchronous and event-based programs using observable sequences for the Java VM.

RxJava: Reactive Extensions for the JVM RxJava is a Java VM implementation of Reactive Extensions: a library for composing asynchronous and event-base

ReactiveX 46.7k Dec 30, 2022
Lightweight threads for Java, with message passing, nio, http and scheduling support.

Kilim: Continuations, Fibers, Actors and message passing for the JVM

Sriram Srinivasan 1.7k Jan 3, 2023
Jalgorithm is an open-source Java library which has implemented various algorithms and data structure

We loved Java and algorithms, so We made Jalgorithm ❤ Jalgorithm is an open-source Java library which has implemented various algorithms and data stru

Muhammad Karbalaee 35 Dec 15, 2022