Comparison between Java and Common Lisp solutions to a phone-encoding problem described by Prechelt

Last update: Nov 30, 2021

Overview

Prechelt Phone Number Encoding

This project implements the phone number encoding described by Lutz Prechelt in his article for the COMMUNICATIONS OF THE ACM (October 1999/Vol. 42, No. 10), Comparing Java vs. C/C++ Efficiency Differences to Interpersonal Differences, later re-used for a more extended paper, An empirical comparisonofC, C++, Java,Perl, Python, Rexx, and Tclfor asearch/string-processing program from March, 2000.

The same problem was later used by Ron Garret (aka Erann Gat) on Lisp as an alternative to Java, also from the year 2000.

The instructions given to participants of the latter study (which was slightly adapted from the original paper, and used to implement my own solution in this repository) can be found at flownet.com.

Peter Norvig, after finding out about the Lisp paper, wrote his own Lisp program to solve the problem.

His solution is included in this repository at src/lisp/main.lisp as a baseline for performance measurements (see benchmark.sh).

Structure of this repository

The solution I first came up with is in src/java/Main.java.

Later, I ported Norvig's Lisp solution to Java (src/java/Main2.java) and Rust (src/rust/phone_encoder/src/main.rs).

In order to be able to benchmark the programs, I created a few utilities in Java and Rust:

src/java/util/GeneratePhoneNumbers.java generates input files with random phone numbers.
src/java/util/LineCount.java counts lines of code (LOC) on given files.
src/java/util/OutputChecker.java can check the output of a program for correctness.
src/rust/benchmark_runner is a Rust project that compiles a binary that can run programs and benchmark them.

To run the benchmarks, a shell script, benchmark.sh, is used.

Compiling and running

Requirements

The benchmarks currently require the following tools:

Java 16+.
Rust and Cargo (tested with rustc 1.53.0 (53cb7b09b 2021-06-17)).
SBCL, Steel Bank Common Lisp compiler.

Java

Run this command to compile the Java solutions:

$ javac src/java/*.java -d build

To compile the utilities, run:

$ javac src/java/util/*.java -d build

Rust

Compile the Rust solution:

$ cd src/rust/phone_encoder
$ cargo build --release

Compile the Benchmark runner:

$ cd src/rust/benchmark_runner
$ cargo build --release

Lisp

No compilation is required.

Running

Java

$ java -cp build Main dictionary.txt input.txt

For all programs, you can use the smaller dictionary at tests/words.txt and input file tests/numbers.txt to print just a few results.

Rust

$ cp src/rust/phone_encoder/target/release/phone_encoder .
$ ./phone_encoder dictionary.txt input.txt

Lisp

$ sbcl --script src/lisp/main.lisp dictionary.txt input.txt

Testing

To verify that your program works correctly execute it with arguments dictionary.txt input.txt, piping its output to a file, say prog_out.txt, then run:

$ diff -q <(sort prog_out.txt) <(sort output.txt)

If diff exits with 0 and does not print anything, your program is good.

The benchmarks generate input files of different sizes by running my Java utility. If you want to try your solution with larger inputs, try this, for example, to generate 50,000 random phone numbers:

$ java -cp build/util util.GeneratePhoneNumbers 50000 > phones_50_000.txt

To partially check your solution works with the larger inputs (the checker cannot guarantee that the digit-substitution algorithm is perfectly correct), run my Java utility as follows:

$ java -cp build/util util.OutputChecker dictionary.txt program_output.txt

Benchmarks

I've published the benchmarks results on a Google Docs Spreadsheet.

To run it yourself, checkout this repository and run:

See the requirements above first.

./benchmark.sh

Adding a new program to the benchmarks

I don't currently intend to accept pull requests to this repository because I used the programs in this repository on an article I wrote on my website and I don't want it to become stale.

However, feel free to fork it and modify it as you see fit!

Add your program to the appropriate folder under src/lang/ (e.g. src/lisp/main.lisp).
If it requires a new compiler, update README.md's Requirements section.
Edit benchmark.sh as follows:
- if it requires compilation, add the compilation command after the echo "Compiling Rust sources" block.
- add the command to run your program to the COMMANDS array.

Solutions in other languages

I've found the following other solutions to Prechelt's phone number encoding so far:

This repository includes also Dart and Julia solutions in the dart and julia branches, respectively.

Comments

Faster Java and added C

Hi. I have written a new Java implementation that runs faster than the current ones in most cases. I have also written the same solution in C. These are the benchmark results (running on WSL) against the new Java versions and the old Rust an Lisp implementations:

===> java -cp build/java Faster
Proc,Memory(bytes),Time(ms)
java,0,288
Proc,Memory(bytes),Time(ms)
java,0,311
Proc,Memory(bytes),Time(ms)
java,0,515
Proc,Memory(bytes),Time(ms)
java,0,607
===> java -cp build/java Main
Proc,Memory(bytes),Time(ms)
java,0,332
Proc,Memory(bytes),Time(ms)
java,0,383
Proc,Memory(bytes),Time(ms)
java,0,546
Proc,Memory(bytes),Time(ms)
java,0,801
===> java -cp build/java Main2
Proc,Memory(bytes),Time(ms)
java,0,256
Proc,Memory(bytes),Time(ms)
java,0,409
Proc,Memory(bytes),Time(ms)
java,0,951
Proc,Memory(bytes),Time(ms)
java,0,1335
===> sbcl --script src/lisp/main.lisp
Proc,Memory(bytes),Time(ms)
sbcl,0,189
Proc,Memory(bytes),Time(ms)
sbcl,0,298
Proc,Memory(bytes),Time(ms)
sbcl,0,761
Proc,Memory(bytes),Time(ms)
sbcl,0,1390
===> ./phone_encoder
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,161
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,274
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,789
Proc,Memory(bytes),Time(ms)
./phone_encoder,0,1457
===> ./phone_encoder_c
Proc,Memory(bytes),Time(ms)
./phone_encoder_c,0,104
Proc,Memory(bytes),Time(ms)
./phone_encoder_c,0,119
Proc,Memory(bytes),Time(ms)
./phone_encoder_c,0,202
Proc,Memory(bytes),Time(ms)
./phone_encoder_c,0,316

The implementation is almost the same as Main.java, but I group the partial solution words by length. In Main.java completeSolution is called once for each word in the Trie:

// each word in this trie may provide a new solution
for ( Item word : trie.values ) {
    wordFound = true;
    var nextSolution = append( solution, new Node( word ) );
    // accept solution if we're at the end
    if ( atEndOfInput ) {
        onSolution.accept( nextSolution );
    } else {
        root.completeSolution( nextSolution, chars, index + 1, true, onSolution );
    }
}

In my implementation the equivalent method is only called once for each word length. E. g. if 123 translates to both "foo" and "bar" and the number to encode is 12345, I group "foo" and "bar", and then encode 45 only one time, instead of having to do it twice. Because of this, the partial solution becomes an irregular matrix of words instead of a list of words.

opened by davidaf3 8

156 more solutions?
Hi, I tried to code my solution to this study using c++17 (using 4h) following the instructions in flownet.com link.

Running with your dictionary.txt and input.txt my program produces all the solutions in your output, but in addition outputs 156 more solutions.

I tried to manually check two of these extra 'solutions', which seem to be correct:

/78698/37395746288: bo"ig 8 Sud gab Tirol 78 69 8 373 957 46288 657205-851: im 7 wem Pan 65 7 205 851

???
opened by jan100099 3
Low hanging fruit Julia optimizations
Use an array instead of a function to get the mapping from letter to digit

Avoid using global variables (these are terrible because Julia can't do type inference on them)

Remove types from function signatures. They don't actually help much in optimizing.

Put all the code in a module. When using a bare script Julia has to compile everything everytime. When using a module, at least some "pre-compilation" hints are cached. This doesn't remove compilation entirely.

time julia src/julia/phone_encoder.jl dictionary.txt input.txt goes from 2s to 1.5s on my machine. These are only improvements for the dictionary, so for large inputs there isn't much improvement.
opened by savq 3

Added Go and updates

Hi!

I have added a Go implementation and updated my old Java and C solutions. I ran (on WSL) the benchmark an these are the results I got:

Proc,Run,Memory(bytes),Time(ms)
===> java -Xms20M -Xmx100M -cp build/java Main
java-Main,0,439
java-Main,0,1189
java-Main,0,1870
java-Main,0,3220
java-Main,0,9520
java-Main,0,13760
===> java -Xms20M -Xmx100M -cp build/java MainFaster
java-MainFaster,0,371
java-MainFaster,0,742
java-MainFaster,0,1775
java-MainFaster,0,3197
java-MainFaster,0,1007
java-MainFaster,0,1541
===> ./lisp-phone-encoder
./lisp-phone-encoder,0,1488
./lisp-phone-encoder,0,2707
./lisp-phone-encoder,0,7547
./lisp-phone-encoder,0,13030
./lisp-phone-encoder,0,27022
./lisp-phone-encoder,0,44292
===> ./rust
./rust,0,104
./rust,0,419
./rust,0,1693
./rust,0,3203
./rust,0,8665
./rust,0,13972
===> ./phone_encoder_c
./phone_encoder_c,0,130
./phone_encoder_c,0,254
./phone_encoder_c,0,1053
./phone_encoder_c,0,1551
./phone_encoder_c,0,490
./phone_encoder_c,0,908
===> ./phone_encoder_go
./phone_encoder_go,0,248
./phone_encoder_go,0,512
./phone_encoder_go,0,1549
./phone_encoder_go,0,2813
./phone_encoder_go,0,970
./phone_encoder_go,0,1755

benchmark-result

opened by davidaf3 6

Any interest in adding Julia to the mix?

Here is a ditto julia port of Norvig's code - https://gist.github.com/srikumarks/920fc8e61e06d0c5aa9162fbb9e606cd I don't have sbcl installed, but if the input sizes are large enough, julia trumps both the (current) rust implementation and Main2.java (openjdk17) on my machine. It uses more memory than rust and less memory than java.

Note: Julia is infamous for adding a chunk of compile time initially which makes benchmarking hard. So the differences start showing up for n = 200000 or more. I tried 400000 and it was 4x as fast as Main2.java and 7x as fast as the (current) rust impl.

opened by srikumarks 7
Add Nim
Here's a Nim version. I didn't want to install Java on my macOS and couldn't get benchmark.sh to run on a Linux machine, but maybe you're willing to install Nim to see it work?

Installation instructions for Nim: https://nim-lang.org/install.html

Here are my times. Nim:

time ./nim_phone_encoder dictionary.txt phones_50_000.txt ... real 0m0.903s user 0m0.877s sys 0m0.019s

Rust:

time ./phone_encoder dictionary.txt phones_50_000.txt real 0m1.150s user 0m1.121s sys 0m0.024s
opened by iffy 2
Optimized version of rust phone_encoder with ibig (around ~x2 faster for me)

I swapped num-bigint with ibig, which explicitly focuses on performance. Also did some refactoring for it to look more Rusty and enabled Profile Guided Optimization. That said, ibig yielded most gains.

Before:

===> java -cp build/java Main Proc,Memory(bytes),Time(ms) java,0,321 Proc,Memory(bytes),Time(ms) java,0,438 Proc,Memory(bytes),Time(ms) java,0,603 Proc,Memory(bytes),Time(ms) java,0,718 ===> java -cp build/java Main2 Proc,Memory(bytes),Time(ms) java,0,305 Proc,Memory(bytes),Time(ms) java,0,473 Proc,Memory(bytes),Time(ms) java,0,1013 Proc,Memory(bytes),Time(ms) java,0,1619 ===> sbcl --script src/lisp/main.lisp Proc,Memory(bytes),Time(ms) sbcl,0,92 Proc,Memory(bytes),Time(ms) sbcl,0,137 Proc,Memory(bytes),Time(ms) sbcl,0,480 Proc,Memory(bytes),Time(ms) sbcl,0,922 ===> ./phone_encoder Proc,Memory(bytes),Time(ms) ./phone_encoder,0,97 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,194 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,649 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,1223 Cleaning up

After:

===> java -cp build/java Main Proc,Memory(bytes),Time(ms) java,0,325 Proc,Memory(bytes),Time(ms) java,0,422 Proc,Memory(bytes),Time(ms) java,0,619 Proc,Memory(bytes),Time(ms) java,0,858 ===> java -cp build/java Main2 Proc,Memory(bytes),Time(ms) java,0,294 Proc,Memory(bytes),Time(ms) java,0,524 Proc,Memory(bytes),Time(ms) java,0,1244 Proc,Memory(bytes),Time(ms) java,0,1702 ===> sbcl --script src/lisp/main.lisp Proc,Memory(bytes),Time(ms) sbcl,0,86 Proc,Memory(bytes),Time(ms) sbcl,0,147 Proc,Memory(bytes),Time(ms) sbcl,0,513 Proc,Memory(bytes),Time(ms) sbcl,0,875 ===> ./phone_encoder Proc,Memory(bytes),Time(ms) ./phone_encoder,0,45 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,102 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,315 Proc,Memory(bytes),Time(ms) ./phone_encoder,0,616 Cleaning up

My setup: Archlinux with AMD Ryzen 7 4800H

opened by alianse777 4
Optimised Common Lisp code

My tiny take on it.

Just some code refactoring, minor optimizations in a few places, some inlining, and additional type and optimization declarations here and there.

I've set block compilation on resulting in better speed so please compile the file before use and change xxx.lisp to xxx.fasl in benchmark.sh.

It is more than 2x faster than the original Common Lisp version.

My setup: M1 MacBook Air. SBCL-2.1.7 Java Zulu16.32+15-CA (build 16.0.2+7) rustc 1.53.0

opened by bpecsek 0

Comparison between Java and Common Lisp solutions to a phone-encoding problem described by Prechelt

Related tags

Overview

Prechelt Phone Number Encoding

Structure of this repository

Compiling and running

Requirements

Java

Rust

Lisp

Running

Java

Rust

Lisp

Testing

Benchmarks

Adding a new program to the benchmarks

Solutions in other languages

Comments

Owner

Renato Athaydes

Aix-bench, the Java benchmark for code synthesis problem.

Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...

Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

A Java library for quickly and efficiently parsing and writing UUIDs

gRPC and protocol buffers for Android, Kotlin, and Java.

The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch processes and shipped to multiple servers.

High Performance data structures and utility methods for Java

Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Java Collections till the last breadcrumb of memory and performance

LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan), audio (OpenAL), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR) applications.

A modern I/O library for Android, Kotlin, and Java.

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api

RxJava – Reactive Extensions for the JVM – a library for composing asynchronous and event-based programs using observable sequences for the Java VM.

Lightweight threads for Java, with message passing, nio, http and scheduling support.

Jalgorithm is an open-source Java library which has implemented various algorithms and data structure