Echo client-server components to evaluate Project Loom virtual threads.

Overview

Overview

Project Loom is the OpenJDK initiative to introduce user-mode threads in Java.

The purpose of this repository is to compare Project Loom virtual threads to two existing alternatives in the context of network applications: classic OS threads and non-blocking NIO.

Methodology

Virtual threads are compared to the alternatives using an echo client-server protocol.

The server is tailored to maximize concurrent network connections and it supports a latency configuration for delaying echo responses by a precise amount. This introduced latency is indistinguishable from network latency from the client perspective.

In the results described below, a latency of 1 second was used.

Three echo clients are compared:

  1. Client built atop non-blocking NIO framework
  2. Client built atop classic OS threads
  3. Client built atop Project Loom virtual threads

Clients are configured to run with a target concurrency level in terms of number of connections and configured to run for a fixed duration.

The various test clients behave the same way:

  1. Establish persistent connections to echo server
  2. Wait for all connections to be successfully established before proceeding with echo requests
  3. Over each connection, send an echo request and receive an echo responses in a loop - when a response is received, immediately send subsequent request
  4. When the target duration is reached, allow trailing echo transactions to complete
  5. Measure throughput as the total number of echo transactions divided by the observed time elapsed

Echo client diagram

Components

NioEchoServer

The NioEchoServer class is a long-running echo server application. It supports the following command-line arguments:

  1. Host - hostname used for passive server socket binding
  2. Port - port number used for socket binding
  3. Buffer size - the size in bytes of the buffer used for received and replaying echo messages
  4. Latency - the delay to introduce between receiving an echo request and sending the response
  5. Resolution - select polling timeout that determines the frequency with which delayed echo responses are examined
  6. Accept queue length - queue length parameter set on passive socket bind

NioEchoClient

The NioEchoClient class is a single-threaded, non-blocking, NIO-based test driver application that runs for a fixed duration at a target concurrent connection level. It supports the following command-line arguments:

  1. Host - hostname of the destination echo server
  2. Port - port number of the destination echo server
  3. Number of connections - the number of concurrent connections to use
  4. Content length - the size in bytes of each echo request message
  5. Duration - the target test duration in milliseconds

ThreadedEchoClient

The ThreadedEchoClient class is a multi-threaded test driver application that runs for a fixed duration at a target concurrent connection level. It supports the following command-line arguments:

  1. Host - hostname of the destination echo server
  2. Port - port number of the destination echo server
  3. Number of connections - the number of concurrent connections to use
  4. Content length - the size in bytes of each echo request message
  5. Duration - the target test duration in milliseconds
  6. Loom - boolean flag denoting whether to use virtual threads

Experiment

The following experiment was conducted with two EC2 instances in AWS, one running the server and another running the client.

  • Region: us-west-2
  • Instance type: c5.2xlarge compute optimized instance 8 vCPU and 16 GB of memory
  • OS: Amazon Linux 2 with Linux Kernel 5.10, AMI ami-00f7e5c52c0f43726

In order to facilitate the rapid creation of 50,000 connections, the following sysctl kernel parameter changes were committed on both hosts prior to the start of the experiment:

sysctl net.ipv4.ip_local_port_range="2000 64000"
sysctl net.ipv4.tcp_fin_timeout=30
sysctl net.core.somaxconn=8192
sysctl net.core.netdev_max_backlog=8000
sysctl net.ipv4.tcp_max_syn_backlog=8192

Java classes were compiled and executed with an OpenJDK 19 Loom early access build: openjdk-19-loom+1-11_linux-x64

https://download.java.net/java/early_access/loom/1/openjdk-19-loom+1-11_linux-x64_bin.tar.gz

jdk-19/bin/javac --enable-preview --release 19 loomtest/*.java

The server instance used was started as follows:

jdk-19/bin/java loomtest.NioEchoServer <ip> 9000 32 1000 25 8192

The buffer size of 32 bytes was chosen to align with client configuration. The small magnitude was intended to minimize overhead associated with packet processing and copying. The goal of the tests is to surface concurrency scaling, not packet processing.

Similarly, the 1,000 millisecond latency was chosen to minimize the processing cost of each connection and to leave the focus on the costs associated with concurrency scaling.

Clients instances were started as follows:

jdk-19/bin/java --enable-preview loomtest.NioEchoClient <ip> 9000 <connections> 32 60000
jdk-19/bin/java --enable-preview loomtest.ThreadedEchoClient <ip> 9000 <connections> 32 60000 <loom>

The content size is set to a very low value of 32 bytes as noted above. Each test run had a target duration of 60,000 milliseconds.

The following range of values was used for number of connections: 100, 1000, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000

Command for running every echo client:

for i in $(echo "100 1000 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000"); do
  jdk-19/bin/java --enable-preview loomtest.NioEchoClient 10.39.196.180 9000 $i 32 60000
  sleep 120
done
for i in $(echo "100 1000 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000"); do
  jdk-19/bin/java --enable-preview loomtest.ThreadedEchoClient 10.39.196.180 9000 $i 32 60000 true
  sleep 120
done
for i in $(echo "100 1000 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000"); do
  jdk-19/bin/java --enable-preview loomtest.ThreadedEchoClient 10.39.196.180 9000 $i 32 60000 false
  sleep 120
done

Snippet of output from an execution of one client:

Args[host=10.39.196.180, port=9000, numConnections=50000, contentLength=32, duration=60000]
barrier opened!
duration: 61006 ms, throughput: 47446.906862 msg/sec

Metrics were gathered on the client host using the ps command:

while true; do ps -C java -o args:100,pcpu,c,cp,bsdtime,cputime,pid,pmem,rss,drs,trs,vsz; echo ""; sleep 1; done | tee ps.txt

Snippet of output from an invocation of ps:

COMMAND                                                                                              %CPU  C  CP   TIME     TIME   PID %MEM   RSS   DRS  TRS    VSZ
jdk-19/bin/java --enable-preview loomtest.NioEchoClient 10.39.196.180 9000 50000 32 60000            76.5 76 765   1:36 00:01:36 10078  1.4 237488 7430752 3 7430756

Results

The following diagram contains throughput data points reported by echo client executions and metrics reported by the ps command.

Results plot

Throughput values are taken from the echo client standard output.

Cumulative CPU time values are taken from the cputime column of ps command output. The cputime in the last row associated with a process is used as the representative value.

Physical memory size values are taken from the rss column of ps command output. The maximum value among all rows associated with a process is used as the representative value.

Virtual memory size values are taken from the vsz column of ps command output. The maximum value among all rows associated with a process is used as the representative value.

Echo client tests using classic OS threads failed to complete successfully at 35,000 connections. The failure was caused by the inability to create pthread 32309. This appears to be due to memory resource constraints on the host and not due to any limits placed on the system user. The /proc/[pid]/limits values do not suggest any problematic limits.

Args[host=10.39.196.180, port=9000, numConnections=35000, contentLength=32, duration=60000, loom=false]
[11.235s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[11.235s][warning][os,thread] Failed to start the native thread for java.lang.Thread "pool-1-thread-32309"
Exception in thread "main" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
	at java.base/java.lang.Thread.start0(Native Method)
	at java.base/java.lang.Thread.start(Thread.java:1466)
	at java.base/java.lang.System$2.start(System.java:2510)
	at java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:144)
	at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:952)
	at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1363)
	at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:145)
	at loomtest.ThreadedEchoClient.lambda$main$1(ThreadedEchoClient.java:94)
	at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
	at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
	at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:711)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575)
	at java.base/java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
	at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616)
	at java.base/java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622)
	at java.base/java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627)
	at loomtest.ThreadedEchoClient.main(ThreadedEchoClient.java:95)

The plot above was creation with Python and matplotlib.

x = [100, 1000, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000]

nio = {
    'label': 'NIO',
    'throughput': [...],
    'cputime': [...],
    'rss': [...],
    'vsz': [...]
}

virtual = {
    ...
}

classic = {
    ...
}

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2)

ax11, ax21, ax12, ax22 = axes.ravel()

for d in [nio, virtual, classic]:
    size = len(d['throughput'])
    ax11.plot(x[0:size], d['throughput'], label=d['label'])
ax11.set_xlabel('Connections')
ax11.set_ylabel('Messages per second')
ax11.set_title('Throughput')
ax11.grid()
ax11.legend()

...

Conclusions

  • NIO, classic OS threads, and virtual threads are all very effective concurrency tools. Even classic OS threads scaled to a concurrency level of 30,000 without issue.
  • The overhead of virtual threads is remarkably low. The resource utilization of virtual threads was not significantly more than with NIO.
  • The overhead of classic OS threads is indeed significant, particularly with respect to the process memory footprint. The cumulative CPU time of echo clients using platform threads is only marginally higher than the other echo clients.
  • The slogan "codes like sync, scaled like async" is accurate. Virtual threads are an excellent alternative to asynchronous programming with non-blocking NIO.

Other Notes

In an attempt to verify that the NioEchoServer implementation wasn't a bottleneck or causing undue slowness, I ran echo clients against a C echo server implementation built atop epoll.

This efficient and minimal echo server was almost a drop in replacement. Performance metrics were consistent with the Java implementation in this repository.

You might also like...

Use Quilt Mappings on Loom

Quilt Mappings on Loom The time has finally arrived! Quilt Mappings are now usable in Loom! Ever wanted to use mappings other than Yarn or MojMap? Qui

Dec 7, 2022

Add a partial Coeffect system into Java using Loom's ExtentLocals

Coeffect Add a partial Coeffect system into Java using Loom's ExtentLocals. In Java there are generally 2 strategies to manage the parameters a method

Sep 9, 2022

Helidon Níma Example - Loom based webserver

helidon-nima-example Helidon Níma Example - Loom based webserver This example is built on top of an ALPHA-1 release of Helidon 4. Alpha releases serve

Dec 16, 2022

EssentialClient is a client side mod originally forked from Carpet Client for 1.15.2 that implements new client side features

EssentialClient is a client side mod originally forked from Carpet Client for 1.15.2 that implements new client side features

EssentialClient EssentialClient is a client side only mod originally forked from Carpet Client for 1.15.2 that implements new client side features. Th

Jan 3, 2023

Lightweight React Native UI Components inspired on Vant

vant-react-native Install yarn add vant-react-native Or npm install vant-react-native Usage import React, { Component } from 'react'; import { View, T

Sep 29, 2022

Admob for React Native with powerful hooks and components

React Native Admob ⚠️ Please note, this package is under active development, which means it may be not stable to apply on production. Please use this

Jan 6, 2023

Components to control your app status and navigation bars.

Components to control your app status and navigation bars.

➖ react-native-bars Components to control your app status and navigation bars. Heavily inspired by the built-in StatusBar module and react-native-tran

Jan 3, 2023

Modular and customizable Material Design UI components for Android

Material Components for Android Material Components for Android (MDC-Android) help developers execute Material Design. Developed by a core team of eng

Jan 3, 2023

💡极致性能的企业级Java服务器框架,RPC,游戏服务器框架,web应用服务器框架。(Extreme fast enterprise Java server framework, can be RPC, game server framework, web server framework.)

💡极致性能的企业级Java服务器框架,RPC,游戏服务器框架,web应用服务器框架。(Extreme fast enterprise Java server framework, can be RPC, game server framework, web server framework.)

👉 为性能而生的万能服务器框架 👈 Ⅰ. zfoo简介 🚩 性能炸裂,天生异步,Actor设计思想,无锁化设计,基于Spring的MVC式用法的万能RPC框架 极致序列化,原生集成的目前二进制序列化和反序列化速度最快的 zfoo protocol 作为网络通讯协议 高可拓展性,单台服务器部署,

Jan 1, 2023
Owner
Elliot Barlas
Elliot Barlas
BlackBox is a virtual engine, it can clone and run virtual application on Android

BlackBox is a virtual engine, it can clone and run virtual application on Android, users don't have to install APK file to run the application on devices. BlackBox control all virtual applications, so you can do anything you want by using BlackBox.

null 1.6k Jan 3, 2023
QuickPerf is a testing library for Java to quickly evaluate and improve some performance-related properties

QuickPerf is a testing library for Java to quickly evaluate and improve some performance-related properties quickperf.io ?? Documentation Annotations

null 365 Dec 15, 2022
Duel Threads - Concurrency techniques duel it out for the championship (and bragging rights)

Duel Threads Concurrency techniques duel it out for the championship (and bragging rights) Phases: Argue over rules, challenges and the grand prize Se

Jason Sipula 2 May 9, 2022
The first Java Actor System supporting fibers from Project Loom

Fibry Fibry is an experimental Actor System built to be simple and flexible. Hopefully, it will also be fun to use. Fibry is the first Java Actor Syst

Luca Venturi 196 Dec 26, 2022
Experimenting with Project Loom

Project Loom Lab Experiments with Project Loom's features based on these JEP(draft)s: Structured Concurrency Virtual Threads Experiments For these exp

Nicolai Parlog 86 Dec 23, 2022
Async-Await support for Vertx using Project Loom

Vertx-Async-Await Async-Await support for Vertx using Project Loom. import static com.augustnagro.vertx.loom.AsyncAwait.async; import static com.augus

August Nagro 21 Jun 9, 2022
A spring cloud infrastructure provides various of commonly used cloud components and auto-configurations for high project consistency

A spring cloud infrastructure provides various of commonly used cloud components and auto-configurations for high project consistency.

Project-Hephaestus 2 Feb 8, 2022
LimboAuth - Minecraft Auth System for Velocity proxy built in virtual server (Limbo).

LimboAuth Auth System built in virtual server (Limbo). MC-Market SpigotMC.org Описание и обсуждение на русском языке (spigotmc.ru) Описание и обсужден

Elytrium 89 Jan 4, 2023
A spring boot application for providing loom-ld services

LOOM-LD Structures sparql_based-linking It is a java project using maven to manage dependencies. loom-ld It is a spring boot application for providing

Ontology Engineering Group (UPM) 2 Apr 19, 2022
Experiments on how to add Loom support for Netty

Netty Loom Experiment This repository contains Project Loom and Netty related test code. Contents / Goals I created these examples since I was curious

Johannes Schüth 5 Oct 14, 2022