A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.

Neo4j

Last update: Nov 23, 2022

Overview

Demo application instructions

Overview

This repository contains the code necessary to reproduce the results for the Trillion Entity demonstration that was part of the NODES 2021 Keynote presentation. It contains the store generation code we used, the orchestration scripts for the AWS instances that are needed to run the setup, the queries we executed, and the client that performs the latency measurements. Please read this README in its entirety before proceeding, to make sure you have an understanding of the necessary steps.

More Information

Blog post with more behind the scenes information Behind the Scenes of Creating the World’s Biggest Graph Database.

The NODES 2021 Keynote recording showing the Trillion Graph Demo live:

A twitter thread summary of the demo:

How To

What you'll need:

An AWS account with sufficient capacity for the number and type of EC2 instances you'll create, including access to S3. AWS is the default provider this application uses; it should be possible to modify it to use the cloud provider of your choice.
Access to Neo4j Enterprise. Fabric is a Neo4j Enterprise feature, which is distributed under a different license. It needs to be properly installed to your local Maven repository and you can find detailed instructions in the Neo4j Documentation

The directory structure is as follows:

cypher contains the individual cypher queries that were used in the demo
server contains the data generation code and the instance orchestration
client contains the client for the latency measurements
guide contains a Neo4j Browser guide which explains the LDBC schema and queries

Outline

Here we'll describe the basic steps you'll need to take. Detailed instructions are provided further down.

Familiarize yourself with the code.

The code provided should be straightforward to understand. You should take some time to familirize yourself with it, since you'll need to provide information specific to your environment. The main two files to look at are the FabricDataGenerator and AmazonController that you can find under the server directory. The first creates the stores both locally and remotely, and the second orchestrates the AWS Neo4j instances. They are structured as scripts, so you can modify them as you like. You will need to edit the code to execute the various steps and configure the setup to your requirements.

Create the stores

You should first create the Person and Template databases. The first is the full Person shard and the latter is the basis for the Forum shards. Typically, you will create these two locally, upload them to S3, and then orchestrate EC2 instances with the AmazonController to generate en mass Forum shards. Of course, with minimal changes, you can do everything locally, in one step, and then move the databases to the Fabric shards however you prefer.

Instantiate the Shards

The AmazonController class can be used to install and configure Neo4j and the shards. You will need to modify the code to execute the appropriate commands for your setup, but the basic AWS orchestration steps will be the same as for the store generation.

Build, install and run the application

The last step is to locally build and run the UI for the demo. With that, you'll be able to take latency measurements and explore the schema you built.

Generating the stores

AWS Instance Orchestration

Install, build and run the client

Detection, Classification, and Localisation of marine mammal and other bioacoustic signals

This is the main code repository for the PAMGuard software. This repository was created on 7 January 2022 from sourceforge SVN repository at https://s

Nov 4, 2022

This repository holds the famous Data Structures (mostly abstract ones) and Algorithms for sorting, traversing, and modifying them.

Data-Structures-and-Algorithms About Repo The repo contains the algorithms for manipulating the abstract data structures like Linked List, Stacks, Que

Dec 26, 2021

Problems of Data Structure from basics are covered here for interview preparation and logic building. Basic programming problems and so many interview based leetcode problems are present. Every program is written to solve problem in as optimized way as possible.

Data Structure in Java Problem Solving 👨‍💻 Problems of Data Structure from basics are covered here for interview preparation and logic building. Bas

May 23, 2022

Stream Processing and Complex Event Processing Engine

Siddhi Core Libraries Siddhi is a cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to captur

Jan 6, 2023

Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This mean

Dec 30, 2022

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox Machine Learning Framework The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid develop

Dec 9, 2022

java deep learning algorithms and deep neural networks with gpu acceleration

Deep Neural Networks with GPU support Update This is a newer version of the framework, that I developed while working at ExB Research. Currently, you

Jan 6, 2023

statistics, data mining and machine learning toolbox

Disambiguation (Italian dictionary) Field of turnips. It is also a place where there is confusion, where tricks and sims are plotted. (Computer scienc

Jun 11, 2022

SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

Spark has rapidly emerged as the de facto standard for big data processing. However, it is not designed for machine learning which has more and more limitation in AI scenarios. SparkFE rewrite the execution engine in C++ and achieve more than 6x performance improvement for feature extraction. It guarantees the online-offline consistency which makes AI landing much easier. For further details, please refer to SparkFE Documentation.

Jun 10, 2021

Comments

Separate components for cypher queries and Browser Guide
The client included a Neo4j Browser guide and the full cypher queries as individual source files. Two new top-level components have been created for both:

guide for the Neo4j Browser guide and related media (images)

cypher for the cypher queries and setup :param commands
opened by akollegger 0

A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.

Related tags

Overview

Demo application instructions

Overview

More Information

How To

Outline

Familiarize yourself with the code.

Create the stores

Instantiate the Shards

Build, install and run the application

Generating the stores

AWS Instance Orchestration

Install, build and run the client

You might also like...

Detection, Classification, and Localisation of marine mammal and other bioacoustic signals

This repository holds the famous Data Structures (mostly abstract ones) and Algorithms for sorting, traversing, and modifying them.

Problems of Data Structure from basics are covered here for interview preparation and logic building. Basic programming problems and so many interview based leetcode problems are present. Every program is written to solve problem in as optimized way as possible.

Stream Processing and Complex Event Processing Engine

Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

java deep learning algorithms and deep neural networks with gpu acceleration

statistics, data mining and machine learning toolbox

SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

Comments

Separate components for cypher queries and Browser Guide

Owner

Neo4j

VisualScripting - Visual scripting using nodes, see README for more details

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Apache Spark - A unified analytics engine for large-scale data processing

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Running compute-intense parts of BigStitcher distributed

TensorFlow Lite Object Detection Android Demo

Libsvm BSD 3 Libsvm Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It solves C-SVM classification, nu-SVM classification, one-class-SVM, epsilon-SVM regression, and nu-SVM regression. License: BSD 3, .

Bazel training materials and codelabs focused on beginner, advanced and contributor learning paths

👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike