A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.

Overview

Discord Discourse users

Demo application instructions

Overview

This repository contains the code necessary to reproduce the results for the Trillion Entity demonstration that was part of the NODES 2021 Keynote presentation. It contains the store generation code we used, the orchestration scripts for the AWS instances that are needed to run the setup, the queries we executed, and the client that performs the latency measurements. Please read this README in its entirety before proceeding, to make sure you have an understanding of the necessary steps.

More Information

Blog post with more behind the scenes information Behind the Scenes of Creating the Worldโ€™s Biggest Graph Database.

The NODES 2021 Keynote recording showing the Trillion Graph Demo live:

A twitter thread summary of the demo:

How To

What you'll need:

  1. An AWS account with sufficient capacity for the number and type of EC2 instances you'll create, including access to S3. AWS is the default provider this application uses; it should be possible to modify it to use the cloud provider of your choice.
  2. Access to Neo4j Enterprise. Fabric is a Neo4j Enterprise feature, which is distributed under a different license. It needs to be properly installed to your local Maven repository and you can find detailed instructions in the Neo4j Documentation

The directory structure is as follows:

  1. cypher contains the individual cypher queries that were used in the demo
  2. server contains the data generation code and the instance orchestration
  3. client contains the client for the latency measurements
  4. guide contains a Neo4j Browser guide which explains the LDBC schema and queries

Outline

Here we'll describe the basic steps you'll need to take. Detailed instructions are provided further down.

Familiarize yourself with the code.

The code provided should be straightforward to understand. You should take some time to familirize yourself with it, since you'll need to provide information specific to your environment. The main two files to look at are the FabricDataGenerator and AmazonController that you can find under the server directory. The first creates the stores both locally and remotely, and the second orchestrates the AWS Neo4j instances. They are structured as scripts, so you can modify them as you like. You will need to edit the code to execute the various steps and configure the setup to your requirements.

Create the stores

You should first create the Person and Template databases. The first is the full Person shard and the latter is the basis for the Forum shards. Typically, you will create these two locally, upload them to S3, and then orchestrate EC2 instances with the AmazonController to generate en mass Forum shards. Of course, with minimal changes, you can do everything locally, in one step, and then move the databases to the Fabric shards however you prefer.

Instantiate the Shards

The AmazonController class can be used to install and configure Neo4j and the shards. You will need to modify the code to execute the appropriate commands for your setup, but the basic AWS orchestration steps will be the same as for the store generation.

Build, install and run the application

The last step is to locally build and run the UI for the demo. With that, you'll be able to take latency measurements and explore the schema you built.

Generating the stores

AWS Instance Orchestration

Install, build and run the client

You might also like...

Detection, Classification, and Localisation of marine mammal and other bioacoustic signals

This is the main code repository for the PAMGuard software. This repository was created on 7 January 2022 from sourceforge SVN repository at https://s

Nov 4, 2022

This repository holds the famous Data Structures (mostly abstract ones) and Algorithms for sorting, traversing, and modifying them.

This repository holds the famous Data Structures (mostly abstract ones) and Algorithms for sorting, traversing, and modifying them.

Data-Structures-and-Algorithms About Repo The repo contains the algorithms for manipulating the abstract data structures like Linked List, Stacks, Que

Dec 26, 2021

Problems of Data Structure from basics are covered here for interview preparation and logic building. Basic programming problems and so many interview based leetcode problems are present. Every program is written to solve problem in as optimized way as possible.

Data Structure in Java Problem Solving ๐Ÿ‘จโ€๐Ÿ’ป Problems of Data Structure from basics are covered here for interview preparation and logic building. Bas

May 23, 2022

Stream Processing and Complex Event Processing Engine

Stream Processing and Complex Event Processing Engine

Siddhi Core Libraries Siddhi is a cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to captur

Jan 6, 2023

Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark

The Eclipse Deeplearning4J (DL4J) ecosystem is a set of projects intended to support all the needs of a JVM based deep learning application. This mean

Dec 30, 2022

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox Machine Learning Framework The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid develop

Dec 9, 2022

java deep learning algorithms and deep neural networks with gpu acceleration

Deep Neural Networks with GPU support Update This is a newer version of the framework, that I developed while working at ExB Research. Currently, you

Jan 6, 2023

statistics, data mining and machine learning toolbox

statistics, data mining and machine learning toolbox

Disambiguation (Italian dictionary) Field of turnips. It is also a place where there is confusion, where tricks and sims are plotted. (Computer scienc

Jun 11, 2022

SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

SparkFE is the LLVM-based and high-performance Spark native execution engine which is designed for feature engineering.

Spark has rapidly emerged as the de facto standard for big data processing. However, it is not designed for machine learning which has more and more limitation in AI scenarios. SparkFE rewrite the execution engine in C++ and achieve more than 6x performance improvement for feature extraction. It guarantees the online-offline consistency which makes AI landing much easier. For further details, please refer to SparkFE Documentation.

Jun 10, 2021
Comments
  • Separate components for cypher queries and Browser Guide

    Separate components for cypher queries and Browser Guide

    The client included a Neo4j Browser guide and the full cypher queries as individual source files. Two new top-level components have been created for both:

    1. guide for the Neo4j Browser guide and related media (images)
    2. cypher for the cypher queries and setup :param commands
    opened by akollegger 0
Owner
Neo4j
Neo4j
VisualScripting - Visual scripting using nodes, see README for more details

VisualScripting Make code using nodes This program does the heavy lifting of making nodes work, to make use of this program plugins are required, plug

null 4 Sep 4, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 2, 2023
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.7k Mar 12, 2021
Running compute-intense parts of BigStitcher distributed

BigStitcher-Spark Running compute-intense parts of BigStitcher distributed. For now we support fusion with affine transformation models (including tra

PreibischLab 10 May 9, 2022
TensorFlow Lite Object Detection Android Demo

GSoC Project 2021 - TensorFlow Description This repository contains the project where I contributed to the TensorFlow Team during GSoC in the year 202

Sayan Nath 6 Dec 31, 2022
Chih-Jen Lin 4.3k Jan 2, 2023
Bazel training materials and codelabs focused on beginner, advanced and contributor learning paths

Bazel-learning-paths This repo has materials for learning Bazel: codelabs, presentations, examples. We are open sourcing the content for training engi

null 18 Nov 14, 2022
๐Ÿ‘„ The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022