18 Java Large-dataset Libraries

18 Repositories

Java large-dataset Libraries

Aix-bench, the Java benchmark for code synthesis problem.

AiXcoder NL2Code Evaluation Benchmark (aix-bench) 简体中文 Paper available: https://arxiv.org/abs/2206.13179 Introduction This is a method-level benchmark

Dec 12, 2022

Code4Me provides automatic intelligent code completion based on large pre-trained language models

Code4Me Code4Me provides automatic intelligent code completion based on large pre-trained language models. Code4Me predicts statement (line) completio

Dec 5, 2022

DatasetCreator is a lightweight RESTFul client implementation of the Salesforce CRM Analytics External Data API.

DatasetCreator is a lightweight RESTFul client implementation of the Salesforce CRM Analytics External Data API. It has been deliberately developed with no 3rd party jars with the goal of being a lean, reliable and scalable solution.

Dec 16, 2022

esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.

esProc esProc is the unique name for esProc SPL package. esProc SPL is an open-source programming language for data processing, which can perform comp

Dec 27, 2022

Hudi manages the storage of large analytical datasets on DFS

Apache Hudi Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets

Dec 30, 2022

Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

Finding average number of words in all the comments in a data set 📝 Mapper Function In the mapper function we first tokenize entire data and then fin

Aug 23, 2021

A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.

Demo application instructions Overview This repository contains the code necessary to reproduce the results for the Trillion Entity demonstration that

Nov 23, 2022

APM, (Application Performance Management) tool for large-scale distributed systems.

Visit our official web site for more information and Latest updates on Pinpoint. Latest Release (2020/01/21) We're happy to announce the release of Pi

Jan 6, 2023

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

Dec 30, 2022

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Dec 28, 2022

Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

Apr 1, 2022

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Mar 12, 2021

Java large off heap cache

OHC - An off-heap-cache Features asynchronous cache loader support optional per entry or default TTL/expireAt entry eviction and expiration without a

Dec 31, 2022

APM, (Application Performance Management) tool for large-scale distributed systems.

Visit our official web site for more information and Latest updates on Pinpoint. Latest Release (2020/01/21) We're happy to announce the release of Pi

Dec 29, 2022

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

Jan 2, 2023

Java Large-dataset Resources

Java large-dataset Libraries

Aix-bench, the Java benchmark for code synthesis problem.

Code4Me provides automatic intelligent code completion based on large pre-trained language models

DatasetCreator is a lightweight RESTFul client implementation of the Salesforce CRM Analytics External Data API.

esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.

Hudi manages the storage of large analytical datasets on DFS

Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.

APM, (Application Performance Management) tool for large-scale distributed systems.

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Hadoop library for large-scale data processing, now an Apache Incubator project

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Java large off heap cache

APM, (Application Performance Management) tool for large-scale distributed systems.

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

APM, (Application Performance Management) tool for large-scale distributed systems.

Apache Spark - A unified analytics engine for large-scale data processing

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Java Large-dataset Resources

Related tags

Java large-dataset Libraries