18 Repositories
Java large-dataset Libraries
Aix-bench, the Java benchmark for code synthesis problem.
AiXcoder NL2Code Evaluation Benchmark (aix-bench) 简体中文 Paper available: https://arxiv.org/abs/2206.13179 Introduction This is a method-level benchmark
Code4Me provides automatic intelligent code completion based on large pre-trained language models
Code4Me Code4Me provides automatic intelligent code completion based on large pre-trained language models. Code4Me predicts statement (line) completio
DatasetCreator is a lightweight RESTFul client implementation of the Salesforce CRM Analytics External Data API.
DatasetCreator is a lightweight RESTFul client implementation of the Salesforce CRM Analytics External Data API. It has been deliberately developed with no 3rd party jars with the goal of being a lean, reliable and scalable solution.
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.
esProc esProc is the unique name for esProc SPL package. esProc SPL is an open-source programming language for data processing, which can perform comp
Hudi manages the storage of large analytical datasets on DFS
Apache Hudi Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets
Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.
Finding average number of words in all the comments in a data set 📝 Mapper Function In the mapper function we first tokenize entire data and then fin
A scale demo of Neo4j Fabric spanning up to 1129 machines/shards running a 100TB (LDBC) dataset with 1.2tn nodes and relationships.
Demo application instructions Overview This repository contains the code necessary to reproduce the results for the Trillion Entity demonstration that
APM, (Application Performance Management) tool for large-scale distributed systems.
Visit our official web site for more information and Latest updates on Pinpoint. Latest Release (2020/01/21) We're happy to announce the release of Pi
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.
ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l
Hadoop library for large-scale data processing, now an Apache Incubator project
Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l
Java large off heap cache
OHC - An off-heap-cache Features asynchronous cache loader support optional per entry or default TTL/expireAt entry eviction and expiration without a
APM, (Application Performance Management) tool for large-scale distributed systems.
Visit our official web site for more information and Latest updates on Pinpoint. Latest Release (2020/01/21) We're happy to announce the release of Pi
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.
ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u
APM, (Application Performance Management) tool for large-scale distributed systems.
Visit our official web site for more information and Latest updates on Pinpoint. Latest Release (2020/01/21) We're happy to announce the release of Pi
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l