292 Java Data-analytics Libraries

292 Repositories

Java data-analytics Libraries

Generate Java types from JSON or JSON Schema and annotates those types for data-binding with Jackson, Gson, etc

jsonschema2pojo jsonschema2pojo generates Java types from JSON Schema (or example JSON) and can annotate those types for data-binding with Jackson 2.x

Jan 5, 2023

Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

Apache Camel Apache Camel is a powerful, open-source integration framework based on prevalent Enterprise Integration Patterns with powerful bean integ

Jan 8, 2023

Spring，SpringBoot 2.0，SpringMVC，Mybatis，mybatis-plus，motan/dubbo分布式，Redis缓存，Shiro权限管理，Spring-Session单点登录，Quartz分布式集群调度，Restful服务，QQ/微信登录，App token登录，微信/支付宝支付；日期转换、数据类型转换、序列化、汉字转拼音、身份证号码验证、数字转人民币、发送短信、发送邮件、加密解密、图片处理、excel导入导出、FTP/SFTP/fastDFS上传下载、二维码、XML读写、高精度计算、系统配置工具类等等。

iBase4J项目简介 iBase4J是Java语言的分布式系统架构。使用Spring整合开源框架。使用Maven对项目进行模块化管理，提高项目的易开发性、扩展性。系统包括4个子系统：系统管理Service、系统管理Web、业务Service、业务Web。系统管理：包括用户管理、权限管理、数

Dec 21, 2022

:herb: 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等:pushpin:

欢迎大家留言和PR~ Tip: 技术更新换代太快，本仓库仅做参考，自己的项目具体使用哪个版本还需谨慎思考~(不推荐使用最新的版本，推荐使用(最新-1|2)的版本，会比较稳定) spring-boot-quick 前言　　自己很早就想搞一个总的仓库就是将自己平时遇到的和学习到的东西整合在一起，方便后

Jan 2, 2023

Not only Spring Boot but also important knowledge of Spring（不只是SpringBoot还有Spring重要知识点）

在线阅读 : https://snailclimb.gitee.io/springboot-guide （上面的地址访问速度缓慢的建议使用这个路径访问）重要知识点基础 Spring Boot 介绍第一个 Hello World 第一个 RestFul Web 服务 Spring 如何优雅读取配

Jan 3, 2023

Spring Boot基础教程，Spring Boot 2.x版本连载中！！！

Spring Boot基础教程本项目内容为《Spring Boot基础教程》的程序样例。专题目标：打造全网内容最全，比收费教程更好的Spring Boot免费教程！加入社群：如果你正在学习Spring Boot，不妨加入我们的Spring技术交流群，一起成长！如何支持：关注我的公众号”程序

Jan 6, 2023

about learning Spring Boot via examples. Spring Boot 教程、技术栈示例代码，快速简单上手教程。

Spring Boot 学习示例 Spring Boot 使用的各种示例，以最简单、最实用为标准，此开源项目中的每个示例都以最小依赖，最简单为标准，帮助初学者快速掌握 Spring Boot 各组件的使用。 Spring Boot 中文索引 | Spring Cloud学习示例代码 | Spring

Jan 1, 2023

云收藏 Spring Boot 2.X 开源项目

云收藏 - 让收藏更简单云收藏是一个使用 Spring Boot 构建的开源网站，可以让用户在线随时随地收藏的一个网站，在网站上分类整理收藏的网站或者文章，可以作为稍后阅读的一个临时存放。作为一个开放开源的软件，可以让用户从浏览器将收藏夹内容导入到云收藏，也支持随时将云收藏收集的文章导出去做备份。

Jan 9, 2023

A 3D chart library for Java applications (JavaFX, Swing or server-side).

Sep 27, 2022

Kubed - A port of the popular Javascript library D3.js to Kotlin/JavaFX.

Kubed: A Kotlin DSL for data visualization Kubed is a data visualization DSL embedded within the Kotlin programming language. Kubed facilitates the cr

Dec 28, 2022

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

Dec 30, 2022

A platform for visualization and real-time monitoring of data workflows

Status This project is no longer maintained. Ambrose Twitter Ambrose is a platform for visualization and real-time monitoring of MapReduce data workfl

Dec 31, 2022

The official home of the Presto distributed SQL query engine for big data

Presto Presto is a distributed SQL query engine for big data. See the User Manual for deployment instructions and end user documentation. Requirements

Jan 5, 2023

:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

Elasticsearch Hadoop Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Apache Hive, Apache Pig, Apach

Dec 22, 2022

Apache Hive

Apache Hive (TM) The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storag

Dec 28, 2022

Apache Druid: a high performance real-time analytics database.

Jan 9, 2023

SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.

SAMOA: Scalable Advanced Massive Online Analysis. This repository is discontinued. The development of SAMOA has moved over to the Apache Software Foun

Dec 28, 2022

Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

Jan 5, 2023

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Apache Gobblin Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems. Ca

Jan 4, 2023

Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

Apr 1, 2022

Mirror of Apache Storm

Master Branch: Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processi

Jan 3, 2023

Netflix's distributed Data Pipeline

Suro: Netflix's Data Pipeline Suro is a data pipeline service for collecting, aggregating, and dispatching large volume of application events includin

Dec 9, 2022

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Apache Zeppelin Documentation: User Guide Mailing Lists: User and Dev mailing list Continuous Integration: Contributing: Contribution Guide Issue Trac

Jan 8, 2023

OpenRefine is a free, open source power tool for working with messy data and improving it

OpenRefine OpenRefine is a Java-based power tool that allows you to load data, understand it, clean it up, reconcile it, and augment it with data comi

Jan 1, 2023

Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

Dec 1, 2022

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl

Jan 2, 2023

statistics, data mining and machine learning toolbox

Disambiguation (Italian dictionary) Field of turnips. It is also a place where there is confusion, where tricks and sims are plotted. (Computer scienc

Jun 11, 2022

Encog java core Apache 2 Encog java core Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Artificial Neural Networks, Genetic Programming, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. License: Apache 2 , .

Encog Machine Learning Framework Encog is a pure-Java/C# machine learning framework that I created back in 2008 to support genetic programming, NEAT/H

Dec 17, 2022

Sends stacktrace-level performance data from a JVM process to Riemann.

Riemann JVM Profiler riemann-jvm-profiler is a JVM agent that you can inject into any JVM process--one written in Clojure, Java, Scala, Groovy, etc.--

Sep 21, 2022

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence in code deployments, configuration changes and infrastructure changes.

https://goreplay.org/ GoReplay is an open-source network monitoring tool which can record your live traffic, and use it for shadowing, load testing, m

Jan 4, 2023

Java 8 annotation processor and framework for deriving algebraic data types constructors, pattern-matching, folds, optics and typeclasses.

Derive4J: Java 8 annotation processor for deriving algebraic data types constructors, pattern matching and more! tl;dr Show me how to write, say, the

Nov 23, 2022

NoSQL data store using the seastar framework, compatible with Apache Cassandra

Scylla What is Scylla? Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a s

Dec 27, 2022

Apache Druid: a high performance real-time analytics database.

Jan 2, 2023

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

Dec 29, 2022

An advanced, but easy to use, platform for writing functional applications in Java 8.

Getting Cyclops X (10) The latest version is cyclops:10.4.0 Stackoverflow tag cyclops-react Documentation (work in progress for Cyclops X) Integration

Dec 29, 2022

binary serialization format

Colfer Colfer is a binary serialization format optimized for speed and size. The project's compiler colf(1) generates source code from schema definiti

Dec 25, 2022

Clojure's data structures modified for use outside of Clojure

This library has been extracted from the master branch of Clojure (http://clojure.org) version 1.5.1 (as of October 2013) http://github.com/richhick

Oct 6, 2022

High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Jan 5, 2023

jdbi is designed to provide convenient tabular data access in Java; including templated SQL, parameterized and strongly typed queries, and Streams integration

The Jdbi library provides convenient, idiomatic access to relational databases in Java. Jdbi is built on top of JDBC. If your database has a JDBC driv

Dec 27, 2022

A distributed in-memory data store for the cloud

EVCache EVCache is a memcached & spymemcached based caching solution that is mainly used for AWS EC2 infrastructure for caching frequently used data.

Jan 2, 2023

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

About CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. CrateDB offers the

Jan 2, 2023

Apache Hive

Apache Hive (TM) The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storag

Dec 28, 2022

The official home of the Presto distributed SQL query engine for big data

Presto Presto is a distributed SQL query engine for big data. See the User Manual for deployment instructions and end user documentation. Requirements

Dec 30, 2022

Distributed Stream and Batch Processing

What is Jet Jet is an open-source, in-memory, distributed batch and stream processing engine. You can use it to process large volumes of real-time eve

Dec 31, 2022

Mirror of Apache Storm

Master Branch: Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processi

Dec 26, 2022

Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...

Jan 3, 2023

Utility for developers and QAs what helps minimize time wasting on writing the same data for testing over and over again. Made by Stfalcon

Stfalcon Fixturer A Utility for developers and QAs which helps minimize time wasting on writing the same data for testing over and over again. You can

Nov 29, 2021

Gephi - The Open Graph Viz Platform

Gephi - The Open Graph Viz Platform Gephi is an award-winning open-source platform for visualizing and manipulating large graphs. It runs on Windows,

Dec 30, 2022

Embulk: Pluggable Bulk Data Loader.

What's Embulk? Embulk is a parallel bulk data loader that helps data transfer between various storages, databases, NoSQL and cloud services. Embulk su

Jan 6, 2023

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

Dex Dex : The data explorer is a powerful tool for data science. It is written in Groovy and Java on top of JavaFX and offers the ability to: Read in

Jan 8, 2023

Java fake data generator

jFairy by Devskiller Java fake data generator. Based on Wikipedia: Fairyland, in folklore, is the fabulous land or abode of fairies or fays. Try jFair

Dec 10, 2022

A library for setting up Java objects as test data.

Beanmother Beanmother helps to create various objects, simple and complex, super easily with fixtures for testing. It encourages developers to write m

Nov 7, 2022

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Dec 29, 2022

XChart is a light-weight Java library for plotting data.

XChart XChart is a light weight Java library for plotting data. Description XChart is a light-weight and convenient library for plotting data designed

Dec 26, 2022

Java dataframe and visualization library

Tablesaw Overview Tablesaw is Java for data science. It includes a dataframe and a visualization library, as well as utilities for loading, transformi

Jan 7, 2023

A 3D chart library for Java applications (JavaFX, Swing or server-side).

Sep 27, 2022

The foundational library of the Morpheus data science framework

Introduction The Morpheus library is designed to facilitate the development of high performance analytical software involving large datasets for both

Dec 20, 2022

Master repository for the JGraphT project

Dec 30, 2022

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

Jan 2, 2023

Library for OpenAPI 3 with spring-boot

Full documentation Acknowledgements springdoc-openapi is made possible thanks to all of its contributors. Introduction The springdoc-openapi Java libr

Jan 5, 2023

CogComp's Natural Language Processing libraries and Demos:

CogCompNLP This project collects a number of core libraries for Natural Language Processing (NLP) developed by Cognitive Computation Group. How to use

Dec 20, 2022

An extensible Java framework for building XML and non-XML streaming applications

Smooks Framework This is the Git source code repository for the Smooks Project. Build Status Building Pre-requisites JDK 8 Apache Maven 3.2.x Maven gi

Dec 1, 2022

Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

Apache Camel Apache Camel is a powerful, open-source integration framework based on prevalent Enterprise Integration Patterns with powerful bean integ

Dec 31, 2022

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues

Nakadi Event Broker Nakadi is a distributed event bus broker that implements a RESTful API abstraction on top of Kafka-like queues, which can be used

Dec 21, 2022

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox Machine Learning Framework The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid develop

Dec 9, 2022

Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

Jan 5, 2023

Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

Jan 2, 2023

Statistical Machine Intelligence & Learning Engine

Smile Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpola

Jan 1, 2023

Stream Processing and Complex Event Processing Engine

Siddhi Core Libraries Siddhi is a cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to captur

Jan 6, 2023

Logstash - transport and process your logs, events, or other data

Logstash Logstash is part of the Elastic Stack along with Beats, Elasticsearch and Kibana. Logstash is a server-side data processing pipeline that ing

Jan 5, 2023

P6Spy is a framework that enables database data to be seamlessly intercepted and logged with no code changes to the application.

p6spy P6Spy is a framework that enables database data to be seamlessly intercepted and logged with no code changes to existing application. The P6Spy

Dec 27, 2022

JCTools - Concurrency tools currently missing from the JDK.

JCTools Java Concurrency Tools for the JVM. This project aims to offer some concurrent data structures currently missing from the JDK: SPSC/MPSC/SPMC/

Dec 28, 2022

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

Jan 5, 2023

High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Dec 31, 2022

Functional patterns for Java

λ Functional patterns for Java Table of Contents Background Installation Examples Semigroups Monoids Functors Bifunctors Profunctors Applicatives Mona

Dec 29, 2022

Java 8 annotation processor and framework for deriving algebraic data types constructors, pattern-matching, folds, optics and typeclasses.

Derive4J: Java 8 annotation processor for deriving algebraic data types constructors, pattern matching and more! tl;dr Show me how to write, say, the

Nov 23, 2022

An advanced, but easy to use, platform for writing functional applications in Java 8.

Getting Cyclops X (10) The latest version is cyclops:10.4.0 Stackoverflow tag cyclops-react Documentation (work in progress for Cyclops X) Integration

Dec 29, 2022

:fire: Seata is an easy-to-use, high-performance, open source distributed transaction solution.

Seata: Simple Extensible Autonomous Transaction Architecture What is Seata? A distributed transaction solution with high performance and ease of use f

Jan 2, 2023

Open Source In-Memory Data Grid

Hazelcast Hazelcast is an open-source distributed in-memory data store and computation platform. It provides a wide variety of distributed data struct

Dec 31, 2022

A reactive Java framework for building fault-tolerant distributed systems

Atomix Website | Javadoc | Slack | Google Group A reactive Java framework for building fault-tolerant distributed systems Please see the website for f

Dec 29, 2022

A Java library for parsing and building iCalendar data models

iCal4j - iCalendar parser and object model Table of Contents Introduction - What is iCal4j? Setup - Download and installation of iCal4j System require

Jan 5, 2023

Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...

Jan 5, 2023

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Trino is a fast distributed SQL query engine for big data analytics. See the User Manual for deployment instructions and end user documentation. Devel

Dec 31, 2022

Apache Druid: a high performance real-time analytics database.

Jan 1, 2023

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Copyright Debezium Authors. Licensed under the Apache License, Version 2.0. The Antlr grammars within the debezium-ddl-parser module are licensed unde

Dec 26, 2022

Apache Calcite

Apache Calcite Apache Calcite is a dynamic data management framework. It contains many of the pieces that comprise a typical database management syste

Dec 31, 2022

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Apache ORC ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with

Jan 2, 2023

Protocol Buffers - Google's data interchange format

Jan 1, 2023

adt4j - Algebraic Data Types for Java

adt4j - Algebraic Data Types for Java This library implements Algebraic Data Types for Java. ADT4J provides annotation processor for @GenerateValueCla

Aug 25, 2022

Infinispan is an open source data grid platform and highly scalable NoSQL cloud data store.

The Infinispan project Infinispan is an open source (under the Apache License, v2.0) data grid platform. For more information on Infinispan, including

Dec 31, 2022

cglib - Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access frameworks to generate dynamic proxy objects and intercept field access.

cglib Byte Code Generation Library is high level API to generate and transform JAVA byte code. It is used by AOP, testing, data access frameworks to g

Jan 8, 2023

Dozer is a Java Bean to Java Bean mapper that recursively copies data from one object to another.

Dozer Active Contributors We are always looking for more help. The below is the current active list: Core @garethahealy @orange-buffalo ?? Protobuf @j

Jan 5, 2023

Java Data-analytics Resources

Java data-analytics Libraries

Related tags