statistics, data mining and machine learning toolbox

Overview

rapaio logo

Disambiguation

  1. (Italian dictionary) Field of turnips. It is also a place where there is confusion, where tricks and sims are plotted.

  2. (Computer science) Statistics, data mining and machine learning library written in Java.

Try it online

Launch rapaio with IJava binder jupyter Launch rapaio with IJava binder jupyter lab

Build status

build status codecov.io

Documentation

Rapaio is a rich collection of data mining, statistics and machine learning tools written completely in Java. Documentation for this library is hosted as github pages. Most of the documentation is written as Jupyter notebooks and hosted at rapaio-notebooks github repository. The notebooks repository can also be spinned up through binder.

Installation

Last published release on maven central is 2.4.0

   <dependency>
     <groupId>io.github.padreati</groupId>
     <artifactId>rapaio</artifactId>
     <version>2.4.0</version>
   </dependency>

The best way for exploration is through jupyter / jupyter-lab notebooks. This is excellent for experimenting with interactive notebooks or to document the ideas you are working on.

You have to install jupyter / jupyter-lab and IJava kernel. For more information you can follow the instruction from IJava. The following notation is specific to IJava kernel jupyter notation.

%maven io.github.padreati:rapaio:2.4.0  

Acknowledgements

Many thanks to JetBrains who provided open source licenses for their brilliant IDE a

Many thanks to SpencerPark for the java kernel he realized IJava jupyter kernel.

Comments
  • NumberFormatException in  XWilkinson.getList()

    NumberFormatException in XWilkinson.getList()

    The issue occurs when using non-US locale.

    The bug is in:

    // XWilkinson.getList()
    list.add(Double.valueOf(String.format("%." + Math.abs(digits) + "f", i)));
    

    Double.valueOf(..) allows only numbers with dot as fraction separator. In case of Russion or French locale String.format() returns '0,123'.

    I suggest using NumberFormat.getInstance().parse(String.format("%.4f", 0.123445)).doubleValue() to get consistent reuslts. Or replace String.fromat(..) with Formatter() which isn't locale specific.

    To reproduce set local to non-us, e.g. add VM options

    -Duser.language=ru
    -Duser.country=RU
    

    and run any code which invokes this method, e.g.

    // kotlin
    XWilkinson.base10(XWilkinson.DEEFAULT_EPS).searchBounded(0.0, 1.0, 5).list
    

    P.S: Perhaps the same issue is not only in this method

    bug 
    opened by iromeo 6
  • Bump notebook from 5.7.8 to 6.0.1 in /notebooks

    Bump notebook from 5.7.8 to 6.0.1 in /notebooks

    Bumps notebook from 5.7.8 to 6.0.1.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 5
  • Plot markers ignore number format settings defined in user locale.

    Plot markers ignore number format settings defined in user locale.

    Plot markers ignore number format settings defined in user locale. E.g. for RU or FR locale I get "0.7" instead of "0,7".

    To reproduce set the locale to non-us, e.g. add VM options -Duser.language=ru -Duser.country=RU

    opened by iromeo 4
  • Rapio branch 1

    Rapio branch 1

    Added Ridge regression. It allows the user to train and predict using Ridge Regression for a given Regularization strength Although I've only changed a few files, i seemed to have committed all the files, sorry about that, I am new to github, hope you understand

    opened by VishwaasHegde 4
  • Add methods to do TTestTwoSamples with given samples mean and deviation

    Add methods to do TTestTwoSamples with given samples mean and deviation

    So, currently, TTestTwoSamples only allow testing by using the two whole samples.

    I would like to be able to use the TTestTwoSamples if I know the mean and deviation of every sample already.

    This is actually available in ZTestTwoSamples but not in TTestTwoSamples

    Hope you consider adding this feature. Thanks in advance :)

    opened by jahirfiquitiva 3
  • Get F distribution value given alpha, numerator and denominator degrees of variance

    Get F distribution value given alpha, numerator and denominator degrees of variance

    Hello, I am trying to implement your library for a project. Everything is working really good, so thanks for creating this. Anyways, I need to get a value from the F distribution table given the alpha, the numerator degrees of variance, and the denominator degrees of variance.

    Something like: F0.025,9,14

    Where: α = 0.025 num = 9 den = 14

    The output value here should be 3.2093 according to this table

    I am not sure if this is already possible and I just couldn't find the right class and method to call. If that's the case, would you mind explaining me what to use to achieve this?

    If this isn't available yet, could you please consider this as a feature request?

    Thanks again for such an amazing work.

    opened by jahirfiquitiva 3
  • Refactor - delete duplicated code & extract class (NaiveBayes.java)

    Refactor - delete duplicated code & extract class (NaiveBayes.java)

    1. In NaiveBayes.java : duplicated code in coreFit() method delete => BuildSumLog() method

    2. extract class : NaiveBayesData, NumericData, NominalData, BinaryData

    opened by holinder4s 3
  • Refactoring - day1(BoundFrame.java)

    Refactoring - day1(BoundFrame.java)

    1. Extract 3 methods in byRows() method (BoundFrame.java)

      • nameLengthComp()
      • nameValueComp()
      • columnExistsCheck()
    2. Renaming 2 methods in BoundFrame.java

      • rowCount() to getrowCount()
      • varCount() to getvarCount()
    opened by holinder4s 3
  • implement resizable vectors

    implement resizable vectors

    It is useful that sometimes to be able to grow dynamically the size of the vector. similar with how the java.util.Array behaves. However, when one knows the appropriate initial allocation size, to be able to use that.

    enhancement 
    opened by padreati 3
  • Using TCK Tested JDK builds of OpenJDK

    Using TCK Tested JDK builds of OpenJDK

    The AdoptOpenJDK has been discontinued since July 2021. When using Zulu you get all the latest updated (TCK Tested) builds for all versions of OpenJDK. Added the upcoming LTS release JDK 17.

    opened by carldea 2
  • Bump notebook from 5.7.6 to 5.7.8 in /notebooks

    Bump notebook from 5.7.6 to 5.7.8 in /notebooks

    Bumps notebook from 5.7.6 to 5.7.8.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • Introduce class priors for classifiers

    Introduce class priors for classifiers

    Introduce class priors for classification. Currently class priors are either simulated, only for learning, through individual weights, or are implemented as a custom feature in some of the classifiers. Class priors should be a generic feature used in scoring methods. As such, we should implement that in a more generic way.

    enhancement 
    opened by padreati 0
  • [Feature][Graphics] Top,bottom,left,right axis display

    [Feature][Graphics] Top,bottom,left,right axis display

    Currently the graphics package allows one to draw axis element on left (for vertical axis) and bottom (for horizontal axis). This is enough for simple plots, but further customization will allow one o draw multiple plots on the same grid layer cell and also nicer drawings in general.

    enhancement UI 
    opened by padreati 0
  • [Feature][Graphics] Multiple axis plotting

    [Feature][Graphics] Multiple axis plotting

    The last big refactory for graphical components allows one to have multiple plots superimposed on the same grid layer cell, each plot with different pair of axes. Technically this is possible, it depends however on the ability to draw on left,right,top, bottom the axes, but we need also a nice API and a decent default behavior for simple situations.

    enhancement UI 
    opened by padreati 0
  • Feature][Graphics] Violin plots

    Feature][Graphics] Violin plots

    A violin plot is similar to a kernel density estimator (actually it is a kernel density estimator), but displayed in parallel, like in boxplot, not superimposed like one can normally already is able to do (multiple kernel density estimator lines). The violin chart is like a boxplot but with more flexibility and carried information about the density.

    enhancement UI 
    opened by padreati 0
Releases(v4.0.0)
Owner
Aurelian Tutuianu
Aurelian Tutuianu
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox Machine Learning Framework The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid develop

Vasilis Vryniotis 1.1k Dec 9, 2022
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

null 900 Jan 2, 2023
Java Statistical Analysis Tool, a Java library for Machine Learning

Java Statistical Analysis Tool JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made

null 752 Dec 20, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.8k Dec 28, 2022
Statistical Machine Intelligence & Learning Engine

Smile Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpola

Haifeng Li 5.7k Jan 1, 2023
A machine learning package built for humans.

aerosolve Machine learning for humans. What is it? A machine learning library designed from the ground up to be human friendly. It is different from o

Airbnb 4.8k Dec 30, 2022
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large scale machine l

Oryx Project 1.7k Mar 12, 2021
Tribuo - A Java machine learning library

Tribuo - A Java prediction library (v4.2) Tribuo is a machine learning library in Java that provides multi-class classification, regression, clusterin

Oracle 1.1k Dec 28, 2022
Java time series machine learning tools in a Weka compatible toolkit

UEA Time Series Classification A Weka-compatible Java toolbox for time series classification, clustering and transformation. For the python sklearn-co

Machine Learning and Time Series Tools and Datasets 140 Nov 7, 2022
Reference implementation for MINAS (MultI-class learNing Algorithm for data Streams), an algorithm to address novelty detection in data streams multi-class problems.

Reference implementation for MINAS (MultI-class learNing Algorithm for data Streams), an algorithm to address novelty detection in data streams multi-class problems.

Douglas M. Cavalcanti 4 Sep 7, 2022
Bazel training materials and codelabs focused on beginner, advanced and contributor learning paths

Bazel-learning-paths This repo has materials for learning Bazel: codelabs, presentations, examples. We are open sourcing the content for training engi

null 18 Nov 14, 2022
java deep learning algorithms and deep neural networks with gpu acceleration

Deep Neural Networks with GPU support Update This is a newer version of the framework, that I developed while working at ExB Research. Currently, you

Ivan Vasilev 1.2k Jan 6, 2023
💻 Machine Coding - leetcode LLD (coding blox) - It is an Online Coding Platform that allows a user to Sign Up, Create Contests and participate in Contests hosted by Others.

leetcode-lld-flipkart-coding-blox Machine coding - leetcode LLD (coding blox) My Approach : https://leetcode.com/discuss/interview-question/object-ori

Hariom Yadav 50 Sep 15, 2022
A course for learning how to program FRC robots using the WPILib and a Romi robot.

FRC-Romi-Programming-Course A course for learning how to program FRC robots using the WPILib and a Romi robot. This course is designed for FRC teams o

null 16 Nov 9, 2022
On-device wake word detection powered by deep learning.

Porcupine Made in Vancouver, Canada by Picovoice Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening

Picovoice 2.8k Dec 30, 2022
An Engine-Agnostic Deep Learning Framework in Java

Deep Java Library (DJL) Overview Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. DJL is desig

Amazon Web Services - Labs 2.9k Jan 7, 2023
Learning Based Java (LBJava)

Learning Based Java LBJava core LBJava examples LBJava maven plugin Compiling the whole package From the root directory run the following command: Jus

CogComp 12 Jun 9, 2019