Java dataframe and visualization library

Overview

Tablesaw

Apache 2.0 Build Status Codacy Badge Maintainability Rating

Overview

Tablesaw is Java for data science. It includes a dataframe and a visualization library, as well as utilities for loading, transforming, filtering, and summarizing data. It's fast and careful with memory. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and integrates well with the Smile machine learning library.

Tablesaw features

Data processing & transformation

  • Import data from RDBMS, Excel, CSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.)
  • Export data to CSV, JSON, HTML or Fixed Width files.
  • Combine tables by appending or joining
  • Add and remove columns or rows
  • Sort, Group, Query
  • Map/Reduce operations
  • Handle missing values

Visualization

Tablesaw supports data visualization by providing a wrapper for the Plot.ly JavaScript plotting library. Here are a few examples of the new library in action.

Tornadoes Tornadoes Tornadoes
Tornadoes Tornadoes Tornadoes
Tornadoes Tornadoes Tornadoes
Tornadoes Tornadoes Tornadoes

Statistics

  • Descriptive stats: mean, min, max, median, sum, product, standard deviation, variance, percentiles, geometric mean, skewness, kurtosis, etc.

Getting started

Add tablesaw-core to your project. You can find the version number for the latest release in the release notes:

<dependency>
    <groupId>tech.tablesaw</groupId>
    <artifactId>tablesaw-core</artifactId>
    <version>VERSION_NUMBER_GOES_HERE</version>
</dependency>

You may also add supporting projects:

  • tablesaw-beakerx - for using Tablesaw inside BeakerX
  • tablesaw-excel - for using Excel workbooks
  • tablesaw-html - for using HTML
  • tablesaw-json - for using JSON
  • tablesaw-jsplot - for creating charts

Documentation and support

And always feel free to ask questions or make suggestions here on the issues tab.

Integrations

Comments
  • Implement transpose #696

    Implement transpose #696

    Thanks for contributing.

    Description

    Added an implementation of Transpose. It has the restriction that columns must be of the same type. This is an initial version for feedback

    Testing

    Added a unit test for the feature

    opened by jackie-h 31
  • Column-wise DataFrame-like operations

    Column-wise DataFrame-like operations

    Hi, I am new to Tablesaw. I am exploring options for recreating some Pyhton DataFrame operations in Tablesaw. For example, I have a DataFrame object called data1 and I use existing columns of this data frame to create new ones (and update existing ones). Here is a couple of lines of Pyhton codes:

    data1['days'] = data1['buyDate'].apply(lambda x: (today - x).days) ... data1['CAGR'] = ((data1['curValue'] / data1['bookValue']) ** (1.0 / data1['nYears']) - 1.0) * 100

    Is there a way to implement something similar using Tablesaw?

    Thanks a lot in advance.

    enhancement core 
    opened by imfaisalmb 31
  • Readonly data: Any equivalent in tablesaw to pandas' view vs copy?

    Readonly data: Any equivalent in tablesaw to pandas' view vs copy?

    I've finally got my use case using tablesaw to an initial build and tried a few runs, and it is rather slow compared to a python version I wrote using pandas previously. I migrated to Java to get better concurrency in the hope of making it go faster.

    Profiling it I see why -- it is spending 75% of its time in tech.tablesaw.table.Rows.copy(). This is because of some Table.where() calls that are designed to filter down the input data.

    Briefly, the background is my input data is price history for certain financial assets. Having established a price at a certain time (a simulated trade entry point) I then want to see if the market went up or down by a certain amount from that point. I am currently doing so using a filter like:

    useData.where(useData.numberColumn("High").isGreaterThanOrEqualTo(target).or(useData.numberColumn("Low").isLessThanOrEqualTo(stop)))

    (method names from memory so may have slight inaccuracies)

    That gives me a filtered table, and then selecting the first row from this gives me the first instance that fulfilled the criteria. I'm only ever interested in the first row matching the criteria but can't think of a way to get tablesaw to stop once that row is found, so instead get the entire table and take the first row (or rather first value of each column) as needed. This kind of approach is applied repeatedly to further filter price data depending on what is found at each stage. The result is a lot of calls to Table.where().

    As I say the code ends up spending a lot of time copying rows, presumably from the original data to the return value of the where() method. Is there a way to get tablesaw to take a "view" approach as pandas would in this situation, that is not actually copy the data but simply copy references to the data, which would be faster? This obviously comes along with issues if one later tries to modify the filtered result represented by the view, because it would be impossible to do so without also modifying the original; pandas handles that by making attempts to alter a view an error, forcing the user to use an operation that would force a copy when that is what they want to do. For my particular use case, since the data is only read (in this particular case I don't even summarise it) I don't mind not being able to modify.

    Does tablesaw have anything analogous to this view concept from pandas? Again, the goal is to avoid needlessly copying a lot of data.

    opened by mark27q1 29
  • Add support to join tables on multiple columns.

    Add support to join tables on multiple columns.

    Thanks for contributing.

    Description

    Enhanced API to accept multiple names of columns to join on. Supported for all joins: inner, leftOuter, rightOuter, and fullOuter. Added javadoc where missing for public methods.

    Testing

    Added dozens of junit tests to cover all changed API. Ran coverage tool in eclipse to ensure testing/exercising of all changes was being done. Reached 100% coverage for all but one method. Goal to reach 100% coverage exposed failure of LONG types to be parse from cvs input so added missing support for that.

    opened by gregorco 22
  • toString() should be more fault tolerant

    toString() should be more fault tolerant

    I'm debugging some code where columns in a table can be of different sizes. To help diagnose the problem I'm printing the table. However, Relation's toString() method, that uses a DataFramePrinter, throws an IndexOutOfBoundsException because the frame.size() is based on the first column's size, while others are shorter. A toString-method should be very careful to not throw exceptions, since it's typically used during debugging. I suggest making DataFramePrinter more fault tolerant when fetching column values, e.g. in line 192: data[i][j] = frame.getString(i, j);

    opened by hallvard 21
  • Circular dependencies in Columns/ColumnTypes can lead to unpredictable behavior

    Circular dependencies in Columns/ColumnTypes can lead to unpredictable behavior

    A null value is sometimes returned by a ColumnType constant. Whether or not the value is null depends on the order in which other code is executed.

    To recreate, I built an array of ColumnTypes and printed the array. Sometimes the values are all initialized correctly, sometimes not:

    Correct:

    [STRING, LOCAL_DATE_TIME, INTEGER]
    

    Incorrect:

    [STRING, null, INTEGER]
    

    The different results can be had by adding or removing a line of code just before printing. The correct result is printed when the line is removed.

    Here is the main class with the offending line included:

    class DummyClass {
    
        public static void main(String[] args) {
            long missing = DateTimeColumn.MISSING_VALUE;
            new DummyPrinter().printColumnTypes();
        }
    } 
    
    

    When I comment out line 1 in main, the code works as expected. Also, the code that implements printing must be in a separate class for the error to occur. Here's that class:

    class DummyPrinter {
    
        void printColumnTypes() {
            System.out.println(Arrays.toString(columnTypes));
        }
    
        private static final ColumnType[] columnTypes = {
            STRING,
            LOCAL_DATE_TIME,
            INTEGER,
        };
    }
    
    

    The array itself is a literal constant. The values inserted into the array are also constants. They are declared in the ColumnType interface in the line shown below:

    DateTimeColumnType LOCAL_DATE_TIME = DateTimeColumnType.INSTANCE;
    

    In that line, the values are provided by a constant (INSTANCE) declared in DateTimeColumnType.

    Here is the code from that class where INSTANCE is created:

    public static final DateTimeColumnType INSTANCE =
            new DateTimeColumnType(BYTE_SIZE, "LOCAL_DATE_TIME", "DateTime");
    
    

    The line of code that turns the error on and off causes the class DateTimeColumnType to load in a different order when it's present than it does when absent.

    opened by lwhite1 21
  • Implement a pie chart

    Implement a pie chart

    Using xChart or JavaFX Charts create an interface that enables easy rendering of the plot from a tablesaw table. See the implementation of Bar Plots (which use JavaFX) for a very similar example.

    https://github.com/jtablesaw/tablesaw/blob/master/plot/src/main/java/tech/tablesaw/api/plot/Bar.java

    enhancement help wanted 
    opened by lwhite1 19
  • Added support for reading and writing Apache ORC file format

    Added support for reading and writing Apache ORC file format

    Thanks for contributing.

    Description

    1. Added support for reading and writing Apache ORC file format Fixes #620

    Testing

    Unit Test cases added

    opened by murtuza-ranapur 17
  • Major problems trying out plots; everything breaks.

    Major problems trying out plots; everything breaks.

    Maybe I'm looking in the wrong place, but it doesn't seem like there is sufficient information for a newcomer to get plotting working—and this is the main reason I'm looking at this library.

    1. First of all, a suggestion: the page https://jtablesaw.github.io/tablesaw/userguide/introduction says nothing at all about the dependency needed for plotting. One has to know to look back at https://github.com/jtablesaw/tablesaw to figure this out. But this is a minor issue.

    2. Much bigger is that the first code at https://jtablesaw.github.io/tablesaw/userguide/Introduction_to_Plotting does not work. Not even close. It won't even compile. It doesn't even have balanced parentheses. Fixing the parentheses still won't get it to compile, and certainly won't show the plot that was promised.

    Figure fig = BubblePlot.create("Average retail price for champagnes by year and rating",
                    champagne,					// table name
                    "highest pro score",		// x variable column name
                    "year",						// y variable column name
                    "Mean Retail"				// bubble size
                   ));
    
    1. So I skip the "introduction" page and go straight to the "real" code at https://jtablesaw.github.io/tablesaw/userguide/BarsAndPies .

    First we load the Tornado dataset:

    Table tornadoes = Table.read().csv("Tornadoes.csv");
    

    What is "the Tornado dataset" and where can I find it? An earlier paragraph mentioned, "we’ll use a Tornado dataset from NOAA", so I went to the NOAA site and downloaded the most promising looking data file named 1950-2017_all_tornadoes.csv. I tried loading that in Tablesaw:

    …
    Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 10000 out of bounds for length 10000
    	at com.univocity.parsers.common.ParserOutput.valueParsed(ParserOutput.java:327)
    	at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:176)
    	at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:560)
    	... 7 more
    

    (So you are using the uniVocity parsers. I'm familiar with them. But they aren't so "plug-and-play" as they make them out to be, as you can see here.)

    I searched the web for "Tornadoes.csv". One unrelated site implied that 2018_torn_prelim.csv might be more promising, but that didn't work. https://gist.github.com/darrenjaworski/5874227 mentioned a tornadoes.csv, and it was some Google spreadsheet, which I downloaded to a CSV file, but that gave me:

    Exception in thread "main" java.lang.IllegalArgumentException: Cannot add column with duplicate name Short column: state number to table tornadoes.csv
    	at tech.tablesaw.api.Table.validateColumn(Table.java:161)
    	at tech.tablesaw.api.Table.addColumns(Table.java:144)
    	at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:147)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:34)
    	…
    

    I finally realized you that this GitHub repository has some "tornado" CSV files. I downloaded and renamed tornadoes 1950-2014.csv, but no go:

    Exception in thread "main" tech.tablesaw.io.csv.AddCellToColumnException: Error while adding cell from row 41658 and column Crop Loss(position:10): Error adding value to column Crop Loss: For input string: "0.4"
    	at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:244)
    	at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:156)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58)
    	at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:34)
    	…
    Caused by: java.lang.NumberFormatException: Error adding value to column Crop Loss: For input string: "0.4"
    	at tech.tablesaw.api.ShortColumn.appendCell(ShortColumn.java:353)
    	at tech.tablesaw.api.ShortColumn.appendCell(ShortColumn.java:27)
    	at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:242)
    	... 5 more
    

    This is frustrating. This is what a new user is faced with.

    If this turns out to be a good library, believe me I'll pitch in and help with the documentation and probably even the code. But I'm stuck just in the first lines! Could someone help me?

    opened by garretwilson 17
  • How to set data type to whole Table.

    How to set data type to whole Table.

    Hello tablesaw team. I've question regarding setup of data type for the Table if I don't know the amount of colums. I have 30_000x2000 feature csv file with 0.0 and some other amount of Double numbers. If I call csv parsing via:

    CsvReadOptions options = CsvReadOptions.builder(csv)
                        .header(false)
                        .maxNumberOfColumns(50_000).build();
    
    Table t = Table.read().csv(options);
    

    I got Number format exception, as all 0.0 number are treated as Short 0. So when reader gets to real numbers like 13.5 if throws NFE.

    But if I add sample(false) to reader options if takes about 2:40 to parse such file.

    How can I setup data type for whole Table, as far as I can see only by setting columnType in parser option, but it's won't work as I don't know a number of columns on csv file?

    P.S. I used com.univocity.parsers.csv.CsvParser separately to read the same file so it takes 2:20 for parser.parsAll and 1:20 for parsing file by row.

    opened by Ebalaitung 17
  • Joins are too slow

    Joins are too slow

    Hi guys! I'm trying to migrate from python+pandas to kotlin+tablesaw. Some parts of my code are already working fast (like csv parsing, x2 times faster than in pandas) But also i've noticed that inner join operation is pretty slow (~ 2 times slower than in pandas)

    Then i tried to optimize my code and use isIn Selection instead of simple join. Unfortunately it uses strings.toArray(new String[0]) under the hood for input parameter collection. It would be more sense to use HashSet to quicker lookup. So i wrote my own predicate:

    val customersSet = customers.toHashSet()    // for faster lookup
    val idColumn = transactions.stringColumn("customer_id")
    idColumn.eval { it in customersSet }
    

    Which is x15 times faster than original inner join. At least on my huge dataset. Of course this is much simpler than join operation since i haven't appended columns etc. But still the difference is huge. I didn't investigate joins code yet, but i hope there is space for improvements there. My key point is: kotlin+jvm should be at least not slower than python+pandas What do you guys think?

    ps: do you use hash indexing on table columns?

    opened by mykola-dev 17
  • tablename.getstring(rowno,columnname) trim spaces

    tablename.getstring(rowno,columnname) trim spaces

    The below one is my table,

    RowNo | RCDTYP | SNDARLIN | RCVARLIN | FILTYP | DELSEQNUM | CRTDAT | FILR1 |

     1  |       0  |        AV  |        IB  |  HANDBACK  |     000002  |  20220426  |         |
    

    tablename.getstring(rowno,columnname)

    takes "IB" only, i want it as " IB"

    can somebody please help?

    opened by Devika123456788999 3
  • Add possibility to skip column type from SQL ResultSet

    Add possibility to skip column type from SQL ResultSet

    It would be useful to be able to use the following: SqlResultSetReader.mapJdbcTypeToColumnType(java.sql.Types.BINARY, ColumnType.SKIP)

    Unfortunately, currently, this results in

    java.lang.UnsupportedOperationException: Column type SKIP doesn't support column creation
            at tech.tablesaw.columns.SkipColumnType.create(SkipColumnType.java:29)
            at tech.tablesaw.io.jdbc.SqlResultSetReader.read(SqlResultSetReader.java:105)
            at tech.tablesaw.io.DataFrameReader.db(DataFrameReader.java:160)```
    
    Alternatively, it would be useful to be able to skip a column by name for the SqlResultSetReader, but i found no such options. It seams the ReadOptions don't support reading from SQL currently.
    opened by mbs-janbe 0
  • stringColumn转换成doubleColunm后,值不对

    stringColumn转换成doubleColunm后,值不对

    hi~,您好。tablesaw 非常的棒,但是在使用tablesaw过程中遇见了一些问题,希望能够给予帮助。 问题描述: 因为doubleColunm不能splitOn,所以我将doubleColunm as stringColumn后进行了分组切片,最后将stringColumn还原成doubleColumn的时候,值却出现了问题。 code :

    @Test
    public void test000() {
        DoubleColumn doubleColumn = DoubleColumn.create("year", 2022.0, 2022.0);
        StringColumn stringColumn = doubleColumn.asStringColumn();
        System.out.println(stringColumn.print());
        DoubleColumn asDoubleColumn = stringColumn.asDoubleColumn();
        System.out.println(asDoubleColumn.print());
    }
    

    输出: Column: year strings 2022.0 2022.0

    Column: year strings 0 0

    再次表示感谢🙏

    bug 
    opened by PanYangyi 0
  • Automatic parse issue in jsaw table

    Automatic parse issue in jsaw table

    final Table tbl2= Table.read() .csv( Joiner.on(System.lineSeparator()) .join("abc,cds,eee", "005,001,003" ), "Table2");

    content1 and content 2 contains two comma seperated lists, i want to treat every columns as string values eg : if content is 005 , I want to add as 005 not 5. Is there any additional method to treat as string?

    opened by Devika123456788999 0
  • Improvement suggestion: Table.print() function should also include the shape in the output

    Improvement suggestion: Table.print() function should also include the shape in the output

    Hi

    After executing any kind of table manipulation, eg add columns/rows, where clauses, you almost always want to see what the table looks like via the print/toString methods. Unfortunately it doesn't also display the shape and so most of these calls always has a following call to shape.

    The shape is important when you have a larger dataset and you can see the changes in the number of rows/columns

    ps Pandas equivalent function prints the shape when displaying table contents

    opened by minhster99 0
  • Table.summary() missing output is misleading

    Table.summary() missing output is misleading

    Hi

    I read in a file with the following structure

     Index  |   Column Name    |  Column Type  |
    --------------------------------------------
         0  |              id  |      INTEGER  |
         1  |            date  |   LOCAL_DATE  |
         2  |            time  |       STRING  |
         3  |    country_name  |       STRING  |
         4  |  state/province  |       STRING  |
         5  |      population  |      INTEGER  |
         6  |  landslide_type  |       STRING  |
         7  |         trigger  |       STRING  |
         8  |      fatalities  |      INTEGER  |
    

    when I do a summary, I get the following

      Summary   |         id          |     date     |  time  |  country_name   |  state/province  |      population      |  landslide_type  |  trigger   |  fatalities  |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
         Count  |               1693  |        1693  |  1693  |           1693  |            1693  |                1693  |            1693  |      1693  |        1693  |
           sum  |            7017532  |              |        |                 |                  |           158226757  |                  |            |              |
          Mean  |  4145.027761370351  |              |        |                 |                  |   93459.39574719437  |                  |            |              |
           Min  |                 34  |              |        |                 |                  |                   0  |                  |            |           0  |
           Max  |               7541  |              |        |                 |                  |            12294193  |                  |            |         280  |
         Range  |               7507  |              |        |                 |                  |            12294193  |                  |            |         280  |
      Variance  |  5003014.595564535  |              |        |                 |                  |  273112413878.66046  |                  |            |              |
      Std. Dev  |  2236.741959986564  |              |        |                 |                  |  522601.58235376637  |                  |            |              |
       Missing  |                     |           3  |        |                 |                  |                      |                  |            |              |
      Earliest  |                     |  2007-03-02  |        |                 |                  |                      |                  |            |              |
        Latest  |                     |  2016-03-02  |        |                 |                  |                      |                  |            |              |
        Unique  |                     |              |   159  |             28  |             227  |                      |              15  |        17  |              |
           Top  |                     |              |        |  United States  |        Kentucky  |                      |       Landslide  |  Downpour  |              |
     Top Freq.  |                     |              |  1065  |            986  |             124  |                      |             866  |       866  |              |
    

    The missing value is only shown for date and it does work, I can verify there were 3 missing date values. However I also can see missing values for fatalities but it does not appear here. If I do the following

    table.intColumn("fatalities").isMissing().size()  // returns > 0
    

    I suspect that missing is not implemented on the summary() call for IntColumn types because if I simply select that single column and do a summary() on it, I get the following with no missing statistic

    Column: fatalities  
     Measure   |  Value  |
    ----------------------
        Count  |   1690  |
          sum  |         |
         Mean  |         |
          Min  |      0  |
          Max  |    280  |
        Range  |    280  |
     Variance  |         |
     Std. Dev  |         |
    

    Could we get that statistic filled in and for those statistics that aren't supported by the column type, could we add something like 'N/A' so that it is clear? Thanks.

    ps this is a great lib. Really appreciate what you've done here!

    opened by minhster99 0
Releases(v0.43.1)
  • v0.43.1(Apr 3, 2022)

    This is a very minor release as a prelude to branching for post-java-8 development.

    What's Changed

    • Update pom.xml to upgrade apache poi-ooxml by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1085
    • fixed remaining javadoc warnings by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1086

    Full Changelog: https://github.com/jtablesaw/tablesaw/compare/v0.43.0...v0.43.1

    Source code(tar.gz)
    Source code(zip)
  • v0.43.0(Mar 30, 2022)

    What's Changed

    Security vulnerabilities addressed

    • Bump h2 from 1.4.200 to 2.1.210 in /core by @dependabot in https://github.com/jtablesaw/tablesaw/pull/1045
    • Bump Jackson version by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1070

    Bug fixes

    • fix uncaught exception in DoubleParser.canParse. (#1043) by @is in https://github.com/jtablesaw/tablesaw/pull/1044
    • fix for issue #1047 : setPrintFormatter causes NPE, plus test by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1072
    • ensured that all copy() and emptyCopy() implementations copy print fo… by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1073

    Performance-Related Enhancements

    • replaced the implementation of Table method dropDuplicateRows() with … by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1058
    • reintroduce parallel sorting of table indices where it is safe to do so by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1065
    • eliminated the auto boxing of table values and row numbers by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1066

    Other Enhancements

    • improved error message; some automated code simplification by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1019
    • Cleanup on aggregate functions by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1020
    • Wrap io exception on reads by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1021
    • added set(Selection, byte) to BooleanColumn, plus Doc cleanup by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1027
    • made saw tests run faster, no loss of coverage by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1029
    • Removed IOException from write interfaces by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1030
    • Provides better error message on column type detection index out of b… by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1032
    • Simplify adding new column to table by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1034
    • upgrade roaring bitmaps to 0.9.25 by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1063
    • Add displayLogo option to jsplot Config by @gbouquet in https://github.com/jtablesaw/tablesaw/pull/1080
    • update the snapshot version as this was apparently done incorrectly e… by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1082

    Documentation Enhancements

    • Update moneyball tutorial by @jbsooter in https://github.com/jtablesaw/tablesaw/pull/1052
    • Java doc2 by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1018
    • Update documentation readme to include all project javadoc links by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1022
    • Complete Javadoc for the Table package by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1024
    • Javadoc interpolation by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1025
    • Update README.md by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1077
    • Update README.md again by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1079

    New Contributors

    • @jbsooter made their first contribution in https://github.com/jtablesaw/tablesaw/pull/1052
    • @is made their first contribution in https://github.com/jtablesaw/tablesaw/pull/1044
    • @gbouquet made their first contribution in https://github.com/jtablesaw/tablesaw/pull/1080

    Full Changelog: https://github.com/jtablesaw/tablesaw/compare/v0.41.0...v0.43.0

    Source code(tar.gz)
    Source code(zip)
  • v0.42.0(Oct 23, 2021)

    Documentation:

    • Many JavaDoc additions and extensions.
    • Update documentation readme to include all project javadoc links (#1022)

    Enhancements

    • Wrap IOException for file reads (#1021). IOException is caught and re-thrown wrapped in a runtime exception RuntimeIOException. For interactive work, this greatly reduces the number of exceptions that need to be caught. Writes will handled in the next release.
    • Cleanup on aggregate function names (#1020). Abstract AggregateFunctions were given a consistent name structures. All class names now have the form: [columnType][returnType]AggregateFunction (e.g. BooleanIntAggregateFunction). If the column and return type are the same, it is not repeated (e.g. StringAggregateFunction).
    • Some aggregate function classes were made public so library users can subclass
    • A few methods were added to Column subclasses. Notably, an asSet() method was added to columns where it was not already present.
    • improved error message for Column append methods (#1019)
    Source code(tar.gz)
    Source code(zip)
  • v0.41.0(Oct 17, 2021)

    This is a documentation only release focused on improving JavaDoc coverage.

    Documentation

    The following are now fully documented for public methods.

    In package tech.tablesaw.tables

    • Relation

    In package tech.tablesaw.api

    • Table
    • Row
    • ColumnType

    In package tech.tablesaw.columns and sub-packages

    • Column
    • AbstractColumn
    • AbstractStringColumn
    • SkipColumnType

    All classes and interfaces in the following packages:

    • tech.tablesaw.indexing
    • tech.tablesaw.selection
    • tech.tablesaw.joining
    • tech.tablesaw.aggregation
    • tech.tablesaw.sorting (and comparator subpackage)
    Source code(tar.gz)
    Source code(zip)
  • v0.40.0(Oct 17, 2021)

    This release focused on minor enhancements that eliminate gaps in functionality.

    Note that the change to method Table:shape() modified the String that is returned, changing the functionality of the method slightly.

    Enhancements

    @lwhite1 Minor extensions (#999) Added methods:

    • DoubleColumn:asDoubleArray()

    • FloatColumn:asFloatArray()

    • IntColumn:asIntArray()

    • ShortColumn:asShortArray()

    • StringFilters:isIn()

    • StringFilters:iNotIn()

    • IntColumn:isNotIn()

    Other enhancement:

    • Made Table:append() accept any Relation as its argument, not just another table.

    Made Table:removeColumns() return Table rather than Relation (#1003) …

    Added method Date:isNotEqualTo(LocalDate) (#1004) …

    Made shape() return the name of the table, along with the shape (#1005)

    Standardized names for methods, added missing methods (#1010)

    • Deprecated addRow(Row) and added appendRow(Row) to make the name more consistent with append(Table).
    • Added methods selectColumns() and rejectColumns() to provide variations to removeColumns() and retainColumns() that return new tables rather than modify the table in-place.
    • Added method Relation:containsColumn(String name);
    • other minor enhancements to code and documentation

    Made Table:countBy() take varargs so the counts can group on more than one column

    Source code(tar.gz)
    Source code(zip)
  • v0.38.5(Sep 7, 2021)

    Small release with one important bug fix. There is also a documentation enhancement.

    Bug fixes

    @lwhite1 SliceGroup TextColumn handling revision (#990). Fixes issue where splitting a large file on a TextColumn (as when using groups in aggregations) could cause a major increase in memory.

    Enhancements

    @dependabot Bump jsoup from 1.12.1 to 1.14.2 in /html (#977)

    @lwhite1 Allow TextColumn to append StringColumns, and vice-versa (#983)

    @lwhite1 made all data fields protected (#991)

    Documentation

    @lwhite1 Update gettingstarted.md

    Source code(tar.gz)
    Source code(zip)
  • v0.38.4(Aug 21, 2021)

    This is a relatively small release with a few nice enhancements and several bug fixes. There is also a documentation enhancement publicizing @ccleva's Parquet integration project.

    Bug fixes

    Fix bug where missing values in numeric columns could not be formatted. This enables arbitrary missing value indicators (e.g. "N/A") to be used in printing and saving to files. @lwhite1

    Replace parallelQuickSort with mergeSort (#968), to avoid incorrect sorting caused by race conditions when a custom sort object is used. @lwhite1

    fix issue #963 (#967) Relation.structure() fails for TableSlice with ClassCastException @lwhite1

    Enhancements

    Aggregate by non-primitive column type that extends Number (#973), making it possible to add a column type for BigDecimal @daugasauron @kallur

    plotly - added range slider to Axis (#953) … @smpawlowski

    To support annotation in plot.ly javascript. (#944) … @xcjusuih

    Documentation

    Added link to the tablesaw-parquet project in README (#966) @ccleva

    Source code(tar.gz)
    Source code(zip)
  • v0.38.3(Jul 22, 2021)

    Features

    • Improved Printformatting (#914)
    • Improve innerJoin performance (#903) - Thanks, @DanielMao1
    • Allow reading of malformed CSV (#901) - Thanks, @ChangzhenZhang
    • Fixes #822 and #815 providing more extensive columntype options - Thanks, @lujop
    • Support multiple custom missing value options in Readers
    • Allow default parsing to be overridden per column. (#928) - Thanks, @jglogan
    • Add contour plot (#938) - Thanks, @ArslanaWu
    • Add support for violin plot (#936) - Thanks, @LUUUAN
    • Add options for keep join keys in both table when appling outer join - Thanks, @DanielMao1
    • Assign FixedWidthWriterSettings.fieldLengths (#943) - Thanks, @Kerwinooooo
    • Support percentage by parsing percentage to double (#906) - Thanks, @Kerwinooooo
    • Open default local browser on an arbitrary HTML page #860 (#949)

    Bug Fixes

    • Fix bug of leftOuter join when using multi-tables (#905) Thanks @Carl-Rabbit
    • fix bug in appendCell that caused custom parser to be ignored (#912)
    • Corrected surefire plugin argLine (#915) Thanks @ccleva
    • Fix CI on Windows (#904) @lujop
    • Fix implementation of append(String) in TextColumn (#921)
    • Fix rightOuter join on multiple tables (#922) Thanks (again) @Carl-Rabbit
    • Fix XlsxReader doesn't respect calculated tableArea for header column names #887 (#926) - Thanks @lujop
    • Remove print statements in tests writing to system.out (#933)
    • Fix Column Type detection #751 and Integer handling in XlsxReader #882 (#931) - Thanks, @lujop
    • Fix(table): method 'where' apply 2 times selection function (#962) - Thanks, @zhiwen95
    • Support for not closing the output stream opened by user (#941) Thanks, @Kerwinooooo, @ChangzhenZhang

    Documentation

    • Update README.md (#917)

    Misc

    • Bump guava from 28.2-jre to 29.0-jre in /core (#895)
    • Bump guava version again for security improvements (#932)
    Source code(tar.gz)
    Source code(zip)
  • v0.38.1(May 9, 2020)

    Features

    • More options for creating a bubble plot (https://github.com/jtablesaw/tablesaw/pull/781) - thanks @rayeaster

    Bug Fixes

    • Fix support for java.sql.Time (https://github.com/jtablesaw/tablesaw/pull/791) - thanks @brainbytes42
    • Allow empty slices when aggregating (https://github.com/jtablesaw/tablesaw/pull/795) - thanks @emillynge
    • Fix NPE in ColumnType.compare (https://github.com/jtablesaw/tablesaw/pull/799)
    • Fix NPE in set (https://github.com/jtablesaw/tablesaw/pull/800)
    Source code(tar.gz)
    Source code(zip)
  • v0.38.0(Apr 13, 2020)

    Features

    • ignoreZeroDecimal option when reading data (https://github.com/jtablesaw/tablesaw/pull/748) - Thanks @larshelge
    • indexOf method (https://github.com/jtablesaw/tablesaw/pull/787) - Thanks @islaterm
    • Ability to add quotes to CSV even if not strictly required (https://github.com/jtablesaw/tablesaw/pull/767)
    • Ability to set layout and config for plots (https://github.com/jtablesaw/tablesaw/pull/690)
    • Pie chart subplots (https://github.com/jtablesaw/tablesaw/pull/777)
    • Plotting of Instant data (https://github.com/jtablesaw/tablesaw/pull/765)
    • Include sheet name when reading from Excel (https://github.com/jtablesaw/tablesaw/pull/758) - Thanks @R1j1t

    Bug Fixes

    • Joining an empty table (https://github.com/jtablesaw/tablesaw/pull/783) - Thanks @vanderzee-anl-gov
    • Use same options for reading and writing a CSV by default (https://github.com/jtablesaw/tablesaw/pull/772)
    • Reading of binary data from database
    • Make DoubleColumn.create work on wider range of input
    • Fix column sorting (https://github.com/jtablesaw/tablesaw/pull/778)
    • Fixed equals method on BooleanColumn (https://github.com/jtablesaw/tablesaw/pull/766)
    • Fixed 3D scatter plot (https://github.com/jtablesaw/tablesaw/pull/764)
    • Fixed BoxBuilder (https://github.com/jtablesaw/tablesaw/pull/763)
    • Make Component.engine non-static (https://github.com/jtablesaw/tablesaw/pull/762)
    • Fixed shaded jar
    • Improved handling of missing values when calling get on a column

    Documentation

    • Fix broken link to data import docs (https://github.com/jtablesaw/tablesaw/pull/773) - Thanks @bantu
    • Add docs for reading from Excel (https://github.com/jtablesaw/tablesaw/pull/759) - Thanks @R1j1t
    • Fixed CSV reading docs (https://github.com/jtablesaw/tablesaw/commit/6fc6a4d013e4cb92b5a8dc13d2d5e2fc62ec1460) - Thanks @salticus
    Source code(tar.gz)
    Source code(zip)
  • v0.37.2(Jan 24, 2020)

  • v0.37.1(Jan 24, 2020)

    Breaking Changes

    • Table.summary now returns a Table instead of a String - Thanks @jackie-h

    Features

    • Table transpose https://github.com/jtablesaw/tablesaw/commit/1b01eaf5c94c8a51d09be7fe2c080a78dc9a03e1 - Thanks @jackie-h
    • Added ability to sample rows while reading a CSV - Thanks @aecio
    • Additional Column and Table create methods

    Cleanup

    • Fixed a bunch of SonarCloud warnings
    • Improved exception message for duplicate Table columns
    • Validation for Table joins
    Source code(tar.gz)
    Source code(zip)
  • v0.37.0(Jan 8, 2020)

    Features

    • Upgraded to Smile 2.0 (https://github.com/jtablesaw/tablesaw/pull/735)
    • Autocorrelation (https://github.com/jtablesaw/tablesaw/pull/726)
    • InstantColumn min and max (https://github.com/jtablesaw/tablesaw/pull/719)
    • Enhancements to histogram (https://github.com/jtablesaw/tablesaw/pull/700)
    • New Column.map method (https://github.com/jtablesaw/tablesaw/pull/705)
    • Expose two FileReader methods (https://github.com/jtablesaw/tablesaw/pull/701)
    • New Plotly config argument (https://github.com/jtablesaw/tablesaw/pull/691)
    • Read specific Excel sheet (https://github.com/jtablesaw/tablesaw/pull/683)
    • Read JSON subtree (https://github.com/jtablesaw/tablesaw/pull/684)
    • Read specific HTML table (https://github.com/jtablesaw/tablesaw/pull/682)

    Bug Fixes

    • Only set LayoutBuilder.autosize if necessary (https://github.com/jtablesaw/tablesaw/pull/713)
    Source code(tar.gz)
    Source code(zip)
  • v0.36.0(Sep 29, 2019)

    Breaking changes

    • Table.numberColumn now returns NumericColumn instead of NumberColumn (https://github.com/jtablesaw/tablesaw/pull/669)

    Features

    • Interpolation of missing cells (https://github.com/jtablesaw/tablesaw/pull/664)
    • File encoding detection (https://github.com/jtablesaw/tablesaw/pull/654)
    • stdDev for rolling columns (https://github.com/jtablesaw/tablesaw/pull/666)
    • Column UI widget in BeakerX (https://github.com/jtablesaw/tablesaw/pull/668)
    • Additional replaceColumn method (https://github.com/jtablesaw/tablesaw/pull/673)

    Bug Fixes

    • Fix reading CSV files with space at edge of column name (https://github.com/jtablesaw/tablesaw/pull/659)
    • Fix ignoreLeadingWhitespace (https://github.com/jtablesaw/tablesaw/commit/fb207104725eb20a5038b29e7c8828b754d4f36d)
    • Fix handling of boolean columns in SawWriter (https://github.com/jtablesaw/tablesaw/pull/661)
    Source code(tar.gz)
    Source code(zip)
  • v0.35.0(Sep 3, 2019)

    Deprecations and breaking changes

    • Deprecated data() methods (https://github.com/jtablesaw/tablesaw/pull/649)
    • Renamed isMissingValue to valueIsMissing (https://github.com/jtablesaw/tablesaw/pull/643)
    • Removed mapToType added in last release (https://github.com/jtablesaw/tablesaw/pull/583)

    Features

    • Analytic Query functions (https://github.com/jtablesaw/tablesaw/pull/606 and https://github.com/jtablesaw/tablesaw/pull/621)
    • Deferred execution queries (https://github.com/jtablesaw/tablesaw/pull/574)
    • Saw file format persistence (https://github.com/jtablesaw/tablesaw/pull/642)
    • Column creation from streams (https://github.com/jtablesaw/tablesaw/pull/634)
    • Improved reading from URL (https://github.com/jtablesaw/tablesaw/pull/650)
    • remainder, capitalize, repeat, and concatenate functions (https://github.com/jtablesaw/tablesaw/pull/635)
    • Figure.builder (https://github.com/jtablesaw/tablesaw/pull/608)
    • Option to ignore whitespace in csv writer (https://github.com/jtablesaw/tablesaw/pull/605 - thanks @sd1998)

    Performance

    • Speed up joins (https://github.com/jtablesaw/tablesaw/pull/562)
    • Speed up TextColumn's isIn method (https://github.com/jtablesaw/tablesaw/pull/613)

    Bug fixes

    • Fix NPE when reading incomplete JSON rows (https://github.com/jtablesaw/tablesaw/pull/591)
    • Make empty columns be of type string (https://github.com/jtablesaw/tablesaw/pull/626)
    • Include missing values in unique (https://github.com/jtablesaw/tablesaw/pull/595)
    • Fix conversion of missing values in IntColumn.toDoubleColumn (https://github.com/jtablesaw/tablesaw/issues/577)
    • Fixed splitOn for TextColumn (https://github.com/jtablesaw/tablesaw/issues/554)
    • Handling of null values in SqlResultSetReader (https://github.com/jtablesaw/tablesaw/pull/563)

    Documentation

    • Began compiling code samples in docs (https://github.com/jtablesaw/tablesaw/pull/637, https://github.com/jtablesaw/tablesaw/pull/639, and https://github.com/jtablesaw/tablesaw/pull/641)

    Development

    • Automatically format code (https://github.com/jtablesaw/tablesaw/pull/570 and https://github.com/jtablesaw/tablesaw/pull/568)
    Source code(tar.gz)
    Source code(zip)
  • v0.34.2(Aug 2, 2019)

    Features

    • Add table.stream (https://github.com/jtablesaw/tablesaw/pull/540)
    • Add fillWith(double) (https://github.com/jtablesaw/tablesaw/pull/539)
    • Add mapToType (https://github.com/jtablesaw/tablesaw/pull/545) - Thanks @ryancerf
    • Add appendRow (https://github.com/jtablesaw/tablesaw/commit/6f98623d81d0e57d0cc5e9ab622b518165f7a74d)
    • Subplots (https://github.com/jtablesaw/tablesaw/pull/548) - Thanks @kiamesdavies
    • QQ plots and related improvements
    • plotly events (https://github.com/jtablesaw/tablesaw/pull/512) - Thanks @tmrn411

    Bug Fixes

    • Data export to Smile (https://github.com/jtablesaw/tablesaw/pull/528) - Thanks @kiamesdavies
    • Unit tests on Windows (https://github.com/jtablesaw/tablesaw/pull/546) - Thanks @paulk-asert
    • Calculation of unique values in string columns (https://github.com/jtablesaw/tablesaw/pull/544) - Thanks @ccleva
    • Ensure tests are run (https://github.com/jtablesaw/tablesaw/pull/551) - Thanks @ccleva
    • asObjectArray in numeric columns (https://github.com/jtablesaw/tablesaw/commit/6f9086897b6e85c482f5f4de3bcb71c4ae53295a)
    • Possible exception in toString (https://github.com/jtablesaw/tablesaw/pull/497) - Thanks @hallvard
    Source code(tar.gz)
    Source code(zip)
  • v0.34.1(Jun 17, 2019)

    Features

    • Improved RollingColumn support
    • Option for CSV quote character (https://github.com/jtablesaw/tablesaw/pull/536)
    • New dropRange and inRange methods (#534)
    • Improved NumberPredicates (#532)

    Bug Fixes

    • Fix DoubleColumn.map (https://github.com/jtablesaw/tablesaw/pull/533)
    Source code(tar.gz)
    Source code(zip)
  • v0.34.0(Jun 5, 2019)

    Breaking changes

    • Renamed join to joinOn so that it will work with Groovy (https://github.com/jtablesaw/tablesaw/pull/531)

    Features

    • Added set with predicate method (https://github.com/jtablesaw/tablesaw/pull/530)
    Source code(tar.gz)
    Source code(zip)
  • v0.33.5(Jun 4, 2019)

  • v0.33.4(Jun 4, 2019)

  • v0.33.3(Jun 4, 2019)

  • v0.33.2(Jun 4, 2019)

  • v0.33.1(Jun 4, 2019)

  • v0.33.0(Jun 4, 2019)

    Features

    • Add InstantColumn (https://github.com/jtablesaw/tablesaw/pull/518)
    • More configurable column type detection (https://github.com/jtablesaw/tablesaw/pull/521)
    • Added option for turning off html escaping in html table output
    • Additional date parsing capabilities (https://github.com/jtablesaw/tablesaw/issues/506)

    Fixes

    • Fix for precision of 0 in JdbcResultSet (https://github.com/jtablesaw/tablesaw/pull/523)

    Cleanup

    • Remove circular dependency between reader packages and core package
    • Remove unused epoch conversion methods (https://github.com/jtablesaw/tablesaw/pull/513)
    Source code(tar.gz)
    Source code(zip)
  • v0.32.7(Mar 31, 2019)

    • Implemented maxCharsPerColumn CSV parser setting
    • Fix DateTimeParser issue
    • Switch from reflections to classgraph
    • Updated pebble version in jsplot
    Source code(tar.gz)
    Source code(zip)
  • v0.32.6(Mar 24, 2019)

  • v0.32.5(Mar 24, 2019)

  • v0.32.4(Mar 24, 2019)

  • v0.32.3(Mar 13, 2019)

  • v0.32.2(Mar 13, 2019)

Owner
Tablesaw
Maintainers for the Java Tablesaw application
Tablesaw
A 3D chart library for Java applications (JavaFX, Swing or server-side).

Orson Charts (C)opyright 2013-2020, by Object Refinery Limited. All rights reserved. Version 2.0, 15 March 2020. Overview Orson Charts is a 3D chart l

David Gilbert 96 Sep 27, 2022
XChart is a light-weight Java library for plotting data.

XChart XChart is a light weight Java library for plotting data. Description XChart is a light-weight and convenient library for plotting data designed

Knowm 1.3k Dec 26, 2022
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

GSI CS-CO/ACO 386 Jan 2, 2023
The Next Generation Logic Library

Introduction LogicNG is a Java Library for creating, manipulating and solving Boolean and Pseudo-Boolean formulas. It includes 100% Java implementatio

LogicNG 103 Nov 19, 2022
The foundational library of the Morpheus data science framework

Introduction The Morpheus library is designed to facilitate the development of high performance analytical software involving large datasets for both

Zavtech Systems 226 Dec 20, 2022
modular and modern graph-theory algorithms framework in Java

Erdos is a very light, modular and super easy to use modern Graph theoretic algorithms framework for Java. It contains graph algorithms that you can a

Erdos 111 Aug 14, 2022
Tank - a beginner-friendly, fast, and efficient FTC robot framework

Tank beta a beginner-friendly, fast, and efficient FTC robot framework Overview tank is a FTC robot framework designed to be beginner-friendly, fast,

Aarush Gupta 1 Jan 8, 2022
The Mines Java Toolkit

The Mines Java Toolkit The Mines Java Toolkit (Mines JTK) is a set of Java packages and native (non-Java) software libraries for science and engineeri

Mines Java Toolkit 57 Nov 19, 2022
A JavaFX 3D Visualization and Component Library

FXyz3D FXyz3D Core: FXyz3D Client: FXyz3D Importers: A JavaFX 3D Visualization and Component Library How to build The project is managed by gradle. To

null 16 Aug 23, 2020
Flow Visualization Library for JavaFX and VRL-Studio

VWorkflows Interactive flow/graph visualization for building domain specific visual programming environments. Provides UI bindings for JavaFX. See htt

Michael Hoffer 274 Dec 29, 2022
A Java Visualization Library based on Apache ECharts.

ECharts Java "We bring better visualization into Java with ECharts" ?? Introduction ECharts Java is a lightweight but comprehensive library for Java d

ECharts Java Open Source Project 171 Dec 31, 2022
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

Dex Dex : The data explorer is a powerful tool for data science. It is written in Groovy and Java on top of JavaFX and offers the ability to: Read in

Patrick Martin 1.3k Jan 8, 2023
A platform for visualization and real-time monitoring of data workflows

Status This project is no longer maintained. Ambrose Twitter Ambrose is a platform for visualization and real-time monitoring of MapReduce data workfl

Twitter 1.2k Dec 31, 2022
IoT Platform, Device management, data collection, processing and visualization, multi protocol, rule engine, netty mqtt client

GIoT GIoT: GIoT是一个开源的IoT平台,支持设备管理、物模型,产品、设备管理、规则引擎、多种存储、多sink、多协议(http、mqtt、tcp,自定义协议)、多租户管理等等,提供插件化开发 Documentation Quick Start Module -> giot-starte

gerry 34 Sep 13, 2022
DataCap is integrated software for data transformation, integration and visualization.

DataCap (incubator) DataCap is integrated software for data transformation, integration and visualization. Require Must-read for users: Be sure to exe

EdurtIO 184 Dec 28, 2022
This is an open source visualization for the C4 model for visualising software architecture.

c4viz: C4 Visualization This is an open source visualization for the C4 model for visualising software architecture. It expects input in the form of a

Peter Valdemar Mørch 40 Dec 6, 2022
Pipeline for Visualization of Streaming Data

Seminararbeit zum Thema Visualisierung von Datenströmen Diese Arbeit entstand als Seminararbeit im Rahmen der Veranstaltung Event Processing an der Ho

Domenic Cassisi 1 Feb 13, 2022
Duck Library is a library for developers who don't want to spend their time to write same library consistently.

Duck Library is a library for developers who don't want to spend their time to write same library consistently. It has almost every useful feature to

null 5 Jul 28, 2022
Inria 1.4k Dec 29, 2022
Tinker is a hot-fix solution library for Android, it supports dex, library and resources update without reinstall apk.

Tinker Tinker is a hot-fix solution library for Android, it supports dex, library and resources update without reinstalling apk. Getting started Add t

Tencent 16.6k Dec 30, 2022