Java dataframe and visualization library

Last update: Jan 7, 2023

Overview

Tablesaw

Overview

Tablesaw is Java for data science. It includes a dataframe and a visualization library, as well as utilities for loading, transforming, filtering, and summarizing data. It's fast and careful with memory. If you work with data in Java, it may save you time and effort. Tablesaw also supports descriptive statistics and integrates well with the Smile machine learning library.

Tablesaw features

Data processing & transformation

Import data from RDBMS, Excel, CSV, JSON, HTML, or Fixed Width text files, whether they are local or remote (http, S3, etc.)
Export data to CSV, JSON, HTML or Fixed Width files.
Combine tables by appending or joining
Add and remove columns or rows
Sort, Group, Query
Map/Reduce operations
Handle missing values

Visualization

Tablesaw supports data visualization by providing a wrapper for the Plot.ly JavaScript plotting library. Here are a few examples of the new library in action.

Statistics

Descriptive stats: mean, min, max, median, sum, product, standard deviation, variance, percentiles, geometric mean, skewness, kurtosis, etc.

Getting started

Add tablesaw-core to your project. You can find the version number for the latest release in the release notes:

<dependency>
    <groupId>tech.tablesaw</groupId>
    <artifactId>tablesaw-core</artifactId>
    <version>VERSION_NUMBER_GOES_HERE</version>
</dependency>

You may also add supporting projects:

tablesaw-beakerx - for using Tablesaw inside BeakerX
tablesaw-excel - for using Excel workbooks
tablesaw-html - for using HTML
tablesaw-json - for using JSON
tablesaw-jsplot - for creating charts

Documentation and support

Start here: https://jtablesaw.github.io/tablesaw/gettingstarted
Then see our documentation page: https://jtablesaw.github.io/tablesaw/ and the Tablesaw User Guide.

And always feel free to ask questions or make suggestions here on the issues tab.

Integrations

We recommend trying Tablesaw inside Jupyter notebooks, which lets you experiment with Tablesaw in a more interactive manner. Get started by installing BeakerX and trying the sample Tablesaw notebook
You may utilize Tablesaw with many machine learning libraries. To see an example of using Tablesaw with Smile check out the sample Tablesaw Jupyter notebook
You may use quandl4j-tablesaw if you'd like to load financial and economic data from Quandl into Tablesaw. This is demonstrated in the sample Tablesaw notebook as well

Comments

Implement transpose #696
Thanks for contributing.

[ x] Tick to sign-off your agreement to the Developer Certificate of Origin (DCO) 1.1

Description

Added an implementation of Transpose. It has the restriction that columns must be of the same type. This is an initial version for feedback

Testing

Added a unit test for the feature
opened by jackie-h 31
Column-wise DataFrame-like operations

Hi, I am new to Tablesaw. I am exploring options for recreating some Pyhton DataFrame operations in Tablesaw. For example, I have a DataFrame object called data1 and I use existing columns of this data frame to create new ones (and update existing ones). Here is a couple of lines of Pyhton codes:

data1['days'] = data1['buyDate'].apply(lambda x: (today - x).days) ... data1['CAGR'] = ((data1['curValue'] / data1['bookValue']) ** (1.0 / data1['nYears']) - 1.0) * 100

Is there a way to implement something similar using Tablesaw?

Thanks a lot in advance.
enhancement core

opened by imfaisalmb 31
Readonly data: Any equivalent in tablesaw to pandas' view vs copy?

I've finally got my use case using tablesaw to an initial build and tried a few runs, and it is rather slow compared to a python version I wrote using pandas previously. I migrated to Java to get better concurrency in the hope of making it go faster.

Profiling it I see why -- it is spending 75% of its time in tech.tablesaw.table.Rows.copy(). This is because of some Table.where() calls that are designed to filter down the input data.

Briefly, the background is my input data is price history for certain financial assets. Having established a price at a certain time (a simulated trade entry point) I then want to see if the market went up or down by a certain amount from that point. I am currently doing so using a filter like:

useData.where(useData.numberColumn("High").isGreaterThanOrEqualTo(target).or(useData.numberColumn("Low").isLessThanOrEqualTo(stop)))

(method names from memory so may have slight inaccuracies)

That gives me a filtered table, and then selecting the first row from this gives me the first instance that fulfilled the criteria. I'm only ever interested in the first row matching the criteria but can't think of a way to get tablesaw to stop once that row is found, so instead get the entire table and take the first row (or rather first value of each column) as needed. This kind of approach is applied repeatedly to further filter price data depending on what is found at each stage. The result is a lot of calls to Table.where().

As I say the code ends up spending a lot of time copying rows, presumably from the original data to the return value of the where() method. Is there a way to get tablesaw to take a "view" approach as pandas would in this situation, that is not actually copy the data but simply copy references to the data, which would be faster? This obviously comes along with issues if one later tries to modify the filtered result represented by the view, because it would be impossible to do so without also modifying the original; pandas handles that by making attempts to alter a view an error, forcing the user to use an operation that would force a copy when that is what they want to do. For my particular use case, since the data is only read (in this particular case I don't even summarise it) I don't mind not being able to modify.

Does tablesaw have anything analogous to this view concept from pandas? Again, the goal is to avoid needlessly copying a lot of data.

opened by mark27q1 29
Add support to join tables on multiple columns.
Thanks for contributing.

[x] Tick to sign-off your agreement to the Developer Certificate of Origin (DCO) 1.1

Description

Enhanced API to accept multiple names of columns to join on. Supported for all joins: inner, leftOuter, rightOuter, and fullOuter. Added javadoc where missing for public methods.

Testing

Added dozens of junit tests to cover all changed API. Ran coverage tool in eclipse to ensure testing/exercising of all changes was being done. Reached 100% coverage for all but one method. Goal to reach 100% coverage exposed failure of LONG types to be parse from cvs input so added missing support for that.
opened by gregorco 22
toString() should be more fault tolerant

I'm debugging some code where columns in a table can be of different sizes. To help diagnose the problem I'm printing the table. However, Relation's toString() method, that uses a DataFramePrinter, throws an IndexOutOfBoundsException because the frame.size() is based on the first column's size, while others are shorter. A toString-method should be very careful to not throw exceptions, since it's typically used during debugging. I suggest making DataFramePrinter more fault tolerant when fetching column values, e.g. in line 192: data[i][j] = frame.getString(i, j);

opened by hallvard 21
Circular dependencies in Columns/ColumnTypes can lead to unpredictable behavior
A null value is sometimes returned by a ColumnType constant. Whether or not the value is null depends on the order in which other code is executed.

To recreate, I built an array of ColumnTypes and printed the array. Sometimes the values are all initialized correctly, sometimes not:

Correct:

[STRING, LOCAL_DATE_TIME, INTEGER]

Incorrect:

[STRING, null, INTEGER]

The different results can be had by adding or removing a line of code just before printing. The correct result is printed when the line is removed.

Here is the main class with the offending line included:

class DummyClass { public static void main(String[] args) { long missing = DateTimeColumn.MISSING_VALUE; new DummyPrinter().printColumnTypes(); } }

When I comment out line 1 in main, the code works as expected. Also, the code that implements printing must be in a separate class for the error to occur. Here's that class:

class DummyPrinter { void printColumnTypes() { System.out.println(Arrays.toString(columnTypes)); } private static final ColumnType[] columnTypes = { STRING, LOCAL_DATE_TIME, INTEGER, }; }

The array itself is a literal constant. The values inserted into the array are also constants. They are declared in the ColumnType interface in the line shown below:

DateTimeColumnType LOCAL_DATE_TIME = DateTimeColumnType.INSTANCE;

In that line, the values are provided by a constant (INSTANCE) declared in DateTimeColumnType.

Here is the code from that class where INSTANCE is created:

public static final DateTimeColumnType INSTANCE = new DateTimeColumnType(BYTE_SIZE, "LOCAL_DATE_TIME", "DateTime");

The line of code that turns the error on and off causes the class DateTimeColumnType to load in a different order when it's present than it does when absent.
opened by lwhite1 21
Implement a pie chart

Using xChart or JavaFX Charts create an interface that enables easy rendering of the plot from a tablesaw table. See the implementation of Bar Plots (which use JavaFX) for a very similar example.

https://github.com/jtablesaw/tablesaw/blob/master/plot/src/main/java/tech/tablesaw/api/plot/Bar.java
enhancement help wanted

opened by lwhite1 19
Added support for reading and writing Apache ORC file format
Thanks for contributing.

[x] Tick to sign-off your agreement to the Developer Certificate of Origin (DCO) 1.1

Description

Added support for reading and writing Apache ORC file format Fixes #620

Testing

Unit Test cases added
opened by murtuza-ranapur 17
Major problems trying out plots; everything breaks.
Maybe I'm looking in the wrong place, but it doesn't seem like there is sufficient information for a newcomer to get plotting working—and this is the main reason I'm looking at this library.

First of all, a suggestion: the page https://jtablesaw.github.io/tablesaw/userguide/introduction says nothing at all about the dependency needed for plotting. One has to know to look back at https://github.com/jtablesaw/tablesaw to figure this out. But this is a minor issue.

Much bigger is that the first code at https://jtablesaw.github.io/tablesaw/userguide/Introduction_to_Plotting does not work. Not even close. It won't even compile. It doesn't even have balanced parentheses. Fixing the parentheses still won't get it to compile, and certainly won't show the plot that was promised.

Figure fig = BubblePlot.create("Average retail price for champagnes by year and rating", champagne, // table name "highest pro score", // x variable column name "year", // y variable column name "Mean Retail" // bubble size ));

So I skip the "introduction" page and go straight to the "real" code at https://jtablesaw.github.io/tablesaw/userguide/BarsAndPies .

First we load the Tornado dataset:

Table tornadoes = Table.read().csv("Tornadoes.csv");

What is "the Tornado dataset" and where can I find it? An earlier paragraph mentioned, "we’ll use a Tornado dataset from NOAA", so I went to the NOAA site and downloaded the most promising looking data file named 1950-2017_all_tornadoes.csv. I tried loading that in Tablesaw:

… Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 10000 out of bounds for length 10000 at com.univocity.parsers.common.ParserOutput.valueParsed(ParserOutput.java:327) at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:176) at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:560) ... 7 more

(So you are using the uniVocity parsers. I'm familiar with them. But they aren't so "plug-and-play" as they make them out to be, as you can see here.)

I searched the web for "Tornadoes.csv". One unrelated site implied that 2018_torn_prelim.csv might be more promising, but that didn't work. https://gist.github.com/darrenjaworski/5874227 mentioned a tornadoes.csv, and it was some Google spreadsheet, which I downloaded to a CSV file, but that gave me:

Exception in thread "main" java.lang.IllegalArgumentException: Cannot add column with duplicate name Short column: state number to table tornadoes.csv at tech.tablesaw.api.Table.validateColumn(Table.java:161) at tech.tablesaw.api.Table.addColumns(Table.java:144) at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:147) at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62) at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58) at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:34) …

I finally realized you that this GitHub repository has some "tornado" CSV files. I downloaded and renamed tornadoes 1950-2014.csv, but no go:

Exception in thread "main" tech.tablesaw.io.csv.AddCellToColumnException: Error while adding cell from row 41658 and column Crop Loss(position:10): Error adding value to column Crop Loss: For input string: "0.4" at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:244) at tech.tablesaw.io.csv.CsvReader.read(CsvReader.java:156) at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:62) at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:58) at tech.tablesaw.io.DataFrameReader.csv(DataFrameReader.java:34) … Caused by: java.lang.NumberFormatException: Error adding value to column Crop Loss: For input string: "0.4" at tech.tablesaw.api.ShortColumn.appendCell(ShortColumn.java:353) at tech.tablesaw.api.ShortColumn.appendCell(ShortColumn.java:27) at tech.tablesaw.io.csv.CsvReader.addRows(CsvReader.java:242) ... 5 more

This is frustrating. This is what a new user is faced with.

If this turns out to be a good library, believe me I'll pitch in and help with the documentation and probably even the code. But I'm stuck just in the first lines! Could someone help me?
opened by garretwilson 17
How to set data type to whole Table.
Hello tablesaw team. I've question regarding setup of data type for the Table if I don't know the amount of colums. I have 30_000x2000 feature csv file with 0.0 and some other amount of Double numbers. If I call csv parsing via:

CsvReadOptions options = CsvReadOptions.builder(csv) .header(false) .maxNumberOfColumns(50_000).build(); Table t = Table.read().csv(options);

I got Number format exception, as all 0.0 number are treated as Short 0. So when reader gets to real numbers like 13.5 if throws NFE.

But if I add sample(false) to reader options if takes about 2:40 to parse such file.

How can I setup data type for whole Table, as far as I can see only by setting columnType in parser option, but it's won't work as I don't know a number of columns on csv file?

P.S. I used com.univocity.parsers.csv.CsvParser separately to read the same file so it takes 2:20 for parser.parsAll and 1:20 for parsing file by row.
opened by Ebalaitung 17
Joins are too slow
Hi guys! I'm trying to migrate from python+pandas to kotlin+tablesaw. Some parts of my code are already working fast (like csv parsing, x2 times faster than in pandas) But also i've noticed that inner join operation is pretty slow (~ 2 times slower than in pandas)

Then i tried to optimize my code and use isIn Selection instead of simple join. Unfortunately it uses strings.toArray(new String[0]) under the hood for input parameter collection. It would be more sense to use HashSet to quicker lookup. So i wrote my own predicate:

val customersSet = customers.toHashSet() // for faster lookup val idColumn = transactions.stringColumn("customer_id") idColumn.eval { it in customersSet }

Which is x15 times faster than original inner join. At least on my huge dataset. Of course this is much simpler than join operation since i haven't appended columns etc. But still the difference is huge. I didn't investigate joins code yet, but i hope there is space for improvements there. My key point is: kotlin+jvm should be at least not slower than python+pandas What do you guys think?

ps: do you use hash indexing on table columns?
opened by mykola-dev 17
tablename.getstring(rowno,columnname) trim spaces
The below one is my table,

RowNo | RCDTYP | SNDARLIN | RCVARLIN | FILTYP | DELSEQNUM | CRTDAT | FILR1 |

1 | 0 | AV | IB | HANDBACK | 000002 | 20220426 | |

tablename.getstring(rowno,columnname)

takes "IB" only, i want it as " IB"

can somebody please help?
opened by Devika123456788999 3

Add possibility to skip column type from SQL ResultSet

It would be useful to be able to use the following: SqlResultSetReader.mapJdbcTypeToColumnType(java.sql.Types.BINARY, ColumnType.SKIP)

Unfortunately, currently, this results in

java.lang.UnsupportedOperationException: Column type SKIP doesn't support column creation
        at tech.tablesaw.columns.SkipColumnType.create(SkipColumnType.java:29)
        at tech.tablesaw.io.jdbc.SqlResultSetReader.read(SqlResultSetReader.java:105)
        at tech.tablesaw.io.DataFrameReader.db(DataFrameReader.java:160)```

Alternatively, it would be useful to be able to skip a column by name for the SqlResultSetReader, but i found no such options. It seams the ReadOptions don't support reading from SQL currently.

opened by mbs-janbe 0

stringColumn转换成doubleColunm后，值不对
hi~,您好。tablesaw 非常的棒，但是在使用tablesaw过程中遇见了一些问题，希望能够给予帮助。 问题描述： 因为doubleColunm不能splitOn，所以我将doubleColunm as stringColumn后进行了分组切片，最后将stringColumn还原成doubleColumn的时候，值却出现了问题。 code ：

@Test public void test000() { DoubleColumn doubleColumn = DoubleColumn.create("year", 2022.0, 2022.0); StringColumn stringColumn = doubleColumn.asStringColumn(); System.out.println(stringColumn.print()); DoubleColumn asDoubleColumn = stringColumn.asDoubleColumn(); System.out.println(asDoubleColumn.print()); }

输出： Column: year strings 2022.0 2022.0

Column: year strings 0 0

再次表示感谢🙏
bug
opened by PanYangyi 0
Automatic parse issue in jsaw table

final Table tbl2= Table.read() .csv( Joiner.on(System.lineSeparator()) .join("abc,cds,eee", "005,001,003" ), "Table2");

content1 and content 2 contains two comma seperated lists, i want to treat every columns as string values eg : if content is 005 , I want to add as 005 not 5. Is there any additional method to treat as string?

opened by Devika123456788999 0
Improvement suggestion: Table.print() function should also include the shape in the output

Hi

After executing any kind of table manipulation, eg add columns/rows, where clauses, you almost always want to see what the table looks like via the print/toString methods. Unfortunately it doesn't also display the shape and so most of these calls always has a following call to shape.

The shape is important when you have a larger dataset and you can see the changes in the number of rows/columns

ps Pandas equivalent function prints the shape when displaying table contents

opened by minhster99 0

Table.summary() missing output is misleading

I read in a file with the following structure

 Index  |   Column Name    |  Column Type  |
--------------------------------------------
     0  |              id  |      INTEGER  |
     1  |            date  |   LOCAL_DATE  |
     2  |            time  |       STRING  |
     3  |    country_name  |       STRING  |
     4  |  state/province  |       STRING  |
     5  |      population  |      INTEGER  |
     6  |  landslide_type  |       STRING  |
     7  |         trigger  |       STRING  |
     8  |      fatalities  |      INTEGER  |

when I do a summary, I get the following

  Summary   |         id          |     date     |  time  |  country_name   |  state/province  |      population      |  landslide_type  |  trigger   |  fatalities  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Count  |               1693  |        1693  |  1693  |           1693  |            1693  |                1693  |            1693  |      1693  |        1693  |
       sum  |            7017532  |              |        |                 |                  |           158226757  |                  |            |              |
      Mean  |  4145.027761370351  |              |        |                 |                  |   93459.39574719437  |                  |            |              |
       Min  |                 34  |              |        |                 |                  |                   0  |                  |            |           0  |
       Max  |               7541  |              |        |                 |                  |            12294193  |                  |            |         280  |
     Range  |               7507  |              |        |                 |                  |            12294193  |                  |            |         280  |
  Variance  |  5003014.595564535  |              |        |                 |                  |  273112413878.66046  |                  |            |              |
  Std. Dev  |  2236.741959986564  |              |        |                 |                  |  522601.58235376637  |                  |            |              |
   Missing  |                     |           3  |        |                 |                  |                      |                  |            |              |
  Earliest  |                     |  2007-03-02  |        |                 |                  |                      |                  |            |              |
    Latest  |                     |  2016-03-02  |        |                 |                  |                      |                  |            |              |
    Unique  |                     |              |   159  |             28  |             227  |                      |              15  |        17  |              |
       Top  |                     |              |        |  United States  |        Kentucky  |                      |       Landslide  |  Downpour  |              |
 Top Freq.  |                     |              |  1065  |            986  |             124  |                      |             866  |       866  |              |

The missing value is only shown for date and it does work, I can verify there were 3 missing date values. However I also can see missing values for fatalities but it does not appear here. If I do the following

table.intColumn("fatalities").isMissing().size()  // returns > 0

I suspect that missing is not implemented on the summary() call for IntColumn types because if I simply select that single column and do a summary() on it, I get the following with no missing statistic

Column: fatalities  
 Measure   |  Value  |
----------------------
    Count  |   1690  |
      sum  |         |
     Mean  |         |
      Min  |      0  |
      Max  |    280  |
    Range  |    280  |
 Variance  |         |
 Std. Dev  |         |

Could we get that statistic filled in and for those statistics that aren't supported by the column type, could we add something like 'N/A' so that it is clear? Thanks.

ps this is a great lib. Really appreciate what you've done here!

opened by minhster99 0

Releases(v0.43.1)

v0.43.1(Apr 3, 2022)
This is a very minor release as a prelude to branching for post-java-8 development.

What's Changed

Update pom.xml to upgrade apache poi-ooxml by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1085

fixed remaining javadoc warnings by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1086

Full Changelog: https://github.com/jtablesaw/tablesaw/compare/v0.43.0...v0.43.1
Source code(tar.gz)
Source code(zip)
v0.43.0(Mar 30, 2022)
What's Changed

Security vulnerabilities addressed

Bump h2 from 1.4.200 to 2.1.210 in /core by @dependabot in https://github.com/jtablesaw/tablesaw/pull/1045

Bump Jackson version by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1070

Bug fixes

fix uncaught exception in DoubleParser.canParse. (#1043) by @is in https://github.com/jtablesaw/tablesaw/pull/1044

fix for issue #1047 : setPrintFormatter causes NPE, plus test by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1072

ensured that all copy() and emptyCopy() implementations copy print fo… by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1073

Performance-Related Enhancements

replaced the implementation of Table method dropDuplicateRows() with … by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1058

reintroduce parallel sorting of table indices where it is safe to do so by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1065

eliminated the auto boxing of table values and row numbers by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1066

Other Enhancements

improved error message; some automated code simplification by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1019

Cleanup on aggregate functions by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1020

Wrap io exception on reads by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1021

added set(Selection, byte) to BooleanColumn, plus Doc cleanup by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1027

made saw tests run faster, no loss of coverage by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1029

Removed IOException from write interfaces by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1030

Provides better error message on column type detection index out of b… by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1032

Simplify adding new column to table by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1034

upgrade roaring bitmaps to 0.9.25 by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1063

Add displayLogo option to jsplot Config by @gbouquet in https://github.com/jtablesaw/tablesaw/pull/1080

update the snapshot version as this was apparently done incorrectly e… by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1082

Documentation Enhancements

Update moneyball tutorial by @jbsooter in https://github.com/jtablesaw/tablesaw/pull/1052

Java doc2 by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1018

Update documentation readme to include all project javadoc links by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1022

Complete Javadoc for the Table package by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1024

Javadoc interpolation by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1025

Update README.md by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1077

Update README.md again by @lwhite1 in https://github.com/jtablesaw/tablesaw/pull/1079

New Contributors

@jbsooter made their first contribution in https://github.com/jtablesaw/tablesaw/pull/1052

@is made their first contribution in https://github.com/jtablesaw/tablesaw/pull/1044

@gbouquet made their first contribution in https://github.com/jtablesaw/tablesaw/pull/1080

Full Changelog: https://github.com/jtablesaw/tablesaw/compare/v0.41.0...v0.43.0
Source code(tar.gz)
Source code(zip)
v0.42.0(Oct 23, 2021)
Documentation:

Many JavaDoc additions and extensions.

Update documentation readme to include all project javadoc links (#1022)

Enhancements

Wrap IOException for file reads (#1021). IOException is caught and re-thrown wrapped in a runtime exception RuntimeIOException. For interactive work, this greatly reduces the number of exceptions that need to be caught. Writes will handled in the next release.

Cleanup on aggregate function names (#1020). Abstract AggregateFunctions were given a consistent name structures. All class names now have the form: [columnType][returnType]AggregateFunction (e.g. BooleanIntAggregateFunction). If the column and return type are the same, it is not repeated (e.g. StringAggregateFunction).

Some aggregate function classes were made public so library users can subclass

A few methods were added to Column subclasses. Notably, an asSet() method was added to columns where it was not already present.

improved error message for Column append methods (#1019)

Source code(tar.gz)
Source code(zip)
v0.41.0(Oct 17, 2021)
This is a documentation only release focused on improving JavaDoc coverage.

Documentation

The following are now fully documented for public methods.

In package tech.tablesaw.tables

Relation

In package tech.tablesaw.api

Table

Row

ColumnType

In package tech.tablesaw.columns and sub-packages

Column

AbstractColumn

AbstractStringColumn

SkipColumnType

All classes and interfaces in the following packages:

tech.tablesaw.indexing

tech.tablesaw.selection

tech.tablesaw.joining

tech.tablesaw.aggregation

tech.tablesaw.sorting (and comparator subpackage)

Source code(tar.gz)
Source code(zip)
v0.40.0(Oct 17, 2021)
This release focused on minor enhancements that eliminate gaps in functionality.

Note that the change to method Table:shape() modified the String that is returned, changing the functionality of the method slightly.

Enhancements

@lwhite1 Minor extensions (#999) Added methods:

DoubleColumn:asDoubleArray()

FloatColumn:asFloatArray()

IntColumn:asIntArray()

ShortColumn:asShortArray()

StringFilters:isIn()

StringFilters:iNotIn()

IntColumn:isNotIn()

Other enhancement:

Made Table:append() accept any Relation as its argument, not just another table.

Made Table:removeColumns() return Table rather than Relation (#1003) …

Added method Date:isNotEqualTo(LocalDate) (#1004) …

Made shape() return the name of the table, along with the shape (#1005)

Standardized names for methods, added missing methods (#1010)

Deprecated addRow(Row) and added appendRow(Row) to make the name more consistent with append(Table).

Added methods selectColumns() and rejectColumns() to provide variations to removeColumns() and retainColumns() that return new tables rather than modify the table in-place.

Added method Relation:containsColumn(String name);

other minor enhancements to code and documentation

Made Table:countBy() take varargs so the counts can group on more than one column
Source code(tar.gz)
Source code(zip)
v0.38.5(Sep 7, 2021)

Small release with one important bug fix. There is also a documentation enhancement.

Bug fixes

@lwhite1 SliceGroup TextColumn handling revision (#990). Fixes issue where splitting a large file on a TextColumn (as when using groups in aggregations) could cause a major increase in memory.

Enhancements

@dependabot Bump jsoup from 1.12.1 to 1.14.2 in /html (#977)

@lwhite1 Allow TextColumn to append StringColumns, and vice-versa (#983)

@lwhite1 made all data fields protected (#991)

Documentation

@lwhite1 Update gettingstarted.md
Source code(tar.gz)
Source code(zip)
v0.38.4(Aug 21, 2021)

This is a relatively small release with a few nice enhancements and several bug fixes. There is also a documentation enhancement publicizing @ccleva's Parquet integration project.

Bug fixes

Fix bug where missing values in numeric columns could not be formatted. This enables arbitrary missing value indicators (e.g. "N/A") to be used in printing and saving to files. @lwhite1

Replace parallelQuickSort with mergeSort (#968), to avoid incorrect sorting caused by race conditions when a custom sort object is used. @lwhite1

fix issue #963 (#967) Relation.structure() fails for TableSlice with ClassCastException @lwhite1

Enhancements

Aggregate by non-primitive column type that extends Number (#973), making it possible to add a column type for BigDecimal @daugasauron @kallur

plotly - added range slider to Axis (#953) … @smpawlowski

To support annotation in plot.ly javascript. (#944) … @xcjusuih

Documentation

Added link to the tablesaw-parquet project in README (#966) @ccleva
Source code(tar.gz)
Source code(zip)
v0.38.3(Jul 22, 2021)
Features

Improved Printformatting (#914)

Improve innerJoin performance (#903) - Thanks, @DanielMao1

Allow reading of malformed CSV (#901) - Thanks, @ChangzhenZhang

Fixes #822 and #815 providing more extensive columntype options - Thanks, @lujop

Support multiple custom missing value options in Readers

Allow default parsing to be overridden per column. (#928) - Thanks, @jglogan

Add contour plot (#938) - Thanks, @ArslanaWu

Add support for violin plot (#936) - Thanks, @LUUUAN

Add options for keep join keys in both table when appling outer join - Thanks, @DanielMao1

Assign FixedWidthWriterSettings.fieldLengths (#943) - Thanks, @Kerwinooooo

Support percentage by parsing percentage to double (#906) - Thanks, @Kerwinooooo

Open default local browser on an arbitrary HTML page #860 (#949)

Bug Fixes

Fix bug of leftOuter join when using multi-tables (#905) Thanks @Carl-Rabbit

fix bug in appendCell that caused custom parser to be ignored (#912)

Corrected surefire plugin argLine (#915) Thanks @ccleva

Fix CI on Windows (#904) @lujop

Fix implementation of append(String) in TextColumn (#921)

Fix rightOuter join on multiple tables (#922) Thanks (again) @Carl-Rabbit

Fix XlsxReader doesn't respect calculated tableArea for header column names #887 (#926) - Thanks @lujop

Remove print statements in tests writing to system.out (#933)

Fix Column Type detection #751 and Integer handling in XlsxReader #882 (#931) - Thanks, @lujop

Fix(table): method 'where' apply 2 times selection function (#962) - Thanks, @zhiwen95

Support for not closing the output stream opened by user (#941) Thanks, @Kerwinooooo, @ChangzhenZhang

Documentation

Update README.md (#917)

Misc

Bump guava from 28.2-jre to 29.0-jre in /core (#895)

Bump guava version again for security improvements (#932)

Source code(tar.gz)
Source code(zip)
v0.38.1(May 9, 2020)
Features

More options for creating a bubble plot (https://github.com/jtablesaw/tablesaw/pull/781) - thanks @rayeaster

Bug Fixes

Fix support for java.sql.Time (https://github.com/jtablesaw/tablesaw/pull/791) - thanks @brainbytes42

Allow empty slices when aggregating (https://github.com/jtablesaw/tablesaw/pull/795) - thanks @emillynge

Fix NPE in ColumnType.compare (https://github.com/jtablesaw/tablesaw/pull/799)

Fix NPE in set (https://github.com/jtablesaw/tablesaw/pull/800)

Source code(tar.gz)
Source code(zip)
v0.38.0(Apr 13, 2020)
Features

ignoreZeroDecimal option when reading data (https://github.com/jtablesaw/tablesaw/pull/748) - Thanks @larshelge

indexOf method (https://github.com/jtablesaw/tablesaw/pull/787) - Thanks @islaterm

Ability to add quotes to CSV even if not strictly required (https://github.com/jtablesaw/tablesaw/pull/767)

Ability to set layout and config for plots (https://github.com/jtablesaw/tablesaw/pull/690)

Pie chart subplots (https://github.com/jtablesaw/tablesaw/pull/777)

Plotting of Instant data (https://github.com/jtablesaw/tablesaw/pull/765)

Include sheet name when reading from Excel (https://github.com/jtablesaw/tablesaw/pull/758) - Thanks @R1j1t

Bug Fixes

Joining an empty table (https://github.com/jtablesaw/tablesaw/pull/783) - Thanks @vanderzee-anl-gov

Use same options for reading and writing a CSV by default (https://github.com/jtablesaw/tablesaw/pull/772)

Reading of binary data from database

Make DoubleColumn.create work on wider range of input

Fix column sorting (https://github.com/jtablesaw/tablesaw/pull/778)

Fixed equals method on BooleanColumn (https://github.com/jtablesaw/tablesaw/pull/766)

Fixed 3D scatter plot (https://github.com/jtablesaw/tablesaw/pull/764)

Fixed BoxBuilder (https://github.com/jtablesaw/tablesaw/pull/763)

Make Component.engine non-static (https://github.com/jtablesaw/tablesaw/pull/762)

Fixed shaded jar

Improved handling of missing values when calling get on a column

Documentation

Fix broken link to data import docs (https://github.com/jtablesaw/tablesaw/pull/773) - Thanks @bantu

Add docs for reading from Excel (https://github.com/jtablesaw/tablesaw/pull/759) - Thanks @R1j1t

Fixed CSV reading docs (https://github.com/jtablesaw/tablesaw/commit/6fc6a4d013e4cb92b5a8dc13d2d5e2fc62ec1460) - Thanks @salticus

Source code(tar.gz)
Source code(zip)
v0.37.2(Jan 24, 2020)
Add cumMin and cumMax

Source code(tar.gz)
Source code(zip)
v0.37.1(Jan 24, 2020)
Breaking Changes

Table.summary now returns a Table instead of a String - Thanks @jackie-h

Features

Table transpose https://github.com/jtablesaw/tablesaw/commit/1b01eaf5c94c8a51d09be7fe2c080a78dc9a03e1 - Thanks @jackie-h

Added ability to sample rows while reading a CSV - Thanks @aecio

Additional Column and Table create methods

Cleanup

Fixed a bunch of SonarCloud warnings

Improved exception message for duplicate Table columns

Validation for Table joins

Source code(tar.gz)
Source code(zip)
v0.37.0(Jan 8, 2020)
Features

Upgraded to Smile 2.0 (https://github.com/jtablesaw/tablesaw/pull/735)

Autocorrelation (https://github.com/jtablesaw/tablesaw/pull/726)

InstantColumn min and max (https://github.com/jtablesaw/tablesaw/pull/719)

Enhancements to histogram (https://github.com/jtablesaw/tablesaw/pull/700)

New Column.map method (https://github.com/jtablesaw/tablesaw/pull/705)

Expose two FileReader methods (https://github.com/jtablesaw/tablesaw/pull/701)

New Plotly config argument (https://github.com/jtablesaw/tablesaw/pull/691)

Read specific Excel sheet (https://github.com/jtablesaw/tablesaw/pull/683)

Read JSON subtree (https://github.com/jtablesaw/tablesaw/pull/684)

Read specific HTML table (https://github.com/jtablesaw/tablesaw/pull/682)

Bug Fixes

Only set LayoutBuilder.autosize if necessary (https://github.com/jtablesaw/tablesaw/pull/713)

Source code(tar.gz)
Source code(zip)
v0.36.0(Sep 29, 2019)
Breaking changes

Table.numberColumn now returns NumericColumn instead of NumberColumn (https://github.com/jtablesaw/tablesaw/pull/669)

Features

Interpolation of missing cells (https://github.com/jtablesaw/tablesaw/pull/664)

File encoding detection (https://github.com/jtablesaw/tablesaw/pull/654)

stdDev for rolling columns (https://github.com/jtablesaw/tablesaw/pull/666)

Column UI widget in BeakerX (https://github.com/jtablesaw/tablesaw/pull/668)

Additional replaceColumn method (https://github.com/jtablesaw/tablesaw/pull/673)

Bug Fixes

Fix reading CSV files with space at edge of column name (https://github.com/jtablesaw/tablesaw/pull/659)

Fix ignoreLeadingWhitespace (https://github.com/jtablesaw/tablesaw/commit/fb207104725eb20a5038b29e7c8828b754d4f36d)

Fix handling of boolean columns in SawWriter (https://github.com/jtablesaw/tablesaw/pull/661)

Source code(tar.gz)
Source code(zip)
v0.35.0(Sep 3, 2019)
Deprecations and breaking changes

Deprecated data() methods (https://github.com/jtablesaw/tablesaw/pull/649)

Renamed isMissingValue to valueIsMissing (https://github.com/jtablesaw/tablesaw/pull/643)

Removed mapToType added in last release (https://github.com/jtablesaw/tablesaw/pull/583)

Features

Analytic Query functions (https://github.com/jtablesaw/tablesaw/pull/606 and https://github.com/jtablesaw/tablesaw/pull/621)

Deferred execution queries (https://github.com/jtablesaw/tablesaw/pull/574)

Saw file format persistence (https://github.com/jtablesaw/tablesaw/pull/642)

Column creation from streams (https://github.com/jtablesaw/tablesaw/pull/634)

Improved reading from URL (https://github.com/jtablesaw/tablesaw/pull/650)

remainder, capitalize, repeat, and concatenate functions (https://github.com/jtablesaw/tablesaw/pull/635)

Figure.builder (https://github.com/jtablesaw/tablesaw/pull/608)

Option to ignore whitespace in csv writer (https://github.com/jtablesaw/tablesaw/pull/605 - thanks @sd1998)

Performance

Speed up joins (https://github.com/jtablesaw/tablesaw/pull/562)

Speed up TextColumn's isIn method (https://github.com/jtablesaw/tablesaw/pull/613)

Bug fixes

Fix NPE when reading incomplete JSON rows (https://github.com/jtablesaw/tablesaw/pull/591)

Make empty columns be of type string (https://github.com/jtablesaw/tablesaw/pull/626)

Include missing values in unique (https://github.com/jtablesaw/tablesaw/pull/595)

Fix conversion of missing values in IntColumn.toDoubleColumn (https://github.com/jtablesaw/tablesaw/issues/577)

Fixed splitOn for TextColumn (https://github.com/jtablesaw/tablesaw/issues/554)

Handling of null values in SqlResultSetReader (https://github.com/jtablesaw/tablesaw/pull/563)

Documentation

Began compiling code samples in docs (https://github.com/jtablesaw/tablesaw/pull/637, https://github.com/jtablesaw/tablesaw/pull/639, and https://github.com/jtablesaw/tablesaw/pull/641)

Development

Automatically format code (https://github.com/jtablesaw/tablesaw/pull/570 and https://github.com/jtablesaw/tablesaw/pull/568)

Source code(tar.gz)
Source code(zip)
v0.34.2(Aug 2, 2019)
Features

Add table.stream (https://github.com/jtablesaw/tablesaw/pull/540)

Add fillWith(double) (https://github.com/jtablesaw/tablesaw/pull/539)

Add mapToType (https://github.com/jtablesaw/tablesaw/pull/545) - Thanks @ryancerf

Add appendRow (https://github.com/jtablesaw/tablesaw/commit/6f98623d81d0e57d0cc5e9ab622b518165f7a74d)

Subplots (https://github.com/jtablesaw/tablesaw/pull/548) - Thanks @kiamesdavies

QQ plots and related improvements

plotly events (https://github.com/jtablesaw/tablesaw/pull/512) - Thanks @tmrn411

Bug Fixes

Data export to Smile (https://github.com/jtablesaw/tablesaw/pull/528) - Thanks @kiamesdavies

Unit tests on Windows (https://github.com/jtablesaw/tablesaw/pull/546) - Thanks @paulk-asert

Calculation of unique values in string columns (https://github.com/jtablesaw/tablesaw/pull/544) - Thanks @ccleva

Ensure tests are run (https://github.com/jtablesaw/tablesaw/pull/551) - Thanks @ccleva

asObjectArray in numeric columns (https://github.com/jtablesaw/tablesaw/commit/6f9086897b6e85c482f5f4de3bcb71c4ae53295a)

Possible exception in toString (https://github.com/jtablesaw/tablesaw/pull/497) - Thanks @hallvard

Source code(tar.gz)
Source code(zip)
v0.34.1(Jun 17, 2019)
Features

Improved RollingColumn support

Option for CSV quote character (https://github.com/jtablesaw/tablesaw/pull/536)

New dropRange and inRange methods (#534)

Improved NumberPredicates (#532)

Bug Fixes

Fix DoubleColumn.map (https://github.com/jtablesaw/tablesaw/pull/533)

Source code(tar.gz)
Source code(zip)
v0.34.0(Jun 5, 2019)
Breaking changes

Renamed join to joinOn so that it will work with Groovy (https://github.com/jtablesaw/tablesaw/pull/531)

Features

Added set with predicate method (https://github.com/jtablesaw/tablesaw/pull/530)

Source code(tar.gz)
Source code(zip)
v0.33.5(Jun 4, 2019)
Make PackedInstant.toString parsable by Instant.parse

Source code(tar.gz)
Source code(zip)
v0.33.4(Jun 4, 2019)
Implement InstantParser

Source code(tar.gz)
Source code(zip)
v0.33.3(Jun 4, 2019)
Fix InstantColumnType.create

Bump jackson-databind to pull in security fix

Source code(tar.gz)
Source code(zip)
v0.33.2(Jun 4, 2019)
InstantColumn support in Row and DataFrameJoiner

Additional DataFrameReader parameter validation

Source code(tar.gz)
Source code(zip)
v0.33.1(Jun 4, 2019)
Add InstantColumn support to Relation

Source code(tar.gz)
Source code(zip)
v0.33.0(Jun 4, 2019)
Features

Add InstantColumn (https://github.com/jtablesaw/tablesaw/pull/518)

More configurable column type detection (https://github.com/jtablesaw/tablesaw/pull/521)

Added option for turning off html escaping in html table output

Additional date parsing capabilities (https://github.com/jtablesaw/tablesaw/issues/506)

Fixes

Fix for precision of 0 in JdbcResultSet (https://github.com/jtablesaw/tablesaw/pull/523)

Cleanup

Remove circular dependency between reader packages and core package

Remove unused epoch conversion methods (https://github.com/jtablesaw/tablesaw/pull/513)

Source code(tar.gz)
Source code(zip)
v0.32.7(Mar 31, 2019)
Implemented maxCharsPerColumn CSV parser setting

Fix DateTimeParser issue

Switch from reflections to classgraph

Updated pebble version in jsplot

Source code(tar.gz)
Source code(zip)
v0.32.6(Mar 24, 2019)
Use header option in HtmlReader

Source code(tar.gz)
Source code(zip)
v0.32.5(Mar 24, 2019)
Fix HTMLReader when using InputStream

Fix reading from URL when charset is specified

Source code(tar.gz)
Source code(zip)
v0.32.4(Mar 24, 2019)
Major improvements to HtmlWriter

Fixed print(1)

Added marker support for bar and histogram

Source code(tar.gz)
Source code(zip)
v0.32.3(Mar 13, 2019)
Improved interface for DataWriter

Source code(tar.gz)
Source code(zip)
v0.32.2(Mar 13, 2019)
Optional DataReader modules are now working

Source code(tar.gz)
Source code(zip)