High performance CSV reader and writer for Java.

Related tags

CSV java csv
Overview

FastCSV

build Codacy Badge codecov javadoc Maven Central

🚀 FastCSV 2.0 upgrade has landed with major improvements on performance and usability!

FastCSV is an ultra-fast and dependency-free RFC 4180 compliant CSV library for Java.

Actively developed and maintained since 2015 its primary intended use cases are:

  • big data applications to read and write data on a massive scale
  • small data applications with the need for a lightweight library

Benchmark

A selected benchmark from the Java CSV library benchmark suite project. Comparing to some other popular, dependency-free and small (< 100 KB) libraries.

Benchmark

Features

API

  • Ultra fast
  • Small footprint
  • Zero runtime dependencies
  • Null-free

CSV specific

  • RFC 4180 compliant – including:
    • Newline and field separator characters in fields
    • Quote escaping
  • Configurable field separator
  • Support for line endings CRLF (Windows), CR (old Mac OS) and LF (Unix)
  • Unicode support

Reader specific

  • Support reading of some non-compliant (real world) data (see comparison with other libraries)
  • Preserving line break character(s) within fields
  • Preserving the original line number (even with skipped and multi line records) – helpful for error messages
  • Auto detection of line delimiters (can also be mixed)
  • Configurable data validation
  • Support for (optional) header lines (get field based on column name)
  • Support for skipping empty rows
  • Support for commented lines (skipping & reading) and configurable comment character

Writer specific

  • Support for multiple quote strategies to differentiate between empty and null

Requirements

  • Java 8

💡 Android is not Java and is not officially supported. Although some basic checks are included in the continuous integration pipeline in order to verify that the library should work with Android 8.0 (API level 26).

CsvReader Examples

Iterative reading of some CSV data from a string

CsvReader.builder().build("foo1,bar1\r\nfoo2,bar2")
    .forEach(System.out::println);

Iterative reading of some CSV data with a header

NamedCsvReader.builder().build("header 1,header 2\nfield 1,field 2")
    .forEach(row -> row.getField("header 2"));

Iterative reading of a CSV file

try (CsvReader csv = CsvReader.builder().build(path, charset)) {
    csv.forEach(System.out::println);
}

Custom settings

CsvReader.builder()
    .fieldSeparator(';')
    .quoteCharacter('"')
    .commentStrategy(CommentStrategy.SKIP)
    .commentCharacter('#')
    .skipEmptyRows(true)
    .errorOnDifferentFieldCount(false);

For more example see CsvReaderExample.java

CsvWriter Examples

Iterative writing of some data to a writer

CsvWriter.builder().build(new PrintWriter(System.out, true))
    .writeRow("header1", "header2")
    .writeRow("value1", "value2");

Iterative writing of a CSV file

try (CsvWriter csv = CsvWriter.builder().build(path, charset)) {
    csv
        .writeRow("header1", "header2")
        .writeRow("value1", "value2");
}

Custom settings

CsvWriter.builder()
    .fieldSeparator(',')
    .quoteCharacter('"')
    .quoteStrategy(QuoteStrategy.REQUIRED)
    .lineDelimiter(LineDelimiter.LF);

For more example see CsvWriterExample.java

Upgrading from version 1.x

Please see UPGRADING.md for an overview of the main functionality of 1.x and how to upgrade them to version 2.

Comments
  • Question about parsing random strings

    Question about parsing random strings

    I want to use the FastCSV for implementing a large file CSV editor, using JavaFX. I plan to parse only separate rows, at render time. This means I need to use the CSV parser to parse a String line at a time. Is it possible to have something like this?

    CSVParser parser = new CSVParser();
    parse.parse( line98);
    parser.reset();
    parser.parse(line99)
    

    I mean I parse a line, then I may parse another, etc. For getting optimal memory usage, I would instantiate the CSVParser only one time.

    question 
    opened by wise-coders 18
  • Performance regression with 2.1.0

    Performance regression with 2.1.0

    Thank you for your work on this great product! It's proven performance has significantly improved the performance of our application.

    We had initially been using version 1.0.4, which greatly improved the performance of our CSV parsing (over commons-csv which we had been previously using). We recently tried upgrading to version 2.1.0. I like the new API, however, we noticed that there was a significant performance degradation over 1.0.4. We have a little bit of a unique data format that we deal with, which involves an embedded CSV list within a CSV column. This is how our data looks:

    NAME,NUMBER,WIDGETS_LIST
    john doe,123456,"""thequickbrownfoxjumpedoverthelazydog"""
    john smith,7890123,"""thequickbrownfoxjumpedoverthelazydog1"",""thequickbrownfoxjumpedoverthelazydog2"""
    

    The WIDGETS_LIST column is a variable length list that is formatted as an embedded csv string. Each item in the list is usually around 200 characters long.

    With fastcsv 1.0.4 we would parse the data with code like this:

    class Parser {
    
      Client parseCsv(Path file) {
        List<Client> clients = new ArrayList<>();
        CsvReader csvReader = new CsvReader();
        try(var parser = csvReader.parse(file, StandardCharsets.UTF_8)) {
          CsvRow row;
          while( (row = parser.nextRow()) != null) {
            String name = row.getField(0);
            String number = row.getField(1);
            List<String> widgets = parseWidgets(row.getField(2));
            clients.add(new Client(name, number, widgets));
          }
        }
        return clients;
      }
      List<String> parseWidgets(String data) {
        CsvReader csvReader = new CsvReader();
        CsvParser parser = csvParser.parser(new StringReader(data));
        CsvRow row = parser.nextRow();
        return row != null ? List.copyOf(row.getFields()) : List.of();
      }
    
    }
    

    With fastcsv 2.1.0 we parse with code like this:

    class Parser {
    
      Client parseCsv(Path file) {
        try(var parser = CsvReader.builder().build(file)) {
          return parser.stream()
             .map(row -> {
                 String name = row.getField(0);
                 String number = row.getField(1);
                 List<String> widgets = parseWidgets(row.getField(2));
                 return new Client(name, number, widgets));
             })
             .toList();
      }
      List<String> parseWidgets(String data) {
        return CsvReader.builder().build(data)
            .stream().flatMap(r -> r.getFields().stream())
            .toList();
      }
    
    }
    

    Very surprisingly, the fastcsv 2.1.0 code takes around twice as long to parse the CSV data than version 1.0.4. It seems to be related to the embedded CSV string since for other data without the embedded CSV, 2.1.0 is actually faster than 1.0.4. However, I cannot figure out why the embedded CSV is causing such a significant slow down. To get meaningful performance results we benchmarked with a CSV file containing about 1 million rows, and processed the same file 10 times per run.

    Additional context Java distribution and version to be used (output of java -version).

    openjdk version "17.0.2" 2022-01-18
    OpenJDK Runtime Environment Temurin-17.0.2+8 (build 17.0.2+8)
    OpenJDK 64-bit Server VM Temurin-17.0.2+8 (build 17.0.2+8, mixed mode, sharing)
    
    opened by shollander 14
  • Per-field quoting

    Per-field quoting

    I need a way to enforce per-field quoting in order to generate CSV for PostgreSQL COPY statement because quoted empty string is treated as NULL, while totally empty field is treated as an empty string:

    1,,3 ---> ""
    1,"",3 ---> NULL
    

    The current CsvAppender API doesn't support such behavior. Possible solutions:

    • add an additional flag to appendField
    • add a new method appendDelimitedField
    • make alwaysDelimitText field mutable so that the consumer can turn it off/on before appending the specific field
    enhancement 
    opened by metametadata 7
  • Don't force the use of FastBufferedWriter

    Don't force the use of FastBufferedWriter

    Many Writer implementations are already buffered and fast enough. Also, many use cases start with an already existing BufferedWriter or similar, adding an extra layer only adds another temporary buffer copy. For example, you may have code like this working for different uses cases:

    GZIPOutputStream zout = new GZIPOutputStream(
    new CipherOutputStream(new FileOutputStream(file), c));
    		
    return new BufferedWriter(new OutputStreamWriter(zout, "utf-8"));
    

    Assuming you can't change the code above or you don't want to be forced to do this:

    if (usingFasctCsv)
       return new OutputStreamWriter(zout, "utf-8")
    else 
       return new BufferedWriter(new OutputStreamWriter(zout, "utf-8"));
    

    Can you add some way to construct an appender without wrapping the writer? Or I will try a pull request...

    opened by qtxo 7
  • Quoted fields at end of a row are silently dropped (fixes #19).

    Quoted fields at end of a row are silently dropped (fixes #19).

    If a field at the end of a row was quoted but empty, it was silently dropped. For example, take this CSV:

    "foo",""
    

    This would only end up having 1 field in the resulting row instead of the expected 2. This PR fixes this bug.

    opened by nathankleyn 5
  • CsvWriter

    CsvWriter "No such a method"

    I think that I found a bug inside the class CsvWriter:

    "Caused by: java.lang.NoSuchMethodError: No virtual method toPath()Ljava/nio/file/Path; in class Ljava/io/File; or its super classes (declaration of 'java.io.File' appears in /system/framework/core-oj.jar)
                                                                                 at de.siegmar.fastcsv.writer.CsvWriter.append(CsvWriter.java:148)"
    

    I doesn't work on android 7.0 but it works on android 8.0, same phone. I already tried forcing Android Studio working with Java VERSION_1_8 and VERSION_1_7 but it's still the same.

    help wanted 
    opened by justmaaarco 5
  • How to append a row to an existing file?

    How to append a row to an existing file?

    Maybe I'm missing something obvious .. I want to simply add a row of data to a CSV file that already exists on disk.

    Using CsvWriter and writeRow does not create a row at the bottom of the file. I noticed things changed in the version overhaul, and the CsvAppender and appendLine stuff is gone.

    So, using the CsvWriter how can you open an existing CSV file and add a row of new data?

    question 
    opened by villain-bryan 4
  • Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data

    Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data

    Describe the bug I am trying to read a file having size of approx 340 MBs. After I reach line number 809, I get this error: Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data

    To Reproduce Try reading a csv with 800k rows.

    Code:

    CsvReader reader = CsvReader.builder()
                .fieldSeparator('\t')
                .quoteCharacter('"')
                .commentStrategy(CommentStrategy.NONE)
                .skipEmptyRows(true)
                .errorOnDifferentFieldCount(true)
                .build(path, charset);
    
    reader.forEach(System.out::println);
    

    Additional context java version "1.8.0_201"

    opened by manishlogan 4
  • Use client passed Writer without wrapping it

    Use client passed Writer without wrapping it

    Is your feature request related to a problem? Please describe.

    Basically CsvWriter fails the Single Responsibility Principle by tackling both high level csv formatting and low level IO buffering, creating problems for non trivial uses.
    Some use cases rely on a Writer obtained previously which is used to write more data than just a single CSV file. For example the same stream can contain multiple CSVs or some other data at the end.
    The user can not do a writer.flush() because the writer is internally wrapped with CachingWriter so the last bytes may never get written until a csvWriter.close() is issued which closes the writer as well.

    Once a Writer is passed its state is unknown until a call to CavWriter.close() is made!

    Describe the solution you'd like

    Let construct a CsvWriter without messing in any way with the writer passed. Let CsvWriter deal with building the csv data structure, and let the passed Writer deal with the low level stuff.

    Describe alternatives you've considered

    No alternative possible except a source code modification.

    RFC 4180 compliance Would this feature comply to RFC 4180?

    Yes, but the important part is that the code will be more correct, because it won't mess with external code passed to it.

    opened by qtxo 4
  • GC limit overhead exceeded because of temporary objects

    GC limit overhead exceeded because of temporary objects

    Hi,

    I am trying to read from a csv file containing a bit more than 2 million rows, then make a simple mapping to something i can use to finally insert it to a database. However, i am getting an erro: "GC limit overhead exceeded", as it creates a lot of temporary objects.

    I read the other issue regarding temporary objects, however as i could understand, it is regarding writing to a csv file, but i am getting this error while reading from an csv file.

    invalid 
    opened by extstmtrifork 4
  • CSV appender for writting big files

    CSV appender for writting big files

    I want to create a big csv file using the CSV Appender. I'm using this code:

    `for (int i = 0; i < writeBuffer.size(); i++) {
    	String array[] = writeBuffer.get(i).toArray(new String[writeBuffer.get(i).size()]);
    	csvAppender.appendLine(array);
       }`
    

    being writeBuffer a List<List>. This buffer can have more than 500 lines. When I finish with the processing, the resulting file only has 148 lines and the last one is incomplete.

    I also have try to flush at 100 lines, but then is not writting the next lines.

    Maybe i am using the library in a incorrect way?

    Thanks in advance.

    question 
    opened by josevi86 4
  • Support for efficiently reading files via random access

    Support for efficiently reading files via random access

    The feature added with #57 had to be removed by #59 as it lacked support for unicode. Think of a better concept and implement it by keeping performance characteristics of reading regular files.

    enhancement 
    opened by osiegmar 0
Releases(v2.2.1)
  • v2.2.1(Nov 9, 2022)

  • v2.2.0(Jun 20, 2022)

  • v2.1.0(Oct 17, 2021)

    [2.1.0] - 2021-10-17

    Added

    • Builder methods for standard encoding (UTF-8)
    • Comment support for writer
    • toString() method to CsvWriter and CsvWriterBuilder
    • Support for random access file operations

    Changed

    • Improved error message when buffer exceeds (because of invalid CSV data) #52
    • Defined 'de.siegmar.fastcsv' as the Automatic-Module-Name (JPMS module name)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jan 1, 2021)

    [2.0.0] - 2021-01-01

    Added

    • Support for commented lines #31
    • Support for multiple quoting strategies #39

    Changed

    • Completely re-engineered the API for better usability
    • Improved performance
    • Make use of Java 8 features (like Streams and Optionals)
    • Replaced TestNG with JUnit 5
    • Changed license from Apache 2.0 to MIT

    Removed

    • CsvContainer concept – use Stream.collect() as a replacement
    • java.io.File API – use java.nio.file.Path instead
    Source code(tar.gz)
    Source code(zip)
  • v1.0.4(Nov 29, 2020)

Owner
Oliver Siegmar
Expert on Microservices, Cloud Computing (esp. Amazon Web Services), Continuous Delivery, Docker, Ansible, Linux, Java, Spring and Scrum
Oliver Siegmar
uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

Welcome to univocity-parsers univocity-parsers is a collection of extremely fast and reliable parsers for Java. It provides a consistent interface for

univocity 874 Dec 15, 2022
Fast and Easy mapping from database and csv to POJO. A java micro ORM, lightweight alternative to iBatis and Hibernate. Fast Csv Parser and Csv Mapper

Simple Flat Mapper Release Notes Getting Started Docs Building it The build is using Maven. git clone https://github.com/arnaudroger/SimpleFlatMapper.

Arnaud Roger 418 Dec 17, 2022
An efficient, up-to-date reader/writer for Java properties files

JProperties JProperties is a small, highly efficient, and extensible library for parsing .properties files. It is a modern replacement for the java.ut

Aidan 12 Apr 1, 2022
Metrobank CSV file reader.

BankTransactionReader The application reads an input CSV file with bank transactions and computes total income and expenses for the whole file, presen

Elza Contiero 0 Sep 1, 2022
The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.

Apache Commons CSV The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types. Documentation More i

The Apache Software Foundation 307 Dec 26, 2022
Vector map library and writer - running on Android and Desktop.

Mapsforge See the integration guide and changelog. And read through how to contribute guidelines. If you have any questions or problems, don't hesitat

mapsforge 1k Dec 30, 2022
Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

Nokogiri Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writ

Sparkle Motion 6k Jan 8, 2023
Vector map library and writer - running on Android and Desktop.

Mapsforge See the integration guide and changelog. And read through how to contribute guidelines. If you have any questions or problems, don't hesitat

mapsforge 1k Jan 7, 2023
A high available,high performance distributed messaging system.

#新闻 MetaQ 1.4.6.2发布。更新日志 MetaQ 1.4.6.1发布。更新日志 MetaQ 1.4.5.1发布。更新日志 MetaQ 1.4.5发布。更新日志 Meta-ruby 0.1 released: a ruby client for metaq. SOURCE #介绍 Meta

dennis zhuang 1.3k Dec 12, 2022
A simple JavaFX application to load, save and edit a CSV file and provide a JSON configuration for columns to check the values in the columns.

SmartCSV.fx Description A simple JavaFX application to load, save and edit a CSV file and provide a JSON Table Schema for columns to check the values

Andreas Billmann 74 Oct 24, 2022
Bank Statement Analyzer Application that currently runs in terminal with the commands: javac Application.java java Application [file-name].csv GUI coming soon...

Bank Statement Analyzer Application that currently runs in terminal with the commands: javac Application.java java Application [file-name].csv GUI coming soon...

Hayden Hanson 0 May 21, 2022
ActiveJ is an alternative Java platform built from the ground up. ActiveJ redefines web, high load, and cloud programming in Java, featuring ultimate performance and scalability!

Introduction ActiveJ is a full-featured modern Java platform, created from the ground up as an alternative to Spring/Micronauts/Netty/Jetty. It is des

ActiveJ LLC 579 Jan 7, 2023
A maven plugin to include features from jmeter-plugins.org for JMeterPluginsCMD Command Line Tool to create graphs, export csv files from jmeter result files and Filter Result tool.

jmeter-graph-tool-maven-plugin A maven plugin to create graphs using the JMeter Plugins CMDRunner from JMeter result files (*.jtl or *.csv) or using F

Vincent DABURON 6 Nov 3, 2022
Library that makes it possible to read, edit and write CSV files

AdaptiveTableLayout Welcome the new CSV Library AdaptiveTableLayout for Android by Cleveroad Pay your attention to our new library that makes it possi

Cleveroad 1.9k Jan 6, 2023
A fast, programmer-friendly, free CSV library for Java

super-csv Dear super-csv community, we are looking for people to help maintain super-csv. See https://github.com/super-csv/super-csv/issues/47 Super C

super csv 500 Jan 4, 2023
Text Object Java Objects (TOJOs): an object representation of a multi-line structured text file like CSV

It's a simple manager of "records" in a text file of CSV, JSON, etc. format. It's something you would use when you don't want to run a full database,

Yegor Bugayenko 19 Dec 27, 2022
Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low latency stream processing and data analysis framework. Milliseconds latency and 10+ times faster than Flink for complicated use cases.

Table-Computing Welcome to the Table-Computing GitHub. Table-Computing (Simplified as TC) is a distributed light weighted, high performance and low la

Alibaba 34 Oct 14, 2022
Uber-project for (some) standard Jackson textual format backends: csv, properties, yaml (xml to be added in future)

Overview This is a multi-module umbrella project for Jackson standard text-format dataformat backends. Dataformat backends are used to support format

FasterXML, LLC 351 Dec 22, 2022