uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

Last update: Dec 15, 2022

Related tags

Overview

Welcome to univocity-parsers

univocity-parsers is a collection of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

We have finally updated the tutorial, please go to our website:

https://www.univocity.com/pages/parsers-tutorial

Bugs, contributions & support

If you find a bug, please report it on github or send us an email on [email protected].

We try out best to eliminate all bugs as soon as possible and you’ll rarely see a bug open for more than 24 hours after it’s reported. We do our best to answer all questions. Enhancements/suggestions are implemented on a best effort basis.

Fell free to submit your contribution via pull requests. Any little bit is appreciated, from improvements on documentation to a full blown rewrite from scratch.

For commercial support, customizations or anything in between, please contact [email protected].

Thank you for using our parsers!

Please consider sponsoring our project or any amount via PayPal, or Bitcoin on the following address:

3BcmUPTPfLDuYWWSBxGKkChkq5WMzC94J6

Thank you!

The univocity team.

Comments

Writing multi-schema / master-detail files
I have seen the docu about reading master-detail style files (https://github.com/uniVocity/univocity-parsers#reading-master-detail-style-files) and found https://github.com/uniVocity/univocity-parsers/issues/17 "creation of multiple types of Java beans". Is there something similar for writing files I have missed so far?

Here is the usecase I'm evaluating:

# Header; line; of; master; record # Header; line for; detailrecord1 # Header; line; for; detail; record2 MASTER; some; data; for; master1 DETAIL1; first other; data DETAIL1; second other; data DETAIL2; data; for; detail; record2 DETAIL2; data2; for; detail; record2 DETAIL2; data3; for; detail; record2 MASTER; some; data; for; master2 DETAIL2; data; for; detail; record2 ...

This style might be too complex even for reading I fear as it has more than just one kind of detailrecord (actually I even have to write more than two kinds of detailrecods :worried: ).

My thoughts to solve this so far: The headers have to be declared as comments so this would not be a problem. The master rows could be written as usual. Writing to fixedwidth the detailrows could be written using fixedWidthWriter.writeRow(string)while string could be collected from a second, third,.. fixedWidthWriter using fww2.processRecordToString() . In case of CSV (usecase above) this seems a bit more difficult to me...!?

Did I miss something which makes this style of writing files easier?
bug enhancement
opened by blackfoxoh 20
null at the end of the record

for fixedwidth paser if i make setSkipTrailingCharsUntilNewline=true i am getting the null value at the end of the line.

please find the below example for more details

sample file -

YearMake_Model___________________________________Description_____________________________Price___ 123456_78 1997Ford_E350____________________________________ac, abs, moon___________________________3000.00_ 123456789 1997Ford_E350____________________________________ac, abs, moon___________________________3000.00_ 23455889

output - Year|Make|Model|Description|Price|null 1997|Ford|E350|ac, abs, moon|3000.00|null 1997|Ford|E350|ac, abs, moon|3000.00|null
bug

opened by suyogparlikar 15
CSVParser appends whitespace at the beginning of each column

I am new to this parser and I have a concern regarding CSV reading. When I read CSV, the parser appends whitespace at the beginning of each column, which I don't want. Is it by default parsing feature and we can't change it? or is there any method to handle this case?

Sample output:

3, "Gunnar Nielsen Aaby", 24 34 5656, NA, NA, Denmark, DEN, 1920 Summer, 1920, Summer, Antwerpen, Football, Football Men's Football, NA
invalid

opened by HMazharHameed 13

Investigate crash building with JDK 9

Running a simple mvn clean install with the JDK 9 results in the JVM crashing:

[jbax@linux-pc univocity-parsers]$ mvn clean install
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 1 modules...
[INFO] Installing Nexus Staging features:
[INFO]   ... total of 1 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building univocity-parsers 2.5.6
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ univocity-parsers ---
[INFO] Deleting /home/jbax/dev/repository/univocity-parsers/target
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ univocity-parsers ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/jbax/dev/repository/univocity-parsers/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ univocity-parsers ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 201 source files to /home/jbax/dev/repository/univocity-parsers/target/classes
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f46cf0802f0, pid=15241, tid=15282
#
# JRE version: Java(TM) SE Runtime Environment (9.0+181) (build 9+181)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (9+181, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x9292f0]  JVMCIGlobals::check_jvmci_flags_are_consistent()+0x120
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" (or dumping to /home/jbax/dev/repository/univocity-parsers/core.15241)
#
# An error report file with more information is saved as:
# /home/jbax/dev/repository/univocity-parsers/hs_err_pid15241.log
[thread 15281 also had an error]
#
# Compiler replay data is saved as:
# /home/jbax/dev/repository/univocity-parsers/replay_pid15241.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)

Attached the hs_err files: jdk_9_crash.zip

It doesn't always fail with a crash. In this case the enforcer plugin errors out. Removing out the maven-enforcer-plugin in the pom.xml makes the crash happen 100% of the time.

opened by jbax 13

Additional test cases for csv setCharToEscapeQuoteEscaping
Based on the documentation here, many users will set charToEscapeQuoteEscaping to \.

However this setting does not work well sometimes. Here are new test cases: https://github.com/apache/spark/pull/17177#issuecomment-284607257

IMHO,

Default setting should be charToEscapeQuoteEscaping = quoteChar

provided that quoteChar != quoteEscapeChar

Documentation should be updated

How do you think about this?
bug
opened by ep1804 13
Can we have a functionality so that all errors in a row can be collected and finally row is skipped?

Is there any way to collect all errors in a row and skip the row finally. I am using retryableErrorHandler which on getting DataValidationException reports error and row is skipped. I want to collect all the errors and skip the row finally. The approach I was using to achieve this is parse CSV once for collecting all errors by using setDefaultValue() and keepRecord() and second time parsing will actually give me valid records. Is there any better way to achieve this?
waiting for more details

opened by rahulbagad 12

AutomaticConfiguration do not work with MultiBeanListProcessor

When I instantiate a FixedWidthParser with a MultiBeanListProcessor the CommonParserSettings#configureFromAnnotations(beanClass) is not called because, it is not an instance of AbstractBeanProcessor. Should not the method be called for each AbstractProcessorBean in the MultiBeanListProcessor?

The example code:

FixedWidthParserSettings settings = new FixedWidthParserSettings();
settings.setAutoConfigurationEnabled(true);
settings.setHeaderExtractionEnabled(false);
settings.getFormat().setLineSeparator("\n");

MultiBeanListProcessor processor = new MultiBeanListProcessor(FileHeader.class, ...);   // FileHeader has an @Headers and fields with @Parsed
settings.setProcessor(processor);

FixedWidthParser parser = new FixedWidthParser(settings);     // Here should call configureFromAnnotations

try (Reader reader = getReader("/positional-file")) {

	parser.parse(reader);   // the exception is throwed here
			
	} catch (IOException e) {
		e.printStackTrace();
	}

The exception:

com.univocity.parsers.common.DataProcessingException: Could not find fields [bankCode, bankName, batchCode] in input. Please enable header extraction in the parser settings in order to match field names.
Internal state when error was thrown: line=0, column=0, record=1, charIndex=240
	at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapFieldIndexes(BeanConversionProcessor.java:360)
	at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapValuesToFields(BeanConversionProcessor.java:289)
	at com.univocity.parsers.common.processor.core.BeanConversionProcessor.createBean(BeanConversionProcessor.java:457)
	at com.univocity.parsers.common.processor.core.AbstractBeanProcessor.rowProcessed(AbstractBeanProcessor.java:51)
	at com.univocity.parsers.common.processor.core.AbstractMultiBeanProcessor.rowProcessed(AbstractMultiBeanProcessor.java:101)
	at com.univocity.parsers.common.Internal.process(Internal.java:21)
	at com.univocity.parsers.common.AbstractParser.rowProcessed(AbstractParser.java:596)
	at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:132)

The value of context.headers() at BeanConversionProcessor.mapFieldIndexes is null.

Is there any other way to use MultiBeanListProcessor with AutoConfiguration from @Headers?

bug

opened by rbatista 12

Parsing of quoted text with quotes inside

I tried different parser settings for unescaped quote handling (UnescapedQuoteHandling.STOP_AT_DELIMITER or UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE), the outcome is still wrong. CsvParserSettings parserSettings = new CsvParserSettings(); parserSettings.setLineSeparatorDetectionEnabled(true); parserSettings.setHeaderExtractionEnabled(true); parserSettings.setDelimiterDetectionEnabled(true); parserSettings.setQuoteDetectionEnabled(true); parserSettings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER); parserSettings.setReadInputOnSeparateThread(true); parserSettings.setNumberOfRowsToSkip(rowsToSkip); parserSettings.trimValues(true); ColumnProcessor rowProcessor = new ColumnProcessor(); parserSettings.setProcessor(rowProcessor); CsvParser lineParser = new CsvParser(parserSettings);

Here's the input csv file for parsing: "name"|"description"|"digit"|"other"\n "test one"|"test description with ""|"1"|"other one"\n "test two"|"test description without a quote"|"2"|"other two"\n

Here's the output after parsing: ["name", "description", "digit", "other"]\n ["test one", "test description with "|"1", "other one"]\n ["test two", "test description without a quote", "2", "other two"]\n

As you can see, "test description with "|"1" is grouped as one element
duplicate

opened by JoyW3000 11
Parsing of quoted text with quotes inside

Hi, is it possible to parse this correclty with some settings?

example1;"example with " inside";example3

Or do i have to quote it?

example1;"example with ""inside";example3

thx for your help
bug question

opened by larsniemann1983 11
Null value set for missing fields

I have fields in my code which are missing from the CSV i am trying to parse. In my parser settings I have set the default null value to be an empty string. However for the missing fields, the value being set is null.

I would like to know if this is the desired behaviour or am i doing something wrong??
bug

opened by raphkr 10
performance of csvwriter.processRecord() with conversions

Hi I've finished an implementation where I use CsvWriter (brand new final version 2.0 of your great lib) with an OutputValueSwitch and then write out rows by providing them as a Map with the method processRecord(). One rowtype of the switch has 57 columns and 3 of them apply a TrimConversion. The Map I'm passing in has 41 entrys so even some columns remain empty. The prcoessRecord() takes approximately 1700ms for writing one row which makes it very long-running. A second rowtype of the switch has 16 columns without any conversions, the input map is nearly the same as for the former rowtype (so there are much more values than needed for this rowtype). This second rowtype takes at most one or two millicesonds. If I comment out the conversions for the first rowtype it is ultrafast as well. Therefore I think the problem might not be specific to processing a map nor the fact of using an OutputValueSwitch but just in case it matters I described the whole use case...

If you need any further details just let me know
enhancement invalid

opened by blackfoxoh 10

Releases(v2.9.1)

v2.9.1(Jan 18, 2021)
BUGs fixed

Quote escape configured to double quote (quote value) character if escape not detected #414

Delimiter detection returns first candidate delimiter even if it does not exist in the file #415

context.getSelectedHeaders() in RowProcessor processStarted() can return invalid results #416

DefaultNullRead of @Parsed does not work with enums #427

Missing fields not initialized if nested beans present #432

Possible race condition #424

implicit limitation on max column name length? #438

Enhancements

Delimiter detection returns first candidate delimiter even if it does not exist in the file #415

Custom CsvFormatDetector #434

Detects "whitespace" as delimiter instead of "comma" #420

Source code(tar.gz)
Source code(zip)
v2.9.0(Aug 15, 2020)
BUGs fixed

CSV auto-detection assigning line ending as quote escape (#409)

FixedWidthFields.keepPadding not working (#405)

Multi-char delimiter incorrectly detected inside quoted string (#404)

Fixed the repeatable conversions initialization in the DefaultConversionProcessor (#399)

Fix NPE in EnumConversion (#391)

fixed quoted parser when using non-printable char as delimiter (#378)

Enhancements

make the maxRowSample for parameter public configurable for CSV auto-detection (#408)

settings.excludeFields() doesn't throw errors for a non-existing fields anymore. (#383)

Expose InputAnalysisProcess implementations publicly (#381)

add "com.googlecode.openbeans" as an optional OSGi dependency (#411)

Source code(tar.gz)
Source code(zip)
v2.8.4(Dec 9, 2019)
BUGS FIXED:

Value which contains line separator is NOT enclosed in quotes if line separator is 2 characters (#357)

Headers in Record context are changed if parser instance is reused (#350)

Record.getString(column) fails if type of column is declared as int and content in file is null (#355)

ENHANCEMENTS:

Rows starting with comment char should be automatically quoted (#362)

Source code(tar.gz)
Source code(zip)
v2.8.3(Aug 8, 2019)
BUGS:

#345 parseLine() is throwing ClassCastException when using lookahead

#337 Inconsistent parsing behavior when max. characters per column is set to -1 (unlimited)

#343 FixedWidthParser drops first char of next record if last field of current record is empty

#336 Single column, empty row CSV files result in empty rows and truncation of the last row

#328 Auto detect format with upper and lower thom is not working

fixed issue writing rows with selection of fields where the number of columns is larger than the selected/available fields.

ENHANCEMENTS:

On CSV format auto-detection, use order of allowed delimiters as order of preference

Performance improvement parsing values when maxCharsPerColumn = -1 (i.e. unlimited)

added support for writing instances of Record directly.

Source code(tar.gz)
Source code(zip)
v2.8.2(May 14, 2019)
BUGS:

Headers being extracted when not expected, and this is leading to memory leak #326

Unclear iteractions between iterate() methods and context/header parsing #314

containsColumn throws ArrayIndexOutOfBoundsException, in combination with selectFields #316

The result of the method getRecordNumber in TextParsingException is wrong. #324

ENHANCEMENTS:

On CSV format auto-detection, resolve space as column separator #310

Updated default excel serialization settings to ALWAYS escape either \r or \n as \r\n. #315

Source code(tar.gz)
Source code(zip)
v2.8.1(Feb 6, 2019)

This release fixes bug reported in issue #309 - headers parsed from one input are used again when another input is parsed using the same parser instance.
Source code(tar.gz)
Source code(zip)
v2.8.0(Feb 1, 2019)
Enhancements:

Map column name to attribute #287: This is a big change. Basically you can now skip annotations altogether and manually define mappings from parsed columns to an object, including nested attributes.

Now you can call getColumnMapper() from:

any *Routines class

any *Bean*Processor class (including BeanWriterProcessor) for writing

mapper.attributeToIndex("name", 0); //name goes to the first column mapper.attributeToColumnName("name", "client_name"); .// same thing, but the first column has header "client_name"

Nested classes are also supported, you can do stuff such as the following, where name is buried in an object structure (assume a class Purchase with a Buyer attribute, which in turn has a Contact attribute, where the name is located:

mapper.methodToIndex("buyer.contact.name", 0); // use methods too. This assumes Contact has a `fullName(java.lang.String)` setter method: mapper.methodToColumnName("buyer.contact.fullName", String.class, "client_name"); // this maps getter and setter methods named `fullName` to column `client_name`. The getters or setters are used depending if you are writing to a file or reading from it. mapper.methodToColumnName("buyer.contact.fullName", "client_name");

You can also just give it a map:

Map<String, Integer> mappings = new HashMap<String, Integer>(); ... fill your map mapper.attributesToIndexes(mappings);

Contrived unit tests: https://github.com/uniVocity/univocity-parsers/blob/master/src/test/java/com/univocity/parsers/issues/github/Github_287.java

Other enhancements:

Support CSV delimiters of more than one character #209: instead of using one character, delimiters now can be anything (e.g: ##, :-p etc).

Support creation of immutable objects #280: create instances of java beans with private constructors and no setter methods.

Enable case-sensitive column header matching #283: You can now target files with columns such as 'A', 'a', ' A ' and ' a '. Any existing code should not be affected. For example: if you use parserSettings.selectFields("a") and the parsed header is ' A ', then ' A ' will be selected as usual. This update allows one to use parserSettings.selectFields(" A ", "a") when headers such as ' A ' and 'a' are present (you can go wild here and target many different columns named aaa with different number of surrounding spaces or different character case combinations).

CsvParser beginParsing closes the stream #303: Introduced flag to prevent the parser from automatically closing the input: parserSettings.setAutoClosingEnabled(false)

Add option on FixedWidthParserSettings to keep padding of parsed values #276: the @FixedWidth annotation now has a keepPading property. This can also be set on the FixedWidthFields, using methods keepPadingOn(columns) and stripPaddingFrom(columns).

Introduce UnescapedQuoteHandling.BACK_TO_DELIMITER #259: if an unescaped quote is detected, the parser will re-process the parsed value and split the unescaped value into individual values after each delimiter found.

Bugfixes:

An extra row with null returned #306

Cannot determine column separator #305

Wrong line ending detection when normalizeLineEndingsWithinQuotes = false #299

Column selection makes @Validate annotation misbehave #296

Fixed width parsing with look-ahead and non-contiguous field definitions #294

CsvParser.parse ignores headers parser setting in processStarted Processor's method #289

Source code(tar.gz)
Source code(zip)
v2.7.6(Sep 25, 2018)
Enhancements:

allowing users to subclass ValidatedConversion

better handling of InvocationTargetExceptions when dealing with errors thrown from getters/setters

Bugfixes:

custom validations weren't working when writing

CSV autodetection failed in some cases (issue #272)

Source code(tar.gz)
Source code(zip)
v2.7.3(Aug 2, 2018)
Enhancements:

Performance improvements skipping values of de-selected fields with the CSV parser.

Adding support for regex validation and custom validations on class attributes and methods annotated with @Validate

Bugfixes:

CsvRoutines.getInputDimension() returns one row less rowCount regardless of csvParserSettings.setHeaderExtractionEnabled() #262

record.toFieldMap() not working on FixedWidth #258

Source code(tar.gz)
Source code(zip)
v2.7.2(Jul 20, 2018)
Bugfixes:

Problems when using two BeanProcessors with an InputValueSwitch #256

Column names in the header row get printed in wrong order despite index values being set correctly #255

Inconsistent result of parseLine on empty set of selected indexes #250

Source code(tar.gz)
Source code(zip)
v2.7.1(Jul 11, 2018)
Fixed a couple of bugs:

Annotated fields won't be processed when reading rows with less elements (#253)

Incorrect handling of columnReorderingEnabled = false ( #254)

Source code(tar.gz)
Source code(zip)
v2.7.0(Jul 10, 2018)
Enhancements:

Introduce @Validate annotation and conversion to perform basic validations #251

Bugfixes:

Inconsistent result of parseLine on empty set of selected indexes #250

Fixed comments collecting on buffer update #252

Source code(tar.gz)
Source code(zip)
v2.6.4(Jul 10, 2018)
Bugfixes:

Support Enum alternative values #240

better error message for more than max columns #247

Enhancements:

FixedWidthWriter does not honour FixedWidthWriterSettings.setNullValue #238

Source code(tar.gz)
Source code(zip)
v2.6.3(Apr 15, 2018)

remove dependency on internal JDK's "memberValues" field used when processing @Copy annotations.
Source code(tar.gz)
Source code(zip)
v2.6.2(Apr 4, 2018)
Bug fixes

Concurrency issue when calling stopParsing() (#231)

Error deriving header names from annotated java beans when writing objects with meta-annotations.

Enhancements

Implemented trim quoted values support (#230)

Source code(tar.gz)
Source code(zip)
v2.6.1(Mar 15, 2018)

Issue #228 revealed a regression bug on the CSV parser which was introduced in version 2.6.0

The last column of a row would be ignored if it's empty and preceded by a quoted value, e.g.

A row such as "A", would be parsed as [A] instead of [A, null]

This new release fixes the bug.
Source code(tar.gz)
Source code(zip)
v2.6.0(Feb 27, 2018)
Enhacements:

CSV parser now parses quoted values ~30% faster

CSV format detection process has option provide a list of possible delimiters, in order of priority ( i.e. settings.detectFormatAutomatically( '-', '.');) (#214 )

CSV writer allows selecting columns that should always have quotes (i.e. writerSettings.quoteFields("C", "D"); or writerSettings.quoteIndexes(0, 1, 4, 5);) (#191)

Introduced support for selecting and aggregating values in lists of java beans when inputs have repeated header names (#188)

Context class has two new methods for facilitating the usage of Record inside RowProcessor (#211):

Adjusting method signatures of AbstractWriter to properly handle writing rows based on collections of objects of any type (from a StackOverflow complaint).

public Record toRecord(String[] row); public RecordMetaData recordMetaData();

Bugfixes:

NullPointer when stopping parser when nothing is parsed (#219)

Reusing CsvRoutines object results in unexpected output (#224)

Source code(tar.gz)
Source code(zip)
v2.5.9(Nov 21, 2017)
Bugfixes

#205 OOM on skip lines. AbstractCharInputReader.skipLines

#212 Bad results if charset is null in beginParsing

#203 CsvRoutines generating empty output with keepResourcesOpen

Fixed incorrect behavior processing comment lines at the end of the input, when no line ending is present after the commented out line.

ENHANCEMENTS

Introduced the CompositeProcessor based on question raised in ticket #206

Adjustment on CsvFormatDetector to improve chances of detecting delimiters in small inputs.

Added expectedRowCount parameter to all methods that produce lists of rows, records or beans to prevent slow reallocation operations inside the resulting ArrayList. Useful when processing large inputs.

Source code(tar.gz)
Source code(zip)
v2.5.8(Oct 16, 2017)
Trailing spaces in csv line leads to null instead of empty String (#196)

On CSV, keepQuotes fail to print out closing quote if it's followed by a whitespace and a line ending (#197)

CsvParser does not properly detect delimiter (with setDelimiterDetectionEnabled) (#198)

Routines' keepResourcesOpen not respected with writeAll(list of objects) (#201)

Source code(tar.gz)
Source code(zip)
v2.5.7(Oct 9, 2017)

This version was released to bury the intermittent results produced by the parser when processing files and input streams without an explicit character encoding while readInputOnSeparateThread=true as reported on issue #194

This problem appeared on version 2.5.0 onward and has been plaguing users until version 2.5.6, and is finally gone for good.
Source code(tar.gz)
Source code(zip)
v2.5.6(Sep 22, 2017)
This is the first Java 9 compatible version

DO NOT use any previous release with Java 9 as it will blow up.

string.getChars used to throw ArrayIndexOutOfBoundsException (tested on JDK 6-8) but now it throws StringIndexOutOfBoundsException. This version was updated to catch IndexOutOfBoundsException instead so it can work with all JDKs (tested from 6 to 9).

SimpleDateFormat - Changes in the locales "broke" previously working date formats. If the default locale is "en_AU" then the following won't work with JDK 9: new SimpleDateFormat("yyyy-MMM-dd").parse("2015-DEC-25")

Classes with annotations for dates now allow users to provide a locale. For example:

@Parsed @Format(formats = "dd-MMM-yyyy", options = "locale=en_US_WIN") private Date myDate; //if you are in Australia, MMM will translate to the abbreviated format followed by period, //i.e. "October" becomes "Oct.". The formatter won't work to parse dates such as "2015-Oct-20" //Use "locale=en" to make it behave as it always did in Java 8 or earlier. @Parsed @Format(formats = "dd-MMM-yyyy", options = "locale=en") private Date australianDay;

Also fixed a bug on the TrimConversion introduced in version 2.5.5 - it would break processing blank or empty strings.
Source code(tar.gz)
Source code(zip)
v2.5.5(Sep 8, 2017)
Fixed concurrency issue processing File or InputStream with no explicit character encoding provided, when setReadInputOnSeparateThread=true (the default). The issue could make the parser discard the first character of the input.

More details in: https://github.com/uniVocity/univocity-parsers/issues/186 and https://github.com/uniVocity/univocity-parsers/issues/187

** Other adjustments **

Improved CSV format detection

Fixed TrimConversion to remove trailing whitespaces when input is truncated to a maximum length.

Source code(tar.gz)
Source code(zip)
v2.5.4(Sep 1, 2017)

BUGFIXES

Invalid values being produced when column selection is defined, reordering is disabled and number of headers is less than number of selected columns. (https://github.com/uniVocity/univocity-parsers/issues/183)

ParsingContext of *Routines returns information of the following row instead of the current (https://github.com/uniVocity/univocity-parsers/issues/184)

currentParsedContent returns a first-char-repeated string from ByteArrayInputStream with setReadInputOnSeparateThread disabled (https://github.com/uniVocity/univocity-parsers/issues/185)

Fixed issues handling values with unescaped quotes at the end of the input, and in combination with the keepQuotes flag - commit.

ENHANCEMENTS

Allowing the same column to feed data into multiple fields of a java bean and its nested fields. Validation kicks in only if writing.
Source code(tar.gz)
Source code(zip)
v2.5.3(Aug 17, 2017)
Bug fixes

CsvParserSettings.setDelimiterDetectionEnabled() not working with input file/stream when the character encoding is not explicitly provided. (https://github.com/uniVocity/univocity-parsers/issues/178)

CSV format autodetection thrown off by fields with single or double quote in the middle of the value (also in https://github.com/uniVocity/univocity-parsers/issues/178)

Fixed-width header extraction fails (https://github.com/uniVocity/univocity-parsers/issues/182)

Failure to extract header row with FixedWidthParser + annotations (https://github.com/uniVocity/univocity-parsers/issues/180)

Enhancements

Custom conversion classes can now be defined without a String[] args constructor (https://github.com/uniVocity/univocity-parsers/issues/181) if no parameters are used.

Source code(tar.gz)
Source code(zip)
v2.5.2(Aug 9, 2017)

Fixed bug in CSV parser - line ending following escaped quote was handled was end of the record (Issue https://github.com/uniVocity/univocity-parsers/issues/177)
Source code(tar.gz)
Source code(zip)
v2.5.1(Jul 29, 2017)

The introduction of support for BOM marker handling created an unwanted side effect when the user does not provide the character encoding of the file being processed: performance is reduced significantly

This version fixes the problem and the performance should be equivalent of previous versions, regardless of the user providing a character encoding explicitly, or the input having a BOM marker.

Relevant issue: https://github.com/uniVocity/univocity-parsers/issues/176
Source code(tar.gz)
Source code(zip)
v2.5.0(Jul 24, 2017)
ENHANCEMENTS

Added methods to the parser that return Iterable<String[]> or Iterable(https://github.com/uniVocity/univocity-parsers/issues/151)

Rows can now be iterated like this:

for (String[] row : parser.iterate(input, "UTF-8")) { //do stuff }

And records:

for (Record row : parser.iterateRecords(new FileInputStream(input), "UTF-8")) { //do stuff }

Support multiple header names in field attribute of @Parsed annotation (https://github.com/uniVocity/univocity-parsers/issues/150)

You can now use the same POJO to parse different files with different headers.

Adding a HeaderTransformer to the @Nested annotation (https://github.com/uniVocity/univocity-parsers/issues/159)

This allows one to nest the same class inside a parent multiple times, by changing the names assigned to the nested properties. Example for a car with 4 wheels:

public static class Wheel { @Parsed String brand; @Parsed int miles; } public static class Car { @Nested Wheel frontLeft; @Nested Wheel frontRight; @Nested Wheel rearLeft; @Nested Wheel rearRight; }

To parse an input with organized like this: frontLeftWheelBrand,frontLeftWheelMiles,frontRightWheelBrand,frontRightWheelMiles,rearLeftWheelBrand,rearLeftWheelMiles,rearRightWheelBrand,rearRightWheelMiles, we need to apply a transformation to the property names of each Wheel instance. This can be done with something like this:

public static class NameTransformer extends HeaderTransformer { private String prefix; public NameTransformer(String... args) { prefix = args[0]; } @Override public String transformName(Field field, String name) { return prefix + Character.toUpperCase(name.charAt(0)) + name.substring(1); } }

Now we can change the Nested annotations to be:

public static class Car { @Nested(headerTransformer = NameTransformer.class, args = "frontLeftWheel") Wheel frontLeft; @Nested(headerTransformer = NameTransformer.class, args = "frontRightWheel") Wheel frontRight; @Nested(headerTransformer = NameTransformer.class, args = "rearLeftWheel") Wheel rearLeft; @Nested(headerTransformer = NameTransformer.class, args = "rearRightWheel") Wheel rearRight; }

And parse the input with:

List<Car> cars = new CsvRoutines(settings).parseAll(Car.class, input);

added support for annotations in methods (https://github.com/uniVocity/univocity-parsers/issues/160)

@Parsed or @Nested annotations now work on individual methods. Modifying the Car example above to use a List<Wheel>, one can do the followng:

public static class Car { List<Wheel> wheels = new ArrayList<Wheel>(); @Nested(headerTransformer = NameTransformer.class, args = "frontLeftWheel") private void setWheel1(Wheel wheel) { //bad method name created on purpose. Also private. wheels.add(wheel); } @Nested(headerTransformer = NameTransformer.class, args = "frontRightWheel") private void setWheel2(Wheel wheel) { //bad method name created on purpose. Also private. wheels.add(wheel); } @Nested(headerTransformer = NameTransformer.class, args = "rearLeftWheel") private void setWheel3(Wheel wheel) { //bad method name created on purpose. Also private. wheels.add(wheel); } @Nested(headerTransformer = NameTransformer.class, args = "rearRightWheel") private void setWheel4(Wheel wheel) { //bad method name created on purpose. Also private. wheels.add(wheel); } }

added support and automatic handling of files with a BOM marker (https://github.com/uniVocity/univocity-parsers/issues/154)

Introduced properties from and to to the @FixedWidth annotation, along with FixedWidthFields.addField(int startPosition, int endPosition) to allow declaring field ranges instead of fixed lengths (https://github.com/uniVocity/univocity-parsers/issues/166)

enabled column reordering in CSV, TSV and Fixed-Width writers. This allows users to select which columns of their input should be written and prevents generating empty columns. Just use selectFields with setColumnReorderingEnabled(true) (https://github.com/uniVocity/univocity-parsers/issues/167)

added a keepResourcesOpen property in CsvRoutines, TsvRoutines and FixedWidthRoutines to control whether or not to close the Writer after writing data to the output. Also controls whether the ResultSet will closed when dumping data from it. (https://github.com/uniVocity/univocity-parsers/issues/172)

BUGS FIXED

CSV format auto-detection thrown off by seemingly regular CSV (https://github.com/uniVocity/univocity-parsers/issues/161)

Can't provide custom headers when dumping a ResultSet (https://github.com/uniVocity/univocity-parsers/issues/157)

ParsingContext.currentParsedContent returns null when input is single length string (https://github.com/uniVocity/univocity-parsers/issues/165)

NullValue and EmptyValue are both applied (https://github.com/uniVocity/univocity-parsers/issues/158)

NumberFormatException when using record.getDate("field") in a concurrent environment (https://github.com/uniVocity/univocity-parsers/issues/156)

AutomaticConfiguration does not work with MultiBeanListProcessor (https://github.com/uniVocity/univocity-parsers/issues/149)

Source code(tar.gz)
Source code(zip)
v2.4.1(Mar 20, 2017)
This release includes:

a fix for concurrency issues processing annotations: https://github.com/uniVocity/univocity-parsers/issues/146

internal adjustments for exception handling and generation of error messages.

Source code(tar.gz)
Source code(zip)
v2.4.0(Mar 14, 2017)
Enhancements

Added support for nested objects with the newly introduced @Nested annotation: https://github.com/uniVocity/univocity-parsers/issues/139

CsvRoutines and other routine classes provide a ParsingContext object: https://github.com/uniVocity/univocity-parsers/issues/136

Bugfixes:

Fixed incorrect escape of CSV when writing a quote escape character with setQuoteEscapingEnabled=true before writing any other character that would prompt the writer to enclose the field in quotes: https://github.com/uniVocity/univocity-parsers/issues/143

Fixed handling of unquoted values where two consecutive unescaped quotes appeared in the input: https://github.com/uniVocity/univocity-parsers/issues/143

IndexOutOfBoundsException when processing index-based annotations: https://github.com/uniVocity/univocity-parsers/commit/33a099c57a61d14d2fb2a58c22cd84806e9d27c8

Fixed width writing from ResultSet should respect user provided field length instead of using the resultset length - https://github.com/uniVocity/univocity-parsers/issues/135

Parser did not throw error in case input fails to be read (e.g. in case of socket timeouts, etc) and would just silently stop. https://github.com/uniVocity/univocity-parsers/issues/140

Source code(tar.gz)
Source code(zip)
v2.3.1(Feb 5, 2017)
This release includes a couple of bug fixes:

setColumnReorderingEnabled(false) not working when parsing single line: https://github.com/uniVocity/univocity-parsers/issues/131

index property of @Parsed annotation not handled correctly: https://github.com/uniVocity/univocity-parsers/issues/132

...and some internal refactoring that should not affect anyone.
Source code(tar.gz)
Source code(zip)