uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

Overview

thumbnail

Welcome to univocity-parsers

univocity-parsers is a collection of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

We have finally updated the tutorial, please go to our website:

https://www.univocity.com/pages/parsers-tutorial

Bugs, contributions & support

If you find a bug, please report it on github or send us an email on [email protected].

We try out best to eliminate all bugs as soon as possible and you’ll rarely see a bug open for more than 24 hours after it’s reported. We do our best to answer all questions. Enhancements/suggestions are implemented on a best effort basis.

Fell free to submit your contribution via pull requests. Any little bit is appreciated, from improvements on documentation to a full blown rewrite from scratch.

For commercial support, customizations or anything in between, please contact [email protected].

Thank you for using our parsers!

Please consider sponsoring our project or paypal any amount via PayPal, or Bitcoin on the following address:

  • 3BcmUPTPfLDuYWWSBxGKkChkq5WMzC94J6

Thank you!

The univocity team.

Comments
  • Writing multi-schema / master-detail files

    Writing multi-schema / master-detail files

    I have seen the docu about reading master-detail style files (https://github.com/uniVocity/univocity-parsers#reading-master-detail-style-files) and found https://github.com/uniVocity/univocity-parsers/issues/17 "creation of multiple types of Java beans". Is there something similar for writing files I have missed so far?

    Here is the usecase I'm evaluating:

    # Header; line; of; master; record
    # Header; line for; detailrecord1
    # Header; line; for; detail; record2
    MASTER; some; data; for; master1
    DETAIL1; first other; data
    DETAIL1; second other; data
    DETAIL2; data; for; detail; record2
    DETAIL2; data2; for; detail; record2
    DETAIL2; data3; for; detail; record2
    MASTER; some; data; for; master2
    DETAIL2; data; for; detail; record2
    ...
    

    This style might be too complex even for reading I fear as it has more than just one kind of detailrecord (actually I even have to write more than two kinds of detailrecods :worried: ).

    My thoughts to solve this so far: The headers have to be declared as comments so this would not be a problem. The master rows could be written as usual. Writing to fixedwidth the detailrows could be written using fixedWidthWriter.writeRow(string)while string could be collected from a second, third,.. fixedWidthWriter using fww2.processRecordToString() . In case of CSV (usecase above) this seems a bit more difficult to me...!?

    Did I miss something which makes this style of writing files easier?

    bug enhancement 
    opened by blackfoxoh 20
  • null at the end of the record

    null at the end of the record

    for fixedwidth paser if i make setSkipTrailingCharsUntilNewline=true i am getting the null value at the end of the line.

    please find the below example for more details

    sample file -

    YearMake_Model___________________________________Description_____________________________Price___ 123456_78 1997Ford_E350____________________________________ac, abs, moon___________________________3000.00_ 123456789 1997Ford_E350____________________________________ac, abs, moon___________________________3000.00_ 23455889

    output - Year|Make|Model|Description|Price|null 1997|Ford|E350|ac, abs, moon|3000.00|null 1997|Ford|E350|ac, abs, moon|3000.00|null

    bug 
    opened by suyogparlikar 15
  • CSVParser appends whitespace at the beginning of each column

    CSVParser appends whitespace at the beginning of each column

    I am new to this parser and I have a concern regarding CSV reading. When I read CSV, the parser appends whitespace at the beginning of each column, which I don't want. Is it by default parsing feature and we can't change it? or is there any method to handle this case?

    Sample output:

    3, "Gunnar Nielsen Aaby", 24 34 5656, NA, NA, Denmark, DEN, 1920 Summer, 1920, Summer, Antwerpen, Football, Football Men's Football, NA

    invalid 
    opened by HMazharHameed 13
  • Investigate crash building with JDK 9

    Investigate crash building with JDK 9

    Running a simple mvn clean install with the JDK 9 results in the JVM crashing:

    [jbax@linux-pc univocity-parsers]$ mvn clean install
    [INFO] Scanning for projects...
    [INFO] Inspecting build with total of 1 modules...
    [INFO] Installing Nexus Staging features:
    [INFO]   ... total of 1 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
    [INFO]                                                                         
    [INFO] ------------------------------------------------------------------------
    [INFO] Building univocity-parsers 2.5.6
    [INFO] ------------------------------------------------------------------------
    [INFO] 
    [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ univocity-parsers ---
    [INFO] Deleting /home/jbax/dev/repository/univocity-parsers/target
    [INFO] 
    [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ univocity-parsers ---
    [INFO] Using 'UTF-8' encoding to copy filtered resources.
    [INFO] skip non existing resourceDirectory /home/jbax/dev/repository/univocity-parsers/src/main/resources
    [INFO] 
    [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ univocity-parsers ---
    [INFO] Changes detected - recompiling the module!
    [INFO] Compiling 201 source files to /home/jbax/dev/repository/univocity-parsers/target/classes
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f46cf0802f0, pid=15241, tid=15282
    #
    # JRE version: Java(TM) SE Runtime Environment (9.0+181) (build 9+181)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (9+181, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
    # Problematic frame:
    # V  [libjvm.so+0x9292f0]  JVMCIGlobals::check_jvmci_flags_are_consistent()+0x120
    #
    # Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" (or dumping to /home/jbax/dev/repository/univocity-parsers/core.15241)
    #
    # An error report file with more information is saved as:
    # /home/jbax/dev/repository/univocity-parsers/hs_err_pid15241.log
    [thread 15281 also had an error]
    #
    # Compiler replay data is saved as:
    # /home/jbax/dev/repository/univocity-parsers/replay_pid15241.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    #
    Aborted (core dumped)
    
    

    Attached the hs_err files: jdk_9_crash.zip

    It doesn't always fail with a crash. In this case the enforcer plugin errors out. Removing out the maven-enforcer-plugin in the pom.xml makes the crash happen 100% of the time.

    opened by jbax 13
  • Additional test cases for csv setCharToEscapeQuoteEscaping

    Additional test cases for csv setCharToEscapeQuoteEscaping

    Based on the documentation here, many users will set charToEscapeQuoteEscaping to \.

    However this setting does not work well sometimes. Here are new test cases: https://github.com/apache/spark/pull/17177#issuecomment-284607257

    IMHO,

    • Default setting should be charToEscapeQuoteEscaping = quoteChar
      • provided that quoteChar != quoteEscapeChar
    • Documentation should be updated

    How do you think about this?

    bug 
    opened by ep1804 13
  • Can we have a functionality so that all errors in a row can be collected and finally row is skipped?

    Can we have a functionality so that all errors in a row can be collected and finally row is skipped?

    Is there any way to collect all errors in a row and skip the row finally. I am using retryableErrorHandler which on getting DataValidationException reports error and row is skipped. I want to collect all the errors and skip the row finally. The approach I was using to achieve this is parse CSV once for collecting all errors by using setDefaultValue() and keepRecord() and second time parsing will actually give me valid records. Is there any better way to achieve this?

    waiting for more details 
    opened by rahulbagad 12
  • AutomaticConfiguration do not work with MultiBeanListProcessor

    AutomaticConfiguration do not work with MultiBeanListProcessor

    When I instantiate a FixedWidthParser with a MultiBeanListProcessor the CommonParserSettings#configureFromAnnotations(beanClass) is not called because, it is not an instance of AbstractBeanProcessor. Should not the method be called for each AbstractProcessorBean in the MultiBeanListProcessor?

    The example code:

    FixedWidthParserSettings settings = new FixedWidthParserSettings();
    settings.setAutoConfigurationEnabled(true);
    settings.setHeaderExtractionEnabled(false);
    settings.getFormat().setLineSeparator("\n");
    
    MultiBeanListProcessor processor = new MultiBeanListProcessor(FileHeader.class, ...);   // FileHeader has an @Headers and fields with @Parsed
    settings.setProcessor(processor);
    
    FixedWidthParser parser = new FixedWidthParser(settings);     // Here should call configureFromAnnotations
    
    try (Reader reader = getReader("/positional-file")) {
    
    	parser.parse(reader);   // the exception is throwed here
    			
    	} catch (IOException e) {
    		e.printStackTrace();
    	}
    
    

    The exception:

    com.univocity.parsers.common.DataProcessingException: Could not find fields [bankCode, bankName, batchCode] in input. Please enable header extraction in the parser settings in order to match field names.
    Internal state when error was thrown: line=0, column=0, record=1, charIndex=240
    	at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapFieldIndexes(BeanConversionProcessor.java:360)
    	at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapValuesToFields(BeanConversionProcessor.java:289)
    	at com.univocity.parsers.common.processor.core.BeanConversionProcessor.createBean(BeanConversionProcessor.java:457)
    	at com.univocity.parsers.common.processor.core.AbstractBeanProcessor.rowProcessed(AbstractBeanProcessor.java:51)
    	at com.univocity.parsers.common.processor.core.AbstractMultiBeanProcessor.rowProcessed(AbstractMultiBeanProcessor.java:101)
    	at com.univocity.parsers.common.Internal.process(Internal.java:21)
    	at com.univocity.parsers.common.AbstractParser.rowProcessed(AbstractParser.java:596)
    	at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:132)
    

    The value of context.headers() at BeanConversionProcessor.mapFieldIndexes is null.

    Is there any other way to use MultiBeanListProcessor with AutoConfiguration from @Headers?

    bug 
    opened by rbatista 12
  • Parsing of quoted text with quotes inside

    Parsing of quoted text with quotes inside

    I tried different parser settings for unescaped quote handling (UnescapedQuoteHandling.STOP_AT_DELIMITER or UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE), the outcome is still wrong. CsvParserSettings parserSettings = new CsvParserSettings(); parserSettings.setLineSeparatorDetectionEnabled(true); parserSettings.setHeaderExtractionEnabled(true); parserSettings.setDelimiterDetectionEnabled(true); parserSettings.setQuoteDetectionEnabled(true); parserSettings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER); parserSettings.setReadInputOnSeparateThread(true); parserSettings.setNumberOfRowsToSkip(rowsToSkip); parserSettings.trimValues(true); ColumnProcessor rowProcessor = new ColumnProcessor(); parserSettings.setProcessor(rowProcessor); CsvParser lineParser = new CsvParser(parserSettings);

    Here's the input csv file for parsing: "name"|"description"|"digit"|"other"\n "test one"|"test description with ""|"1"|"other one"\n "test two"|"test description without a quote"|"2"|"other two"\n

    Here's the output after parsing: ["name", "description", "digit", "other"]\n ["test one", "test description with "|"1", "other one"]\n ["test two", "test description without a quote", "2", "other two"]\n

    As you can see, "test description with "|"1" is grouped as one element

    duplicate 
    opened by JoyW3000 11
  • Parsing of quoted text with quotes inside

    Parsing of quoted text with quotes inside

    Hi, is it possible to parse this correclty with some settings?

    example1;"example with " inside";example3

    Or do i have to quote it?

    example1;"example with ""inside";example3

    thx for your help

    bug question 
    opened by larsniemann1983 11
  • Null value set for missing fields

    Null value set for missing fields

    I have fields in my code which are missing from the CSV i am trying to parse. In my parser settings I have set the default null value to be an empty string. However for the missing fields, the value being set is null.

    I would like to know if this is the desired behaviour or am i doing something wrong??

    bug 
    opened by raphkr 10
  • performance of csvwriter.processRecord() with conversions

    performance of csvwriter.processRecord() with conversions

    Hi I've finished an implementation where I use CsvWriter (brand new final version 2.0 of your great lib) with an OutputValueSwitch and then write out rows by providing them as a Map with the method processRecord(). One rowtype of the switch has 57 columns and 3 of them apply a TrimConversion. The Map I'm passing in has 41 entrys so even some columns remain empty. The prcoessRecord() takes approximately 1700ms for writing one row which makes it very long-running. A second rowtype of the switch has 16 columns without any conversions, the input map is nearly the same as for the former rowtype (so there are much more values than needed for this rowtype). This second rowtype takes at most one or two millicesonds. If I comment out the conversions for the first rowtype it is ultrafast as well. Therefore I think the problem might not be specific to processing a map nor the fact of using an OutputValueSwitch but just in case it matters I described the whole use case...

    If you need any further details just let me know

    enhancement invalid 
    opened by blackfoxoh 10
Releases(v2.9.1)
  • v2.9.1(Jan 18, 2021)

    BUGs fixed

    • Quote escape configured to double quote (quote value) character if escape not detected #414

    • Delimiter detection returns first candidate delimiter even if it does not exist in the file #415

    • context.getSelectedHeaders() in RowProcessor processStarted() can return invalid results #416

    • DefaultNullRead of @Parsed does not work with enums #427

    • Missing fields not initialized if nested beans present #432

    • Possible race condition #424

    • implicit limitation on max column name length? #438

    Enhancements

    • Delimiter detection returns first candidate delimiter even if it does not exist in the file #415

    • Custom CsvFormatDetector #434

    • Detects "whitespace" as delimiter instead of "comma" #420

    Source code(tar.gz)
    Source code(zip)
  • v2.9.0(Aug 15, 2020)

    BUGs fixed

    • CSV auto-detection assigning line ending as quote escape (#409)

    • FixedWidthFields.keepPadding not working (#405)

    • Multi-char delimiter incorrectly detected inside quoted string (#404)

    • Fixed the repeatable conversions initialization in the DefaultConversionProcessor (#399)

    • Fix NPE in EnumConversion (#391)

    • fixed quoted parser when using non-printable char as delimiter (#378)

    Enhancements

    • make the maxRowSample for parameter public configurable for CSV auto-detection (#408)

    • settings.excludeFields() doesn't throw errors for a non-existing fields anymore. (#383)

    • Expose InputAnalysisProcess implementations publicly (#381)

    • add "com.googlecode.openbeans" as an optional OSGi dependency (#411)

    Source code(tar.gz)
    Source code(zip)
  • v2.8.4(Dec 9, 2019)

    BUGS FIXED:

    • Value which contains line separator is NOT enclosed in quotes if line separator is 2 characters (#357)

    • Headers in Record context are changed if parser instance is reused (#350)

    • Record.getString(column) fails if type of column is declared as int and content in file is null (#355)

    ENHANCEMENTS:

    • Rows starting with comment char should be automatically quoted (#362)
    Source code(tar.gz)
    Source code(zip)
  • v2.8.3(Aug 8, 2019)

    BUGS:

    • #345 parseLine() is throwing ClassCastException when using lookahead

    • #337 Inconsistent parsing behavior when max. characters per column is set to -1 (unlimited)

    • #343 FixedWidthParser drops first char of next record if last field of current record is empty

    • #336 Single column, empty row CSV files result in empty rows and truncation of the last row

    • #328 Auto detect format with upper and lower thom is not working

    • fixed issue writing rows with selection of fields where the number of columns is larger than the selected/available fields.

    ENHANCEMENTS:

    • On CSV format auto-detection, use order of allowed delimiters as order of preference

    • Performance improvement parsing values when maxCharsPerColumn = -1 (i.e. unlimited)

    • added support for writing instances of Record directly.

    Source code(tar.gz)
    Source code(zip)
  • v2.8.2(May 14, 2019)

    BUGS:

    • Headers being extracted when not expected, and this is leading to memory leak #326

    • Unclear iteractions between iterate() methods and context/header parsing #314

    • containsColumn throws ArrayIndexOutOfBoundsException, in combination with selectFields #316

    • The result of the method getRecordNumber in TextParsingException is wrong. #324

    ENHANCEMENTS:

    • On CSV format auto-detection, resolve space as column separator #310

    • Updated default excel serialization settings to ALWAYS escape either \r or \n as \r\n. #315

    Source code(tar.gz)
    Source code(zip)
  • v2.8.1(Feb 6, 2019)

    This release fixes bug reported in issue #309 - headers parsed from one input are used again when another input is parsed using the same parser instance.

    Source code(tar.gz)
    Source code(zip)
  • v2.8.0(Feb 1, 2019)

    Enhancements:

    • Map column name to attribute #287: This is a big change. Basically you can now skip annotations altogether and manually define mappings from parsed columns to an object, including nested attributes.

    Now you can call getColumnMapper() from:

    1. any *Routines class
    2. any *Bean*Processor class (including BeanWriterProcessor) for writing
        mapper.attributeToIndex("name", 0); //name goes to the first column
        mapper.attributeToColumnName("name", "client_name"); .// same thing, but the first column has header "client_name"
    

    Nested classes are also supported, you can do stuff such as the following, where name is buried in an object structure (assume a class Purchase with a Buyer attribute, which in turn has a Contact attribute, where the name is located:

        mapper.methodToIndex("buyer.contact.name", 0); 
       
        // use methods too. This assumes Contact has a `fullName(java.lang.String)` setter method:
        mapper.methodToColumnName("buyer.contact.fullName", String.class, "client_name");
    
        // this maps getter and setter methods named `fullName` to column `client_name`. The getters or setters are used depending if you are writing to a file or reading from it.
        mapper.methodToColumnName("buyer.contact.fullName", "client_name");
    

    You can also just give it a map:

        Map<String, Integer> mappings = new HashMap<String, Integer>();
        ... fill your map
        mapper.attributesToIndexes(mappings);
    
    

    Contrived unit tests: https://github.com/uniVocity/univocity-parsers/blob/master/src/test/java/com/univocity/parsers/issues/github/Github_287.java

    Other enhancements:

    • Support CSV delimiters of more than one character #209: instead of using one character, delimiters now can be anything (e.g: ##, :-p etc).

    • Support creation of immutable objects #280: create instances of java beans with private constructors and no setter methods.

    • Enable case-sensitive column header matching #283: You can now target files with columns such as 'A', 'a', ' A ' and ' a '. Any existing code should not be affected. For example: if you use parserSettings.selectFields("a") and the parsed header is ' A ', then ' A ' will be selected as usual. This update allows one to use parserSettings.selectFields(" A ", "a") when headers such as ' A ' and 'a' are present (you can go wild here and target many different columns named aaa with different number of surrounding spaces or different character case combinations).

    • CsvParser beginParsing closes the stream #303: Introduced flag to prevent the parser from automatically closing the input: parserSettings.setAutoClosingEnabled(false)

    • Add option on FixedWidthParserSettings to keep padding of parsed values #276: the @FixedWidth annotation now has a keepPading property. This can also be set on the FixedWidthFields, using methods keepPadingOn(columns) and stripPaddingFrom(columns).

    • Introduce UnescapedQuoteHandling.BACK_TO_DELIMITER #259: if an unescaped quote is detected, the parser will re-process the parsed value and split the unescaped value into individual values after each delimiter found.

    Bugfixes:

    • An extra row with null returned #306

    • Cannot determine column separator #305

    • Wrong line ending detection when normalizeLineEndingsWithinQuotes = false #299

    • Column selection makes @Validate annotation misbehave #296

    • Fixed width parsing with look-ahead and non-contiguous field definitions #294

    • CsvParser.parse ignores headers parser setting in processStarted Processor's method #289

    Source code(tar.gz)
    Source code(zip)
  • v2.7.6(Sep 25, 2018)

    Enhancements:

    • allowing users to subclass ValidatedConversion
    • better handling of InvocationTargetExceptions when dealing with errors thrown from getters/setters

    Bugfixes:

    • custom validations weren't working when writing
    • CSV autodetection failed in some cases (issue #272)
    Source code(tar.gz)
    Source code(zip)
  • v2.7.3(Aug 2, 2018)

    Enhancements:

    • Performance improvements skipping values of de-selected fields with the CSV parser.

    • Adding support for regex validation and custom validations on class attributes and methods annotated with @Validate

    Bugfixes:

    • CsvRoutines.getInputDimension() returns one row less rowCount regardless of csvParserSettings.setHeaderExtractionEnabled() #262

    • record.toFieldMap() not working on FixedWidth #258

    Source code(tar.gz)
    Source code(zip)
  • v2.7.2(Jul 20, 2018)

    Bugfixes:

    • Problems when using two BeanProcessors with an InputValueSwitch #256

    • Column names in the header row get printed in wrong order despite index values being set correctly #255

    • Inconsistent result of parseLine on empty set of selected indexes #250

    Source code(tar.gz)
    Source code(zip)
  • v2.7.1(Jul 11, 2018)

    Fixed a couple of bugs:

    • Annotated fields won't be processed when reading rows with less elements (#253)

    • Incorrect handling of columnReorderingEnabled = false ( #254)

    Source code(tar.gz)
    Source code(zip)
  • v2.7.0(Jul 10, 2018)

    Enhancements:

    • Introduce @Validate annotation and conversion to perform basic validations #251

    Bugfixes:

    • Inconsistent result of parseLine on empty set of selected indexes #250

    • Fixed comments collecting on buffer update #252

    Source code(tar.gz)
    Source code(zip)
  • v2.6.4(Jul 10, 2018)

    Bugfixes:

    • Support Enum alternative values #240

    • better error message for more than max columns #247

    Enhancements:

    • FixedWidthWriter does not honour FixedWidthWriterSettings.setNullValue #238
    Source code(tar.gz)
    Source code(zip)
  • v2.6.3(Apr 15, 2018)

  • v2.6.2(Apr 4, 2018)

    Bug fixes

    • Concurrency issue when calling stopParsing() (#231)
    • Error deriving header names from annotated java beans when writing objects with meta-annotations.

    Enhancements

    • Implemented trim quoted values support (#230)
    Source code(tar.gz)
    Source code(zip)
  • v2.6.1(Mar 15, 2018)

    Issue #228 revealed a regression bug on the CSV parser which was introduced in version 2.6.0

    The last column of a row would be ignored if it's empty and preceded by a quoted value, e.g.

    A row such as "A", would be parsed as [A] instead of [A, null]

    This new release fixes the bug.

    Source code(tar.gz)
    Source code(zip)
  • v2.6.0(Feb 27, 2018)

    Enhacements:

    • CSV parser now parses quoted values ~30% faster

    • CSV format detection process has option provide a list of possible delimiters, in order of priority ( i.e. settings.detectFormatAutomatically( '-', '.');) (#214 )

    • CSV writer allows selecting columns that should always have quotes (i.e. writerSettings.quoteFields("C", "D"); or writerSettings.quoteIndexes(0, 1, 4, 5);) (#191)

    • Introduced support for selecting and aggregating values in lists of java beans when inputs have repeated header names (#188)

    • Context class has two new methods for facilitating the usage of Record inside RowProcessor (#211):

    • Adjusting method signatures of AbstractWriter to properly handle writing rows based on collections of objects of any type (from a StackOverflow complaint).

    public Record toRecord(String[] row);
    
    public RecordMetaData recordMetaData();
    

    Bugfixes:

    • NullPointer when stopping parser when nothing is parsed (#219)

    • Reusing CsvRoutines object results in unexpected output (#224)

    Source code(tar.gz)
    Source code(zip)
  • v2.5.9(Nov 21, 2017)

    Bugfixes

    • #205 OOM on skip lines. AbstractCharInputReader.skipLines

    • #212 Bad results if charset is null in beginParsing

    • #203 CsvRoutines generating empty output with keepResourcesOpen

    • Fixed incorrect behavior processing comment lines at the end of the input, when no line ending is present after the commented out line.

    ENHANCEMENTS

    • Introduced the CompositeProcessor based on question raised in ticket #206

    • Adjustment on CsvFormatDetector to improve chances of detecting delimiters in small inputs.

    • Added expectedRowCount parameter to all methods that produce lists of rows, records or beans to prevent slow reallocation operations inside the resulting ArrayList. Useful when processing large inputs.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.8(Oct 16, 2017)

    • Trailing spaces in csv line leads to null instead of empty String (#196)
    • On CSV, keepQuotes fail to print out closing quote if it's followed by a whitespace and a line ending (#197)
    • CsvParser does not properly detect delimiter (with setDelimiterDetectionEnabled) (#198)
    • Routines' keepResourcesOpen not respected with writeAll(list of objects) (#201)
    Source code(tar.gz)
    Source code(zip)
  • v2.5.7(Oct 9, 2017)

    This version was released to bury the intermittent results produced by the parser when processing files and input streams without an explicit character encoding while readInputOnSeparateThread=true as reported on issue #194

    This problem appeared on version 2.5.0 onward and has been plaguing users until version 2.5.6, and is finally gone for good.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.6(Sep 22, 2017)

    This is the first Java 9 compatible version

    DO NOT use any previous release with Java 9 as it will blow up.

    • string.getChars used to throw ArrayIndexOutOfBoundsException (tested on JDK 6-8) but now it throws StringIndexOutOfBoundsException. This version was updated to catch IndexOutOfBoundsException instead so it can work with all JDKs (tested from 6 to 9).

    • SimpleDateFormat - Changes in the locales "broke" previously working date formats. If the default locale is "en_AU" then the following won't work with JDK 9: new SimpleDateFormat("yyyy-MMM-dd").parse("2015-DEC-25")

    Classes with annotations for dates now allow users to provide a locale. For example:

        @Parsed
        @Format(formats = "dd-MMM-yyyy", options = "locale=en_US_WIN")
        private Date myDate;
    
        //if you are in Australia, MMM will translate to the abbreviated format followed by period, 
        //i.e. "October"  becomes "Oct.". The formatter won't work to parse dates such as "2015-Oct-20"
        //Use "locale=en" to make it behave as it always did in Java 8 or earlier.
        @Parsed
        @Format(formats = "dd-MMM-yyyy", options = "locale=en")
        private Date australianDay;
    

    Also fixed a bug on the TrimConversion introduced in version 2.5.5 - it would break processing blank or empty strings.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.5(Sep 8, 2017)

    Fixed concurrency issue processing File or InputStream with no explicit character encoding provided, when setReadInputOnSeparateThread=true (the default). The issue could make the parser discard the first character of the input.

    More details in: https://github.com/uniVocity/univocity-parsers/issues/186 and https://github.com/uniVocity/univocity-parsers/issues/187

    ** Other adjustments **

    • Improved CSV format detection

    • Fixed TrimConversion to remove trailing whitespaces when input is truncated to a maximum length.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.4(Sep 1, 2017)

    BUGFIXES

    Invalid values being produced when column selection is defined, reordering is disabled and number of headers is less than number of selected columns. (https://github.com/uniVocity/univocity-parsers/issues/183)

    ParsingContext of *Routines returns information of the following row instead of the current (https://github.com/uniVocity/univocity-parsers/issues/184)

    currentParsedContent returns a first-char-repeated string from ByteArrayInputStream with setReadInputOnSeparateThread disabled (https://github.com/uniVocity/univocity-parsers/issues/185)

    Fixed issues handling values with unescaped quotes at the end of the input, and in combination with the keepQuotes flag - commit.

    ENHANCEMENTS

    Allowing the same column to feed data into multiple fields of a java bean and its nested fields. Validation kicks in only if writing.

    Source code(tar.gz)
    Source code(zip)
  • v2.5.3(Aug 17, 2017)

    Bug fixes

    • CsvParserSettings.setDelimiterDetectionEnabled() not working with input file/stream when the character encoding is not explicitly provided. (https://github.com/uniVocity/univocity-parsers/issues/178)

    • CSV format autodetection thrown off by fields with single or double quote in the middle of the value (also in https://github.com/uniVocity/univocity-parsers/issues/178)

    • Fixed-width header extraction fails (https://github.com/uniVocity/univocity-parsers/issues/182)

    • Failure to extract header row with FixedWidthParser + annotations (https://github.com/uniVocity/univocity-parsers/issues/180)

    Enhancements

    • Custom conversion classes can now be defined without a String[] args constructor (https://github.com/uniVocity/univocity-parsers/issues/181) if no parameters are used.
    Source code(tar.gz)
    Source code(zip)
  • v2.5.2(Aug 9, 2017)

    Fixed bug in CSV parser - line ending following escaped quote was handled was end of the record (Issue https://github.com/uniVocity/univocity-parsers/issues/177)

    Source code(tar.gz)
    Source code(zip)
  • v2.5.1(Jul 29, 2017)

    The introduction of support for BOM marker handling created an unwanted side effect when the user does not provide the character encoding of the file being processed: performance is reduced significantly

    This version fixes the problem and the performance should be equivalent of previous versions, regardless of the user providing a character encoding explicitly, or the input having a BOM marker.

    Relevant issue: https://github.com/uniVocity/univocity-parsers/issues/176

    Source code(tar.gz)
    Source code(zip)
  • v2.5.0(Jul 24, 2017)

    ENHANCEMENTS

    • Added methods to the parser that return Iterable<String[]> or Iterable(https://github.com/uniVocity/univocity-parsers/issues/151)

    Rows can now be iterated like this:

    	for (String[] row : parser.iterate(input, "UTF-8")) {
    		//do stuff
    	}
    

    And records:

    	for (Record row : parser.iterateRecords(new FileInputStream(input), "UTF-8")) {
    		//do stuff
    	}
    
    • Support multiple header names in field attribute of @Parsed annotation (https://github.com/uniVocity/univocity-parsers/issues/150)

    You can now use the same POJO to parse different files with different headers.

    • Adding a HeaderTransformer to the @Nested annotation (https://github.com/uniVocity/univocity-parsers/issues/159)

    This allows one to nest the same class inside a parent multiple times, by changing the names assigned to the nested properties. Example for a car with 4 wheels:

    	public static class Wheel {
    		@Parsed
    		String brand;
    
    		@Parsed
    		int miles;
    	}
    
    public static class Car {
    		@Nested
    		Wheel frontLeft;
    
    		@Nested
    		Wheel frontRight;
    
    		@Nested
    		Wheel rearLeft;
    
    		@Nested
    		Wheel rearRight;
    }
    

    To parse an input with organized like this: frontLeftWheelBrand,frontLeftWheelMiles,frontRightWheelBrand,frontRightWheelMiles,rearLeftWheelBrand,rearLeftWheelMiles,rearRightWheelBrand,rearRightWheelMiles, we need to apply a transformation to the property names of each Wheel instance. This can be done with something like this:

    	public static class NameTransformer extends HeaderTransformer {
    
    		private String prefix;
    
    		public NameTransformer(String... args) {
    			prefix = args[0];
    		}
    
    		@Override
    		public String transformName(Field field, String name) {
    			return prefix + Character.toUpperCase(name.charAt(0)) + name.substring(1);
    		}
    	}
    

    Now we can change the Nested annotations to be:

    public static class Car {
    		@Nested(headerTransformer = NameTransformer.class, args = "frontLeftWheel")
    		Wheel frontLeft;
    
    		@Nested(headerTransformer = NameTransformer.class, args = "frontRightWheel")
    		Wheel frontRight;
    
    		@Nested(headerTransformer = NameTransformer.class, args = "rearLeftWheel")
    		Wheel rearLeft;
    
    		@Nested(headerTransformer = NameTransformer.class, args = "rearRightWheel")
    		Wheel rearRight;
    }
    

    And parse the input with:

    
    	List<Car> cars = new CsvRoutines(settings).parseAll(Car.class, input);
    
    
    • added support for annotations in methods (https://github.com/uniVocity/univocity-parsers/issues/160)

    @Parsed or @Nested annotations now work on individual methods. Modifying the Car example above to use a List<Wheel>, one can do the followng:

    	public static class Car {
    		List<Wheel> wheels = new ArrayList<Wheel>();
    		
    		@Nested(headerTransformer = NameTransformer.class, args = "frontLeftWheel")
    		private void setWheel1(Wheel wheel) { //bad method name created on purpose. Also private.
    			wheels.add(wheel);
    		}
    
    		@Nested(headerTransformer = NameTransformer.class, args = "frontRightWheel")
    		private void setWheel2(Wheel wheel) { //bad method name created on purpose. Also private.
    			wheels.add(wheel);
    		}
    
    		@Nested(headerTransformer = NameTransformer.class, args = "rearLeftWheel")
    		private void setWheel3(Wheel wheel) { //bad method name created on purpose. Also private.
    			wheels.add(wheel);
    		}
    
    		@Nested(headerTransformer = NameTransformer.class, args = "rearRightWheel")
    		private void setWheel4(Wheel wheel) { //bad method name created on purpose. Also private.
    			wheels.add(wheel);
    		}
    	}
    
    • added support and automatic handling of files with a BOM marker (https://github.com/uniVocity/univocity-parsers/issues/154)

    • Introduced properties from and to to the @FixedWidth annotation, along with FixedWidthFields.addField(int startPosition, int endPosition) to allow declaring field ranges instead of fixed lengths (https://github.com/uniVocity/univocity-parsers/issues/166)

    • enabled column reordering in CSV, TSV and Fixed-Width writers. This allows users to select which columns of their input should be written and prevents generating empty columns. Just use selectFields with setColumnReorderingEnabled(true) (https://github.com/uniVocity/univocity-parsers/issues/167)

    • added a keepResourcesOpen property in CsvRoutines, TsvRoutines and FixedWidthRoutines to control whether or not to close the Writer after writing data to the output. Also controls whether the ResultSet will closed when dumping data from it. (https://github.com/uniVocity/univocity-parsers/issues/172)

    BUGS FIXED

    • CSV format auto-detection thrown off by seemingly regular CSV (https://github.com/uniVocity/univocity-parsers/issues/161)

    • Can't provide custom headers when dumping a ResultSet (https://github.com/uniVocity/univocity-parsers/issues/157)

    • ParsingContext.currentParsedContent returns null when input is single length string (https://github.com/uniVocity/univocity-parsers/issues/165)

    • NullValue and EmptyValue are both applied (https://github.com/uniVocity/univocity-parsers/issues/158)

    • NumberFormatException when using record.getDate("field") in a concurrent environment (https://github.com/uniVocity/univocity-parsers/issues/156)

    • AutomaticConfiguration does not work with MultiBeanListProcessor (https://github.com/uniVocity/univocity-parsers/issues/149)

    Source code(tar.gz)
    Source code(zip)
  • v2.4.1(Mar 20, 2017)

    This release includes:

    • a fix for concurrency issues processing annotations: https://github.com/uniVocity/univocity-parsers/issues/146

    • internal adjustments for exception handling and generation of error messages.

    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Mar 14, 2017)

    Enhancements

    • Added support for nested objects with the newly introduced @Nested annotation: https://github.com/uniVocity/univocity-parsers/issues/139
    • CsvRoutines and other routine classes provide a ParsingContext object: https://github.com/uniVocity/univocity-parsers/issues/136

    Bugfixes:

    • Fixed incorrect escape of CSV when writing a quote escape character with setQuoteEscapingEnabled=true before writing any other character that would prompt the writer to enclose the field in quotes: https://github.com/uniVocity/univocity-parsers/issues/143
    • Fixed handling of unquoted values where two consecutive unescaped quotes appeared in the input: https://github.com/uniVocity/univocity-parsers/issues/143
    • IndexOutOfBoundsException when processing index-based annotations: https://github.com/uniVocity/univocity-parsers/commit/33a099c57a61d14d2fb2a58c22cd84806e9d27c8
    • Fixed width writing from ResultSet should respect user provided field length instead of using the resultset length - https://github.com/uniVocity/univocity-parsers/issues/135
    • Parser did not throw error in case input fails to be read (e.g. in case of socket timeouts, etc) and would just silently stop. https://github.com/uniVocity/univocity-parsers/issues/140
    Source code(tar.gz)
    Source code(zip)
  • v2.3.1(Feb 5, 2017)

    This release includes a couple of bug fixes:

    • setColumnReorderingEnabled(false) not working when parsing single line: https://github.com/uniVocity/univocity-parsers/issues/131
    • index property of @Parsed annotation not handled correctly: https://github.com/uniVocity/univocity-parsers/issues/132

    ...and some internal refactoring that should not affect anyone.

    Source code(tar.gz)
    Source code(zip)
Owner
univocity
univocity
High performance CSV reader and writer for Java.

FastCSV ?? FastCSV 2.0 upgrade has landed with major improvements on performance and usability! FastCSV is an ultra-fast and dependency-free RFC 4180

Oliver Siegmar 411 Dec 22, 2022
Classpy is a GUI tool for investigating Java class file, Lua binary chunk, Wasm binary code, and other binary file formats.

Classpy Classpy is a GUI tool for investigating Java class file, Lua binary chunk, Wasm binary code, and other binary file formats. Inspiration This t

null 1k Dec 17, 2022
Ninja is a full stack web framework for Java. Rock solid, fast and super productive.

_______ .___ _______ ____. _____ \ \ | |\ \ | | / _ \ / | \| |/ | \ | |/ /_\ \ / | \

Ninja Web Framework 1.9k Jan 5, 2023
Rapidoid - Extremely Fast, Simple and Powerful Java Web Framework and HTTP Server!

Rapidoid - Simple. Powerful. Secure. Fast! Rapidoid is an extremely fast HTTP server and modern Java web framework / application container, with a str

null 1.6k Dec 30, 2022
A sidecar to run alongside Trino to gather metrics using the JMX connector and expose them in different formats using Apache velocity

Overview A sidecar to run alongside Trino to gather metrics using the JMX connector and expose them in different formats using Apache Velocity. Click

BlueCat Engineering 4 Nov 18, 2021
HopLa Burp Suite Extender plugin - Adds autocompletion support and useful payloads in Burp Suite

HopLa ?? All the power of PayloadsAllTheThings, without the overhead. This extension adds autocompletion support and useful payloads in Burp Suite to

Synacktiv 522 Dec 24, 2022
Apache POI - A Java library for reading and writing Microsoft Office binary and OOXML file formats.

Apache POI A Java library for reading and writing Microsoft Office binary and OOXML file formats. The Apache POI Project's mission is to create and ma

The Apache Software Foundation 1.5k Jan 1, 2023
The reliable, generic, fast and flexible logging framework for Java.

About logback Thank you for your interest in logback, the reliable, generic, fast and flexible logging library for Java. The Logback documentation can

QOS.CH Sarl 2.6k Jan 7, 2023
Ethylene is a open-source, lightweight, general-purpose compatibility layer standing between the developer and the chaotic world of configuration file formats.

Ethylene Ethylene is a open-source, lightweight, general-purpose compatibility layer standing between the developer and the chaotic world of configura

Steank 7 Aug 9, 2022
A strongly consistent distributed transaction framework

A strongly consistent distributed transaction framework

dromara 1.9k Jan 3, 2023
A fast and reliable Java micro-library which chooses the sorting algorithm that best fits your needs and sorts the parameter.

A fast and reliable Java micro-library which chooses the sorting algorithm that best fits your needs and sorts the parameter.

Simone Nicol 2 Feb 19, 2022
Juerr is a Java port of the uerr crate, it provides stunning visual error handling.

Juerr Juerr is a Java port of the uerr crate, it provides stunning visual error handling. Showcase Using the code below, we can display a simple error

IkeVoodoo 3 Jul 17, 2022
UMS is a CRUD based management system which uses File Handling to manipulate data and perform the CRUD operations

UMS is a CRUD (Create, Read, Update, Delete) based management system which uses File Handling to manipulate data and perform the CRUD operations. It is a group project made using Java procedural programming having both User and Admin sides.

Daoud-Hussain 9 Dec 20, 2022
Find different kinds of algorithm questions and solutions in different software languages

Algorithm Samples ⭐ Find this project useful? If you think it has helped you, you can star this repo and join the Stargazers and motivate us to share

Serkan Alc 37 Oct 31, 2022
Fast and reliable message broker built on top of Kafka.

Hermes Hermes is an asynchronous message broker built on top of Kafka. We provide reliable, fault tolerant REST interface for message publishing and a

Allegro Tech 742 Jan 3, 2023
EJE provides accessible methods for handling events/actions/listeners

Easy-Java-Events EJE provides accessible methods for handling events/actions/listeners. Add this as dependency to your project via Maven/Gradle/Sbt/Le

Osiris-Team 4 Aug 23, 2022
Fast, Reliable & Simple Practice Core

Twilight An open-sourced simple Practice Plugin made for the Spigot API (Unfinished) Contact You can contact me on discord via my tag or server: My di

Luca 7 Sep 19, 2022
A complete 3D game development suite written purely in Java.

jMonkeyEngine jMonkeyEngine is a 3-D game engine for adventurous Java developers. It’s open-source, cross-platform, and cutting-edge. 3.2.4 is the lat

jMonkeyEngine 3.3k Dec 31, 2022