sesseltjonna-csv is a high-performance CSV library with developer-friendly configuration options.
Projects using this library will benefit from:
For databinding, a very specific parser is generated per unique CSV file header, which yields extremely fast processing while allowing for per-field customizations.
The library also hosts ‘traditional’ CSV parsers (statically typed) for those wanting to work directly on String arrays.
The primary use-case for this library is large csv files with more than 1000 lines where the CSV file format is known and reasonable stable.
Bugs, feature suggestions and help requests can be filed with the issue-tracker.
The project is implemented in Java and built using Maven. The project is available on the central Maven repository.
or
Use the builder to configure your parser.
CsvMapper<Trip> mapper = CsvMapper.builder(Trip.class)
.stringField("route_id")
.quoted()
.optional()
.stringField("service_id")
.required()
.build();
where each field must be either required
or optional
. The necessary Trip
setters will be deducted from the field name (see further down for customization).
Then create a CsvReader
using
Reader reader = ...; // your input
CsvReader<Trip> csvReader = mapper.create(reader);
and parse untill null
using
do {
Trip trip = csvReader.next();
if(trip == null) {
break;
}
// your code here
} while(true);
To run some custom logic before applying values, add your own consumer
:
CsvMapper<City> mapping = CsvMapper.builder(City.class)
.longField("Population")
.consumer((city, n) -> city.setPopulation(n * 1000))
.optional()
.build();
or with custom (explicit) setters:
CsvMapper<Trip> mapper = CsvMapper.builder(Trip.class)
.stringField("route_id")
.setter(Trip::setRouteId)
.quoted()
.optional()
.stringField("service_id")
.setter(Trip::setServiceId)
.required()
.build();
The library supports an intermediate processor
for handling complex references. In other words when a column value maps to a child or parent object, it can be resolved at parse or post-processing time. For example by resolving a Country
when parsing a City
using an instance of MyCountryLookup
- first the mapper:
CsvMapper2<City, MyCountryLookup> mapping = CsvMapper2.builder(City.class, MyCountryLookup.class)
.longField("Country")
.consumer((city, lookup, country) -> city.setCountry(lookup.getCountry(country))
.optional()
.build();
Then supply an instance of of the intermediate processor
when creating the CsvRader
:
MyCountryLookup lookup = ...;
CsvReader<City> csvReader = mapper.create(reader, lookup);
Using this feature can be essential when parsing multiple CSV files in parallel, or even fragments of the same file in parallel, with entities referencing each other, storing the values in intermediate processors and resolving references as a post-processing step.
Create a CsvReader<String[]>
using
Reader input = ...; // your input
CsvReader<String[]> csvReader = StringArrayCsvReader.builder().build(input);
String[] next;
do {
next = csvReader.next();
if(next == null) {
break;
}
// your code here
} while(true);
Note that the String-array itself is reused between lines. Note that the column indexes can be rearranged by using the builder withColumnMapping(..)
methods, which should be useful when doing your own (efficient) hand-coded databinding.
The dynamically generated instances are extremely fast (i.e. as good as a parser tailored very specifically to the file being parsed), but note that the assumption is that the number of different CSV files for a given application or format is limited, so that parsing effectively is performed by a JIT-compiled class and not by a newly generated class for each file.
To maximize performance (like response time) it is always necessary to pre-warm the JVM regardless of the underlying implementation.
JMH benchmark results.
If the parser runs alone on a multicore system, the ParallelReader from the SimpleFlatMapper might further improve performance by approximately 50%.
Performance note for single-shot scenarios and CsvMapper
: If a custom setter is specified, the library will invoke it to determine the underlying method invokation using ByteBuddy
, so some additional classloading will take place.
The following rules / restrictions apply, mostly for keeping in sync with RFC-4180:
Also note that
See the project gtfs-databinding for a full working example.
If you have any questions, comments or feature requests, please open an issue.
Contributions are welcome, especially those with unit tests ;)