Skip to the content.

xml-log-filter

High-performance filtering of to-be-logged XML. Reads, filters, formats and writes XML in a single step - drastically increasing throughput. Typical use-cases

In a typical bare-bones system, this could translate to something like 5-10% overall performance improvement.

Features:

The processors have all been validated to handle valid documents using the latest W3C XML test suite.

Bugs, feature suggestions and help requests can be filed with the issue-tracker.

License

Apache 2.0

Obtain

The project is built with Maven and is available on the central Maven repository.

<dependency>
    <groupId>com.github.skjolber.xml-log-filter</groupId>
    <artifactId>xml-log-filter-core</artifactId>
    <version>1.0.8</version>
</dependency>

Usage

See individual sub-modules for detailed usage instructions and examples.

Max CDATA node sizes

Configuring

factory.setMaxTextNodeLength(1024);
factory.setMaxCDATANodeLength(1024);

yields output like (at a smaller max length)

<parent>
    <child><![CDATA[QUJDREVGR0hJSktMTU5PUFFSU1...[TRUNCATED BY 46]]]></child>
</parent>

for CDATA and correspondingly for text nodes.

Anonymizing attributes and/or elements

Configuring

factory.setAnonymizeFilters(new String[]{"/parent/child"}); // multiple paths supported

results in

<parent>
    <child>[*****]</child>
</parent>

See below for supported XPath syntax.

Removing subtrees

Configuring

factory.setPruneFilters(new String[]{"/parent/child"}); // multiple paths supported

results in

<parent>
    <child><!-- [SUBTREE REMOVED] --></child>
</parent>

See below for supported XPath syntax.

XPath expressions

A minor subset of the XPath syntax is supported. However multiple expressions can be used at once. Namespace prefixes in the XML are simply ignored, only local names at used to determine a match. Expressions are case-sensitive.

Anonymize

Supported syntax:

/my/xml/element
/my/xml/@attribute

with support for wildcards;

/my/xml/*
/my/xml/@*

or a simple any-level element search

//myElement

which cannot target attributes.

Prune

Supported syntax:

/my/xml/element

with support for wildcards;

/my/xml/*

or a simple any-level element search

//myElement

Performance

The processors within this project are much faster than stock processors. This is expected as parser/serializer features have been traded for performance.

The project has DOM- and StAX-based equivalents for feature and performance comparison. Depending on the implementation, benchmarks show throughput is approximately 5x-10x compared to stock processors.

Memory use will be approximately two times the XML string size.

See this visualization and the JMH module for running detailed benchmarks.

Background

The project is intended as a complimentary tool for use alongside XML frameworks, such as SOAP- or XML-based REST stacks. Its primary use-case is processing to-be logged XML. The project relies on the fact that such frameworks have very good error handling, like schema validation, to apply a simplified view of the XML syntax, basically handling only the happy-case of a well-formed document. The frameworks themselves detect invalid documents and handle them as raw content.

See also

See projects

History