Practical Java Performance: Date Formatting Optimization
Table of Contents
- Introduction
- JMH: Ultimate Java Benchmarking Tool
- The Formatters Used by ATSD
- New at the Zoo: ATSD
DatetimeProcessor
- Supported Patterns
- Slides
- References
Introduction
One does not simply measure JVM code performance. A performance engineer must consider numerous variables which affect the resulting data. Several of these are enumerated below:
- Interpreter and compiled modes (implementation details).
- Preemptive compiler optimizations based on collected profiling information.
- Other optimizations such as: constant folding, loop unrolling, dead code elimination.
JMH: The Ultimate Java Benchmarking Tool
Java Microbenchmarking Harness (JMH) is a Java harness for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM.
To use this tool, create a maven
project from the JMH archetype.
mvn archetype:generate \
-DinteractiveMode=false \
-DarchetypeGroupId=org.openjdk.jmh \
-DarchetypeArtifactId=jmh-java-benchmark-archetype \
-DgroupId=com.github.raipc \
-DartifactId=benchmark \
-Dversion=1.0
A sample JMH benchmark is shown below:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class CharIsDigitBenchmark {
@Param(value = {"+7(955)123-45-67", "79551234567"} )
private String input;
@Benchmark
public String writeDigitsCharacter() {
final int length = input.length();
final StringBuilder sb = new StringBuilder(length);
for (int i = 0; i < length; i++) {
char ch = input.charAt(i);
if (Character.isDigit(ch)) {
sb.append(ch);
}
}
return sb.toString();
}
@Benchmark
public String writeDigitsString() {
final String digits = "0123456789";
final int length = input.length();
final StringBuilder sb = new StringBuilder(length);
for (int i = 0; i < length; i++) {
String ch = input.substring(i, i+1);
if (digits.contains(ch)) {
sb.append(ch);
}
}
return sb.toString();
}
}
Each instrumented method is annotated with @Benchmark
annotation.
The @State
annotation marks the class that contains the benchmark state.
This annotation defines either the same class as the instrumented methods, or a separate class, in which case the state object is provided to benchmark method as a method parameter.
The @Param
annotation marks parameterized fields. The initialization values can be provided
via annotation parameters or by using command line parameter -p {param-name}={param-value}
.
Understand the principles of writing an informative JMH benchmark by studying the provided Examples.
Date Formatters Performance Comparison
Review date formatter benchmarks on GitHub.
To record benchmarks, build and run the project.
mvn clean package
java -jar target/benchmarks.jar
Benchmark Mode Cnt Score Error Units
parseCustomCurrentATSD sample 3722617 504.680 ± 8.328 ns/op
parseJoda sample 2381165 934.201 ± 15.333 ns/op
parseOptimized sample 2845261 142.150 ± 4.712 ns/op
printJodaIso8601 sample 3662844 509.362 ± 8.079 ns/op
printWithCustomPrinterIsoOpt sample 2369326 271.344 ± 10.401 ns/op
The Formatters Used by ATSD
ATSD uses different date parsing and formatting libraries to cover all use cases. The following table clarifies which date formatting libraries are used by each ATSD subsystem.
ATSD Subsystem | Date Formatter |
---|---|
Data API | joda.time to parse, custom formatter to print. |
SQL | Apache Commons |
Rule Engine | joda.time |
Forecasts | SimpleDateFormat |
CSV Parser | SimpleDateFormat |
User Interface | SimpleDateFormat |
Each formatter supports different patterns, hence all of them need to be documented, with the differences among them emphasized.
DatetimeProcessor
ATSD The introduction of a new date formatter is to improve overall maintainability by reducing the number of supported libraries, date patterns, documentation notes. After analyzing common use cases, the following API was created.
public interface DatetimeProcessor {
long parseMillis(String datetime);
long parseMillis(String datetime, @NotNull ZoneId zoneId);
ZonedDateTime parse(String datetime);
ZonedDateTime parse(String datetime, @NotNull ZoneId zoneId);
String print(long timestamp);
String print(long timestamp, @NotNull ZoneId zoneId);
}
DatetimeProcessor API
use cases:
DatetimeProcessor fmt = DateTimeFormatterManager.createFormatter(pattern); // for multiple usage of custom format
String datetime = TimeUtils.formatDateTime(millis, pattern, zoneId); // for single usage of custom format
String datetime = TimeUtils.formatDateTime(millis, pattern);
long timestamp = TimeUtils.parseMillis(timestamp, pattern);
long timestamp = TimeUtils.parseWithDefaultParser(datetime); // for tests or best-effort parsing
long timestamp = parseISO8601(String date);
String datetime = formatISO8601(long time);
String datetime = formatISO8601millis(long time);
String datetime = formatLocalNoTimezone(long time);
String datetime = formatLocalMillisNoTz(long time);
Supported Patterns
DatetimeProcessor
supports Java 8 DateTimeFormatter
patterns with several differences:
- No need to escape
T
literal. u
pattern is translated toccccc
, day of week starting from Monday.Z
pattern is translated toXX
,RFC822
offset,Z
forzulu
.ZZ
pattern is translated toXXX
,ISO8601
offset,Z
forzulu
.ZZZ
pattern is translated toVV
, Zone ID.
Implementation
DatetimeProcessor
interface is implemented by three classes.
DatetimeProcessorIso8601
and DatetimeProcessorLocal
represent highly optimized date processors for
ISO pattern yyyy-MM-ddTHH:mm:ss[.SSSSSSSSS][Z]
and local time pattern yyyy-MM-dd HH:mm:ss[.SSSSSSSSS][Z]
.
DatetimeProcessorCustom
is a wrapper for java.time.format.DateTimeFormatter
objects. Formatting strictly delegates to default implementation. The parsing operation is performed in semi-manually: After the resolving step, which includes validating, combining and simplifying the various fields into more useful ones, is performed the date object is
constructed manually by providing default field values if needed.
By design, DatetimeProcessor
objects must not be constructed manually. Instead, use the DateTimeFormatterManager.createFormatter(pattern)
factory method.
This method is responsible for a number of tasks:
- Attempt to acquire the
DatetimeProcessor
for the specifiedpattern
from cache. - In case of cache-miss, normalize the pattern.
- Insert the most appropriate
DatetimeProcessor
implementation to cache.
Caching
Caching date formatters is not an innovative idea: Previous libraries used this approach under the hood.
joda.time
cached formatters usingConcurrentHashMap
limited by 5000 items.Apache Commons
used unlimitedConcurrentHashMap
cache.DatetimeProcessor
objects are cached in the managed LRU cachedateTimeFormatters
limited bycache.formatters.maximum.size
property (defaults to100
) which is cleared on demand with the Settings > Cache Management form.
DatetimeProcessor
caching method advantages:
- Defense from cache pollution.
- Cache replacement policy (LRU) demonstrates higher throughput in worst scenarios, which is when many formatters are used.
- Size is controlled by the user.
Breaking Good
Here is an example of cache pollution attack:
SELECT *, date_format(time, CONCAT('''time: ''', 'yyyy-MM-dd HH:mm:ss', ''', value: ', value, '''')) AS "time_and_value"
FROM "mpstat.cpu_busy"
LIMIT 500000
The above query only affects date formatting with a dynamic pattern, when DatetimeProcessor
is returned by DateTimeFormatterManager
. This does not affect the performance of date formatting in Data API or other subsystems.
A better query is shown here:
SELECT *, CONCAT('time: ', date_format(time, 'yyyy-MM-dd HH:mm:ss'), ', value: ', value) AS "time_and_value"
FROM "mpstat.cpu_busy"
LIMIT 500000
Performance
Some performance considerations:
- Use
JSR-310
ZoneOffset
instead ofTimeZone
to parse zone offsets which offers freeRFC822
offsets support. - Manipulate datetime units using
OffsetDateTime
instead ofCalendar
. - Optimize
parseInt
function with limited characters support. - Implement
sizeInDigits
function using divide-and-conquer approach. - Use JVM intrinsics if possible.
Slides
References
Aleksey Shipilev's Talk about Java Benchmarking
Tagir Valeev's Talk about JIT Optimizations
SimpleDateFormat
Pattern Reference