In-memory cache with SQL-query capability

Darr Mirr
6 min readOct 19, 2021

Introduction

At this article we are going to take a look at TweeCache Java library.
TweeCache provides high-performance caching library for Java and capability to execute SQL-query against data stored in cache.

Get Stated

It is required to execute two precondition steps before to start adding dependency to project by your favorite build tool.

Preconditions

  1. Download improved version of Apache Calcite jar and pom files from project repository
  2. Download TweeCache jar and pom files from project release page

Maven

Follow to guide to installing 3rd party JARs in order to put Apache Calcite and TweeCache to maven local repository.

The last step is to add dependency to your project:

Maven dependency declaration

Gradle

Gradle has several ways to add dependency to project:

From local maven repository

Gradle supports maven local repository. Follow to guide to installing 3rd party JARs in order to put Apache Calcite and TweeCache to maven local repository.

The last step is to add dependency to your project:

Gradle dependency declaration (mavenLocal repository)

From local directory on your computer

Download jar file with dependencies (-all suffix at file name) from project release page and put it at some directory, for example, at ${project.root.dir}/libs.

The last step is to add dependency to your project:

Gradle dependency declaration (flatDir repository)

Create Cache Schema

Usually, Cache represents a key/value storage. Methods to opetate over data in cache are similar to java.util.Map ones. But SQL is tightly coupled with relation database managment system (RDBMS) that requires fixed table schema. Therefore, TweeCache internally creates schema where table storage is cache object and table columns are fields defined at cache value class.

TweeCache schema structure

TweeCache has special builder class dedicated to help construct TweeCache instance. Let’s look how to create TweeCeache object:

TweeCache object creation

Cache name must be unique due to TweeCache compiles at runtime schema class with provided cache name. Method withSchema represents builder to configure cache schema. You should add at least one table with table storage to new schema. Another words TweeCache could store multiple cache instances with different or the same data structures.

TweeCache internally uses Caffeine cache. You are free to configure Caffeine cache instance according to your needs. Please, see official Caffeine documentation.

You should invoke build method after cache schema is configured. It returts java.util.Optional<TweeCache> due to schema creation is complicated process. TweeCache intentionaly does not throw any exception. It logs any exceptions at error level and returns Optional.EMPTY in such case.

Populating Cache Scheme

TweeCache is empty just it has been created. So it is time to fill cache data. The simplest way to do that is use method put :

Populate TweeCache using by put method

TweeCache provides the same operations over cache instance as Caffeine one. The main difference is TweeCache contains multiple cache instances. Therefore, you have to provide information about table name. You have two options to do that:

  • implicitly using value’s class (as shown at code snippet above)
  • explicitly providing table name (as shown at code snippet below)
Populate TweeCache using by put method (explicit provide table name)

Point to notice
TweeCache schema does not store cache key. Therefore, it is recommend to store key as value’s field.

Retrieving data from Cache

TweeCache provides several options to retrieve data from Cache:

  • get value by id
  • get all values
  • get value(s) by SQL-query

Here is code example how to getById and getAllvalues:

Get value by its id and get all values at table

Getting value by its id and get all values are the same operations over cache instance as Caffeine one. The main difference is TweeCache contains multiple cache instances. Therefore, you have to provide information about table name. It is the similar toput and putAll methods.

Get data by SQL-query

TweeCache provides capability to execute SQL-query againts data stored in cache. You can query data at one cache instance or multiple cache instances using join sql statement.

Execute SQL-query against data at cache schema

Method query has two mandatory arguments:

  • SQL-query string
  • java.util.function.Function row mapper

And one optional:

  • java.util.Map SQL-parameters

TweeCache support named SQL-parameters.

Point to notice
It is more efficient to get value by its id using get method rather than using query one.

TweeCache internally uses Apache Calcite to execute SQL-query against data at cache schema. Apache Calcite validate SQL-query, build query plan and execute query against supplied data.

SQL-query validator

Apache Calcite has own SQL-query validator. There are some tips to write SQL queries:

  • always use cache schema name before table name
  • always use aliases for table name
  • always use aliases before table column name

SQL functions

TweeCache provides capability to define custom non-aggregate SQL functions. It can be useful for some cases. Let’s see one of them.

Let’s assume it is required to split comma separated string to array:

User-defined `string_to_array_int` function usage example

How to add SQL function? Here is steps to do that:

  1. Create public Java class
  2. Create public static method
  3. Make function implementation
  4. Add function to cache schema
Adding custom SQL function to cache schema

Performance

Parsing SQL-query, performing validation and building query plan requires time to execute. Therefore, all precondition items of executed SQL-queries are stored at internal cache. Cache item’s lifetime is equal to 15 minutes after last access. Therefore, first SQL-query execution requires more time than next one.

See Java microbenchmark harness log output:

# Warmup Iteration   1: 2,712 ms/op
# Warmup Iteration 2: 0,234 ms/op
# Warmup Iteration 3: 0,140 ms/op
# Warmup Iteration 4: 0,137 ms/op
# Warmup Iteration 5: 0,136 ms/op
Iteration 1: 0,137 ms/op
Iteration 2: 0,137 ms/op
Iteration 3: 0,138 ms/op
Iteration 4: 0,136 ms/op
Iteration 5: 0,136 ms/op

And simple benchmark results:

Benchmark                  Mode  Cnt  Score   Error  Units
BenchmarkTest.selectQuery avgt 25 0,144 ± 0,007 ms/op

First query (non-cached) is executed for 2 712 ms. But all next query execution takes average for 0,144 ms. Execution time depends on a lot of options but non-cached query is slower than cached one. Testing is performed on AMD A8–6410 APU.

Conclusion

In this article, we take a look at TweeCache Java library. It provides high-performance caching library for Java and capability to execute SQL-query against data stored in cache.

Library repository is available on Github.com.
See more usage examples at com.github.darrmirr.tweecache.TweeCacheTest class.

--

--