immudb index performance deep dive

Indexing is the first thing that comes to mind when we consider a database’s performance. B-tree is an m-way tree that self-balances itself. Due to their balanced structure, such trees are frequently used to manage and organise enormous databases and facilitate searches, especially range queries. A B-Tree index speeds up data access because the storage engine doesn’t have to scan the whole table to find the desired data.

A B-tree:

  • keeps keys in sorted order for sequential traversing
  • uses a hierarchical index to minimize the number of disk reads
  • uses partially full blocks to speed up insertions and deletions
  • keeps the index balanced with a recursive algorithm

immudb stores the index in a slightly modified version of a typical B-Tree, called a timed B-tree. A TBtree (a timed B-Tree) stores indexes for records in a database for lookups by the full key, a key range, or a key prefix. They are useful only if the lookup uses a leftmost prefix of the index. The index is useful for the following kinds of queries:

  • Match the full key
  • Match a leftmost prefix
  • Match a range of values
  • Match one part exactly and match a range on another part

As the tree’s nodes are sorted, they are helpful for both lookups and ORDER BY queries (finding keys in sorted order). In general, if a B-Tree can help find a row in a particular way, it can help you sort rows by the same criteria. So, the index will be helpful for ORDER BY clauses that match all the types of lookups we just listed. Why is this required you may ask? For SQL support.

SQL support

Our SQL support layer is built on top of the TBtree too. Any SQL schema for a table contains information about the column and datatype.

CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,
   ....
);

immudb stores the schema and records for a table as any other key-value pair in it’s append-only log storage. But to identify the SQL records, a prefix is appended on the key to help identify the schema (or catalog in immudb terminology).

With help of B-Tree for lookups by the full key, a key range, or a key prefix, immudb managed B-Tree support. Also revisions/history for keys are stored in the TBtree, this provides support for fast time-travel for any key by just querying the index.

Note that the values are not stored against the key in the Btree, rather we store the offset of the value in the value-log (explained below) against a key for faster lookups.

Transaction support

Each new transaction in immudb currently creates a snapshot to allow concurrent operations on the database. I’ll write more on how MVCC support works in immudb in an upcoming blog.

The next post in the series covers data persistence of immudb

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.

Subscribe to our newsletter