ebpf-linux-immudb


What is eBPF?

eBPF has gained lot of momentum in the last years. The reason is that it brings to the Linux Kernel what Javascript is to a Web browser and what Lua did to game engines.

By turning the Linux Kernel into a re-programmable system, solutions around networking, security and observability can be implemented without having to change code in the Kernel, and without adding complexity to these subsystems.

How does eBPF works?

eBPF works by embedding a virtual machine into the Linux kernel. Think of it like the Javascript virtual machine in your browser. In a browser, the DOM and other browser APIs are exposed to the Javascript VM. With eBPF, different Kernel observability subsystems (kprobes, tracepoints, uprobes, etc) are exposed to eBPF programs.

How to use eBPF?

There are different ways to interact with the eBPF subsystem. One of them is to use the BPF compiler collection (BCC). For example, if you compile the following code with BCC, it will recognize the prefix of the function (kprobe__) to instrument a kprobe on the kernel tcp_v4_connect function.

int kprobe__tcp_v4_connect(struct pt_regs *ctx, struct sock *sk)
    [...]
}

The bpftrace tool was created by Alastair Robertson. It allows to instrument the kernel with a very simple language inspired by C and awk. It uses eBPF under the hoods. For example, you can save the filename used in a call to statfs() in a map by doing:

tracepoint:syscalls:sys_enter_statfs
{
        @filename[tid] = args->pathname;
}

and then display it once the function exits:

tracepoint:syscalls:sys_exit_statfs
/@filename[tid]/
{
        $ret = args->ret;
        $errno = $ret >= 0 ? 0 : - $ret;

        printf("%-6d %-16s %3d %sn", pid, comm, $errno,
            str(@filename[tid]));
        delete(@filename[tid]);
}

Brendan Gregg has written many utilities that ship as examples to bpftrace.

Snooping shell commands

Adding a hook to capture all bash commands is just a matter of adding an uprobe to the bash readline function.

BEGIN
{
        printf("Tracing bash commands... Hit Ctrl-C to end.n");
        printf("%-9s %-6s %sn", "TIME", "PID", "COMMAND");
}

uretprobe:/usr/lib64/libreadline.so:readline
{
        time("%H:%M:%S  ");
        printf("%-6d %sn", pid, str(retval));
}

If bash in your distribution is compiled with readline statically, you will need to replace /usr/lib64/libreadline.so with /bin/bash.

$ sudo bpftrace bashreadline.bt

And you will start seeing the commands as they are typed.

Creating a tamper-proof audit log of typed Linux commands.

Sending data from eBPF to immudb

immudb is a database developed by CodeNotary which allows to insert data tamper-proof and will full history. The data is protected by the clients, so you can make sure nobody has changed the history since the last time you connected. immudb can be used as a DBMS service or embedded into your application.

We will create a Go program that inserts a eBPF program to capture shell commands and we will send them to a database to be stored tamper-proof. If somebody wants to remove the history of a typed commands, other clients will realize it by receiving a state that does not validate against the previous cryptographic state the client stores.


To learn interactively and get started with immudb from the command line and programming languages, visit the immudb Playground.

The immudb Playground


We could use bpftrace and immuclient to create a very simple script without do any programming.

First we set the credentials:

$ export IMMUCLIENT_USERNAME=immudb IMMUCLIENT_PASSWORD=immudb IMMUCLIENT_DATABASE=defaultdb

We will write a uretprobe that will trigger everytime the readline function from bash returns. The hook will then use the system built-in function to call immuclient to insert the uid, pid, timestamp and command into *immudb**.

Warning: Because system() allows to run anything, the bpftrace system built-in needs to be called with --unsafe. Don’t use system() in a real scenario as simple extra quotes would allow any user to run commands as root.

$ bpftrace --unsafe -e 'uretprobe:/usr/lib64/libreadline.so:readline { system("bin/immuclient set "bash:%d-%d-%d" "user %d: %s"n", pid, nsecs, rand, uid, str(retval)); }'

Once the probe start capturing commands, you will see the output of immuclient as it inserts values:

Attaching 1 probe...
tx:             1
key:            bash:27372-784451418-1577705011
value:          user 1000: cat /etc/fstab
hash:           f0294266d6631d6970b27359c3a4f427e2872548961f4a33acdb57ad04a89fef

tx:             2
key:            bash:27130--869677815-769938295
value:          user 1000: git pull --rebase
hash:           dea8733107c200ee0e88a1636f3781648f9512f39d88d175a14948f95d2c42c1

Doing this from a programming language allows get structured data and avoid the problems of the script version. To do this from Go, you can start with the this example.

We define a struct that can be serialized to JSON, which we will use to store the event into immudb:

type Entry struct {
    Pid     uint32 `json:pid`
    Command string `json:command`
}

You would need to setup an immudb client just before starting to read the events:

c, err := client.NewImmuClient(client.DefaultOptions())
if err != nil {
    log.Fatal(err)
}

ctx := context.Background()
// login with default username and password and storing a token
lr, err := c.Login(ctx, []byte(`immudb`), []byte(`immudb`))
if err != nil {
    log.Fatal(err)
}
// set up an authenticated context that will be required in future operations
md := metadata.Pairs("authorization", lr.Token)
ctx = metadata.NewOutgoingContext(context.Background(), md)

log.Printf("Connected to immudb")

And then on every event, we would insert it:

entry := Entry{Pid: event.Pid, Command: comm}
json, err := json.Marshal(entry)
if err != nil {
    log.Fatal(err)
}

key := fmt.Sprintf("bash:%d:%s", time.Now().UnixNano(), shortuuid.New())
_, err = c.Set(ctx, []byte(key), json)
if err != nil {
    log.Fatal(err)
}

Once built, run it, and it should start capturing commands into immudb. We can verify that by doing a scan on the ""bash" prefix:

$ immuclient
immuclient>login immudb
Password:
Successfully logged in.
immudb user has the default password: please change it to ensure proper security
immuclient>scan bash
tx:             1
key:            bash:1617959978373826959:RL33FeAh8K2mjrreQ6nAQ9
value:          {"Pid":6793,"Command":"ls -la"}
hash:           7de2e26104283979302dfc790b15de0c1dd53f6275a580460e57aea612f72b1d

tx:             2
key:            bash:1617959983177004413:NDSGRabvtLaEAVH7MBYPmB
value:          {"Pid":6793,"Command":"cat /etc/fstab"}
hash:           2a0c9cac5fc79702e3e15516be24e084120db42fab679853792a4552eed3b254

tx:             3
key:            bash:1617959992408965502:vj9EcBDTNXyAx5dXGqFc5Q
value:          {"Pid":6793,"Command":"immuclient"}
hash:           1cdab4955185414ff5540fdf6d6c20fa60f83dc55c6f1da67a1bd8581c0957d3

In a real scenario, you would additionally:

  • Handle authentication properly
  • Use VerifiedSet and handle the case where the server has been tampered

The full program is listed here and also available in this Github repository.

// Copyright 2017 Louis McCormack
// Adapted by Duncan Mac-Vicar
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package main

import (
    "bytes"
    "context"
    "encoding/binary"
    "encoding/json"
    "fmt"
    "github.com/codenotary/immudb/pkg/client"
    bpf "github.com/iovisor/gobpf/bcc"
    "github.com/renstrom/shortuuid"
    "google.golang.org/grpc/metadata"
    "log"
    "os"
    "os/signal"
    "time"
)

const source string = `
#include <uapi/linux/ptrace.h>

struct readline_event_t {
        u32 pid;
        char str[80];
} __attribute__((packed));

BPF_PERF_OUTPUT(readline_events);

int get_return_value(struct pt_regs *ctx) {
        struct readline_event_t event = {};
        u32 pid;
        if (!PT_REGS_RC(ctx))
                return 0;
        pid = bpf_get_current_pid_tgid();
        event.pid = pid;
        bpf_probe_read(&event.str, sizeof(event.str), (void *)PT_REGS_RC(ctx));
        readline_events.perf_submit(ctx, &event, sizeof(event));

        return 0;
}
`

type readlineEvent struct {
    Pid uint32
    Str [80]byte
}

type Entry struct {
    Pid     uint32 `json:pid`
    Command string `json:command`
}

func main() {
    m := bpf.NewModule(source, []string{})
    defer m.Close()

    readlineUretprobe, err := m.LoadUprobe("get_return_value")
    if err != nil {
        fmt.Fprintf(os.Stderr, "Failed to load get_return_value: %sn", err)
        os.Exit(1)
    }

    err = m.AttachUretprobe("/usr/lib64/libreadline.so", "readline", readlineUretprobe, -1)
    if err != nil {
        fmt.Fprintf(os.Stderr, "Failed to attach return_value: %sn", err)
        os.Exit(1)
    }

    table := bpf.NewTable(m.TableId("readline_events"), m)

    channel := make(chan []byte)

    perfMap, err := bpf.InitPerfMap(table, channel, nil)
    if err != nil {
        fmt.Fprintf(os.Stderr, "Failed to init perf map: %sn", err)
        os.Exit(1)
    }

    sig := make(chan os.Signal, 1)
    signal.Notify(sig, os.Interrupt, os.Kill)

    go func() {
        c, err := client.NewImmuClient(client.DefaultOptions())
        if err != nil {
            log.Fatal(err)
        }

        ctx := context.Background()
        // login with default username and password and storing a token
        lr, err := c.Login(ctx, []byte(`immudb`), []byte(`immudb`))
        if err != nil {
            log.Fatal(err)
        }
        // set up an authenticated context that will be required in future operations
        md := metadata.Pairs("authorization", lr.Token)
        ctx = metadata.NewOutgoingContext(context.Background(), md)

        log.Printf("Connected to immudb")

        var event readlineEvent
        for {
            data := <-channel
            err := binary.Read(bytes.NewBuffer(data), binary.LittleEndian, &event)
            if err != nil {
                fmt.Printf("failed to decode received data: %sn", err)
                continue
            }
            // Convert C string (null-terminated) to Go string
            comm := string(event.Str[:bytes.IndexByte(event.Str[:], 0)])

            entry := Entry{Pid: event.Pid, Command: comm}
            json, err := json.Marshal(entry)
            if err != nil {
                log.Fatal(err)
            }

            key := fmt.Sprintf("bash:%d:%s", time.Now().UnixNano(), shortuuid.New())
            _, err = c.Set(ctx, []byte(key), json)
            if err != nil {
                log.Fatal(err)
            }
        }
    }()

    perfMap.Start()
    <-sig
    perfMap.Stop()
}

immudb

BUILT ON THE FASTEST
IMMUTABLE LEDGER
TECHNOLOGY

Open Source and easy to use in new applications and easy to integrate into existing application.

Subscribe to Our Newsletter

Get the latest product updates, company news, and special offers delivered right to your inbox.

Subscribe to our newsletter

Use Case - Tamper-resistant Clinical Trials

Goal:

Blockchain PoCs were unsuccessful due to complexity and lack of developers.

Still the goal of data immutability as well as client verification is a crucial. Furthermore, the system needs to be easy to use and operate (allowing backup, maintenance windows aso.).

Implementation:

immudb is running in different datacenters across the globe. All clinical trial information is stored in immudb either as transactions or the pdf documents as a whole.

Having that single source of truth with versioned, timestamped, and cryptographically verifiable records, enables a whole new way of transparency and trust.

Use Case - Finance

Goal:

Store the source data, the decision and the rule base for financial support from governments timestamped, verifiable.

A very important functionality is the ability to compare the historic decision (based on the past rulebase) with the rulebase at a different date. Fully cryptographic verifiable Time Travel queries are required to be able to achieve that comparison.

Implementation:

While the source data, rulebase and the documented decision are stored in verifiable Blobs in immudb, the transaction is stored using the relational layer of immudb.

That allows the use of immudb’s time travel capabilities to retrieve verified historic data and recalculate with the most recent rulebase.

Use Case - eCommerce and NFT marketplace

Goal:

No matter if it’s an eCommerce platform or NFT marketplace, the goals are similar:

  • High amount of transactions (potentially millions a second)
  • Ability to read and write multiple records within one transaction
  • prevent overwrite or updates on transactions
  • comply with regulations (PCI, GDPR, …)


Implementation:

immudb is typically scaled out using Hyperscaler (i. e. AWS, Google Cloud, Microsoft Azure) distributed across the Globe. Auditors are also distributed to track the verification proof over time. Additionally, the shop or marketplace applications store immudb cryptographic state information. That high level of integrity and tamper-evidence while maintaining a very high transaction speed is key for companies to chose immudb.

Use Case - IoT Sensor Data

Goal:

IoT sensor data received by devices collecting environment data needs to be stored locally in a cryptographically verifiable manner until the data is transferred to a central datacenter. The data integrity needs to be verifiable at any given point in time and while in transit.

Implementation:

immudb runs embedded on the IoT device itself and is consistently audited by external probes. The data transfer to audit is minimal and works even with minimum bandwidth and unreliable connections.

Whenever the IoT devices are connected to a high bandwidth, the data transfer happens to a data center (large immudb deployment) and the source and destination date integrity is fully verified.

Use Case - DevOps Evidence

Goal:

CI/CD and application build logs need to be stored auditable and tamper-evident.
A very high Performance is required as the system should not slow down any build process.
Scalability is key as billions of artifacts are expected within the next years.
Next to a possibility of integrity validation, data needs to be retrievable by pipeline job id or digital asset checksum.

Implementation:

As part of the CI/CD audit functionality, data is stored within immudb using the Key/Value functionality. Key is either the CI/CD job id (i. e. Jenkins or GitLab) or the checksum of the resulting build or container image.

White Paper — Registration

We will also send you the research paper
via email.

CodeNotary — Webinar

White Paper — Registration

Please let us know where we can send the whitepaper on CodeNotary Trusted Software Supply Chain. 

Become a partner

Start Your Trial

Please enter contact information to receive an email with the virtual appliance download instructions.

Start Free Trial

Please enter contact information to receive an email with the free trial details.