Skip to content

AhmedTrb/Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple SQLite-Like Database in C

Overview

This project is a simple SQLite-inspired database implemented in C, designed to demonstrate core database concepts:

  • Persistent storage of rows in leaf nodes using a B+ tree.
  • Efficient search, insert, and traversal operations.
  • Basic command parsing for inserting and selecting rows.

The primary goal of this project is learning by doing: to deeply understand how databases internally manage data, optimize read/write operations, and scale efficiently.


Motivation

My journey into databases started with a simple video about indexing, which sparked my curiosity about how databases store data, manage queries, and optimize performance. Reading about database internals — pages on disk, I/O costs, and tree-based indexing

I wanted to go beyond theory and experience the challenges firsthand, so I decided to build a minimal database engine to better understand and visualize these concepts.

I detailed my learning about Databases basic data storage and why using B+trees in this Notion page Databases


Features

This simple database supports one table with the following schema:

  • id (integer)
  • username (string, max 32 characters)
  • email (string, max 255 characters)

The database supports the following commands:

  • insert <id> <username> <email>: Inserts a new row into the database.
  • select: Retrieves and displays all rows in the database.
  • select <id>: Searches for a row by its ID and displays it if found.
  • bulk_insert inserts rows from "sample_data.txt" file for testing.

Meta commands:

  • .exit: Exits the database program and frees allocated memory.
  • .btree: Displays the structure of the B+ tree.
  • .constants: Displays constants used in the database.
  • .help: Displays help information about available commands.
  • .cache: Displays LRU cache structure.

Building and Running

To build and run the database, follow these steps:

  1. Clone the repository:
    git clone https://github.com/AhmedTrb/Database.git
    cd Database
  2. Compile the code using gcc and run the executable:
    gcc -o db db.c
    ./db database.db

Database File Structure

The database uses a simple file-based storage mechanism to persist data using a B+Tree. The file structure is as follows:

+------------------- Database File -------------------+
|                                                     |
|  [PAGE 0] (Root Internal Node)                      |
|  ┌───────────────────────────────────────────────┐  |
|  │ Node Type: INTERNAL                           │  |
|  │ Is Root: true|
|  │ Num Keys: 1                                   │  |
|  │                                               │  |
|  │ Keys: [4]                                     │  |
|  │ Children: [1] [2]                             │  |
|  └───────────────────────────────────────────────┘  |
|                                                     |
|  [PAGE 1] (Leaf Node)                               |
|  ┌───────────────────────────────────────────────┐  |
|  │ Node Type: LEAF                               │  |
|  │ Num Cells: 3                                  │  |
|  │ Next Leaf: 2                                  │  |
|  │                                               │  |
|  │ Row #1 → id=1 username="ahmed" email="a@x.com"│  |
|  │ Row #2 → id=2 username="john"  email="j@x.com"│  |
|  │ Row #3 → id=3 username="ali"   email="al@x.com│  |
|  └───────────────────────────────────────────────┘  |
|                                                     |
|  [PAGE 2] (Leaf Node)                               |
|  ┌───────────────────────────────────────────────┐  |
|  │ Node Type: LEAF                               │  |
|  │ Num Cells: 2                                  │  |
|  │ Next Leaf: NULL                               │  |
|  │                                               │  |
|  │ Row #4 → id=4 username="lisa"  email="li@x.com│  |
|  │ Row #5 → id=5 username="omar"  email="o@x.com"│  |
|  └───────────────────────────────────────────────┘  |
|                                                     |
+-----------------------------------------------------+

LRU Cache

To optimize read operations, the database implements a simple LRU (Least Recently Used) cache. The cache stores recently accessed pages in memory, reducing the need for frequent disk I/O operations. The cache has a fixed capacity and uses a doubly linked list and a hash table to track the order of page usage. When a page is accessed, it is moved to the front of the list. If the cache is full and a new page needs to be added, the least recently used page (at the end of the list) is evicted.

+--------------------- LRU Cache ----------------------+
|                                                     |
|  Doubly Linked List (Order of Use):                 |
|                                                     |
|  HEAD (MRU)                                         |
|   ┌──────────┐    ┌──────────┐    ┌──────────┐      |
|   │ Page 12  │<-->| Page 8   │<-->| Page 5   │      |
|   └──────────┘    └──────────┘    └──────────┘      |
|                         ^                           |
|                         |                           |
|                      TAIL (LRU)                     |
|                                                     |
+-----------------------------------------------------+
|                                                     |
|  Hash Table (Direct Access by Page Number):         |
|                                                     |
|  Index: PageNum → Pointer                           |
|  -------------------------------------------------  |
|   [5]  → (ptr to node holding Page 5)               |
|   [8]  → (ptr to node holding Page 8)               |
|   [12] → (ptr to node holding Page 12)              |
|                                                     |
|  Empty slots are NULL for pages not cached.         |
+-----------------------------------------------------+

Acknowledgment

This project follows this excellent step-by-step tutorial DB Tutorial, which guided the initial implementation of:

  • B+ tree structure for table storage.
  • Row storage in leaf nodes.
  • Disk persistence through simple file-based storage.

I built upon this foundation by adding new utilities and experimenting with different aspects of the database to understand each component in depth. Things I added:

  • select id
  • LRU Cache to optimize read operations.
  • bulk_insert command to insert multiple rows from a file.
  • Debug and visualization utilities (.cache, .btree).

About

A simple SQLite-inspired database in C with B+ tree indexing

Topics

Resources

Stars

Watchers

Forks

Languages