This project is a simple SQLite-inspired database implemented in C, designed to demonstrate core database concepts:
- Persistent storage of rows in leaf nodes using a B+ tree.
- Efficient search, insert, and traversal operations.
- Basic command parsing for inserting and selecting rows.
The primary goal of this project is learning by doing: to deeply understand how databases internally manage data, optimize read/write operations, and scale efficiently.
My journey into databases started with a simple video about indexing, which sparked my curiosity about how databases store data, manage queries, and optimize performance. Reading about database internals — pages on disk, I/O costs, and tree-based indexing
I wanted to go beyond theory and experience the challenges firsthand, so I decided to build a minimal database engine to better understand and visualize these concepts.
I detailed my learning about Databases basic data storage and why using B+trees in this Notion page Databases
This simple database supports one table with the following schema:
id(integer)username(string, max 32 characters)email(string, max 255 characters)
The database supports the following commands:
insert <id> <username> <email>: Inserts a new row into the database.select: Retrieves and displays all rows in the database.select <id>: Searches for a row by its ID and displays it if found.bulk_insertinserts rows from "sample_data.txt" file for testing.
Meta commands:
.exit: Exits the database program and frees allocated memory..btree: Displays the structure of the B+ tree..constants: Displays constants used in the database..help: Displays help information about available commands..cache: Displays LRU cache structure.
To build and run the database, follow these steps:
- Clone the repository:
git clone https://github.com/AhmedTrb/Database.git cd Database - Compile the code using
gccand run the executable:gcc -o db db.c ./db database.db
The database uses a simple file-based storage mechanism to persist data using a B+Tree. The file structure is as follows:
+------------------- Database File -------------------+
| |
| [PAGE 0] (Root Internal Node) |
| ┌───────────────────────────────────────────────┐ |
| │ Node Type: INTERNAL │ |
| │ Is Root: true │ |
| │ Num Keys: 1 │ |
| │ │ |
| │ Keys: [4] │ |
| │ Children: [1] [2] │ |
| └───────────────────────────────────────────────┘ |
| |
| [PAGE 1] (Leaf Node) |
| ┌───────────────────────────────────────────────┐ |
| │ Node Type: LEAF │ |
| │ Num Cells: 3 │ |
| │ Next Leaf: 2 │ |
| │ │ |
| │ Row #1 → id=1 username="ahmed" email="a@x.com"│ |
| │ Row #2 → id=2 username="john" email="j@x.com"│ |
| │ Row #3 → id=3 username="ali" email="al@x.com│ |
| └───────────────────────────────────────────────┘ |
| |
| [PAGE 2] (Leaf Node) |
| ┌───────────────────────────────────────────────┐ |
| │ Node Type: LEAF │ |
| │ Num Cells: 2 │ |
| │ Next Leaf: NULL │ |
| │ │ |
| │ Row #4 → id=4 username="lisa" email="li@x.com│ |
| │ Row #5 → id=5 username="omar" email="o@x.com"│ |
| └───────────────────────────────────────────────┘ |
| |
+-----------------------------------------------------+
To optimize read operations, the database implements a simple LRU (Least Recently Used) cache. The cache stores recently accessed pages in memory, reducing the need for frequent disk I/O operations. The cache has a fixed capacity and uses a doubly linked list and a hash table to track the order of page usage. When a page is accessed, it is moved to the front of the list. If the cache is full and a new page needs to be added, the least recently used page (at the end of the list) is evicted.
+--------------------- LRU Cache ----------------------+
| |
| Doubly Linked List (Order of Use): |
| |
| HEAD (MRU) |
| ┌──────────┐ ┌──────────┐ ┌──────────┐ |
| │ Page 12 │<-->| Page 8 │<-->| Page 5 │ |
| └──────────┘ └──────────┘ └──────────┘ |
| ^ |
| | |
| TAIL (LRU) |
| |
+-----------------------------------------------------+
| |
| Hash Table (Direct Access by Page Number): |
| |
| Index: PageNum → Pointer |
| ------------------------------------------------- |
| [5] → (ptr to node holding Page 5) |
| [8] → (ptr to node holding Page 8) |
| [12] → (ptr to node holding Page 12) |
| |
| Empty slots are NULL for pages not cached. |
+-----------------------------------------------------+This project follows this excellent step-by-step tutorial DB Tutorial, which guided the initial implementation of:
- B+ tree structure for table storage.
- Row storage in leaf nodes.
- Disk persistence through simple file-based storage.
I built upon this foundation by adding new utilities and experimenting with different aspects of the database to understand each component in depth. Things I added:
- select id
- LRU Cache to optimize read operations.
- bulk_insert command to insert multiple rows from a file.
- Debug and visualization utilities (.cache, .btree).