Database
A database is an organized collection of data stored electronically and managed by a Database Management System (DBMS).
While datasets represent collections of data, databases provide the controlled environment where data is stored, managed, updated, and accessed efficiently.
Databases are designed to ensure consistency, reliability, performance, and security—especially when multiple users or applications access data at the same time.
A database allows users and applications to:
In practice, databases act as the system of record for most operational and analytical applications.
Core Components of a Database :
1. Data
Actual stored values—rows, documents, key-value pairs, or files.
2. Schema
Defines structure: tables, columns, data types, constraints.
3. Database Engine
Handles storage, indexing, query execution, and transactions.
4. Indexes
Improve query performance by reducing data scans.
5. Transactions
Ensure data integrity during concurrent operations.
6. Metadata
Describes tables, columns, permissions, and statistics.
Types of Databases :
Databases are categorized based on data model and access patterns.
1. Relational Databases (RDBMS)
Store data in tables with rows and columns, governed by a fixed schema.
Examples:
Characteristics:
Use Cases:
2. NoSQL Databases
Designed for scalability and flexible schemas.
|
Subtype |
Example |
Best For |
|---|---|---|
|
Document |
MongoDB |
JSON-like records |
|
Key-Value |
Redis, DynamoDB |
Caching, sessions |
|
Column-Family |
Cassandra |
High-write workloads |
|
Graph |
Neo4j |
Relationship analysis |
Used when scale or schema flexibility is critical.
3. Analytical Databases (Data Warehouses)
Optimized for reading large volumes of historical data.
Examples:
Characteristics:
4. Specialized Databases
Each is optimized for a specific access pattern.
Database Operations (CRUD) :
All databases support core operations:
|
Operation |
Meaning |
|---|---|
|
Create |
Insert new records |
|
Read |
Query existing data |
|
Update |
Modify records |
|
Delete |
Remove records |
These operations form the foundation of application and analytics workloads.
Transactions and ACID Properties :
Most traditional databases follow ACID principles :
|
Property |
Meaning |
|---|---|
|
Atomicity |
All or nothing execution |
|
Consistency |
Data rules always enforced |
|
Isolation |
Concurrent operations don’t interfere |
|
Durability |
Data persists after commit |
ACID ensures correctness, especially in financial and mission-critical systems.
How Databases Are Used in Real Systems
|
Use Case |
Database Type |
Example |
|---|---|---|
|
Web applications |
RDBMS / NoSQL |
User profiles |
|
Streaming ingestion |
NoSQL |
Event metadata |
|
Analytics |
Data Warehouse |
Sales reporting |
|
ML feature storage |
SQL / NoSQL |
Feature tables |
Databases often act as sources or sinks in data pipelines.
Databases in Data Engineering :
In modern architectures:
Databases coexist with file-based systems rather than replacing them.
Limitations of Databases :
While powerful, databases have constraints:
This is why large systems combine databases with data lakes and distributed storage .
Example – Database in a Research Context :
In a research study:
This improves traceability, validation, and repeatability.
📚 Study Notes