10 sessions
What you will be able to do:
=> which in turn should bring good health, fortune and happiness
Advanced Database Design: understanding of normalization and denormalization concepts to optimize database structure
Triggers and procedures
if we have time : Common cloud services; Google BigQuery
If there’s a specific subject you want to address let me know.
Real datasets!
Goals
Lots of good resources around
Postgres tutorials
The best resource is the excellent documentation
organize complex structured data
Spotify example
Social Network example
Discussion : Why use a database instead of just … files like json, or excel / csv ?
Sometimes we hear the word database
used to describe an Excel file.
How can we put something as simple as a CSV or Excel file on the same level as these engineering marvels that are PostgreSQL, Weaviate, MongoDB, Neo4j, Redis, MySQL, etc… ?
So I asked my friend GPT-4 to give me a definition of a database:
In simple terms: “A database is like a smart notebook or filing system that helps you keep track of lots of information and find exactly what you need in no time.”
which definitely includes CSV files, Excel files, JSON files, XML files, and many other simple file-based formats.
From the Encyclopedia Britannica, we get:
database, any collection of data or information specifically organized for rapid search by computer. Databases are structured to facilitate storage, retrieval, modification, and deletion of data.
See also the Database article on Wikipedia.
a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data.
No longer just talking about quickly finding information (the search part) but also about:
This is where a simple spreadsheet file no longer meets the objective.
A database management system (DBMS) is distinguished from a simple spreadsheet by several essential functionalities.
Feature | Description | Excel | DBMS |
---|---|---|---|
Data Storage and Retrieval | Stores data in an organized manner and retrieves it as needed. | ✅ | ✅ |
Data Manipulation | Allows adding, modifying, or deleting data. | ✅ | ✅ |
Data Querying | Allows asking complex questions (queries) about the data. | ✔️ | ✅ |
Data Organization | Structures data in formats such as tables, documents, or graphs to facilitate management. | ✅ | ✅ |
Data Sharing | Allows multiple users or applications to use the database simultaneously. | ✅ | ✅ |
Data Security | Protects data against unauthorized access or corruption. | ✔️ | ✅ |
Concurrency Control | Manages multiple users modifying data at the same time without conflicts. | ✅ | |
Backup and Recovery | Ensures that data is not lost and can be restored in case of failure. | ✔️ | ✅ |
Data Integrity | Ensures that data remains accurate, consistent, and reliable. | ✅ | |
Performance Optimization | Provides tools to optimize the speed and efficiency of data retrievals and updates. | ✅ | |
Support for Transactions | Ensures that a group of operations (transactions) is completed entirely or not at all. ACID compliance |
✅ |
1991: Object-Oriented databases gain attention.
Most OODBs from the 90s are no longer used. But they influenced the evolution of both SQL and NoSQL databases.
1995: MySQL is released as open source
And in 2009, 2 new NoSQL databases are launched:
Mongodb is web scale
That was 15 years ago back in Aug 27, 2010.
The rise of the world wide web (myspace - 2003 😍, youtube - 2005) and the massive increase in application scale by several orders of magnitude.
Suddenly, we have millions of people simultaneously trying to access and modify Terabytes of data in milliseconds.
Relational databases cannot keep up with the scale of applications, the chaotic nature of unstructured data, and the speed requirements.
The promise of NoSQL is volume and speed.
and meanwhile, in 2013, Docker containers revolutionize database deployment
Vector search is booming. Vector search capabilities are being integrated into most existing DBMS, including PostgreSQL, MongoDB, and Neo4j.
In Brief
Relational databases
There are other types of databases than relational databases :
We have many databases to choose from. It all depends on scale, application nature, budget, etc.
Database Type | Purpose | Examples | Application |
---|---|---|---|
Relational - SQL | Fixed schema | PostgreSQL, MySQL, Oracle | Transactions, normalization |
Document Stores | Flexible schema, JSON-like documents | MongoDB, CouchDB | Web applications, content management |
Graph Databases | Relationship-centered data | Neo4j, ArangoDB | Social networks, recommendation engines |
Key-Value Stores | Simple and fast lookups | Redis, DynamoDB | Caching, session management |
Vector Databases | Similarity search, AI embeddings | Pinecone, Weaviate | AI applications, semantic search |
Column-Family Stores | Wide-column data, high scalability | Cassandra, HBase | Time-series, big data applications |
Time-Series Databases | Time-ordered data | InfluxDB, TimescaleDB | IoT, monitoring systems |
Check the ranking of all databases at https://db-engines.com/en/ranking
source: https://www.generativevalue.com/p/a-primer-on-databases
Also see this interactive map that lists all players in 2023.
Advantages
✅ Free to use
✅ Transparent (view + audit code)
✅ Customizable
✅ Strong community support
✅ No vendor lock-in
Disadvantages
❌ May need more setup & tuning
❌ Support = community or paid 3rd party
❌ Fewer enterprise features (sometimes)
Advantages
✅ Enterprise-grade support
✅ Robust tooling & integrations
✅ Optimized performance at scale
✅ Security/compliance features
Disadvantages
❌ Expensive licensing
❌ Closed codebase
❌ Vendor lock-in risk
❌ Limited customization
In terms of relational databases compare
PostgreSQL’s most mentioned advantage is the efficiency of its central algorithm, which means it outperforms many databases that are advertised as more advanced.
One of the most flexible open source databases around;
You can write functions in a wide range of server-side languages: Python, Perl, Java, Ruby, C, and R.
SQLite is arguably the most implemented database engine in the world,
unlike many other databases it is not a client-server engine; rather, the full software is embedded into each implementation
to build and implement a small database, SQLite is arguably the best way to go. It is extremely small,
also very Fast
Extensions : add-ons that enhance the functionality of a PostgreSQL database.
The Guardian: from MongoDB to postgresQL
Take a few minutes to read the article
https://www.theguardian.com/info/2018/nov/30/bye-bye-mongo-hello-postgres
from 2018, so no hasty conclusion. the article underlines the difficulty of choosing the right database. MongoDB has evolved a lot since then.
Reasons for Moving from MongoDB to Migration Process: PostgreSQL:
Feature Limitations: Alternatives like DynamoDB were considered but lacked essential features like encryption at rest, which Postgres on AWS RDS provided.