
Created by Claude Opus 4.1 with prompt
“a simple sketch or diagram of an RDBMS architecture, keep it simple”
This architecture shows how a SQL query flows from the client through various processing layers before reaching the actual data on disk, with each component playing a crucial role in ensuring efficient, reliable database operations.
SQL Clients, Web Apps, BI Tools
These are the programs that users interact with to send SQL queries to the database. They connect to the RDBMS server using protocols like JDBC, ODBC, or native database drivers. (python, nodejs, Java, etc)
Checks SQL syntax and converts queries into an internal format the database can understand. It validates that tables and columns exist and that the user has proper permissions.
Determines the most efficient way to execute a query by analyzing different execution plans. It considers factors like available indexes, table statistics, and join methods to minimize resource usage. This is what makes Postgresql super efficient.
Carries out the optimized query plan by coordinating with other components to retrieve or modify data according to the SQL command.
EXPLAINExplain
SELECT * FROM trees tr
JOIN taxonomy ta on ta.id = tr.taxonomy_id
JOIN tree_species sp on sp.id = ta.species_id
WHERE tr.circumference > 50 and sp.species = 'tomentosa';

Manages how data is physically stored and retrieved from disk.
It handles different storage structures like heap files, B-trees for indexes, and implements algorithms for efficient data access.
Ensures database transactions are
Memory Cache
Concurrency Control
Data Files - Store the actual table records and rows containing your business data.
Index Files - Contain sorted pointers to data locations, speeding up queries by avoiding full table scans.
Log Files - Record all database changes for recovery purposes, allowing the system to restore data after crashes or rollback uncommitted transactions.
The primary language for interacting with the database from within. Users and applications use SQL commands (SELECT, INSERT, UPDATE, DELETE) to query and modify data directly.
Tools and utilities for moving data in and out of the database. This includes bulk loading tools, data migration utilities, and export functions for backing up or transferring data to other systems.
in postgresql we have command line executables:
createdb,dropdb,pg_dump,pg_restore,pg_dumpall,pg_restoreall
Organization and Optimization
Administrative functions that maintain database performance. This involves organizing data structures, managing storage allocation, and optimizing query execution paths to ensure fast response times.
In postgresql , we have for instance
Autovacuum reclaims storage space by cleaning up dead tuples.TOAST (oversized attribute storage) manages large field values.VACUUM / REINDEX / CLUSTER: Manual commands to reorganize data and indexes for better performance.SQL Layer - The query processing interface that interprets SQL commands, validates syntax, checks permissions, and translates high-level queries into low-level operations the database engine can execute.
Compacting and Cold Recovery - Maintenance operations that reclaim unused space (compacting) and restore the database from backups or after unexpected shutdowns (cold recovery). These ensure data integrity and efficient storage usage.
Database Core (Tables, Views, Index)
The logical database structure:
add : functions, triggers, sequences, etc
NTFS, NFS, etc. - The underlying operating system file systems where database files are physically stored. The database management system interfaces with these file systems to read and write data to disk. Common examples include:

Master SQL, and you’ll never lack job opportunities
SQL was created in the early 1970s. Here are the major features added through its evolution:
SQL-86 (SQL-87) - First ANSI Standard
SQL-89 (SQL1)
SQL-92 (SQL2)
SQL:1999 (SQL3)
SQL:2003
SQL:2006
SQL:2008
SQL:2011
SQL:2016
SQL:2023 (Latest)
# Python changes every few years
print "Hello" # Python 2 (dead)
print("Hello") # Python 3
# JavaScript changes constantly
var x = 5; // Old way
let x = 5; // New way
const x = 5; // Newer way
-- SQL from 1990s still works today!
SELECT * FROM users WHERE age > 18;
SQL is the COBOL that actually stayed relevant
Imperative (Python/Java/C++): “HOW to do it”
results = []
for user in users:
if user.age > 21 and user.country == 'France':
results.append(user.name)
return sorted(results)
Declarative (SQL): “WHAT you want”
You describe the result, the database figures out HOW
Declarative means you describe what you want, not how to do it.
In SQL, you state the result you want, and the database (query planner) figures out how to get it.
SELECT name FROM students WHERE grade > 15;
✅ You focus on what, not how — that’s why SQL is declarative.
SQL isn’t just for database admins anymore:
# Modern data stack
df = pd.read_sql("SELECT * FROM events WHERE date > '2024-01-01'", conn)
model.train(df)

from XKCD comics
-- NEVER do this
query = f"SELECT * FROM users WHERE name = '{user_input}'"
If the user enters: ‘; DROP TABLE users; ‘
The query becomes:
SELECT * FROM users WHERE name = '';
DROP TABLE users;
Boom! no more user table!
Always use parameterized queries!
animate this cartoon
| Standard SQL | PostgreSQL | MySQL | SQL Server |
|---|---|---|---|
SUBSTRING() |
SUBSTRING() |
SUBSTRING() |
SUBSTRING() |
| ❌ | LIMIT 10 |
LIMIT 10 |
TOP 10 |
| ❌ | STRING_AGG() |
GROUP_CONCAT() |
STRING_AGG() |
| ❌ | ::TEXT |
❌ | CAST AS VARCHAR |
Core SQL (90%) is identical everywhere Fancy features (10%) vary

SQL
├── DDL (Data Definition Language)
│ └── CREATE, ALTER, DROP
├── DML (Data Manipulation Language)
│ └── INSERT, UPDATE, DELETE
├── DQL (Data Query Language)
│ └── SELECT, FROM, WHERE, JOIN
├── DCL (Data Control Language)
│ └── GRANT, REVOKE
└── TCL (Transaction Control Language)
└── COMMIT, ROLLBACK, BEGIN
You’ll use DML and DQL 90% of the time
Defines and manages database structure
create database ipsadb;
create table students (
id serial primary key,
name varchar(100) not null,
age int not null,
grade int not null
);
ALTER table students ADD COLUMN graduation_year int;
DROP table students;
suffices to just
sqlite3 <database name>.db
adding db or .db as suffix is good practice
CREATE TABLE <table_name> (
id serial primary key,
<column_name> <data_type> <constraints>,
<column_name> <data_type> <constraints>,
<column_name> <data_type> <constraints>,
<column_name> <data_type> <constraints>,
);
with index and keys:
CREATE TABLE <table_name> (
id serial primary key,
<column_name> <data_type> <constraints>,
<column_name> <data_type> <constraints>,
<column_name> <data_type> <constraints>,
<column_name> <data_type> <constraints>,
INDEX <index_name> (<column_name>),
UNIQUE INDEX <index_name> (<column_name>),
FOREIGN KEY (<column_name>) REFERENCES <table_name>(<column_name>),
);
in postgresql
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
hire_date DATE NOT NULL,
salary NUMERIC(10, 2) NOT NULL,
department_id INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
in sqlite
CREATE TABLE employees (
id INTEGER PRIMARY KEY AUTOINCREMENT,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
hire_date DATE NOT NULL,
salary NUMERIC NOT NULL,
department_id INTEGER NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Let’s look at the postgresql documentation for the create table statement:

https://www.postgresql.org/docs/current/sql-createtable.html
Manages data within tables
INSERT INTO <table_name> (<column_name>, <column_name>, <column_name>, <column_name>)
VALUES (<value>, <value>, <value>, <value>);
for instance
INSERT INTO courses (title, description, semester, hours)
VALUES (
"intro to databse",
"the best intro to database course in the universe ",
"fall 2025",
4
);
the id gets auto-incremented any date time now with default value NOW() gets instanciated
It is good practice to use default values for date time columns and always have for tracabiliy purposess the 2 colums:
created_atupdated_atUPDATE <table_name>
SET <column_name> = <value>,
<column_name> = <value>
WHERE <condition>;
for instance:
UPDATE courses
SET title = "intro to databse",
description = "the best intro to database course in the universe",
semester = "fall 2025",
hours = 4
WHERE id = 1;
Retrieves data from the database
SELECT <column_name>, <column_name>
FROM <table_name>
WHERE <condition>;
for instance:
SELECT title, description
FROM courses
WHERE semester = "fall 2025";
joining on the teacher id:
SELECT title, description, teachers.name
FROM courses
JOIN teachers ON courses.teacher_id = teachers.id
WHERE semester = "fall 2025";
Controls access and permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON <table_name> TO <user_name>;
for instance:
GRANT SELECT, INSERT, UPDATE, DELETE ON courses TO Alexis;
Manages database transactions
BEGIN TRANSACTION;
-- Update all course credits by 1
UPDATE courses
SET credits = credits + 1;
-- Give all teachers a 10% raise
UPDATE teachers
SET salary = salary * 1.10;
-- Check if any teacher salary is now over 200000
IF EXISTS (SELECT 1 FROM teachers WHERE salary > 200000)
BEGIN
-- Something went wrong, undo everything
ROLLBACK;
PRINT 'Rolled back - salary too high';
END
ELSE
BEGIN
-- Everything looks good, save the changes
COMMIT;
PRINT 'Changes saved successfully';
END
Both COMMIT and ROLLBACK are “terminal” commands for a transaction - they save/undo the work AND close the transaction in one go.
There’s no “END TRANSACTION” command in SQL. Once you COMMIT or ROLLBACK, you’re done!
Important Note: Some database texts combine DQL with DML since
SELECTstatements can be considered data manipulation, resulting in 4 sub-languages instead of 5.
Aggregate Functions (DQL)
COUNT(), SUM(), AVG(), MAX(), MIN()GROUP BY for summarizing dataScalar Functions (DQL)
COALESCE() - Handles NULL valuesCAST() / CONVERT() - Type conversionSUBSTRING(), CONCAT(), TRIM() - String manipulationDATEADD(), DATEDIFF() - Date operationsROUND(), FLOOR(), CEILING() - Math functionsClauses & Operations (DQL)
GROUP BY - Groups rows for aggregationHAVING - Filters grouped resultsORDER BY - Sorts result setsDISTINCT - Removes duplicatesUNION/INTERSECT/EXCEPT - Set operationsWindow Functions (DQL)
ROW_NUMBER(), RANK(), DENSE_RANK()LAG(), LEAD() - Access other rowsConditional Logic (DQL)
CASE WHEN - Conditional expressionsAll these functions and clauses:
Note: Some databases like PostgreSQL consider these part of the broader “SELECT statement syntax” rather than separate categories, but they all fall under the DQL umbrella since they’re used for querying data.
-- Create the structure
CREATE TABLE songs (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
artist TEXT NOT NULL,
duration_seconds INTEGER,
release_date DATE
);
-- Modify the structure
ALTER TABLE songs ADD COLUMN genre TEXT;
-- Destroy the structure
DROP TABLE songs; -- ⚠️ Everything gone!
-- Create: Add new data
INSERT INTO songs (title, artist, duration_seconds)
VALUES ('Flowers', 'Miley Cyrus', 200);
-- Read: Query data
SELECT title, artist FROM songs WHERE duration_seconds < 180;
-- Update: Modify existing data
UPDATE songs SET genre = 'Pop' WHERE artist = 'Taylor Swift';
-- Delete: Remove data
DELETE FROM songs WHERE release_date < '2000-01-01';
The everyday SQL you write
SELECT -- What columns?
FROM -- What table?
WHERE -- What rows?
GROUP BY -- How to group?
HAVING -- Filter groups?
ORDER BY -- What order?
LIMIT -- How many?
This pattern solves 82% of your data questions

-- Almost natural language
SELECT name, age
FROM students
WHERE grade > 15
ORDER BY age DESC
LIMIT 10;
Translates to: “Show me the names and ages of students with grades above 15, sorted by age (oldest first), but only the top 10”
This is why SQL survived: humans can read it
Without JOIN (multiple queries):
# Get user
user = db.query("
SELECT * FROM users WHERE id = 42
")
# Get their orders
orders = db.query(f"
SELECT *
FROM orders
WHERE user_id = {user.id}
")
# Get order details
for order in orders:
items = db.query(f"
SELECT *
FROM items
WHERE order_id = {order.id}
")
With JOIN (one query):
SELECT u.name, o.date, i.product
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN items i ON o.id = i.order_id
WHERE u.id = 42;
A JOIN combines rows from two or more tables based on a related column between them.
SELECT columns
FROM table1
JOIN table2
ON table1.column = table2.column;
keysfrom trees tr)
using aliases
SELECT * FROM trees tr
JOIN taxonomy ta on ta.id = tr.taxonomy_id
JOIN tree_species sp on sp.id = ta.species_id
WHERE tr.circumference > 50 and sp.species = 'tomentosa';
instead of
SELECT * FROM trees
JOIN taxonomy on taxonomy.id = trees.taxonomy_id
JOIN tree_species on tree_species.id = taxonomy.species_id
WHERE trees.circumference > 50 and tree_species.species = 'tomentosa';
1. INNER JOIN (Default) Returns only matching records from both tables.
SELECT customers.name, orders.order_date
FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id;
SELECT
m.movie_id,
m.title,
m.release_year,
a.actor_name,
a.role
FROM movies m
LEFT JOIN actors a ON m.movie_id = a.movie_id
ORDER BY m.title;
Result example:
| movie_id | title | release_year | actor_name | role | |
|---|---|---|---|---|---|
| 1 | The Matrix | 1999 | Keanu Reeves | Neo | |
| 1 | The Matrix | 1999 | Laurence F. | Morpheus | |
| 2 | Inception | 2010 | Leonardo D. | Cobb |
These movies with no actors will not be shown
| movie_id | title | release_year | actor_name | role | |
|---|---|---|---|---|---|
| 3 | Documentary XYZ | 2023 | NULL | NULL | |
| 4 | Silent Film | 1920 | NULL | NULL |
2. LEFT JOIN (LEFT OUTER JOIN)
Returns all records from the left table, and matched records from the right table.
SELECT customers.name, orders.order_date
FROM customers
LEFT JOIN orders
ON customers.customer_id = orders.customer_id;
NULL values appear for customers with no orders
Simple LEFT JOIN: All movies, even those without actors
SELECT
m.movie_id,
m.title,
m.release_year,
a.actor_name,
a.role
FROM movies m
LEFT JOIN actors a ON m.movie_id = a.movie_id
ORDER BY m.title;
Result example:
| movie_id | title | release_year | actor_name | role |
|---|---|---|---|---|
| 1 | The Matrix | 1999 | Keanu Reeves | Neo |
| 1 | The Matrix | 1999 | Laurence F. | Morpheus |
| 2 | Inception | 2010 | Leonardo D. | Cobb |
| 3 | Documentary XYZ | 2027 | NULL | NULL |
| 4 | Silent Film | 1920 | NULL | NULL |
The key point: LEFT JOIN keeps ALL movies, even if they have no actors in the actors table. Those movies will show NULL values for the actor columns. This is useful for:
Seeing all movies in your database (with or without cast info) Finding movies that have no actors assigned yet (like documentaries or movies with missing data) Making sure you don’t lose any movies from your results just because actor data is incomplete
3. RIGHT JOIN (RIGHT OUTER JOIN)
Returns all records from the right table, and matched records from the left table.
SELECT customers.name, orders.order_date
FROM customers
RIGHT JOIN orders
ON customers.customer_id = orders.customer_id;
NULL values appear for orders with no customer info
4. FULL OUTER JOIN
Returns all records when there’s a match in either table.
SELECT customers.name, orders.order_date
FROM customers
FULL OUTER JOIN orders
ON customers.customer_id = orders.customer_id;
Note: Not supported in MySQL, use UNION of LEFT and RIGHT JOIN
FULL OUTER JOIN:
All movies AND all actors, even unmatched ones
SELECT
m.movie_id,
m.title,
m.release_year,
a.actor_name,
a.role
FROM movies m
FULL OUTER JOIN actors a ON m.movie_id = a.movie_id
ORDER BY m.title, a.actor_name;
Result example:
| movie_id | title | release_year | actor_name | role |
|---|---|---|---|---|
| NULL | NULL | NULL | Alphonse | NULL |
| NULL | NULL | NULL | Nephew of Alphonse | NULL |
| 1 | The Matrix | 1999 | Keanu Reeves | Neo |
| 2 | Inception | 2010 | Leonardo D. | Cobb |
| 3 | Documentary XYZ | 2023 | NULL | NULL |
| 4 | Silent Film | 1920 | NULL | NULL |
The key difference: FULL OUTER JOIN keeps EVERYTHING from both tables:
This creates three groups in your results:
5. CROSS JOIN
Returns the Cartesian product of both tables (every row paired with every row).
SELECT customers.name, products.product_name
FROM customers
CROSS JOIN products;
So, given a database, how do you know if it needs to be normalized?
There are 3 types of anomalies that you can look for:
A normalized database will solve these anomalies.
Consider the table of teachers and their courses
| id | teacher | course |
|---|---|---|
| 1 | Bjorn | Intermediate Database |
| 2 | Sofia | Crypto Currencies |
| 3 | Hussein | Data Analytics |
Now a new teacher (Alexis), is recruited by the school. The teacher does not have a course assigned to yet. If we want to insert the teacher in the table we need to put a NULL value in the course column.
NULL values are to be avoided!!! (more on that later)
More generally:
Consider now the following movies, directors, year and production house
| id | Film | Director | Production House |
|---|---|---|---|
| 1 | Sholay | Ramesh Sippy | Sippy Films |
| 2 | Dilwale Dulhania Le Jayenge | Aditya Chopra | Yash Raj Films |
| 3 | Kabhi Khushi Kabhie Gham | Karan Johar | Dharma Productions |
| 4 | Kuch KuchwHota Hai | Karan Johar | Dharma Productions |
| 5 | Lagaan | Ashutosh Gowariker | Aamir Khan Productions |
| 6 | Dangal | Nitesh Tiwari | Aamir Khan Productions |
| 7 | Bajrangi Bhaijaan | Kabir Khan | Salman Khan Films |
| 8 | My Name Is Khan | Karan Johar | Dharma Productions |
| 9 | Gully Boy | Zoya Akhtar | Excel Entertainment |
| 10 | Zindagi Na Milegi Dobara | Zoya Akhtar | Excel Entertainment |
(chatGPT knows Bollywood 😄)
If one of the production house changes its name, we need to update multiple rows.
In a small example like this one, this is not a problem since the query is trivial but in large databases some rows may not be updated.
By moving the production house data in its own separate table, then updating its name would only impact one row!
Here is a table about programmers, languages and (secret) projects
| Developer_Name | Language | Project_Name |
|---|---|---|
| Lisa | Python | Atomic Blaster |
| Bart | Java | Invisible tank |
| Montgomery | Python | Atomic Blaster |
| Maggie | JavaScript | Exploding Squirrel |
| Ned | C++ | War AI |
| Homer | Python | Atomic Blaster |
| Marge | Ruby | War AI |
Let’s say that for imperative reasons we need to cover up the existence of the War AI project. And we will delete all rows that mention that project in the databse. Then an unfortunate consequence is that we will also also delete all mentions of Marge and Ned since they are not involved in other projects.
So by deleting an attribute we also remove certain valus of other attributes.
When you have different but related natural / logical entities packed together in the same table, this causes anomalies and you need to normalize.
Why is it a problem to have NULL values in a column?
Ambiguity: Null can mean “unknown”, “not applicable” or “missing” leading to confusion.
IS NULL vs =) .NULL values are ignored in aggregate functions (like SUM, COUNT).Impacts indexing and performance: since Nulls are excluded from indexes
So avoid NULL values if you can!

SQLite stores the entire database, consisting of definitions, tables, indices, and data, as a single cross-platform file.
SQLite is everywhere
SQLite comes pre-installed on macOS. First check your version:
sqlite3 --version
or install it with Homebrew:
brew install sqlite
sqlite-tools-win-x64-*.zip (or win32 for 32-bit)C:\sqlitesqlite3.exe, sqldiff.exe, and sqlite3_analyzer.exeC:\sqlitesqlite3 --version
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
choco install sqlite
sqlite3 --version
winget install SQLite.SQLite
.help: Show all commands.tables: List all tables.schema: Show table structures.quit: Exit SQLite.databases: List attached databasesThese are specific to sqlite3
In postgresql you can use
\l: list databases
\c <database_name>: connect to a database
\d: list tables
\d <table_name>: show table structure
\q: quit
You can work in the terminal
sqlite3 <database_name> ".schema"
sqlite3 <database_name> ".tables"
sqlite3 <database_name> ".databases"
or execute a sql file
sqlite3 <database_name> < <sql_file>
Note the < before the sql file name
or open a sqlite session
sqlite3 <database_name>
and then run your queries and commands
we are working on the imdb 1000 dataset a list of top 1000 movies on IMDB.
The we will create a raw database, a database with primary key and proper data types, and a normalized database.
in sqlite suffices to just type
sqlite3 imdb_raw.db
this creates the database imdb_raw.db in the current directory. It’s empty

Let’s go back to the database in sqlite
sqlite3 imdb_raw.db
and list the tables, there is none
.tables
So we create the 1st table with a CREATE TABLE statement
The original file has 11 columns: rank, title, description, genre, duration, year, rating, votes, revenue, metascore, imdb_score, ….
We want to create the table with the same column names. we’ll use a single TEXT type for all the columns to simplify things.
CREATE TABLE imdb_raw (
Poster_Link TEXT,
Series_Title TEXT,
Released_Year TEXT, -- keep TEXT for now (there are often odd values)
Certificate TEXT,
Runtime TEXT, -- e.g., "142 min"
Genre TEXT, -- comma-separated
IMDB_Rating TEXT,
Overview TEXT,
Meta_score TEXT,
Director TEXT,
Star1 TEXT,
Star2 TEXT,
Star3 TEXT,
Star4 TEXT,
No_of_Votes TEXT,
Gross TEXT -- e.g., "$44,008,000" or empty
);
We can check that the table exists with
.tables
and check the schema with
.schema
And we can import the data with
.mode csv
.import imdb_top_1000_no_header.csv imdb_raw
we need to remove the header row from the csv before importing it. so here iI’ve used a no header version of the dataset.
to see the results with columns as lines, change the mode :
.mode liness
Let’s check what we have with 2 queries
SELECT * FROM imdb_raw LIMIT 5;
And the title
SELECT Series_Title FROM imdb_raw LIMIT 5;
How many rows do we have
SELECT COUNT(*) FROM imdb_raw;
This should return 1000.
All columns have text for type for now. We need to change that
What SQLite CAN’T do with ALTER TABLE:
ALTER TABLE imdb
ALTER COLUMN IMDB_Rating TYPE FLOAT USING IMDB_Rating::FLOAT;
ALTER TABLE imdb
ADD PRIMARY KEY (id);
we need to be able to identify each row uniquely.
This is the role of the primary key.
Each table should have a primary key.
It’s an auto incrementing integer that is unique to each row.
in our case, working with sqlite, creatig the primary key is a bit complex. we leave it for now.
in standard sql, like postgresql, adding the key would take 2 steps
ALTER TABLE imdb_raw
ADD COLUMN raw_id SERIAL;
ALTER TABLE imdb_raw
ADD PRIMARY KEY (raw_id);
SELECT -- What columns?
FROM -- What table?
WHERE -- What rows?
to which we can add:
ORDER BY -- What order?
LIMIT -- How many?
and finally (out of the scope of this course)
GROUP BY -- How to group?
HAVING -- Filter groups?
we imported all the data as text.
we want to change the types of some columns.
actors: create one table with actors, and one join many to many table because movies have many actors
genres: we can create a table with genres, and one join many to many table because movies have many genres
directors: we can create a table with directors, and one join many to one table because movies have one director
The sequence of sql statements to add a primary key, chage datatypes and normalize the dataset is available here
when we select a set of columns and not all the columns we are doing a projection
select * from imdb_raw -- returns all the columns
whereas
select Series_Title, Released_Year from imdb_raw -- returns only the Series_Title and released year column
we can apply functions to the columns during the filtering
for instance let’s calculate some stats on the IMDB rating
select avg(IMDB_Rating), max(IMDB_Rating), min(IMDB_Rating) from imdb_raw;
when we select a set of rows and not all the rows we are doing a filtering
select Series_Title from imdb_raw where IMDB_Rating > 8;
we can use multiple conditions
select Series_Title
from imdb_raw
where IMDB_Rating > 8
AND Released_Year > 2000;
select Series_Title
from imdb_raw
where IMDB_Rating > 8
OR Released_Year > 2000;