Postgresql Loading data

A course on Intermediate databases - Epita Fall 2025

Backups and Restore data dumps


plan

Your turn with the oneline dataset available at


Demo : data backup

On my local I have a several databases \l

pgAdmin

pgAdmin list databases

CLI

pgAdmin list databases


Let’s back up one of these, for instance the treesdb_v01 database

pgAdmin backup database


in the data options tab, select

pgAdmin backup database


in the “Query Options section”, select

pgAdmin backup database

Nothing to do for the other tabs

and click Backup


Process

pgAdmin backup database

if we click on the process we see the CLI equivalent

pgAdmin backup database

/<path>/pg_dump
--file "/Users/alexis/treesdb_backup.pg_dump"
--host "localhost" --port "5432" --username "alexis" --no-password
--format=p
--large-objects
--encoding "UTF8"
--section=pre-data
--section=data
--create
--clean
--if-exists
--verbose "treesdb_v01"

We can find all the settings that we selected in pgAdmin

I can copy paste that command into my terminal and it will / should work


Now let’s … restore

Download the dataset

And restore it

Since the dump file is in plain text, it is just SQL statements that we can execute with psql.

psql -U alexis -h localhost -p 5432 -f <path>/treesdb_backup.pg_dump postgres

The -f simply says execute the file treesdb_backup.pg_dump.

Note

If I had used format=c (compressed) then to restore it I would have used pg_restore

pg_restore -U alexis -h localhost -p 5432 -d treesdb_backup <path>/treesdb_backup.pg_dump

✅ run a sql file In pgAdmin

First rename your file from treesdb_backup.pg_dump to treesdb_backup.pg_dump.sql

pgAdmin won’t open the file when it does not have the .sql extension

Then to run a dump file or any SQL file:

This will execute the SQL statements in your dump (including CREATE DATABASE if you included it).

pgAdmin restore database

At this point you can simply use the PSQL tool in pgAdmin and run the above command

pgAdmin PSQL database


from now on, I will only use PSQL


The titanic dataset is a classic dataset used in machine learning competitions

titanic

A list of the 891 passengers with their survival status, age, sex, class, fare, etc.

The kaggle competition is to predict the survival of the passengers based on their features.

Here we will just load the dataset into a database and review some basic SQL queries.

The goal of the exercise is to be able to import any csv dataset into a database.


Titanic CSV fields are:


Steps

In order to import a csv file into a database

we will


Server vs Database vs Tables


database creation

CREATE DATABASE titanicdb;

in pgAdmin, right click on Databases (n) and select Create Database

It will show the SQL command that will be executed

CREATE DATABASE titanicdb
    WITH
    OWNER = alexis
    ENCODING = 'UTF8'
    LOCALE_PROVIDER = 'libc'
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

Note: it’s always good practice to name the database with a db postfix or prefix.

once the titanicdb has been created connect to it with

\c titanicdb

CLI

you can use the createdb command to create a database from the command line

createdb titanicdb

Create the table

CREATE TABLE titanic (
    passenger_id SERIAL PRIMARY KEY,
    survived     BOOLEAN,
    pclass       SMALLINT ,
    name         TEXT NOT NULL,
    sex          VARCHAR(10),
    age          REAL,
    sibsp        SMALLINT,
    parch        SMALLINT,
    ticket       TEXT,
    fare         NUMERIC(8,2),
    cabin        TEXT,
    embarked     CHAR(1)
);

where

This is very precise. we could have used text for all strings and int / real for numbers

Note: there are no constraints on the table.


check

\d lists all the assets (tables, views, functions, etc.) in the current database

we have

localhost alexis@titanicdb=# \d
                   List of relations
 Schema |           Name           |   Type   | Owner
--------+--------------------------+----------+--------
 public | titanic                  | table    | alexis
 public | titanic_passenger_id_seq | sequence | alexis

sequence ?

Let’s check out the sequence

pgAdmin sequence

shows the behavior of the passenger_id serial number: from 1, increment 1, etc


Table

able

Notice in the last column.

 nextval('titanic_passenger_id_seq'::regclass)

We can actually select the next value of the sequence

SELECT nextval('titanic_passenger_id_seq'::regclass);

Since our table is empty, this will return 1, then 2 etc


Import the csv data

Now to import the csv file into the table

COPY titanic FROM '/path/to/titanic.csv' DELIMITER ',' CSV HEADER;

or

\copy titanic FROM '/path/to/titanic.csv' DELIMITER ',' CSV HEADER;

There may be permission issues with the 1st COPY clause.

Note: you must use the absolute path to the file

if everything goes well you should have

COPY 891

and to verify this query should also return 891

SELECT COUNT(*) FROM titanic;

The CSV Import Challenge

Common Problems You’ll Face::


Now that we have imported the data, let’s practice some basic queries.

practice-queries-titanic