Writing Scalable Functions
function design, PL/pgSQL, best practices, exception handling
Writing functions
TOC
Context
In the last session we saw how to write SQL functions and PL/pgSQL functions.
To start today’s class I’d like to go over
- General best practice to write functions
- Specifics of PL/pgSQL functions
Functions that scale
How would you define what is a well written function ?
We can look at 2 aspects to begin with
- proper naming
- single responsibility principle
Naming
A good way to name function is to have its name that starts with an action verb followed by the name of the object it acts upon.
For instance
- retrieve_use_record
- count_hits
- convert_rows_to_json
Some additional best practices for naming functions:
- Be specific: Names should precisely describe the function’s task.
- Use consistent terminology: Stick to a specific style or terminology throughout your codebase.
- Avoid abbreviations: Write full words to keep things clear unless abbreviations are universally understood in your team or project.
- Length: Aim for names that are long enough to be descriptive but not overly verbose.
Single responsibility principle
This is super important
- it facilitates comprehension
- it helps with writing automated tests
- it greatly reduces bugs and erratic behaviors
The SRP comes down to :
A function has one job and one job only !
Don’t
- extract data and transform it,
- log in a user and send an email,
- launch the video, update views counts, and generate subtitles,
… in the same function
Read more about the SRP on wikipedia
How to write a good prompt ?
Which leads me to good prompting techniques
Working with an LLM can be frustrating and depressing. Not because the answers are wrong (might be when it comes to coding) but because if done without effort, lazily, it kills the pleasure we get from creative work.
Instead of simply asking a questions and copy pasting the answer, you need to really engage with the LLM.
Difference between
from ;
What are best practices for naming functions?
to
I’m looking into best practices to name a function in a high level programming language (python, … or even PL/pgSQL).
My advice would be
- the function name starts with a verb followed by the name of the object it acts upon.
What do you think ?
In short
- give context
- ask for review, opinion or feedback
- reword what you understood
What are your techniques for writing efficient prompts ?
PL/pgSQL functions
The structure of a PL/pgSQL function
The following sections are taken from https://www.postgresqltutorial.com/postgresql-plpgsql/plpgsql-block-structure/. Please refer to this site for further details.
Blocks
PL/pgSQL blocks have a structure that includes a declaration section (optional) and an execution section (mandatory). Inside a block, variables are declared first, followed by the logic executed. Blocks can be nested, with labels for referencing.
Example:
DO $$
DECLARE
total_sales NUMERIC;
BEGIN
SELECT SUM(amount) INTO total_sales FROM sales;
RAISE NOTICE 'Total Sales: %', total_sales;
END $$;
Here, total_sales
is declared, a query calculates the total sales, and the result is displayed using RAISE NOTICE
.
Variables
In PL/pgSQL, variables are declared in the declaration section of a block and hold temporary data. You can initialize them upon declaration using :=
or by copying another variable’s data type using %TYPE
. Variables can have different scopes depending on where they’re declared.
Example:
CREATE OR REPLACE FUNCTION calculate_discount()
RETURNS NUMERIC AS $$
DECLARE
total_sales NUMERIC := 0;
discount NUMERIC;
BEGIN
SELECT SUM(amount) INTO total_sales FROM sales;
discount := total_sales * 0.1;
RAISE NOTICE 'Total Sales: %, Discount: %', total_sales, discount;
RETURN discount;
END $$
LANGUAGE plpgsql;
This declares and initializes variables and uses them in calculations.
Select INTO
In PL/pgSQL, the SELECT INTO
statement assigns the result of a query to variables. This is useful when retrieving a single value or multiple columns into variables within a function. You can use SELECT INTO
to process values and perform further calculations or return the result.
Example: Total Play Count for a Spotify Artist
CREATE OR REPLACE FUNCTION get_artist_play_count(artist_name TEXT)
RETURNS NUMERIC AS $$
DECLARE
total_plays NUMERIC;
BEGIN
SELECT SUM(play_count) INTO total_plays
FROM tracks
WHERE artist = artist_name;
RAISE NOTICE 'Total plays for %: %', artist_name, total_plays;
RETURN total_plays;
END $$
LANGUAGE plpgsql;
This function calculates the total play count for an artist in the Spotify database by selecting the sum of play_count
for all tracks associated with the given artist name.
RECORD Types
In PL/pgSQL, record types allow you to handle rows of data without needing to define a fixed structure. A record can hold a row from any table or query, and its fields are dynamically assigned at runtime. You can use %ROWTYPE
to copy the structure of a table.
This function retrieves the top artist by play count from a Spotify-like database.
CREATE OR REPLACE FUNCTION get_top_artist()
RETURNS RECORD AS $$
DECLARE
artist_record RECORD;
BEGIN
SELECT artist_name, COUNT(*) as plays INTO artist_record
FROM spotify_tracks
GROUP BY artist_name
ORDER BY plays DESC
LIMIT 1;
RETURN artist_record;
END $$
LANGUAGE plpgsql;
Declaring constant
To define a constant in PL/pgSQL, you use the following syntax:
constant_name constant data_type = expression;
RAISING NOTICE and Exceptions
In PL/pgSQL, errors and messages are handled using RAISE
to display notices, warnings, or errors. You can also use error-handling blocks with EXCEPTION
. Messages provide valuable debugging information.
Here’s a function example related to a Spotify-like database:
CREATE OR REPLACE FUNCTION calculate_streaming_revenue(artist_id INT)
RETURNS NUMERIC AS $$
DECLARE
total_revenue NUMERIC := 0;
BEGIN
BEGIN
SELECT SUM(streams * revenue_per_stream) INTO total_revenue
FROM tracks WHERE artist = artist_id;
IF total_revenue IS NULL THEN
RAISE NOTICE 'No streams found for artist ID: %', artist_id;
RETURN 0;
ELSE
RAISE NOTICE 'Total Revenue for Artist ID %: %', artist_id, total_revenue;
RETURN total_revenue;
END IF;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Error calculating revenue for artist ID %', artist_id;
END;
END $$
LANGUAGE plpgsql;
This function
- calculates an artist’s total streaming revenue and raises a message if no streams are found.
- uses an error handler that raises an exception if any unforeseen error occurs during the calculation process.
If Then Else
In PL/pgSQL, IF-THEN-ELSE
statements allow conditional logic. You can check conditions and perform different actions depending on the result.
Example:
CREATE OR REPLACE FUNCTION get_artist_popularity(artist_id INT)
RETURNS TEXT AS $$
DECLARE
popularity INT;
BEGIN
SELECT popularity INTO popularity FROM artists WHERE id = artist_id;
IF popularity > 80 THEN
RETURN 'Very Popular';
ELSIF popularity > 50 THEN
RETURN 'Moderately Popular';
ELSE
RETURN 'Less Popular';
END IF;
END $$
LANGUAGE plpgsql;
This function returns an artist’s popularity description based on their popularity score.
Looping
In PL/pgSQL, loops allow you to repeat actions multiple times. You can use basic loops, FOR
, WHILE
, and EXIT
to control the flow.
Example:
CREATE OR REPLACE FUNCTION get_top_streams(p_artist_id INT)
RETURNS INT AS $$
DECLARE
v_total_streams INT := 0;
v_track RECORD;
BEGIN
FOR v_track IN SELECT streams FROM tracks WHERE artist_id = p_artist_id LOOP
v_total_streams := v_total_streams + v_track.streams;
END LOOP;
RETURN v_total_streams;
END $$
LANGUAGE plpgsql;
This function loops through an artist’s tracks to calculate the total streams.
Function overloading
In PL/pgSQL, function overloading allows you to create multiple functions with the same name but different parameter types or numbers. PostgreSQL automatically selects the appropriate function based on the provided arguments.
Example:
CREATE OR REPLACE FUNCTION get_artist_revenue(p_artist_id INT)
RETURNS NUMERIC AS $$
BEGIN
RETURN (SELECT SUM(streams * revenue_per_stream) FROM tracks WHERE artist_id = p_artist_id);
END $$
LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION get_artist_revenue(p_artist_name TEXT)
RETURNS NUMERIC AS $$
BEGIN
RETURN (SELECT SUM(streams * revenue_per_stream) FROM tracks WHERE artist = p_artist_name);
END $$
LANGUAGE plpgsql;
This example overloads the get_artist_revenue
function to accept either an artist’s ID or name.
Naming parameters and variables
In PL/pgSQL, using prefixes like v_
for variables and p_
for parameters helps distinguish between the two and makes the code more readable. This convention avoids confusion when variables and parameters have similar names, making it clear which are local to the function and which are passed in.
For example:
v_total_sales
for a variable holding total sales.p_artist_id
for a function parameter representing an artist’s ID.
Other common prefixes include r_
for records and c_
for constants.
Practice
Let’s write the function or set of functions that creats the hash from the tree attributes.
For the sake of simplicity we will use the non normalized version of the treesdb
: treesdb_v02
Connect to treesdb_v02 database on your local
- connect to your PostgreSQL server with a PSQL session on your local, and connect to the
treesdb_v02
database - if the database is not already on your server, restore it from https://github.com/SkatAI/epitadb/blob/master/data/treesdb_v02.01.sql.backup
The goal is to create a sha256 generated hash string for each tree using a concatenation of some of its attributes: genre, varieties, species, and longitude and latitude (geolocation[0], geolocation[1])
We write a series of functions. Each with a unique task
- concatenate tree attributes
- generate the hash string
- inserts the has string into a new column
Viewing existing functions
You can view the functions you created and their input types with \df
and their definition with \sf <function_name>
concatenation
CREATE OR REPLACE FUNCTION concat_tree_attributes(p_tree_id integer)
RETURNS text AS
$$
DECLARE
v_concatenated_text text;
v_genre text;
v_species text;
v_variety text;
v_lon text;
v_lat text;
BEGIN
-- Fetch the tree attributes
SELECT COALESCE(genre, 'UNK'),
COALESCE(species, 'UNK'),
COALESCE(variety, 'UNK'),
COALESCE(CAST(geolocation[0] AS text), 'UNK'),
COALESCE(CAST(geolocation[1] AS text), 'UNK')
INTO v_genre, v_species, v_variety, v_lon, v_lat
FROM public.trees
WHERE id = p_tree_id;
-- Concatenate the attributes with '-' and replace spaces with '_'
v_concatenated_text := CONCAT_WS('-',v_genre, v_species, v_variety, v_lon, v_lat);
v_concatenated_text := REPLACE(v_concatenated_text , ' ', '_');
RETURN v_concatenated_text;
END;
$$
LANGUAGE plpgsql;
In this function, identify
- the RETURNS data type declaration
- the
$$
- the declaration of internal variables
- in the select statement, the
INTO
- how we split the calls to concat and REPLACE over 2 lines instead of chaining them ?
Test test test and test again
It’s important to always test your functions.
At least on a few examples. Take a random tree_id
and look at the output of the function.
select concat_tree_attributes(88);
concat_tree_attributes
-------------------------------------------------------
Prunus-UNK-Kanzan-2.387247686120145-48.86610339939845
ok now the second function. But first let’s check that the sha256 extension is installed
Activate Extension
The sha256 function is part of the pgcrypto
extension
You can list installed extension with
\dx
On treesdb_v02
you should just see PL/pgSQl.
\dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
Activate the pgcrypto
extension with
CREATE EXTENSION pgcrypto;
and now it’s listed
\dx
List of installed extensions
Name | Version | Schema | Description
----------+---------+------------+------------------------------
pgcrypto | 1.3 | public | cryptographic functions
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
To list all extensions available for installation
select * from pg_available_extensions order by name;
which returns
name | default_version | installed_version | comment
--------------------+-----------------+-------------------+--------------------------------------------------
adminpack | 2.1 | [null] | administrative functions for PostgreSQL
amcheck | 1.3 | [null] | functions for verifying relation integrity
autoinc | 1.0 | [null] | functions for autoincrementing fields
bloom | 1.0 | [null] | bloom access method - signature file based index
...
generate hash
CREATE OR REPLACE FUNCTION generate_sha256_hash(p_input_text text)
RETURNS text AS
$$
DECLARE
v_hashed_text text;
BEGIN
-- Generate SHA-256 hash and convert it to a hexadecimal string
v_hashed_text := encode(digest(p_input_text, 'sha256'), 'hex');
RETURN v_hashed_text;
END;
$$
LANGUAGE plpgsql;
Note
- the name and type of the input parameter
- the data type that is returned
Listing digest
and encode
with \dx
, why do we need to encode
the output of digest
?
test test test
Your function should return:
select generate_sha256_hash('hello world');
generate_sha256_hash
------------------------------------------------------------------
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
Create a new column in the trees table
simply done with
alter table trees add column hash text;
and insert the hash into the column
Given a tree id and its hash, the last function inserts the hash into the corresponding tree