Writing Scalable Functions

function design, PL/pgSQL, best practices, exception handling

Writing functions

TOC

Context

In the last session we saw how to write SQL functions and PL/pgSQL functions.

To start today’s class I’d like to go over

Functions that scale

How would you define what is a well written function ?

We can look at 2 aspects to begin with

Naming

A good way to name function is to have its name that starts with an action verb followed by the name of the object it acts upon.

For instance

Some additional best practices for naming functions:

Single responsibility principle

This is super important

The SRP comes down to :

A function has one job and one job only !

Don’t

… in the same function

Read more about the SRP on wikipedia

How to write a good prompt ?

Which leads me to good prompting techniques

Working with an LLM can be frustrating and depressing. Not because the answers are wrong (might be when it comes to coding) but because if done without effort, lazily, it kills the pleasure we get from creative work.

Instead of simply asking a questions and copy pasting the answer, you need to really engage with the LLM.

Difference between

from ;

What are best practices for naming functions?

to

I’m looking into best practices to name a function in a high level programming language (python, … or even PL/pgSQL).

My advice would be

  • the function name starts with a verb followed by the name of the object it acts upon.

What do you think ?

In short

What are your techniques for writing efficient prompts ?

PL/pgSQL functions

The structure of a PL/pgSQL function

The following sections are taken from https://www.postgresqltutorial.com/postgresql-plpgsql/plpgsql-block-structure/. Please refer to this site for further details.

Blocks

PL/pgSQL blocks have a structure that includes a declaration section (optional) and an execution section (mandatory). Inside a block, variables are declared first, followed by the logic executed. Blocks can be nested, with labels for referencing.

Example:

DO $$
DECLARE
    total_sales NUMERIC;
BEGIN
    SELECT SUM(amount) INTO total_sales FROM sales;
    RAISE NOTICE 'Total Sales: %', total_sales;
END $$;

Here, total_sales is declared, a query calculates the total sales, and the result is displayed using RAISE NOTICE.

Variables

In PL/pgSQL, variables are declared in the declaration section of a block and hold temporary data. You can initialize them upon declaration using := or by copying another variable’s data type using %TYPE. Variables can have different scopes depending on where they’re declared.

Example:

CREATE OR REPLACE FUNCTION calculate_discount()
RETURNS NUMERIC AS $$
DECLARE
    total_sales NUMERIC := 0;
    discount NUMERIC;
BEGIN
    SELECT SUM(amount) INTO total_sales FROM sales;
    discount := total_sales * 0.1;
    RAISE NOTICE 'Total Sales: %, Discount: %', total_sales, discount;
    RETURN discount;
END $$
LANGUAGE plpgsql;

This declares and initializes variables and uses them in calculations.

Select INTO

In PL/pgSQL, the SELECT INTO statement assigns the result of a query to variables. This is useful when retrieving a single value or multiple columns into variables within a function. You can use SELECT INTO to process values and perform further calculations or return the result.

Example: Total Play Count for a Spotify Artist

CREATE OR REPLACE FUNCTION get_artist_play_count(artist_name TEXT)
RETURNS NUMERIC AS $$
DECLARE
    total_plays NUMERIC;
BEGIN
    SELECT SUM(play_count) INTO total_plays
    FROM tracks
    WHERE artist = artist_name;

    RAISE NOTICE 'Total plays for %: %', artist_name, total_plays;
    RETURN total_plays;
END $$
LANGUAGE plpgsql;

This function calculates the total play count for an artist in the Spotify database by selecting the sum of play_count for all tracks associated with the given artist name.

RECORD Types

In PL/pgSQL, record types allow you to handle rows of data without needing to define a fixed structure. A record can hold a row from any table or query, and its fields are dynamically assigned at runtime. You can use %ROWTYPE to copy the structure of a table.

This function retrieves the top artist by play count from a Spotify-like database.

CREATE OR REPLACE FUNCTION get_top_artist()
RETURNS RECORD AS $$
DECLARE
    artist_record RECORD;
BEGIN
    SELECT artist_name, COUNT(*) as plays INTO artist_record
    FROM spotify_tracks
    GROUP BY artist_name
    ORDER BY plays DESC
    LIMIT 1;
    RETURN artist_record;
END $$
LANGUAGE plpgsql;

Declaring constant

To define a constant in PL/pgSQL, you use the following syntax:

constant_name constant data_type = expression;

RAISING NOTICE and Exceptions

In PL/pgSQL, errors and messages are handled using RAISE to display notices, warnings, or errors. You can also use error-handling blocks with EXCEPTION. Messages provide valuable debugging information.

Here’s a function example related to a Spotify-like database:

CREATE OR REPLACE FUNCTION calculate_streaming_revenue(artist_id INT)
RETURNS NUMERIC AS $$
DECLARE
    total_revenue NUMERIC := 0;
BEGIN
    BEGIN
        SELECT SUM(streams * revenue_per_stream) INTO total_revenue
        FROM tracks WHERE artist = artist_id;

        IF total_revenue IS NULL THEN
            RAISE NOTICE 'No streams found for artist ID: %', artist_id;
            RETURN 0;
        ELSE
            RAISE NOTICE 'Total Revenue for Artist ID %: %', artist_id, total_revenue;
            RETURN total_revenue;
        END IF;

    EXCEPTION
        WHEN OTHERS THEN
            RAISE EXCEPTION 'Error calculating revenue for artist ID %', artist_id;
    END;
END $$
LANGUAGE plpgsql;

This function

If Then Else

In PL/pgSQL, IF-THEN-ELSE statements allow conditional logic. You can check conditions and perform different actions depending on the result.

Example:

CREATE OR REPLACE FUNCTION get_artist_popularity(artist_id INT)
RETURNS TEXT AS $$
DECLARE
    popularity INT;
BEGIN
    SELECT popularity INTO popularity FROM artists WHERE id = artist_id;

    IF popularity > 80 THEN
        RETURN 'Very Popular';
    ELSIF popularity > 50 THEN
        RETURN 'Moderately Popular';
    ELSE
        RETURN 'Less Popular';
    END IF;
END $$
LANGUAGE plpgsql;

This function returns an artist’s popularity description based on their popularity score.

Looping

In PL/pgSQL, loops allow you to repeat actions multiple times. You can use basic loops, FOR, WHILE, and EXIT to control the flow.

Example:

CREATE OR REPLACE FUNCTION get_top_streams(p_artist_id INT)
RETURNS INT AS $$
DECLARE
    v_total_streams INT := 0;
    v_track RECORD;
BEGIN
    FOR v_track IN SELECT streams FROM tracks WHERE artist_id = p_artist_id LOOP
        v_total_streams := v_total_streams + v_track.streams;
    END LOOP;
    RETURN v_total_streams;
END $$
LANGUAGE plpgsql;

This function loops through an artist’s tracks to calculate the total streams.

Function overloading

In PL/pgSQL, function overloading allows you to create multiple functions with the same name but different parameter types or numbers. PostgreSQL automatically selects the appropriate function based on the provided arguments.

Example:

CREATE OR REPLACE FUNCTION get_artist_revenue(p_artist_id INT)
RETURNS NUMERIC AS $$
BEGIN
    RETURN (SELECT SUM(streams * revenue_per_stream) FROM tracks WHERE artist_id = p_artist_id);
END $$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION get_artist_revenue(p_artist_name TEXT)
RETURNS NUMERIC AS $$
BEGIN
    RETURN (SELECT SUM(streams * revenue_per_stream) FROM tracks WHERE artist = p_artist_name);
END $$
LANGUAGE plpgsql;

This example overloads the get_artist_revenue function to accept either an artist’s ID or name.

Naming parameters and variables

In PL/pgSQL, using prefixes like v_ for variables and p_ for parameters helps distinguish between the two and makes the code more readable. This convention avoids confusion when variables and parameters have similar names, making it clear which are local to the function and which are passed in.

For example:

Other common prefixes include r_ for records and c_ for constants.

Practice

Let’s write the function or set of functions that creats the hash from the tree attributes.

For the sake of simplicity we will use the non normalized version of the treesdb: treesdb_v02

Connect to treesdb_v02 database on your local

The goal is to create a sha256 generated hash string for each tree using a concatenation of some of its attributes: genre, varieties, species, and longitude and latitude (geolocation[0], geolocation[1])

We write a series of functions. Each with a unique task

Viewing existing functions

You can view the functions you created and their input types with \df

and their definition with \sf <function_name>

concatenation

CREATE OR REPLACE FUNCTION concat_tree_attributes(p_tree_id integer)
RETURNS text AS
$$
DECLARE
    v_concatenated_text text;
    v_genre text;
    v_species text;
    v_variety text;
    v_lon text;
    v_lat text;
BEGIN
    -- Fetch the tree attributes
    SELECT COALESCE(genre, 'UNK'),
           COALESCE(species, 'UNK'),
           COALESCE(variety, 'UNK'),
           COALESCE(CAST(geolocation[0] AS text), 'UNK'),
           COALESCE(CAST(geolocation[1] AS text), 'UNK')
    INTO v_genre, v_species, v_variety, v_lon, v_lat
    FROM public.trees
    WHERE id = p_tree_id;

    -- Concatenate the attributes with '-' and replace spaces with '_'
    v_concatenated_text := CONCAT_WS('-',v_genre, v_species, v_variety, v_lon, v_lat);

    v_concatenated_text := REPLACE(v_concatenated_text , ' ', '_');

    RETURN v_concatenated_text;
END;
$$
LANGUAGE plpgsql;

In this function, identify

Test test test and test again

It’s important to always test your functions.

At least on a few examples. Take a random tree_id and look at the output of the function.

select concat_tree_attributes(88);
                concat_tree_attributes
-------------------------------------------------------
 Prunus-UNK-Kanzan-2.387247686120145-48.86610339939845

ok now the second function. But first let’s check that the sha256 extension is installed

Activate Extension

The sha256 function is part of the pgcrypto extension

You can list installed extension with

\dx

On treesdb_v02 you should just see PL/pgSQl.

\dx
                 List of installed extensions
  Name   | Version |   Schema   |         Description
---------+---------+------------+------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language

Activate the pgcrypto extension with

CREATE EXTENSION pgcrypto;

and now it’s listed

\dx
                  List of installed extensions
   Name   | Version |   Schema   |         Description
----------+---------+------------+------------------------------
 pgcrypto | 1.3     | public     | cryptographic functions
 plpgsql  | 1.0     | pg_catalog | PL/pgSQL procedural language

To list all extensions available for installation

select * from pg_available_extensions order by name;

which returns

        name        | default_version | installed_version |                  comment
--------------------+-----------------+-------------------+--------------------------------------------------
 adminpack          | 2.1             | [null]            | administrative functions for PostgreSQL
 amcheck            | 1.3             | [null]            | functions for verifying relation integrity
 autoinc            | 1.0             | [null]            | functions for autoincrementing fields
 bloom              | 1.0             | [null]            | bloom access method - signature file based index
 ...

generate hash

CREATE OR REPLACE FUNCTION generate_sha256_hash(p_input_text text)
RETURNS text AS
$$
DECLARE
    v_hashed_text text;
BEGIN
    -- Generate SHA-256 hash and convert it to a hexadecimal string
    v_hashed_text := encode(digest(p_input_text, 'sha256'), 'hex');

    RETURN v_hashed_text;
END;
$$
LANGUAGE plpgsql;

Note

Listing digest and encode with \dx, why do we need to encode the output of digest?

test test test

Your function should return:

select generate_sha256_hash('hello world');
                       generate_sha256_hash
------------------------------------------------------------------
 b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9

Create a new column in the trees table

simply done with

alter table trees add column hash text;

and insert the hash into the column

Given a tree id and its hash, the last function inserts the hash into the corresponding tree