SQL Functions COALESCE and Strings
COALESCE, NULL handling, string functions, volatility
Functions
We’ve used functions in PostgreSQL before.
Simple math functions like SUM(), AVG(), MAX(), MIN() etc
In this document we look at important SQL functions you should know about.
Then we create functions based on SELECT statements.
- how to create a new SQL function
- difference in function volatility
- function overloading
- benefits of functions over simple queries
SQL functions use simple queries. They are efficient but lack any logic control such as LOOPs and variables.
For that we need a procedural language such as PL/pgSQL.
Important SQL functions
COALESCE
see https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL
The COALESCE function returns the first of its arguments that is not null.
If all arguments are null, COALESCE returns null.
select COALESCE(1,2,3,4,5,6) -> 1
select COALESCE(NULL,2) -> 2
select COALESCE(NULL,NULL) -> NULL
If the last column is a string COALESCE will always return that string if the other arguments are NULL.
So COALESCE is often used to substitute a default value for null values when data is retrieved for display, for example:
This query returns description
if it is not null, otherwise short_description
if it is not null, otherwise (none)
.
SELECT COALESCE(description, short_description, '(none)')
This query returns unknown
when domain
is null
SELECT COALESCE(domain, 'unknown' ) from trees; --
Practice
We’re working with the normalized version of the trees database. treesdb_v03
Using the trees table. select a random number of trees, (id
, height
, remarkable
).
If the column remarkable
is null, display unknown
instead
Note: all arguments of
COALESCE
need to have the same type. To cast a Boolean into a text usename_of_boolean_column::text
.
[solution]
<your query>
CONCAT function
To concatenate string you can use the CONCAT
function which concatenates the text representations of all the arguments and ignores NULL arguments.
concat ( val1 "any" [, val2 "any" [, ...] ] ) → text
For instance:
concat('abcde', 2, NULL, 22) → abcde222
the ||
symbol
You can also use the ||
symbol to concatenate strings.
text || text → text
Concatenates the two strings.
'Post' || 'greSQL' → PostgreSQL
Exercise
Using the treesdb_v03
, concatenate multiple columns into one
columns are :
- name
- genre
- species
- variety
- domain
- arrondissement
also
- if one value is null, replace with
UNK
, usingCOALESCE
. - use
concat_ws
to add a ‘-‘ (upper dash) between all the columns - replace all the spaces with a column value with ‘_’ with
REPLACE()
see https://www.postgresql.org/docs/current/functions-string.html
Note: you don’t have to return all the rows each time. You can limit to N random() rows;
Note: first build the main query that returns all the columns, then coalesce, then concatenate, then replace
<your query>
Create your own SQL function
You can create a SQL function with:
CREATE [OR REPLACE] FUNCTION function_name(parameter1 type, parameter2 type, ...)
RETURNS return_type
AS $$
SQL_statement()
$$ LANGUAGE SQL;
where SQL_STATEMENT
is a SQL query.
For instance
CREATE OR REPLACE FUNCTION add_numbers(a integer, b integer)
RETURNS integer
AS $$
SELECT a + b;
$$ LANGUAGE SQL;
which you can then use directly with :
SELECT add_numbers(5, 3);
Key points about SQL vs PL/pgSQL functions
- Performance: SQL functions can often be inlined by the query planner, potentially leading to better performance than PL/pgSQL functions for simple operations.
- Simplicity: They’re easier to write and understand for straightforward database operations.
- Limitations: SQL functions have limited procedural capabilities compared to PL/pgSQL. They can’t use variables, loops, or complex control structures.
Why use Functions instead of SQL statement
If SQL functions are just simple queries, why use them ? Why not write the SQL query instead ?
Using a simple SQL function to insert a new row versus using a direct INSERT
query has a few potential benefits and trade-offs, depending on your use case. Here are some points to consider:
Benefits of using a SQL function for INSERT operations:
- Reusability:
- If you frequently need to insert rows with similar logic (such as setting default values or checking conditions), you can encapsulate that logic in a function. This avoids writing the same
INSERT
query multiple times in different parts of your application. - Example: If you always insert a customer and automatically assign a default discount rate, this can be done inside a function.
- If you frequently need to insert rows with similar logic (such as setting default values or checking conditions), you can encapsulate that logic in a function. This avoids writing the same
- Abstraction:
- A function can abstract the underlying structure of the table. If you ever change the schema of the table (e.g., adding new columns), you can modify the function rather than updating every
INSERT
query across your application. - Example: If a column is added to a table, the function can take care of inserting default or derived values without needing to update application code.
- A function can abstract the underlying structure of the table. If you ever change the schema of the table (e.g., adding new columns), you can modify the function rather than updating every
- Encapsulation of Logic:
- Functions can enforce certain business logic, such as constraints or transformations, before performing the insert. For example, a function could check conditions or modify input data before inserting it, even though you’re using a simple SQL function.
- Example: A function could perform an
INSERT
only if a certain condition is met, like checking if the record already exists in the table.
- Parameter Handling:
- A function provides a simple interface where you pass parameters, and the function takes care of handling them, making the code cleaner when used from an external application.
- Security and Permissions:
- You can grant users or roles permission to execute the function without giving them direct
INSERT
permissions on the table. This can be useful for controlling access and enforcing specific behavior. - Example: You allow users to call the
insert_customer
function but do not give them directINSERT
privileges on thecustomers
table.
- You can grant users or roles permission to execute the function without giving them direct
- Transaction Control:
- Functions can be part of larger transactions. When used in combination with other database operations, a function call can contribute to an atomic transaction (all changes succeed or fail together), even in simple SQL functions.
Trade-offs of using a SQL function instead of a direct INSERT query:
- Overhead:
- There might be a slight performance overhead when calling a function compared to running a direct
INSERT
query, as the function adds an extra layer of abstraction. - For simple use cases, a direct
INSERT
query might be faster.
- There might be a slight performance overhead when calling a function compared to running a direct
- Complexity:
- If the function is very simple (e.g., just a one-liner
INSERT
without additional logic), the benefit of using a function might be minimal, and a directINSERT
query could suffice.
- If the function is very simple (e.g., just a one-liner
- Less Flexibility (in simple SQL functions):
- Simple SQL functions are limited in terms of logic and error handling compared to a direct query that you can wrap in more complex procedural code (e.g., using PL/pgSQL or an external programming language).
- Debugging and Transparency:
- If an error occurs, tracking down issues might be more complex when they are wrapped inside a function, compared to a direct
INSERT
where you can see exactly what query is being executed.
- If an error occurs, tracking down issues might be more complex when they are wrapped inside a function, compared to a direct
Example
Using a function:
CREATE OR REPLACE FUNCTION insert_customer(c_name TEXT, c_email TEXT)
RETURNS VOID AS $$
INSERT INTO customers (name, email)
VALUES (c_name, c_email);
$$ LANGUAGE sql;
-- Then you can call it:
SELECT insert_customer('John Doe', '[email protected]');
Direct query:
INSERT INTO customers (name, email)
VALUES ('John Doe', '[email protected]');
When to prefer functions over direct queries
- If you plan to reuse the logic multiple times across your codebase.
- If you need to enforce business rules or perform additional transformations.
- If you want to abstract the underlying table structure or schema.
- If you need to provide controlled access through permissions.
When to prefer direct queries
- For one-off or very simple inserts.
- When you want maximum performance and have no additional logic to handle.
In summary, functions provide reusability, encapsulation, and the ability to enforce logic, but if your insert operation is straightforward with no additional logic, a direct INSERT
query is sufficient.
Optimize SQL functions by setting their volatility
In PostgreSQL, function volatility categories (VOLATILE, STABLE, and IMMUTABLE) are important concepts that inform the query planner about the behavior of functions.
Setting the volatility of the SQL function helps the query planner optimize it.
To specify if a function should be considered as VOLATILE, STABLE or IMMUTABLE, you add the keyword after
$$ LANGUAGE SQL <VOLATILE | STABLE | IMMUTABLE>;
VOLATILE Functions
- Default category if not specified
- Behavior:
- Can modify the database
- Can return different results on successive calls with the same arguments
- Examples:
- Functions that use random number generation
- Functions that modify database state
- Functions that depend on current time
Here’s an example of a VOLATILE function:
CREATE FUNCTION get_random_number() RETURNS integer AS $$
SELECT floor(random() * 100)::integer;
$$ LANGUAGE SQL VOLATILE;
STABLE Functions
- Behavior:
- Cannot modify the database
- For the same input arguments, will return the same result within a single table scan
- Results might change across SQL statements
- Examples:
- Functions that depend on database content (but not on values that change within a single query)
- Functions that depend on configuration settings
- Current timestamp functions (within a transaction)
Here’s an example of a STABLE function:
CREATE FUNCTION get_current_timestamp() RETURNS timestamp AS $$
SELECT current_timestamp;
$$ LANGUAGE SQL STABLE;
-- Usage:
SELECT get_current_timestamp();
IMMUTABLE Functions
- Behavior:
- Cannot modify the database
- Always return the same result for the same input arguments
- Results do not depend on any database content or external state
- Examples:
- Mathematical functions
- String manipulation functions that don’t depend on database settings
Here’s an example of an IMMUTABLE function:
CREATE FUNCTION add_numbers(a integer, b integer) RETURNS integer AS $$
SELECT a + b;
$$ LANGUAGE SQL IMMUTABLE;
Key Points:
- Optimization:
- The query planner can optimize STABLE and IMMUTABLE functions more effectively than VOLATILE functions.
- IMMUTABLE function calls can be pre-computed during query planning if their arguments are constant.
- STABLE functions can be optimized in WHERE clauses and joins, as the planner knows their value won’t change within a query.
- Default Behavior:
- Functions are considered VOLATILE by default unless specified otherwise.
- Performance Impact:
- Correctly categorizing your functions can lead to significant performance improvements in complex queries.
- Incorrect Categorization:
- Marking a function as STABLE or IMMUTABLE when it’s actually VOLATILE can lead to incorrect query results.
- It’s safer to mark a function as more volatile than it actually is (e.g., STABLE instead of IMMUTABLE) if you’re unsure.
- Usage in Indexes:
- Only IMMUTABLE functions can be used directly in index definitions.
Here’s an example demonstrating how volatility affects query planning:
CREATE TABLE items (id serial PRIMARY KEY, price numeric);
-- VOLATILE function (default)
CREATE FUNCTION get_tax_rate() RETURNS numeric AS $$
SELECT 0.1; -- Assume this could change
$$ LANGUAGE SQL;
-- IMMUTABLE function
CREATE FUNCTION calculate_tax(price numeric) RETURNS numeric AS $$
SELECT price * 0.1;
$$ LANGUAGE SQL IMMUTABLE;
-- This query will recalculate get_tax_rate() for every row
EXPLAIN ANALYZE SELECT id, price * get_tax_rate() FROM items;
-- This query can optimize the calculate_tax() call
EXPLAIN ANALYZE SELECT id, calculate_tax(price) FROM items;
In the second query, the planner might be able to compute calculate_tax
just once if it determines that this optimization is beneficial.
Rule of thumb:
- VOLATILE: Use when the function can modify database state or its result can change even with the same inputs (e.g., random(), current_timestamp).
- STABLE: Use when the function doesn’t modify the database and returns consistent results for the same inputs within a single table scan, but may change across statements (e.g., now(), current_date).
- IMMUTABLE: Use when the function always returns the same result for the same arguments, regardless of any database or external state (e.g., abs(), lower()).
It’s safer to choose the more volatile option. Incorrectly marking a function as less volatile than it actually is can lead to incorrect query results, while being more conservative only potentially misses some optimization opportunities.