In this exercise, the goal is to generate a unique sha256 hash string for each tree entry in the treesdb_v03 database.
The SHA hash string that will be generated takes multiple columns describing tree characteristics as input.
sha256 in PostgreSQLThe sha256 function in PostgreSQL is used to generate a cryptographic hash of data,. It is part of the pgcrypto extension, which provides various cryptographic functions, including hashing algorithms.
To use the sha256 function, you utilize the digest() function provided by pgcrypto. It requires two input parameters:
'sha256'.Before using the function, you must install the pgcrypto extension, which can be done by running:
CREATE EXTENSION pgcrypto;
Once installed, you can hash a string as shown in the following example:
SELECT encode(digest('example_string', 'sha256'), 'hex') AS sha256_hash;
This example generates the SHA-256 hash of the string example_string and returns it in a human-readable hexadecimal format.
SELECT encode(digest('example_string', 'sha256'), 'hex') AS sha256_hash;
sha256_hash
------------------------------------------------------------------
9bfb55a8406617ff3e6767ec5d27fc6b5682c3a79c415ef6e084bf7d050273e6
Hashing the columns that fully describe a tree (such as name, domain, genre, species, variety, arrondissement) has several advantages:
sha256, you create a unique, fixed-length identifier (hash) for each tree. This ensures that even with varying column lengths, the hash size remains constant.Very useful for content that varies a lot in length like articles, blog posts, etc
Hashing a record also helps with:
In summary, hashing provides a compact, efficient, and secure way to uniquely identify and handle records in a database.
name, domain, genre, species, variety, arrondissement.
COALESCE to replace any NULL values with 'UNK' in the concatenated result.COALESCE(column_name, 'UNK').digest and encode functions definitionYou can use \df+ digest to find which version of the function you want to inspect.
and then
SELECT pg_catalog.pg_get_functiondef('digest(bytea, text)'::regprocedure);
To see the definition for digest(bytea, text).
sha256 function.sha256 hash string.id as input.sha256 hash generated from the concatenated columns of that tree.sha_id Columnsha_id of type text to the trees table .This column will store the hash values for each tree.
sha_id column.sha_id column.ids or a query that selects a subset of trees.sha_id column only contains unique values.PL/pgSQL function that:
sha256 hash on all rows in the database.sha_id column.NULL values in categorical columns.sha_id values.RAISE NOTICE statements within PL/pgSQL to help with debugging.