SHA256 Functions Worksheet

SHA256, pgcrypto, hashing, data integrity

Worksheet: Using Functions in PL/pgSQL

In this exercise, the goal is to generate a unique sha256 hash string for each tree entry in the treesdb_v03 database. The SHA hash string that will be generated takes multiple columns describing tree characteristics as input.

sha256 in PostgreSQL

The sha256 function in PostgreSQL is used to generate a cryptographic hash of data,. It is part of the pgcrypto extension, which provides various cryptographic functions, including hashing algorithms.

To use the sha256 function, you utilize the digest() function provided by pgcrypto. It requires two input parameters:

  1. Data: The string or data to be hashed.
  2. Algorithm: The name of the hash algorithm, in this case, 'sha256'.

Before using the function, you must install the pgcrypto extension, which can be done by running:

CREATE EXTENSION pgcrypto;

Once installed, you can hash a string as shown in the following example:

SELECT encode(digest('example_string', 'sha256'), 'hex') AS sha256_hash;

This example generates the SHA-256 hash of the string example_string and returns it in a human-readable hexadecimal format.

SELECT encode(digest('example_string', 'sha256'), 'hex') AS sha256_hash;
                           sha256_hash
------------------------------------------------------------------
 9bfb55a8406617ff3e6767ec5d27fc6b5682c3a79c415ef6e084bf7d050273e6

Benefits of sha256

Hashing the columns that fully describe a tree (such as name, domain, genre, species, variety, arrondissement) has several advantages:

1. Unique Identifier Creation

Very useful for content that varies a lot in length like articles, blog posts, etc

2. Efficient Comparison

Hashing a record also helps with:

In summary, hashing provides a compact, efficient, and secure way to uniquely identify and handle records in a database.

Your task

1. Concatenate Categorical Columns

2. Generate SHA256 Hash

You can use \df+ digest to find which version of the function you want to inspect. and then

SELECT pg_catalog.pg_get_functiondef('digest(bytea, text)'::regprocedure);

To see the definition for digest(bytea, text).

3. Create SQL Function for Tree Hash

4. Add sha_id Column

This column will store the hash values for each tree.

5. Modify Function: Insert Hash Value

6. Test the Function on Several Trees

7. Update Function: Handle Subset of Trees

8. Ensure Uniqueness of Hash Values

9. PL/pgSQL Function: Generate Hash for All Trees

Additional Notes