Neo4j Practice

Aura db

AuraDB offers free neo4j hosting.

Create an account on AuraDB and then create an instance.

Important, save the password, you won’t be able to see it afterwards

Aura Credentials

Click on the “Learning” button bottom left side of the bage,

Learning Aura

then select Query fundamentals

Query fundamentals

And connect to your instance

On page 2 of the tutorial, Load the Northwind dataset.

and click “run import”, upper right

What does the Load dataset do

we can see that multiple files have been uploaded

Query fundamentals

click on “show cypher” to understand how the data is loaded

cypher

The 1st cypher statements is

CYPHER 5 CREATE CONSTRAINT `territoryID_Territory_uniq` IF NOT EXISTS
FOR (n: `Territory`)
REQUIRE (n.`territoryID`) IS UNIQUE;

This creates a constraint named supplierID_Supplier_uniq that ensures uniqueness of the supplierID property across all nodes labeled Supplier. The constraint enforces data integrity and prevents duplicate suppliers in the dataset

the 2nd statement

Load statement

CYPHER 5 UNWIND $nodeRecords AS nodeRecord
WITH *
WHERE NOT nodeRecord.`supplierID` IN $idsToSkip AND NOT nodeRecord.`supplierID` IS NULL
MERGE (n: `Supplier` { `supplierID`: nodeRecord.`supplierID` })
SET n.`companyName` = nodeRecord.`companyName`
SET n.`contactName` = nodeRecord.`contactName`
SET n.`contactTitle` = nodeRecord.`contactTitle`
SET n.`address` = nodeRecord.`address`
SET n.`city` = nodeRecord.`city`
SET n.`region` = nodeRecord.`region`
SET n.`postalCode` = nodeRecord.`postalCode`
SET n.`country` = nodeRecord.`country`
SET n.`phone` = nodeRecord.`phone`
SET n.`fax` = nodeRecord.`fax`
SET n.`homePage` = nodeRecord.`homePage`;

Loads the dataset into the collection.

Node Creation/Update:

The subsequent SET statements populate all the supplier properties:

This is a typical bulk data loading pattern for Neo4j - you pass in an array of records as a parameter and process them all in a single transaction for efficiency.

The original $nodeRecords object is probably loaded with

LOAD CSV WITH HEADERS FROM 'file:///suppliers.csv' AS row
WITH collect({
  supplierID: toInteger(row.SupplierID),
  companyName: row.CompanyName,
  contactName: row.ContactName
  // ... other fields
}) AS nodeRecords

Then the UNWIND statement would follow

Loading a csv dataset

For nodes

3 steps:

  1. load the csv data into a node object
  2. [opt] create uniqueness constraints
  3. Unwind the node opbject to populate each element into the node collection

For relations

the cypher Load statement is

CYPHER 5 UNWIND $relRecords AS relRecord
MATCH (source: `Territory` { `territoryID`: relRecord.`territoryID` })
MATCH (target: `Region` { `regionID`: relRecord.`regionID` })
MERGE (source)-[r: `IN_REGION`]->(target);

We match the Ids of both sides of the relations with exiting node Ids.

And we must do that for every relations.

A lot more work than loading a csv or json into MongoDB, weaviate or even PostgresQL (although we need to create tables first).

Data loading prep

Data Preparation Challenges

we need

You can add constraints to enforce data integrity but schema validation really relies on upstream data cleaning and formatting work.

Including:

IDs can be of any type as long as they are unique.

Northwind dataset schema

The Northwind dataset is a fictional trading company database originally created by Microsoft for SQL Server tutorials. It represents a small trading company that imports and exports specialty foods.

Key entities in the Northwind dataset include:

The relationships between these entities demonstrate:

The relationships are defined as

Raw csv

The northwind dataset is available from the neo4j github repo: https://github.com/neo4j-contrib/northwind-neo4j

This other repo holds the csv files and the code to load the dataset https://github.com/neo4j-graph-examples/northwind

Schema viz

Once loaded you can see the whole schema with

CALL db.schema.visualization()

Northwind Schema

Let’s Explore

Explore nodes and rels

MATCH p=()-[:PURCHASED]->() RETURN p LIMIT 25;

Edit the query and remove the LIMIT 25 to see the whole graph with the PURCHASED relationship

Note the cypher query

MATCH (n:Product) RETURN n LIMIT 25;

Explore product node

A customer - purchased -> orders

So the cypher is

MATCH path=(c:Customer)-[pr:PURCHASED]->(o:Order) RETURN path LIMIT 25;

If you don’t want the graph but want to return the table instead, just return the entities and relations that you need from the query.

For instance if we want to return the customer’s contactName and companyName, the order’s orderID and shipName, the query becomes

MATCH (c:Customer)-[pr:PURCHASED]->(o:Order) RETURN c.contactName, c.companyName, o.orderID, o.shipName LIMIT 25;

and you get the table

Explore product node

Cypher query syntax

A basic Cypher query follows this structure:

MATCH [pattern] - Define what to find in the graph WHERE [conditions] - Optional: Filter the results RETURN [what to show] - Specify output LIMIT [number] - Optional: Restrict number of results

A MATCH pattern has three key components:

  1. Nodes: In parentheses ()
    • With labels: (p:Person)
    • Properties: (p:Person {name: “John”})
    • Anonymous: ()
  2. Relationships: In square brackets []
    • With types: -[:KNOWS]->
    • Direction: -> or <-
    • Anonymous: –
  3. Variables: For referencing
    • Node variables: (p:Person)
    • Relationship variables: -[r:KNOWS]->
    • Path variables: path=()–>()

Filtering, ordering, limiting, coalescing, … in cypher with the MATCH clause, is not that different from regular SQL queries syntax .

See the cheat sheet for the MATCH clause


Your turn

Knowing the relations:

Write the following queries for graph and tables

Returning more entities

This query returns the graph of suppliers who supply product > $50 and their categories.

MATCH path=(s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
WHERE p.unitPrice > 50
RETURN path

What if we also want to see the orders for these products ?

Using WITH

MATCH path=(s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
WHERE p.unitPrice > 50
WITH path, p
MATCH orderPath=(o:Order)-[: ORDERS ]->(p)
RETURN path, orderPath

this returns the first graph of suppliers, products and their categories. But also the orders for these products.

Back to practicing

Optional MATCH

OPTIONAL MATCH works like a regular MATCH but when no matching relationships/patterns are found, it returns NULL instead of filtering out the record.

This Returns ALL products with their orders (if they exist)

MATCH (p:Product)
WHERE p.productName = 'Chai'
OPTIONAL MATCH (p)<-[o:ORDERS]-()
RETURN p.productName, o;

Will show:

OPTIONAL MATCH is like a LEFT JOIN in SQL : it keeps records from the left side (Products) even when there’s no match on the right side (Orders).

So for instance looking at order counts

MATCH (p:Product)
WHERE p.produtName = "Aniseed Syrup"
OPTIONAL MATCH (p)<-[o:ORDERS]-()
RETURN p.productName,
       COUNT(o) as order_count;

Property Filters

These queries are equivalent and provide the same results.

MATCH (o:Order)-[:ORDERS]->(p:Product)
WHERE o.shipCity = 'London'
RETURN o.orderID, o.shipDate, p.productName, p.unitPrice

property pattern:

MATCH (o:Order { shipCity : 'London' })-[:ORDERS]->(p:Product)
RETURN o.orderID, o.shipDate, p.productName, p.unitPrice

The property pattern is slightly more performant as it filters earlier in the query execution. However, WHERE clauses offer more flexibility for complex conditions.

Back to you

MATCH (s:Supplier)-->(:Product)-->(c:Category)
RETURN s.companyName AS company, collect( DISTINCT c.categoryName) AS categories

Create new relationships with MERGE

The pattern to create relationship VERB with properties

MATCH <source to target relation>
WITH source, target, some way of counting
WHERE define threshold or some filtering
MERGE (source)<->r:VERB</-/>(target)
SET r.<property name> = <property value>

Let’s create new relationships : FREQUENTLY_BOUGHT_WITH: as in belong to the same order

Notice how you can point back to the product through the order.

First we find all the products that often (freq > 10) bought together.

MATCH (p1:Product)<-[:ORDERS]-(Order)-[:ORDERS]->(p2:Product)
WHERE p1 < p2
WITH p1, p2, count(*) AS frequency
WHERE frequency > 3
return p1.productName, p2.productName, frequency
ORDER BY frequency desc

Let’s use the results of that query to create a new relation called FREQUENTLY_BOUGHT_WITH and add the property weight = frequency

We do not specify any arrows which makes the relation bidirectional

MATCH (p1:Product)<-[:ORDERS]-(Order)-[:ORDERS]->(p2:Product)
WHERE p1 < p2
WITH p1, p2, count(*) AS frequency
WHERE frequency > 3
MERGE (p1)-[r:FREQUENTLY_BOUGHT_WITH]-(p2)
 SET r.weight= frequency

We see : Created 58 relationships, set 58 properties

now we can visualize the products using this relation

MATCH p=()-[:FREQUENTLY_BOUGHT_WITH]->() RETURN p

Your turn to create a new relationship

create new relations

Other example

MATCH (c:Customer)-[:PURCHASED]->(:Order)-[:ORDERS]->(p:Product)<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c2:Customer)
WHERE c < c2
// find similar customers
WITH c, c2, count(*) as similarity
// with at least 50 shared product purchases
WHERE similarity > 50
// create a relationship between the two without specifying direction
MERGE (c)-[sim:SIMILAR_TO]-(c2)
// set relationship weight from similairity
ON CREATE SET sim.weight = similarity