AuraDB offers free neo4j hosting.
Create an account on AuraDB and then create an instance.
Important, save the password, you won't be able to see it afterwards
Click on the "Learning" button bottom left side of the bage,
then select Query fundamentals
And connect to your instance
On page 2 of the tutorial, Load the Northwind dataset.
and click "run import", upper right
we can see that multiple files have been uploaded
click on "show cypher" to understand how the data is loaded
The 1st cypher statements is
CYPHER 5 CREATE CONSTRAINT `territoryID_Territory_uniq` IF NOT EXISTS
FOR (n: `Territory`)
REQUIRE (n.`territoryID`) IS UNIQUE;
This creates a constraint named supplierID_Supplier_uniq
that ensures uniqueness of the supplierID property across all nodes labeled Supplier.
The constraint enforces data integrity and prevents duplicate suppliers in the dataset
the 2nd statement
Load statement
CYPHER 5 UNWIND $nodeRecords AS nodeRecord
WITH *
WHERE NOT nodeRecord.`supplierID` IN $idsToSkip AND NOT nodeRecord.`supplierID` IS NULL
MERGE (n: `Supplier` { `supplierID`: nodeRecord.`supplierID` })
SET n.`companyName` = nodeRecord.`companyName`
SET n.`contactName` = nodeRecord.`contactName`
SET n.`contactTitle` = nodeRecord.`contactTitle`
SET n.`address` = nodeRecord.`address`
SET n.`city` = nodeRecord.`city`
SET n.`region` = nodeRecord.`region`
SET n.`postalCode` = nodeRecord.`postalCode`
SET n.`country` = nodeRecord.`country`
SET n.`phone` = nodeRecord.`phone`
SET n.`fax` = nodeRecord.`fax`
SET n.`homePage` = nodeRecord.`homePage`;
Loads the dataset into the collection.
UNWIND $nodeRecords AS nodeRecord : Takes a parameter array $nodeRecords and processes each record individuallyWHERE NOT nodeRecord.supplierID IN $idsToSkip and NOT nodeRecord.supplierID IS NULL - Filters out records that should be skipped (via $idsToSkip parameter) and records with null supplier IDsNode Creation/Update:
MERGE (n: Supplier{supplierID: nodeRecord.supplierID }): Either finds an existing Supplier node with the given supplierID or creates a new one if it doesn't existThe subsequent SET statements populate all the supplier properties:
companyName - The supplier's company name
contactName - Primary contact person ...
It uses MERGE instead of CREATE to handle potential duplicates gracefully: If a supplier with the same ID already exists, it will update the properties rather than create a duplicate The $idsToSkip parameter allows you to exclude certain suppliers from being processed (useful for incremental loads or handling problematic data)
This is a typical bulk data loading pattern for Neo4j - you pass in an array of records as a parameter and process them all in a single transaction for efficiency.
The original $nodeRecords object is probably loaded with
LOAD CSV WITH HEADERS FROM 'file:///suppliers.csv' AS row
WITH collect({
supplierID: toInteger(row.SupplierID),
companyName: row.CompanyName,
contactName: row.ContactName
// ... other fields
}) AS nodeRecords
Then the UNWIND statement would follow
3 steps:
the cypher Load statement is
CYPHER 5 UNWIND $relRecords AS relRecord
MATCH (source: `Territory` { `territoryID`: relRecord.`territoryID` })
MATCH (target: `Region` { `regionID`: relRecord.`regionID` })
MERGE (source)-[r: `IN_REGION`]->(target);
We match the Ids of both sides of the relations with exiting node Ids.
And we must do that for every relations.
A lot more work than loading a csv or json into MongoDB, weaviate or even PostgresQL (although we need to create tables first).
Data Preparation Challenges
we need
You can add constraints to enforce data integrity but schema validation really relies on upstream data cleaning and formatting work.
Including:
IDs can be of any type as long as they are unique.
The Northwind dataset is a fictional trading company database originally created by Microsoft for SQL Server tutorials. It represents a small trading company that imports and exports specialty foods.
Key entities in the Northwind dataset include:
The relationships between these entities demonstrate:
The relationships are defined as
The northwind dataset is available from the neo4j github repo: https://github.com/neo4j-contrib/northwind-neo4j
This other repo holds the csv files and the code to load the dataset https://github.com/neo4j-graph-examples/northwind
Once loaded you can see the whole schema with
CALL db.schema.visualization()

MATCH p=()-[:PURCHASED]->() RETURN p LIMIT 25;
Edit the query and remove the LIMIT 25 to see the whole graph with the PURCHASED relationship
Note the cypher query
MATCH (n:Product) RETURN n LIMIT 25;
A customer - purchased -> orders
So the cypher is
MATCH path=(c:Customer)-[pr:PURCHASED]->(o:Order) RETURN path LIMIT 25;
If you don't want the graph but want to return the table instead, just return the entities and relations that you need from the query.
For instance if we want to return the customer's contactName and companyName, the order's orderID and shipName, the query becomes
MATCH (c:Customer)-[pr:PURCHASED]->(o:Order) RETURN c.contactName, c.companyName, o.orderID, o.shipName LIMIT 25;
and you get the table
A basic Cypher query follows this structure:
MATCH [pattern] - Define what to find in the graph
WHERE [conditions] - Optional: Filter the results
RETURN [what to show] - Specify output
LIMIT [number] - Optional: Restrict number of results
A MATCH pattern has three key components:
Nodes: In parentheses ()
Relationships: In square brackets []
Variables: For referencing
Filtering, ordering, limiting, coalescing, ... in cypher with the MATCH clause, is not that different from regular SQL queries syntax .
See the cheat sheet for the MATCH clause
Knowing the relations:
Write the following queries for graph and tables
This query returns the graph of suppliers who supply product > $50 and their categories.
MATCH path=(s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
WHERE p.unitPrice > 50
RETURN path
What if we also want to see the orders for these products ?
Using WITH
MATCH path=(s:Supplier)-[:SUPPLIES]->(p:Product)-[:PART_OF]->(c:Category)
WHERE p.unitPrice > 50
WITH path, p
MATCH orderPath=(o:Order)-[: ORDERS ]->(p)
RETURN path, orderPath
this returns the first graph of suppliers, products and their categories. But also the orders for these products.
Which categories have the most expensive products?
max(p.unitPrice) AS maxPriceFind all orders shipped to London
find all the products shipped to London
Who are our top 5 customers by number of orders?
OPTIONAL MATCH works like a regular MATCH but when no matching relationships/patterns are found, it returns NULL instead of filtering out the record.
This Returns ALL products with their orders (if they exist)
MATCH (p:Product)
WHERE p.productName = 'Chai'
OPTIONAL MATCH (p)<-[o:ORDERS]-()
RETURN p.productName, o;
Will show:
OPTIONAL MATCH is like a LEFT JOIN in SQL : it keeps records from the left side (Products) even when there's no match on the right side (Orders).
So for instance looking at order counts
MATCH (p:Product)
WHERE p.produtName = "Aniseed Syrup"
OPTIONAL MATCH (p)<-[o:ORDERS]-()
RETURN p.productName,
COUNT(o) as order_count;
These queries are equivalent and provide the same results.
MATCH (o:Order)-[:ORDERS]->(p:Product)
WHERE o.shipCity = 'London'
RETURN o.orderID, o.shipDate, p.productName, p.unitPrice
property pattern:
MATCH (o:Order { shipCity : 'London' })-[:ORDERS]->(p:Product)
RETURN o.orderID, o.shipDate, p.productName, p.unitPrice
The property pattern is slightly more performant as it filters earlier in the query execution. However, WHERE clauses offer more flexibility for complex conditions.
find all the employees reporting to mr / mrs 'Fuller'
which product categories each supplier provides ?
MATCH (s:Supplier)-->(:Product)-->(c:Category)
RETURN s.companyName AS company, collect( DISTINCT c.categoryName) AS categories
The pattern to create relationship VERB with properties
MATCH <source to target relation>
WITH source, target, some way of counting
WHERE define threshold or some filtering
MERGE (source)<->r:VERB</-/>(target)
SET r.<property name> = <property value>
Let's create new relationships : FREQUENTLY_BOUGHT_WITH: as in belong to the same order
Notice how you can point back to the product through the order.
First we find all the products that often (freq > 10) bought together.
MATCH (p1:Product)<-[:ORDERS]-(Order)-[:ORDERS]->(p2:Product)
WHERE p1 < p2
WITH p1, p2, count(*) AS frequency
WHERE frequency > 3
return p1.productName, p2.productName, frequency
ORDER BY frequency desc
Let's use the results of that query to create a new relation called FREQUENTLY_BOUGHT_WITH
and add the property weight = frequency
We do not specify any arrows which makes the relation bidirectional
MATCH (p1:Product)<-[:ORDERS]-(Order)-[:ORDERS]->(p2:Product)
WHERE p1 < p2
WITH p1, p2, count(*) AS frequency
WHERE frequency > 3
MERGE (p1)-[r:FREQUENTLY_BOUGHT_WITH]-(p2)
SET r.weight= frequency
We see : Created 58 relationships, set 58 properties
now we can visualize the products using this relation
MATCH p=()-[:FREQUENTLY_BOUGHT_WITH]->() RETURN p
create new relations
MATCH (c:Customer)-[:PURCHASED]->(:Order)-[:ORDERS]->(p:Product)<-[:ORDERS]-(:Order)<-[:PURCHASED]-(c2:Customer)
WHERE c < c2
// find similar customers
WITH c, c2, count(*) as similarity
// with at least 50 shared product purchases
WHERE similarity > 50
// create a relationship between the two without specifying direction
MERGE (c)-[sim:SIMILAR_TO]-(c2)
// set relationship weight from similairity
ON CREATE SET sim.weight = similarity