The POLE dataset is a crime investigation dataset available in Neo4j Aura tutorials.
POLE stands for Person, Object, Location, Event - a standard data model used in policing and investigative work, using crime data from Manchester, U.K.
The dataset contains 29,000 crimes in 15,000 locations, generating 106,000 relationships between the nodes, built from publicly available Greater Manchester street-level crime
The POLE data model focuses on four basic types of entities and the relationships between them: Persons, Objects, Locations, and Events
The model employs relationships like "lives-with" and "party-to" to find deep and complex networks of connections, obscure family relationships, and social associations
Important! Loading the dataset in Aura will ask us to select an instance. Since we're on the free tier, we only have one instance, which is already populated with the Northwind data. Importing the new dataset will just add the new nodes and relations to the existing data.
Before importing the new data we need to remove the data from the existing instance.
// Delete all relationships first
MATCH ()-[r]-()
DELETE r;
// Then delete all nodes
MATCH (n)
DELETE n;
We must also remove constraints and indexes.
Manually, we see the contraints with
SHOW CONSTRAINTS;
SHOW INDEXES
and then for each line
DROP CONSTRAINT constraint_name_here;
DROP INDEX index_name_here;
There's way to do it with one line, with the APOC module, but in Aura, it's more complicated.
APOC stands for "Awesome Procedures On Cypher" - it's a comprehensive library of procedures and functions that extends Neo4j's capabilities far beyond what's available in standard Cypher.
In the end the easiest way to start with a new dataset is simply to delete the instance and create a new one.
The dataset is available in https://github.com/neo4j-graph-examples/pole as a Neo4j dump and in the Aura tutorials
Go to https://console-preview.neo4j.io/guides/sample-datasets and click on the Crime investigation card
And follow the tutorial until it asks you to Load the dataset.
Click on "import POLE"
and then click on run Import (upper right side)
MATCH (c:Crime)
RETURN count(c) as total_crimes;
MATCH (c:Crime)
RETURN c.type as crime_type, count(c) as count
ORDER BY count DESC;
MATCH (c:Crime)-[:OCCURRED_AT]->(l:Location)
WHERE l.postcode STARTS WITH "BL3"
RETURN c.id, c.type, l.postcode, l.address
ORDER BY l.postcode;
MATCH (p:Person)
WHERE p.name STARTS WITH "John"
RETURN p.name, p.id
ORDER BY p.name;
MATCH (p:Person)-[r]-(c:Crime)
WHERE c.type = "Burglary"
RETURN p.name, type(r) as relationship_type, c.id
ORDER BY p.name;
MATCH (l:Location)<-[:OCCURRED_AT]-(c:Crime)
WITH l, count(c) as crime_count, collect(c.type) as crime_types
WHERE crime_count > 5
RETURN l.address, crime_count,
[type IN crime_types | type] as all_crimes
ORDER BY crime_count DESC;
MATCH (l:Location)<-[:OCCURRED_AT]-(c:Crime)
WITH l, count(c) as crime_count, collect(c.type) as crime_types
WHERE crime_count > 5
RETURN l.address, crime_count,
[type IN crime_types | type] as all_crimes,
head([type IN crime_types | type]) as most_common_type
ORDER BY crime_count DESC;
MATCH (o:Officer)-[:INVESTIGATED]->(c:Crime)
WITH o, count(c) as cases_handled
RETURN o.name, cases_handled,
round(avg(cases_handled)) as avg_cases_per_officer
ORDER BY cases_handled DESC
LIMIT 10;
MATCH (p1:Person)-[:KNOWS]-(p2:Person)
MATCH (p1)-[r1]-(c:Crime)-[r2]-(p2)
RETURN p1.name, p2.name, c.id, c.type, type(r1), type(r2)
ORDER BY c.id;
MATCH (c:Crime)-[:OCCURRED_AT]->(l:Location)
WHERE l.postcode IS NOT NULL
WITH substring(l.postcode, 0, 4) as area, c.type, count(c) as crime_count
RETURN area, c.type, crime_count
ORDER BY area, crime_count DESC;
MATCH (c:Crime)<-[:INVESTIGATED]-(o:Officer)
WHERE c.outcome CONTAINS "investigation" OR c.outcome CONTAINS "ongoing"
RETURN c.id, c.type, c.outcome, o.name, c.date
ORDER BY c.date DESC;
MATCH (l:Location)<-[:OCCURRED_AT]-(c:Crime)
WITH l, collect(DISTINCT c.type) as crime_types, count(c) as total_crimes
WHERE size(crime_types) > 1
RETURN l.address, crime_types, size(crime_types) as diversity_score, total_crimes
ORDER BY diversity_score DESC, total_crimes DESC;
Complex patterns, path finding, advanced aggregation
MATCH (p1:Person)-[:KNOWS*1..3]-(p2:Person)
WHERE p1 <> p2
MATCH (p1)-[]-(c1:Crime), (p2)-[]-(c2:Crime)
WHERE c1.type CONTAINS "Violence" AND c2.type CONTAINS "Violence"
RETURN p1.name, p2.name,
shortestPath((p1)-[:KNOWS*]-(p2)) as connection_path,
c1.id as crime1, c2.id as crime2
LIMIT 20;
MATCH (p:Person)-[r]-(c:Crime)
WITH p, c, substring(c.date, 0, 7) as month
WITH p, collect(DISTINCT month) as months,
collect({crime: c.id, type: c.type, month: month}) as crimes
WHERE size(months) > 1
RETURN p.name, months, size(months) as active_months,
[crime IN crimes | crime.type] as crime_types,
crimes
ORDER BY size(months) DESC;
MATCH (o1:Officer)-[:INVESTIGATED]->(c:Crime)<-[:INVESTIGATED]-(o2:Officer)
WHERE o1 <> o2
WITH o1, o2, collect(c.id) as shared_cases, count(c) as collaborations
WHERE collaborations > 1
RETURN o1.name, o2.name, collaborations, shared_cases
ORDER BY collaborations DESC;
MATCH (l:Location)<-[:OCCURRED_AT]-(c:Crime)
WITH l, c,
CASE c.type
WHEN "Violence and sexual offences" THEN 5
WHEN "Criminal damage and arson" THEN 4
WHEN "Burglary" THEN 4
WHEN "Anti-social behaviour" THEN 2
WHEN "Public order" THEN 3
ELSE 2
END as severity_weight
WITH l, count(c) as crime_count, sum(severity_weight) as risk_score,
collect(c.type) as crime_types
RETURN l.address, l.postcode, crime_count, risk_score,
round(risk_score * 1.0 / crime_count, 2) as avg_severity,
crime_types
ORDER BY risk_score DESC
LIMIT 10;
MATCH (p:Person)-[:WITNESS_TO]-(c1:Crime)
MATCH (p)-[:SUSPECT_IN]-(c2:Crime)
WHERE c1 <> c2
WITH p,
collect(DISTINCT c1.id) as witnessed_crimes,
collect(DISTINCT c2.id) as suspected_crimes
RETURN p.name,
size(witnessed_crimes) as witness_count,
size(suspected_crimes) as suspect_count,
witnessed_crimes, suspected_crimes
ORDER BY witness_count DESC, suspect_count DESC;
Complex analytics, graph algorithms concepts, optimization
MATCH (p:Person)-[r]-(c1:Crime)-[:OCCURRED_AT]->(l1:Location)
MATCH (p)-[r2]-(c2:Crime)-[:OCCURRED_AT]->(l2:Location)
WHERE c1 <> c2
AND abs(duration.between(date(c1.date), date(c2.date)).days) <= 7
AND substring(l1.postcode, 0, 4) = substring(l2.postcode, 0, 4)
WITH p, c1, c2, l1, l2,
duration.between(date(c1.date), date(c2.date)).days as day_diff
RETURN p.name,
c1.id as first_crime, c1.date as first_date, c1.type as first_type,
c2.id as second_crime, c2.date as second_date, c2.type as second_type,
day_diff,
substring(l1.postcode, 0, 4) as area
ORDER BY p.name, first_date;
MATCH (p:Person)
OPTIONAL MATCH (p)-[:KNOWS]-(p2:Person)
OPTIONAL MATCH (p)-[r]-(c:Crime)
WITH p, count(DISTINCT p2) as person_connections,
count(DISTINCT c) as crime_connections,
collect(DISTINCT type(r)) as relationship_types
WITH p, person_connections, crime_connections, relationship_types,
(person_connections * 2 + crime_connections) as influence_score
WHERE influence_score > 0
RETURN p.name,
person_connections,
crime_connections,
relationship_types,
influence_score
ORDER BY influence_score DESC
LIMIT 15;
MATCH (l:Location)<-[:OCCURRED_AT]-(c:Crime)
WITH l, c,
toInteger(substring(c.date, 8, 2)) as day,
toInteger(substring(c.date, 5, 2)) as month,
substring(l.postcode, 0, 4) as area
WITH area, month,
count(c) as crime_count,
collect(DISTINCT c.type) as crime_types,
collect(DISTINCT l.address) as locations,
round(avg(day), 1) as avg_day_of_month
WHERE crime_count >= 3
WITH area, month, crime_count, crime_types, locations, avg_day_of_month,
CASE
WHEN month IN [6,7,8] THEN "Summer"
WHEN month IN [12,1,2] THEN "Winter"
WHEN month IN [3,4,5] THEN "Spring"
ELSE "Autumn"
END as season
RETURN area, season, month, crime_count,
size(locations) as location_diversity,
crime_types,
avg_day_of_month,
round(crime_count * 1.0 / size(locations), 2) as concentration_ratio
ORDER BY crime_count DESC, concentration_ratio DESC;