← Back to dbsql

The New York Trees census dataset

4 min read

Trees of New York

Browsing along the bigquery public dataset, on a rainy day in october, I stumbled on the New York City trees dataset.

Well well well, this looks like a great dataset for a nice exam.

The main dataset for the 2015 survey has 673k trees for a total uncompressed file size of 122 Mb. Bit too much for your laptops. I extracted all the trees in Manhattan.

This subset is available https://skatai.com/assets/db/new_york_trees.sql

In a psql session, you can restore the dataset with

\i <path to the sql file>/new_york_trees.sql

or the equivalent for windows.

This will create

  • the new_york_trees database
  • the trees table

which holds 65423 trees.

The data schema is a bit different than the Paris dataset

AttributeData TypeDescription
tree_idINTEGERPRIARY KEY
created_atDATEThe date tree points were collected in the census software
tree_dbhINTEGERDiameter of the tree, measured at approximately 54" / 137cm above the ground.
stump_diamINTEGERDiameter of stump measured through the center, rounded to the nearest inch.
statusSTRINGIndicates whether the tree is alive, standing dead, or a stump.
healthSTRINGIndicates the user's perception of tree health.
spc_latinSTRINGScientific name for species, e.g. "Acer rubrum"
spc_commonSTRINGCommon name for species, e.g. "red maple"
user_typeSTRINGThis field describes the category of user who collected this tree point's data.
root_stoneSTRINGIndicates the presence of a root problem caused by paving stones in tree bed
root_grateSTRINGIndicates the presence of a root problem caused by metal grates in tree bed
root_otherSTRINGIndicates the presence of other root problems
trunk_wireSTRINGIndicates the presence of a trunk problem caused by wires or rope wrapped around the trunk
trnk_lightSTRINGIndicates the presence of a trunk problem caused by lighting installed on the tree
trnk_otherSTRINGIndicates the presence of other trunk problems
brch_lightSTRINGIndicates the presence of a branch problem caused by lights (usually string lights) or wires in the branches
brch_shoeSTRINGIndicates the presence of a branch problem caused by sneakers in the branches
brch_otherSTRINGIndicates the presence of other branch problems
addressSTRINGNearest estimated address to tree
zipcodeINTEGERFive-digit zipcode in which tree is located
zip_citySTRINGCity as derived from zipcode. This is often (but not always) the same as borough.
boronameSTRINGName of borough in which tree point is located
nta_nameSTRINGNTA name corresponding to the neighborhood tabulation area from the 2010 US Census that the tree point falls into.
latitudeFLOATLatitude of point, in decimal degrees
longitudeFLOATLongitude of point, in decimal degrees

Enjoy and explore

Once you have restored the database

make sure you have 65423 trees

Things you can start exploring

values of

  • root_..., trnk_..., brch_...
  • problems
  • spc_common
  • status and health

Scope of the exam

  • basic querying
  • CTEs
  • window functions
  • index creation
  • explain a query plan
  • create a view
  • write simple functions in SQL
  • write simple functions in pgplsql