The New York Trees census dataset
census of all the trees in Manhattan
Trees of New York
Browsing along the bigquery public dataset, on a rainy day in october, I stumbled on the New York City trees dataset.
Well well well, this looks like a great dataset for a nice exam.
The main dataset for the 2015 survey has 673k trees for a total uncompressed file size of 122 Mb. Bit too much for your laptops. I extracted all the trees in Manhattan.
This subset is available https://skatai.com/assets/db/new_york_trees.sql
In a psql session, you can restore the dataset with
\i <path to the sql file>/new_york_trees.sql
or the equivalent for windows.
This will create
- the
new_york_treesdatabase - the
treestable
which holds 65423 trees.
The data schema is a bit different than the Paris dataset
| Attribute | Data Type | Description |
|---|---|---|
| tree_id | INTEGER | PRIARY KEY |
| created_at | DATE | The date tree points were collected in the census software |
| tree_dbh | INTEGER | Diameter of the tree, measured at approximately 54” / 137cm above the ground. |
| stump_diam | INTEGER | Diameter of stump measured through the center, rounded to the nearest inch. |
| status | STRING | Indicates whether the tree is alive, standing dead, or a stump. |
| health | STRING | Indicates the user’s perception of tree health. |
| spc_latin | STRING | Scientific name for species, e.g. “Acer rubrum” |
| spc_common | STRING | Common name for species, e.g. “red maple” |
| user_type | STRING | This field describes the category of user who collected this tree point’s data. |
| root_stone | STRING | Indicates the presence of a root problem caused by paving stones in tree bed |
| root_grate | STRING | Indicates the presence of a root problem caused by metal grates in tree bed |
| root_other | STRING | Indicates the presence of other root problems |
| trunk_wire | STRING | Indicates the presence of a trunk problem caused by wires or rope wrapped around the trunk |
| trnk_light | STRING | Indicates the presence of a trunk problem caused by lighting installed on the tree |
| trnk_other | STRING | Indicates the presence of other trunk problems |
| brch_light | STRING | Indicates the presence of a branch problem caused by lights (usually string lights) or wires in the branches |
| brch_shoe | STRING | Indicates the presence of a branch problem caused by sneakers in the branches |
| brch_other | STRING | Indicates the presence of other branch problems |
| address | STRING | Nearest estimated address to tree |
| zipcode | INTEGER | Five-digit zipcode in which tree is located |
| zip_city | STRING | City as derived from zipcode. This is often (but not always) the same as borough. |
| boroname | STRING | Name of borough in which tree point is located |
| nta_name | STRING | NTA name corresponding to the neighborhood tabulation area from the 2010 US Census that the tree point falls into. |
| latitude | FLOAT | Latitude of point, in decimal degrees |
| longitude | FLOAT | Longitude of point, in decimal degrees |
Enjoy and explore
Once you have restored the database
make sure you have 65423 trees
Things you can start exploring
values of
root_...,trnk_...,brch_...problemsspc_commonstatusandhealth
Scope of the exam
- basic querying
- CTEs
- window functions
- index creation
- explain a query plan
- create a view
- write simple functions in SQL
- write simple functions in pgplsql