Joe Murphyportfolio NYC Parks Department site scrape

One of the stalled NYC parks projects

Scrape a multi-page website with python, turn it into a spreadsheet the reporter turned into a story.

What I did

  • Analyzed the markup and markup variations the NYC Parks Dept. used to publish its list of capital projects.
  • Wrote python code to parse that markup and its variations and turn the 700+ projects into a spreadsheet of data for the reporter. This included parsing dates and writing complex regular expressions.

NYC Parks Department site scrape project links

Article: NYC Parks Dept. has 43 projects stalled five years or more

URL: One of the scraped pages, NYC Parks Dept. Capital Project Tracker