Scrape a multi-page website with python, turn it into a spreadsheet the reporter turned into a story.
What I did
- Analyzed the markup and markup variations the NYC Parks Dept. used to publish its list of capital projects.
- Wrote python code to parse that markup and its variations and turn the 700+ projects into a spreadsheet of data for the reporter. This included parsing dates and writing complex regular expressions.
NYC Parks Department site scrape project links
Article: NYC Parks Dept. has 43 projects stalled five years or more
URL: One of the scraped pages, NYC Parks Dept. Capital Project Tracker