Bringing an open data set to researchers and journalists
A Rails app surfacing public tax data for nonprofits. Renders XML sources into readable views, backed by Postgres full text search.
- Ruby on Rails
- MRJob on EMR
A project to collect, analyze, and present data on the history of an influential news aggregator, The Drudge Report.
This project consists of a web scraper, which is done, and an analysis of the data, which isn’t. I started it years ago but stopped after I hit a wall. Came back to it a few months ago and it’s been a great way to learn new python and data tools and work on something fun while going through my Scala classes.
Biggest win: 10X speedup in run time with multiprocessing and asyncio.
Tools: Python, Parquet, Arrow, S3
After I’ve taken the Spark class in my Scala series I’m going to use EMR to enrich and play around with the data. I’m hoping to write a few blog posts on what I find.