MyPicture

Hi, I'm a software engineer focusing on data automation in Brooklyn, New York.

I have a strong background in analytics, and have worked in politics, marketing, finance, and mobile e-commerce. Currently, I have the pleasure to work with the Data Engineering team at Betterment.

After programming professionally for the last 4 years, I am:

  • Fluent in Python and SQL with professional experience writing Ruby, JavaScript and R. Learning Scala.
  • Experienced in ETL, as well as designing schemas and optimizing queries for a variety of data systems: PostgreSQL, Redshift, MySQL, Hive, Vertica.
  • Cloud computing skills with Amazon S3, EC2 and EMR.
  • Able to write web apps in Flask and Ruby on Rails. I've written apps deployed on Heroku and an internal company network.

Ruby

JavaScript

SQL

Python

Betterment

Software Engineer, Data

January 2017 – Present

The Data Engineering team at Betterment builds customer facing and internal services to increase the efficiency of the company. We do this through ETL, data export, event tracking with AWS Lambda and Kinesis, and and building real-time web services that serve the product and internal customers. My responsibilities include:

  • Write new data pipelines in our Airflow/Redshift stack and migrate old pipelines from our legacy Luigi/MySQL architecture
  • Contribute to internal Data Engineering frameworks in the course of ¬†performing my own duties, improving tooling for the rest of the team.
  • Support analytics, research and marketing teams with query development, code review, and new ETL work.
  • Deal with urgent data issues that arise during business hours and 24-hours a day via PagerDuty during on call rotations.

Hillary for America

Data Engineer, Analytics Tech

October 2015 – November 2016

The Technical Analytics team at HFA served as an in-house technical consultancy for the Analytics organization. We supported a team of 150 Analysts at our Brooklyn HQ and around the country as well as the rest of the organization in using our tools and systems more efficiently and effectively.

Of all the work I did on the campaign, I’m most proud of the tools I built for the survey team. I developed a flexible automation system for our mission-critical survey weighting and reporting operations, allowing our experts to focus on their work and not fighting with their tools. This work included:

  • An automated survey weighting system (Jinja/SQL)
  • Schema development and ETLs for survey data (Docker, Vertica, PostgreSQL, numpy)
  • An automated reporting system (Docker, Jinja/RMarkdown)
  • A web app allowing survey manager to operate the system (Flask, SQLAlchemy, jQuery)

 

Additionally, I:

  • Served as a consultant on data projects, advising on analyses, data modeling, etc.
  • Wrote and maintained modular, reusable code for projects, and used it in subsequent data processing projects, allowing for faster and more reliable performance.
  • Scraped a¬†lot of county election result websites.

Optimizely

Product/Data Analyst

As one of the first analytics hires at Optimizely, I helped bridge the gap between the company and its data.

  • Owned the development of analytics for new mobile SDK from designing events to dashboards, in consultation with engineers and stakeholders
  • Developed data pipelines for Data Scientists so they can ignore the data warehouse and focus on building models to unlock value

University of California, Santa Cruz

Economics and Politics

Graduated with Honors in June of 2009

501cData

Bringing an open data set to researchers and journalists

A Rails app surfacing public tax data for nonprofits. Renders XML sources into readable views, backed by Postgres full text search.

Tools used:

  • Ruby on Rails
  • PostgreSQL
  • MRJob on EMR
  • Heroku

Site: https://www.501cdata.org

Code: https://github.com/davidshere/irs-aws-990s


Drudge Research

A project to collect, analyze, and present data on the history of an influential news aggregator, The Drudge Report.

This project consists of a web scraper, which is done, and an analysis of the data, which isn’t. I started it years ago but stopped after I hit a wall. Came back to it a few months ago and it’s been a great way to learn new python and data tools and work on something fun while going through my Scala classes.

Biggest win: 10X speedup in run time with multiprocessing and asyncio.

Tools: Python, Parquet, Arrow, S3

After I’ve taken the Spark class in my Scala series I’m going to use EMR to enrich and play around with the data. I’m hoping to write a few blog posts on what I find.

Code: https://github.com/davidshere/drudge_research