Personal Projects
Have a look at my projects below to get an idea of what I’ve worked on, how I implement my skills and my approach from concept to execution. Reach out if you’d like to learn more about a specific project.
Real Estate Data Project
I created a reusable program that cleans Airbnb and Zillow data with a connection to a PostgreSQL database. It also contains reusable functions in classes to perform exploratory analysis, calculate CAP rates, and recommend optimal zipcodes for investment.
​
Libraries: Pandas, Numpy, psycopg2, SQLAlchemy, Matplotlib

Operations Data Project
I analyzed a data set of clothing entering a warehouse using advanced SQL functions and Python. I calculated weighted averages in SQL using window functions and the filter clause to identify clothing that took longer to process in the warehouse. The SQL queries also use subqueries, common table expressions, CASE statements, UNION ALL, and ALTER TABLE ... UPDATE.
​
Libraries: Pandas, Numpy, Seaborn, Matplotlib, SQLAlchemy

Time Series Model in Python
I used Python to create a time series model. I modeled airline passenger data using ARMA, ARIMA, and SARIMA time series models. SARIMA was the best method for this data due to the presence of seasonality. Included in the project is a coding mental challenge: how would I go about creating a grid search method for SARIMA like the ARIMA version in pmdarima. I included sample code that I wrote to do parameter optimization on a logistic regression model using multithreading. Multithreading can be a powerful tool if employed in the right way.
​
Libraries: Pandas, Numpy, Matplotlib, statsmodels, Sklearn, pmdarima

Tax Data Project
I performed statistical analysis to analyze the effect of the urban population share on tax rates as a percent of GDP. I corrected data entry issues and encoded dummy variables. I used a panel data model and ran a regular OLS regression for demonstration purposes. Moreover, I ran the Hausman test to compare fixed effects versus random effects.
​
Libraries: Pandas, Numpy, SciPy, Matplotlib, statsmodels, linearmodels

DC Bike Share SQL Project
I used SQL to run PostgreSQL queries. I created and executed custom PostgreSQL functions and stored procedures against DC Bike Share data. I also followed along with a tutorial using similar bike share data using MS SQL Server. In this tutorial, I practiced declaring variables and manipulating dates. Finally, I practiced stored procedures in both MS SQL Server and MySQL.
​
Libraries: Pandas, Psycopg2, SqlAlchemy

Yahoo Finance Web Scraper
As my first Python project, I built a Yahoo Finance web scraper in Python that uses multithreading to quickly scrape stock price history, dividend payments, statistics, and financial information from the Yahoo Finance website. It also cleans and outputs the data in a pandas data frame. This project also contains code to scrape the earnings and dividend calendars from Nasdaq. The code is organized into classes containing public as well as private functions.
​
Libraries: Requests, Pandas, Datetime, Time, Numpy, LXML, BS4, JSON, OS, SYS, Concurrent.Futures, Math

Python Customer Segmentation
I used K-Means to segment customers in Python based on the number of transactions, the number of items purchased, and the total amount spent.
​
Libraries: Pandas, Numpy, Sklearn, Matplotlib

Stata Data Cleaning
I programmatically downloaded data files from IPEDS using Python, a task that very few coders would be able to achieve. Then, I harmonized the data sets into a consistent panel dataset that is easy for others to use. I reshaped each data set to be long by unitid, academic rank, contract length, and sex. This required converting triply wide data into long for some years. I also encoded and recoded categorical variables for consistency across years.

Kickstarter Data Project
I performed data cleaning and analysis on Kickstarter data. I dropped duplicates, fixed inconsistent data entry, created new features, and encoded dummy variables. I also ran statistical tests including Chi-Square, F-test for joint significance, the Kruskal-Wallis H test, and the Dunn's test.
​
Libraries: Pandas, Numpy, SciPy, SciKit-Learn, Matplotlib, scikit_posthocs

Loan Data Project
I created a SQL database containing one table. Then, I aggregated the current balance by institution lender type, loan to value cohorts, and loan age cohorts, requiring common table expressions, UNION ALL, CASE statements, and window functions.
​
Libraries: Pandas, SQLAlchemy, OS, Matplotlib

Business Data Project
I created three tables in a SQL database. I used UPDATE..SET...WHERE, date functions, joins, LIKE, CASE statements, and subqueries. I wrote a number of queries to determine:
1. How many unique customers had transactions in the year 2021
2. The last names of the customers who purchased a particular product in February 2021
3. The total orders made for a product type, how many orders with a `transact_type` of `SALE` were to customers in the state of New Jersey
4. How many unique customers have had successful, non-returned transactions in 2021
5. A list of all the customers with a `VOID` order for any Product ID/SKU that starts with the letter `t`.
​
I also wrote a short script to access the City Bikes API in Python.
​
Libraries: Pandas, SQLAlchemy, OS, Requests

Construction Permit Project
I analyzed a large building permit data set and populations for each zipcode from the 2010 Census. I created two SQL tables: Permits and Population. I ran a number of SQL queries using window functions, subqueries, common table expressions, and joins. I also transferred some information from SQL to Python to calculate Z-scores and conduct a Chi-Square test on the data which are functions that SQLite does not support.
​
Libraries: Pandas, Datetime, Numpy, SciPy, SQLAlchemy

Ethereum Smart Contracts
I followed along on a 16-hour Solidity/Python video made by freecodecamp.org. We used Ganache-CLI, Ganache-GUI, Brownie, and Web3.py to create and interact with smart contracts on a variety of networks.
​
Libraries: brownie, ganache, pytest, web3.py

NetworkX Graphs
Using the Energy Information Administration API, I created a social network graph in Python of oil-producing countries based on high correlations between production decisions. I used advanced code to customize label placement and margins.
​
Libraries: Numpy, Pandas, EIA, NetworkX, Matplotlib

Parse PDF Documents
I used Tabula to scrape an imperfect table from a PDF document. I cleaned the messy output for the desired information and exported it to Excel.
Libraries: Tabula, Pandas, Itertools

Option Greeks Visualizations
I created two visualizations to depict the relationship between:
1. Vega and ask price
2. Implied Volatility and ask price
​
I used advanced methods to label each point by days left until expiration.
​
Libraries: Pandas, Matplotlib, Datetime, Time

Scrape Google Search Results
Scrapes Google News RSS feed based on a user-provided search term. Generates a text file with the results. I used this program to automate a list of Twitter posts for bulk upload.
​
Libraries: Newspaper, BS4, URLlib

Merge Docx and PDF Files
Wrote programs to quickly combine Microsoft Word and Adobe PDF documents into one easy-to-manage file. These programs saved me time when organizing files for work and school projects. At times, I merged up to 100 files at once.
​
Libraries: Docx, PyPDF2, OS

Organize Files in Powershell
Wrote PowerShell code to move items, rename files, and delete old GitHub forks (mostly from Python tutorials).
​
These programs saved me a lot of time. I was able to quickly sort through files in my downloads folder and rename PowerPoint lecture slides to a consistent format.
