Anurag Aiyer

Logo

Portfolio Website

View the Project on GitHub arag1/portfolio

Data Science Portfolio

Technical Skills: Python, SQL, AWS, Docker, Tableau, Airflow

Education

Work Experience

Associate Product Manager @ Visa (August 2022 - June 2023)

Retail Data Analytics Intern @ Samsung Electronics America (June 2021 - August 2021)

Data Science Intern @ Takeda (May 2020 - August 2020)

Data Science Researcher @ UC Berkeley IEOR Department (January 2020 - May 2020)

Projects

Predicting NYC UHI Index (March 2025)

Project Code

Predicting UHI Index using Sentinel 2 and Landsat 8 satellite data bands using Planetary Computer API. All work was done using Python and Pandas using Random Forest Regression for prediction. Experimenting with different training test splits to improve test and validation accuracy from 70/30 split to 90/10 split.

NYPD Arrests Deep Dive (August 2024)

Presentation

Finding key drivers on NYC arrests by discovering noticable increase of arrests by borough, offense type, ethnicity, and age groups. We synthesized all of our findings into a Tableau dashboard, so we can better inform the NYPD how to better equip themselves to reduce crime. Our recommendations involve a collective effort in improving officer training and gain better understanding of NYC regulations.

Dashboard

End-to-End Data Pipeline for State Economic Metrics (August 2024)

Project Zip | Documentation

The purpose of this data pipeline project is to streamline the integration, processing, and analysis of economic datasets from various sources like the USDA Economic Research Service and Kaggle. The project aims to deliver a self -contained data pipeline that can be deployed effortlessly using Docker. It automates data ingestion, transforms datasets with Airflow, and stores results in SQL. The documentation covers dataset descriptions, normalization into 11 tables, and the generation of pandas profiling reports for comprehensive data exploration. The architecture includes stages for data loading, transformation, and storage, with Airflow managing ETL tasks and PostgreSQL serving as the database backend.

Crypto Coin Prediction using Neural Networks (May 2024)

Dataset | Project Zip | Presentation

The goal of the project is to predict daily Bitcoin prices leveraging deep learning models learned in the Johns Hopkins Neural Network course. Research was performed into 7 different types of datasets: weekly, hourly, and daily changes. The dataset overall contains 234 Crypto Coins/Altcoins with historical Open, High, Low, Close, and Volume (OHLCV) prices traded in the Binance Exchange. Our project group chose D1 as a group since there includes more data for the daily scale and can provide high capabilities for prediction. The networks used were RNN, LSTM and GRU to compare model performance and accuracy.

Modeling Stock Market Behavior (May 2024)

Project Code

The goal of the project is to build a model of how stock prices vary year over year. In order to do this, our team built a Markov chain using a stationary distribution with historical price information. Then, we compare the distribution of the Markov chain with the distribution of the test set of stock prices. This method was developed in a previous study in 2011. The data used was the closing prices of each Dow Jones Industrial average member from 2014-2019. This generates 1007 points per symbol. Overall, there are 30 symbols within Dow Jones. This was used as a reference to produce the stationary distribution of the Markov chain. We used this stationary distribution to project the stocks for 2023-2024.

Credit Card and Fico Score Exploration (April 2023)

Deliverable

Used Python to explore patterns between FICO scores and credit cards. From the analysis portion, Visa and MasterCard are the top two companies that authorize the highest amount of credit limit for their clients. Most middle class workers owning less than 3 credit cards have an average FICO score of 800+. Addditional steps that would be taken to further improve this project are looking at other variables, such as ethnicity, location, and education to further gain clarrification on if there is a correlation between the number of credit cards and FICO score.