R Programming

World Bank - Global Database on Intergenerational Mobility (GDIM)

Data analysis and visualization project using the World Bank's comprehensive dataset on intergenerational mobility across countries.

Techniques: Clustering, choropleth visualization, OLS regression

Retail Dataset II Analysis

Comprehensive customer analytics on online retail data from Dr. Daqing Chen, London South Bank University.

Techniques: RFM analysis, heat maps, customer segmentation, clustering, retention and lifetime value analysis

Python Programming

Introduction to Data Science in Python

Simple Regex

Basic regular expression operations for text processing.

Data Cleaning & Analysis

Fundamental data cleaning and analysis techniques.

Renewable Energy Analysis

Data analysis of renewable energy indicators and trends.

Sports Analytics

Correlation analysis of win/loss rates with population data from MLB, NBA, and NHL.

Applied Plotting, Charting & Data Representation in Python

Temperature Records Visualization

Visualization of record-breaking high and low temperatures in Ann Arbor (2015 data, 2005-2014 baseline).

Applied Machine Learning in Python

KNN Classifier - Breast Cancer Dataset

Training K-Nearest Neighbors classifier on sklearn's breast cancer dataset.

Multiple Classifiers - Mushroom Dataset

Training decision tree, SVC, linear, and Lasso classifiers on UCI Mushroom dataset.

Fraud Detection Analysis

Training dummy and SVC classifiers on fraud detection data.

Blight Ticket Prediction

Random forest classifier to predict blight ticket violations.

Applied Text Mining in Python

Datetime Processing

Cleaning and sorting datetime data for text analysis.

Text Processing - Moby Dick

Text parsing, lemmatization, and spell-checking analysis of Moby Dick.

Spam Detection Model

Logistic regression and count vectorizer for spam message classification.

Document Topic Classification

Machine learning to compare document similarities and assign correct topics.

Applied Social Network Analysis in Python

Graph Creation & Manipulation

Creating and manipulating graph structures and networks.

Network Connectivity Analysis

Analysis of network connectivity and graph properties.

Centrality & PageRank Algorithms

Node centrality analysis, Scaled PageRank, and HITS algorithms.

Link Prediction Model

Logistic regression for predicting future connections (AUC: 0.935).