R Programming
World Bank - Global Database on Intergenerational Mobility (GDIM)
Data analysis and visualization project using the World Bank's comprehensive dataset on intergenerational mobility across countries.
Retail Dataset II Analysis
Comprehensive customer analytics on online retail data from Dr. Daqing Chen, London South Bank University.
Python Programming
Introduction to Data Science in Python
Simple Regex
Basic regular expression operations for text processing.
Data Cleaning & Analysis
Fundamental data cleaning and analysis techniques.
Renewable Energy Analysis
Data analysis of renewable energy indicators and trends.
Sports Analytics
Correlation analysis of win/loss rates with population data from MLB, NBA, and NHL.
Applied Plotting, Charting & Data Representation in Python
Temperature Records Visualization
Visualization of record-breaking high and low temperatures in Ann Arbor (2015 data, 2005-2014 baseline).
Applied Machine Learning in Python
KNN Classifier - Breast Cancer Dataset
Training K-Nearest Neighbors classifier on sklearn's breast cancer dataset.
Multiple Classifiers - Mushroom Dataset
Training decision tree, SVC, linear, and Lasso classifiers on UCI Mushroom dataset.
Fraud Detection Analysis
Training dummy and SVC classifiers on fraud detection data.
Blight Ticket Prediction
Random forest classifier to predict blight ticket violations.
Applied Text Mining in Python
Datetime Processing
Cleaning and sorting datetime data for text analysis.
Text Processing - Moby Dick
Text parsing, lemmatization, and spell-checking analysis of Moby Dick.
Spam Detection Model
Logistic regression and count vectorizer for spam message classification.
Document Topic Classification
Machine learning to compare document similarities and assign correct topics.
Applied Social Network Analysis in Python
Graph Creation & Manipulation
Creating and manipulating graph structures and networks.
Network Connectivity Analysis
Analysis of network connectivity and graph properties.
Centrality & PageRank Algorithms
Node centrality analysis, Scaled PageRank, and HITS algorithms.
Link Prediction Model
Logistic regression for predicting future connections (AUC: 0.935).