Personal Technical Project

Data Analytics for Public Policy

A demonstration of how Python can be used to explore a complex dataset, visualize patterns, and translate exploratory analysis into clear, decision-relevant insights for policy and program design.

Visual Overview

Below are visualizations made in Python using a 2004 Diabetes Dataset (Open Source). They show how visual analysis can help us understand how demographics and various clinical indicators may influence health outcomes.

Outcome Distribution

Histogram showing distribution of diabetes progression scores

Progression by Age & Sex

Bar chart showing average progression by age group and sex

Health Metrics & Progression

Scatter plots showing health metrics versus progression

Strength of Association

Bar chart showing strength of linear relationship between predictors and progression

Analytical Approach

This project applies a workflow designed to answer the question: "Which factors most influence diabetes progression?"

Step 1

Learn About the Data

Use histograms to understand the shape of the outcome and identify any unusual patterns.

Step 2

Look at Demographics

Use bar charts to compare how demographic characteristics influence progression.

Step 3

Look at Health Factors

Use scatter plots with trend lines to examine how key health indicators relate to progression.

Step 4

Confirm Relationships

Calculate trend slopes for continuous variables to see how strongly each factor impacts progression.

Key Findings

BMI and triglycerides showed the strongest positive correlation with diabetes progression, which suggests that public health teams may want to focus their interventions on those factors.

Strongest Associations

BMI & Triglycerides (S5)

These variables show the steepest trend slopes with progression, suggesting they are the most influential levers in this dataset.

Protective Indicator

HDL Cholesterol (S3)

Higher HDL cholesterol appears to have a negative correlation with progression, meaning it has a protective effect.

Limited Demographic Impact

Age & Sex

Age and sex show weak associations with progression, which suggests interventions focused on health factors would be more effective.

Limitations

This analysis is exploratory and descriptive. It highlights patterns but does not estimate causal effects or predict individual outcomes. The findings should be read as early signals that guide further investigation, not definitive evidence.

Dataset Constraints

The dataset is small, collected in 2004, and includes a limited set of clinical variables. Important behavioral, social, and environmental factors are not captured.

Correlation, Not Causation

The analysis shows associations but can't confirm causation. Establishing causation requires a study design that supports valid comparisons.

Model Simplicity

Trend slopes provide a simple way to compare relationships, but they do not account for interactions, nonlinear effects, or confounding variables.

Generalizability

Because the dataset is narrow and standardized, the patterns may not generalize to broader or more diverse populations without further validation.

Return to Home