Names are more than labels β they reflect culture, identity, and social change. This dataset contains 2.2 million records from U.S. Social Security card applications over three decades, broken down by state, gender, year, and name. Its cultural relevance and approachable nature make it perfect for Pythonβbased exploratory analysis and storytelling.
- Cultural resonance: Everyone connects to names, making insights relatable.
- Scale: Millions of records across decades and regions.
- Diversity: Gender, geography, and time dimensions allow for rich comparisons.
- Trend potential: Names rise and fall with cultural events, celebrities, and societal shifts.
- The most popular names of each decade and how they change over time.
- Names with the biggest jumps and drops in popularity.
- Regional differences in naming across U.S. states.
- The rise of genderβneutral names and evolving cultural preferences.
- Data Cleaning
- Normalize state codes, gender labels, and handle missing values.
- Aggregate counts by decade, gender, and state.
- Exploratory Analysis
- Identify top names by decade, gender, and region.
- Detect names with sharp rises or declines in popularity.
- Visualization
- Line charts for name popularity over time.
- Heatmaps for stateβwise trends.
- Word clouds for most popular names per decade.
- Advanced Analysis
- Forecast future name popularity using time series models.
- Detect cultural spikes linked to events or celebrities.
- A clear picture of naming trends across decades.
- Regional storytelling that highlights cultural diversity.
- Insights into societal shifts (e.g., genderβneutral naming).
- Engaging visualizations that make the analysis accessible to all audiences.
π us-baby-names-analysis
β
βββ π data
β βββ raw/ β Original dataset (CSV)
β βββ processed/ β Cleaned and aggregated data (decade, gender, state)
β
βββ π notebooks
β βββ eda.ipynb β Exploratory Data Analysis (popular names, jumps/drops, gender differences)
β βββ visualization.ipynb β Trend charts, heatmaps, word clouds
β βββ forecasting.ipynb β Predictive modeling for future name popularity
β
βββ π visuals
β βββ charts/ β Line charts, heatmaps, bar plots
β βββ wordclouds/ β Word clouds of popular names per decade
β
βββ π docs
β βββ dataset_story.md β Narrative introduction (Dataset Story section)
β βββ analysis_report.md β Final written report with insights and impact
β
βββ README.md β Project overview, Dataset Story, workflow, and results
- Python (pandas, NumPy) for data cleaning and analysis.
- Matplotlib/Seaborn/Plotly for visualizations.
- Jupyter Notebooks for interactive exploration.
- WordCloud/NLP libraries for textβbased insights.