---
title: "New York City Population by Borough 1950 - 2040"
output: rmarkdown::html_vignette
author: "Robert Hutto"
vignette: >
  %\VignetteIndexEntry{New York City Population by Borough 1950 - 2040}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include = FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(nycOpenData)
library(ggplot2)
library(dplyr)
library(tidyr)
```

## Introduction

New York City is made up of five boroughs: the Bronx, Brooklyn, Manhattan, Queens, and Staten Island. The population of each of these boroughs has varied throughout the past century. On the NYC Open Data portal, the [New York City Population by Borough, 1950 - 2040](https://data.cityofnewyork.us/City-Government/New-York-City-Population-by-Borough-1940-2040/xywu-7bv9) dataset hosts population estimates and projections.

The `xywu-7bv9()` function provides access to historical census data and population projections from 1950 to 2040, allowing one to analyze demographic trends across Brooklyn, Bronx, Manhattan, Queens, and Staten Island.

The `nycOpenData` package provides a streamlined interface for accessing New York City's vast open data resources. It connects directly to the NYC Open Data Portal. It is currently utilized as a primary tool for teaching data acquisition in [Reproducible Research Using R](https://martinezc1-reproducible-research-using-r.share.connect.posit.cloud/), helping students bridge the gap between raw city APIs and tidy data analysis.

## Pulling a Small Sample

Let's start by pulling a small sample to see the structure:

```{r small-sample}
small_sample <- nyc_pull_dataset("xywu-7bv9", limit = 5)
small_sample

# Seeing what columns are in the dataset
names(small_sample)
```

## Pulling Full Dataset

Now let's pull the complete dataset to work with:

```{r full-data}
population_data <- nyc_pull_dataset("xywu-7bv9")

population_data |>
  slice_head(n = 6)
```

## Filtering by Borough

We can filter for a specific borough. Let's look at Brooklyn's population over time:

```{r filter-brooklyn}
brooklyn_pop <- nyc_pull_dataset("xywu-7bv9", filters = list(borough = "Brooklyn"))

brooklyn_pop
```

## Mini analysis

Let's visualize population trends across boroughs. First, we need to reshape the data from wide to long format.

```{r population-trends, fig.alt="Line chart showing population trends for NYC's five boroughs from 1950 to 2040.", fig.cap="Population trends for NYC's five boroughs from 1950 to 2040, including historical data and projections.", fig.height=6, fig.width=8}

# Get full dataset and filter for Total Population rows only
population_data <- nyc_pull_dataset("xywu-7bv9")

# Clean borough names and filter to get individual boroughs (exclude NYC Total)
borough_data <- population_data |>
  mutate(borough = trimws(borough)) |>  # Remove leading/trailing spaces
  filter(age_group == "Total Population", borough != "NYC Total")

# Reshape from wide to long format
pop_long <- borough_data |>
  select(borough, `x1950`, `x1960`, `x1970`, `x1980`, `x1990`, `x2000`, `x2010`, `x2020`, `x2030`, `x2040`) |>
  pivot_longer(cols = starts_with("x"), names_to = "year", values_to = "population") |>
  mutate(
    year = as.numeric(gsub("x", "", year)),
    population = as.numeric(population)
  )

# Create line chart
ggplot(pop_long, aes(x = year, y = population, color = borough)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal() +
  labs(
    title = "NYC Population by Borough: 1950-2040",
    subtitle = "Historical data and projections",
    x = "Year",
    y = "Population",
    color = "Borough"
  ) +
  theme(legend.position = "bottom")
```

We can also look at which borough is projected to have the largest population in 2040:

```{r summary-2040}
pop_long |>
  filter(year == 2040) |>
  arrange(desc(population))
```

## Summary

The `xywu-7bv9()` function provides easy access to demographic data for New York City spanning from 1950-2040. This enables analysis of long-term population trends, comparisons across boroughs, and exploration of projected future changes.

The `nycOpenData` package serves as a robust interface for the NYC Open Data portal, streamlining the path from raw city APIs to actionable insights. By abstracting the complexities of data acquisition—such as pagination, type-casting, and complex filtering—it allows users to focus on analysis rather than data engineering.

As demonstrated in this vignette, the package provides a seamless workflow for targeted data retrieval, automated filtering, and rapid visualization.

## How to Cite

If you use this package for research or educational purposes, please cite it as follows:

Martinez C (2026). nycOpenData: Convenient Access to NYC Open Data API Endpoints. R package version 0.1.6, <https://martinezc1.github.io/nycOpenData/>.
