Package {pulso}


Title: Load Microdata from Colombia's 'GEIH' ('DANE')
Version: 0.1.0
Description: Programmatic access to microdata from Colombia's Gran Encuesta Integrada de Hogares ('GEIH'), published by 'DANE'. Provides a tidy interface to download, parse, and harmonize labor market surveys from 2007 to present. R companion to the 'pulso-co' 'Python' package.
License: MIT + file LICENSE
URL: https://github.com/Stebandido77/pulso
BugReports: https://github.com/Stebandido77/pulso/issues
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Imports: httr2 (≥ 1.0.0), jsonlite (≥ 1.8.0), tibble (≥ 3.2.0), rlang (≥ 1.1.0), cli (≥ 3.6.0), fs (≥ 1.6.0), digest (≥ 0.6.0), data.table (≥ 1.14.0), dplyr (≥ 1.0.0), readxl (≥ 1.4.0), xml2 (≥ 1.3.0)
Suggests: haven (≥ 2.5.0), vctrs (≥ 0.6.0), tidyr, testthat (≥ 3.2.0), knitr, rmarkdown
VignetteBuilder: knitr
Config/testthat/edition: 3
Config/roxygen2/version: 8.0.0
NeedsCompilation: no
Packaged: 2026-06-05 22:59:24 UTC; esteb
Author: Esteban Labastidas [aut, cre]
Maintainer: Esteban Labastidas <estebanlabastidas123@gmail.com>
Repository: CRAN
Date/Publication: 2026-06-12 10:00:08 UTC

pulso: Load Microdata from Colombia's 'GEIH' ('DANE')

Description

R companion to the 'pulso-co' 'Python' package. Provides programmatic access to microdata from Colombia's Gran Encuesta Integrada de Hogares ('GEIH'), published by 'DANE', and to Banco de la Republica monetary policy data.

Loading microdata

Describing columns and variables

Catalog and validation

Banco de la Republica

Comparison with Python

This package mirrors the API of the 'pulso-co' 'Python' package. See the package vignette for details.

Author(s)

Maintainer: Esteban Labastidas estebanlabastidas123@gmail.com

Authors:

See Also

Useful links:


Describe a GEIH module

Description

Returns a human-readable summary of a GEIH survey module: its survey level, available epochs, and harmonized canonical variables.

Usage

pulso_describe(module)

Arguments

module

Character. Module name (e.g., "ocupados"). Must be one of the modules registered in sources.json.

Value

A multi-line character string.

Examples

pulso_describe("ocupados")


Describe metadata for a single column

Description

Pretty-prints metadata for a column in a tibble loaded with metadata = TRUE. Output format mirrors Python's pulso.describe_column().

Usage

pulso_describe_column(df, column)

Arguments

df

A tibble loaded with pulso_load(..., metadata = TRUE).

column

Character. Column name to describe.

Value

A multi-line character string.

Examples


df <- pulso_load(2024, 6, "ocupados", metadata = TRUE)
cat(pulso_describe_column(df, "P6020"))



Describe a harmonized variable

Description

Returns a human-readable summary of a canonical variable defined in variable_map.json: its module, description, source mappings per epoch, and any comparability warning.

Usage

pulso_describe_variable(variable_name)

Arguments

variable_name

Character. Canonical variable name (e.g., "sexo"). The lookup is case-insensitive.

Value

A single multi-line character string.

Examples

pulso_describe_variable("sexo")
pulso_describe_variable("edad")


List metadata for all columns of a tibble

Description

Returns a tibble summary of all columns with their metadata.

Usage

pulso_list_columns_metadata(df)

Arguments

df

A tibble loaded with pulso_load(..., metadata = TRUE).

Value

A tibble with columns: column, label, type, module, source, has_categories.

Examples


df <- pulso_load(2024, 6, "ocupados", metadata = TRUE)
summary <- pulso_list_columns_metadata(df)
print(summary)



List validated GEIH periods

Description

Returns the registry entries for which the downloaded data has been manually validated against DANE published figures.

Usage

pulso_list_validated_range()

Value

A tibble with one row per validated (year, month) pair, with columns: year (integer), month (integer), epoch (character), validated (logical, always TRUE), validated_at (character, ISO-8601 or NA), num_modules (integer). Sorted by year, month ascending.

Examples

pulso_list_validated_range()


List harmonized variables from the variable map

Description

Returns a tibble summarizing all canonical variables defined in variable_map.json, optionally filtered by module.

Usage

pulso_list_variables(module = NULL)

Arguments

module

Character or NULL. When non-NULL, restricts output to variables whose module field equals this value (exact match, case-sensitive). Available modules in the current variable map: caracteristicas_generales, desocupados, inactivos, ocupados, otros_ingresos, vivienda_hogares.

Value

A tibble with one row per variable and columns:

canonical_name

chr. Key used throughout the pulso package.

module

chr. Survey module the variable belongs to.

description_es

chr. Spanish description, or NA if absent.

comparability

chr. Always NA in this version (see Note).

has_warning

lgl. TRUE when a comparability_warning is present and non-empty for the variable.

num_epochs

int. Number of epoch mappings defined.

Rows are sorted ascending by canonical_name.

Note

The comparability column is always NA_character_ in the current version because the comparability field (expected values: "high" / "limited") has not yet been added to variable_map.json. Only comparability_warning (free text) is present in the data.

Examples

# All variables
pulso_list_variables()

# Subset by module
pulso_list_variables(module = "ocupados")


Load GEIH microdata for a single year-month-module

Description

Downloads and parses microdata from Colombia's Gran Encuesta Integrada de Hogares (GEIH), published by DANE.

Usage

pulso_load(
  year,
  month,
  module,
  area = NULL,
  harmonize = TRUE,
  cache = TRUE,
  metadata = FALSE,
  allow_unvalidated = FALSE
)

Arguments

year

Integer. Year (2007 to current year).

month

Integer. Month (1-12).

module

Character. Module name (e.g., "ocupados").

area

Character or NULL. Optional area filter. NOT IMPLEMENTED in v0.1.0.

harmonize

Logical. Whether to apply harmonization. Default TRUE.

cache

Logical. Whether to cache downloads. Default TRUE.

metadata

Logical. Whether to attach DANE metadata to result. Default FALSE for Python parity. When TRUE, attaches metadata via attr(df, "pulso_metadata") and triggers lazy download of codebook on first call.

allow_unvalidated

Logical. When FALSE (default), raises pulso_data_not_validated for periods not yet manually validated against DANE published figures. Set TRUE to load anyway with a warning.

Value

A tibble with the microdata. If metadata = TRUE, the tibble has an attribute "pulso_metadata" with structured column info.

Examples


df <- pulso_load(year = 2024, month = 6, module = "ocupados",
                 metadata = TRUE)
cat(pulso_describe_column(df, "P6020"))



Load and merge GEIH microdata across multiple modules

Description

Downloads, parses, and joins microdata from multiple GEIH modules for a single year-month. All modules must be at the same survey level (persona or hogar).

Usage

pulso_load_merged(
  year,
  month,
  modules,
  harmonize = TRUE,
  cache = TRUE,
  metadata = FALSE,
  allow_unvalidated = FALSE
)

Arguments

year

Integer. Year (2007 to current year).

month

Integer. Month (1-12).

modules

Character vector. Module names to merge (length >= 2). Use pulso_load for a single module.

harmonize

Logical. Whether to lowercase column names. Default TRUE.

cache

Logical. Whether to cache downloads. Default TRUE.

metadata

Logical. Whether to attach DANE metadata. Default FALSE.

allow_unvalidated

Logical. Passed through to each pulso_load() call. Default FALSE. When TRUE, metadata is composed from the first module's column index as a v0.1.0 approximation (multi-module metadata composition is planned for v0.2.0).

Value

A tibble with columns from all requested modules, joined on shared identifier keys (directorio, secuencia_p, orden for persona-level modules). The join is an outer join by default so that modules covering different person subsets (e.g., ocupados vs desocupados) are combined correctly.

Note

Mixed-level merges (persona + hogar modules in the same call) are deferred to v0.2.0. If you need a hogar-level module, merge the result manually after separate calls.

Examples


df <- pulso_load_merged(2024, 6, c("ocupados", "caracteristicas_generales"))
nrow(df)



Fetch Tasa de Politica Monetaria (TPM) from Banco de la Republica

Description

Returns Colombia's monetary policy rate as a tibble. Data source: Banco de la Republica SDMX API (DF_CBR_DAILY_HIST). When the API is unavailable, falls back to a bundled snapshot extending to 2026-04-21.

Usage

pulso_tpm(start = NULL, end = NULL, use_fixture = NULL)

Arguments

start

Optional start date as character "YYYY-MM-DD" or Date.

end

Optional end date as character "YYYY-MM-DD" or Date.

use_fixture

Logical or NULL. If NULL (default), auto-detect: tries API first, falls back to snapshot if unavailable. If TRUE, always use bundled snapshot. If FALSE, force API call.

Value

A tibble with columns:

fecha

Date. Observation date.

valor

numeric. TPM rate in percentage points.

serie

character. Always "tpm".

Examples


# Recent TPM
tpm_2024 <- pulso_tpm(start = "2024-01-01", end = "2024-12-31")
# Full history (uses fixture if API down)
tpm_all <- pulso_tpm()


Validation status for a GEIH period

Description

Returns structured metadata about a specific year-month entry in the pulso registry, including whether it has been manually validated against published DANE figures.

Usage

pulso_validation_status(year, month)

Arguments

year

Integer. Year (2007 to current year).

month

Integer. Month (1-12).

Value

A one-row tibble with columns: year, month, epoch, validated, validated_by, validated_at, source_url, file_size_mb, modules_available, checksum_sha256.

Examples


pulso_validation_status(2024, 6)