datazoom.amazonia

CRAN version R build status CRAN downloads CRAN downloads Languages Commits Open Issues Closed Issues Files Followers

The datazoom.amazonia package facilitates access to official Brazilian Amazon data, including agriculture, deforestation, production. The package provides functions that download and pre-process selected datasets.

Installation

You can install the released version of datazoom.amazonia from CRAN with:

install.packages("datazoom.amazonia")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("datazoompuc/datazoom.amazonia")

1 - Environmental data

PRODES Yearly deforestation
DETER Alerts on forest cover changes
DEGRAD Forest degradation
Imazon Deforestation pressure in the Amazon
IBAMA Environmental fines
MapBiomas Land cover and land use
TerraClimate Climate data
SEEG Greenhouse gas emission estimates
CENSOAGRO Agriculture activities

2 - Social data

IPS Amazon Social Progress Index
IEMA Access to electricity in the Amazon region
Population Population

3 - Economic data

COMEX Brazilian foreign trade
BACI Global foreign trade
PIB-Munic Municipal GDP
CEMPRE Central register of companies
PAM Agricultural production
PEVS Forestry and extraction
PPM Livestock farming
SIGMINE Mining
ANEEL Energy development
EPE Energy consumption

4 - Other tools

Legal Amazon Municipalities Dataset with Brazilian cities and whether they belong to the Legal Amazon
The ‘googledrive’ package Troubleshooting and information for downloads from Google Drive

Environmental Data

PRODES

Overview

PRODES (Projeto de Monitoramento da Floresta Amazônica Brasileira - Project for Monitoring the Brazilian Amazon Forest) is Brazil’s primary satellite-based deforestation monitoring system operated by INPE (National Institute for Space Research).

This dataset provides:

PRODES is the authoritative source for Amazon deforestation statistics, used for environmental monitoring, policy evaluation, and international reporting on Brazilian forest protection.

Data Source and Methodology

PRODES monitoring: - Uses satellite imagery from multiple sources (Landsat, CBERS, others) - Automated and manual analysis to detect clear-cut deforestation - Annual assessment of forest loss - Recently (2020+) raster data published through TerraBrasilis - Data available as both raster files and aggregated by municipality

For more information, visit INPE PRODES Project and TerraBrasilis.


Available Datasets

1. deforestation

Clear-cut deforestation areas (complete forest loss).

2. residual_deforestation

Deforestation that was not captured in previous surveys (detected through improved methodology).

3. native_vegetation

Remaining native forest and natural vegetation areas.

4. hydrography

Water bodies and hydrographic features.

5. non_forest

Non-forest areas including savanna, grasslands, and other vegetation types.

6. clouds

Cloud cover in satellite imagery (data quality indicator).


Important Data Characteristics

Raw vs. Treated Data

Cumulative vs. Incremental Data

Year Type Definition
2007 Cumulative All deforestation from 1988-2007
2008-2023 Incremental Deforestation detected in that specific year

When analyzing time trends, be aware that 2007 includes 19 years of accumulated loss.

Data Organization


Function Parameters

1. dataset

Selects which landscape classification to download.

dataset = "deforestation"             # Clear-cut deforestation (main product)
dataset = "residual_deforestation"    # Previously undetected deforestation
dataset = "native_vegetation"         # Remaining forests and vegetation (2023 only)
dataset = "hydrography"               # Water bodies (2023 only)
dataset = "non_forest"                # Savanna and other non-forest (2023 only)
dataset = "clouds"                    # Cloud cover (2023 only)

2. raw_data

Controls data format: raster or municipality aggregates.

raw_data = FALSE  # logical

Important: Raster data is large; ensure sufficient storage and memory.

3. time_period

Specifies which year(s) to download.

Available by dataset:

Dataset Available Years
deforestation 2007 (cumulative), 2008-2023 (incremental)
residual_deforestation 2010-2023
native_vegetation 2023 only
hydrography 2023 only
non_forest 2023 only
clouds 2023 only
time_period = 2020              # single year
time_period = c(2015, 2020)     # multiple years
time_period = 2015:2020         # range of years (for deforestation only)

4. language

Output language.

language = "eng"  # character string

Examples

Example 1: Deforestation data for a single year

# download treated deforestation data for 2023
deforestation <- load_prodes(
  dataset = "deforestation",
  raw_data = FALSE,
  time_period = 2023,
  language = "eng"
)

Example 2: Deforestation data over multiple years

# download treated deforestation data for 2008 to 2023
deforestation_series <- load_prodes(
  dataset = "deforestation",
  raw_data = FALSE,
  time_period = 2008:2023,
  language = "eng"
)

Example 3: Cumulative forest loss since 1988

# download 2007 (cumulative 1988-2007) plus all years up to 2023
all_deforestation <- load_prodes(
  dataset = "deforestation",
  raw_data = FALSE,
  time_period = c(2007, 2008:2023),
  language = "eng"
)

Example 4: Residual deforestation

# download treated residual deforestation data for 2020
residual <- load_prodes(
  dataset = "residual_deforestation",
  raw_data = FALSE,
  time_period = 2020,
  language = "eng"
)

Data Notes

Raw Raster Data Characteristics

Municipality Aggregates

Data Quality and Accuracy

  1. Detection accuracy: ~95% for clear-cut deforestation
  2. Minimum mapping unit: Small forest clearances may not be detected
  3. Cloud contamination: Clouds may prevent detection in some areas (see clouds dataset)
  4. Seasonal effects: Deforestation easier to detect in dry season

Limitations

  1. Clear-cut only: Degradation and partial logging not captured (see DEGRAD for that)
  2. Legal Amazon only: Data limited to Legal Amazon definition; doesn’t cover all forest regions
  3. Raster data very large: May require specialized tools and significant computing resources
  4. Historical changes: INPE occasionally revises historical estimates as methodology improves
  5. Recent data provisional: 2023 data may be subject to revision as final processing completes

DETER

Overview

DETER (Real-Time Detection System) is a satellite-based monitoring system operated by INPE (National Institute of Space Research) that detects and reports changes in forest cover with near real-time frequency. This system provides:

DETER is the primary early-warning system for deforestation in the Amazon, used by Brazilian environmental agencies for enforcement, research institutions for analysis, and internationally for monitoring compliance with forest conservation goals.

Data Source and Methodology

DETER monitoring: - Uses satellite imagery from multiple sources (MODIS, Landsat, Sentinel) - Automated and manual analysis to detect recent disturbances - Generates “alerts” - polyons representing areas of detected change - Updates available regularly (frequency varies by satellite availability) - Operated by INPE’s Remote Sensing Division

For technical details, visit INPE DETER System.


Available Datasets

1. deter_amz (DETER Amazon)

DETER monitoring data for the Legal Amazon biome.

2. deter_cerrado (DETER Cerrado)

DETER monitoring data for the Cerrado biome.


Important Data Characteristics

Raw Data Structure

The raw DETER data from INPE reports one alert/detection per row, with each row typically associated with a single municipality in the raw data. However, many alerts actually overlap multiple municipalities (typically 2-4 municipalities) that are not all shown in the original records. ### Data Processing

This package provides an important enhancement: spatial intersection with IBGE municipality geometries (2019 version) to identify ALL municipalities that each DETER alert overlaps with. This creates a more complete and accurate geographic picture than the original raw data.

Important note on CRS metadata: The CRS (Coordinate Reference System) information may need verification after loading, as coordinate system metadata can sometimes be unclear in the original INPE data.


Function Parameters

1. dataset

Selects which biome’s DETER data to download.

dataset = "deter_amz"      # Legal Amazon monitoring
dataset = "deter_cerrado"  # Cerrado biome monitoring

2. raw_data

Controls whether to download the original data or the processed/enhanced version.

raw_data = FALSE  # logical

Recommendation: Use raw_data = FALSE to get the enhanced municipality identification from spatial intersection.

3. language

Output language for variable names and documentation.

language = "eng"  # character string

Examples

# download treated DETER Amazon data
deter_amz <- load_deter(
  dataset = "deter_amz",
  raw_data = FALSE,
  language = "eng"
)

# download treated DETER Cerrado data
deter_cerrado <- load_deter(
  dataset = "deter_cerrado",
  raw_data = FALSE,
  language = "eng"
)

Data Notes

Raw Data Limitations

Data Structure

Each alert/row typically contains: - Spatial geometry: Polygon coordinates (SF object) - Detection date: When the alert was issued - Alert type: Deforestation, forest degradation, or other disturbance - Area: Size of detected change in hectares - Municipality: Geographic unit identification - State: Brazilian state - Metadata: Satellite source, confidence level (varies by product)

Important Considerations

  1. Alert types vary: “Deforestation” is permanent forest loss; “degradation” is forest damage but not clear-cut
  2. Near real-time data: Alerts are issued frequently; data is continuously updated
  3. Minimum detection size: Varies by sensor; typically 25 hectares for Amazon, larger for Cerrado
  4. CRS metadata: Verify coordinate system after loading; typically UTM zones for Brazil
  5. Overlapping municipalities: Enhanced version accounts for alerts crossing municipality boundaries
  6. False positives possible: Satellite detection can occasionally misclassify cloud shadows or other features as forest loss

DEGRAD

Overview

The DEGRAD project is a research initiative that uses satellite imagery to monitor forest degradation in the Amazon. Unlike DETER’s near real-time alerts, DEGRAD provides a more detailed annual analysis of forest degradation patterns.

This dataset captures:

DEGRAD data is valuable for understanding forest degradation as a distinct phenomenon from clear-cut deforestation, important for carbon accounting, biodiversity protection, and understanding transition stages toward complete forest loss.

Data Source and Methodology

DEGRAD monitoring: - Conducted by INPE’s forest monitoring programs - Uses satellite imagery interpretation to identify forest degradation signs - Focuses on selective logging, small-scale agriculture, forest fires, and other degrading activities - Released as annual editions with comprehensive analysis - Limited documentation available (original INPE documentation is sparse)

For information, visit INPE Forest Monitoring.


Important Data Characteristics

Data Organization

Important: DEGRAD data is organized differently than real-time systems. Key points:

  1. Yearly editions: Data is organized by publication year (e.g., “DEGRAD 2016”), not event year
  2. Mixed event years: A DEGRAD edition may contain degradation events from different years
  3. Documentation limited: Original INPE documentation is minimal; users should be aware of potential inconsistencies

Spatial Integration

This package enhances the raw DEGRAD data by: - Intersecting DEGRAD spatial polygons with IBGE municipality boundaries (2019 version) - Providing municipality identification for each degradation event - Converting to Simple Features (SF) objects for spatial analysis

Note on CRS: Coordinate system metadata should be verified after loading, as original INPE data sometimes has unclear CRS information.


Available Dataset

degrad (Forest Degradation)

Detailed monitoring of forest degradation across the Legal Amazon.


Function Parameters

1. dataset

Only one dataset is available:

dataset = "degrad"  # Forest degradation monitoring

2. raw_data

Controls whether to download the original data or the processed/enhanced version.

raw_data = FALSE  # logical

Recommendation: Use raw_data = FALSE for most applications to get municipality-level information. ### 3. time_period

Specifies which year(s) of degradation events to download.

Important: When you request a year, you get events from that year regardless of which DEGRAD edition they appear in.

time_period = 2015              # single year
time_period = c(2010, 2015)     # multiple specific years
time_period = 2010:2015         # range of years

4. language

Output language for variable names and documentation.

language = "eng"  # character string

Data Structure

The returned data is a Simple Features (SF) spatial object with:


Examples

# download treated forest degradation data from 2010 to 2012
data <- load_degrad(
  dataset = "degrad",
  raw_data = FALSE,
  time_period = 2010:2012,
  language = "eng"
)

Data Notes

Data Organization Complexity

The annual edition structure (e.g., “DEGRAD 2016”) mixed with variable event years within those editions means: - When you request year 2015, you get all detected 2015 events regardless of edition - Some 2015 events may appear in both DEGRAD 2015 and DEGRAD 2016 editions - Duplication is handled in the data loading process

Degradation Types

Common degradation types include: - Selective logging: Commercial timber extraction - Forest fires: Fire damage to forest areas - Agricultural clearing: Small-scale farming expansion - Mining: Degradation from mining activities - Other: Mixed or unclassified degradation causes

(Exact categories vary by edition; verify with your loaded data)

Spatial Considerations

  1. Polygons not points: Each event is a geometric polygon, not a single location point
  2. Municipality intersection: Treated data identifies all municipalities polygon overlaps
  3. CRS verification: Check coordinate system after loading
  4. Geometry validity: Some polygons may have validity issues; use st_is_valid() to check

Data Limitations

  1. Limited documentation: INPE’s original documentation for DEGRAD is sparse
  2. Mixed time periods: Events from different years appear in same edition
  3. Possible inconsistencies: Classification and methodology may vary across editions
  4. Detection limits: Minimum detectable degradation size varies by methodology/edition
  5. Not real-time: This is annual analysis, not near-real-time detection like DETER

Imazon

Overview

Imazon (Instituto do Homem e Meio Ambiente da Amazônia - Institute of Man and Environment of the Amazon) is an independent Brazilian research organization that produces its own deforestation monitoring and analysis.

This dataset provides:

Imazon’s deforestation pressure index helps identify which municipalities face high deforestation risk, valuable for targeted conservation, enforcement, and development planning.

Data Source

Imazon’s classification is based on: - Deforestation monitoring using satellite imagery - Analysis of deforestation trends and drivers - Risk assessment methodology to classify municipalities - Regular updates as monitoring data accumulates

For more information, visit Imazon Official Website.


Available Dataset

imazon_shp (Imazon Municipality Pressure Index)

Municipality-level deforestation pressure classification with spatial geometries.


Deforestation Pressure Categories

Imazon’s three-level classification system: | Category | Pressure Level | Description | |———-|—|—| | 0 | No/Low Pressure | Minimal deforestation threat | | 1 | Moderate Pressure | Some deforestation risk | | 2 | High Pressure | Significant deforestation threat | | 3 | Very High Pressure | Severe/Critical deforestation risk |


Function Parameters

1. dataset

Only one dataset is available:

dataset = "imazon_shp"  # Imazon municipality pressure classification

2. raw_data

Controls whether to download the original data or the processed/cleaned version.

raw_data = FALSE  # logical

3. language

Output language for variable names and documentation.

language = "eng"  # character string

Examples

# download treated Imazon deforestation pressure data
data <- load_imazon(
  raw_data = FALSE,
  language = "eng"
)

Data Notes

Data Format

Pressure Categories

The 0-3 scale represents Imazon’s assessment of deforestation risk: - Based on historical deforestation trends - Current forest cover - Accessibility and economic drivers - Human pressure factors

Important Limitations

  1. Static classification: This is a classification/index, not time-series deforestation data
  2. Methodology may change: Imazon’s methodology for calculating pressure may be updated
  3. Subjective risk assessment: Classification involves judgment in addition to objective metrics
  4. Not current alerts: This is not real-time detection like DETER; it’s a classification product
  5. Municipal-level only: Finer spatial resolution not available

Using with Other Datasets

This dataset works well in combination with: - DETER: Compare pressure classification with actual recent deforestation - DEGRAD: Correlate pressure with degradation patterns - CEMPRE: Analyze relationship between employment and deforestation pressure - COMEX: Examine trade patterns in high-pressure vs low-pressure areas


🔴 This function uses the googledrive package to download data. In case of authentication errors, see googledrive.

IBAMA

Overview

The Brazilian Institute of Environment and Renewable Natural Resources (Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis - IBAMA) dataset documents environmental enforcement actions, including embargoes of productive assets and fines issued for environmental violations. This dataset provides individual-level records of environmental infractions from 2005 onwards, representing the enforcement activity of Brazil’s primary federal environmental agency.

The data covers environmental violations across multiple sectors including deforestation, illegal mining, wildlife trafficking, illegal fishing, and other environmental crimes.

Data Coverage

The IBAMA dataset includes:

Dataset Description

Available Datasets

The function provides access to three distinct datasets tracking different stages of environmental enforcement:

  1. Embargoed Areas ("embargoed_areas")
  2. Distributed Fines ("distributed_fines")
  3. Collected Fines ("collected_fines")

Key Variables

  1. Infraction Information: Type of violation, legal basis, date of infraction
  2. Location: State, municipality, coordinates (when available)
  3. Enforcement Action: Type of action (embargo, fine), date of action
  4. Financial Data: Fine amount, payment status, payment date
  5. Entity Information: Individual or corporate identifier, sector
  6. Embargo Details: Embargoed area size, land type, geographic descriptors

Data Aggregation

The function returns either: - Raw Data: Individual infraction records (original format) - Aggregated Data: Summary statistics for each time-location period, including: - Total number of infractions - Infractions sent to prosecution - Infractions with ongoing legal proceedings - Embargoed area totals - Fine totals and collection rates


Function Parameters

Options:

  1. dataset: Three possible choices
  2. raw_data:
  3. states:
  4. language:


Examples

# download treated embargoed areas data in english
data <- load_ibama(
  dataset = "embargoed_areas",
  raw_data = FALSE,
  language = "eng"
)

# download treated collected fines data from Bahia
data <- load_ibama(
  dataset = "collected_fines",
  raw_data = FALSE,
  states = "BA",
  language = "pt"
)

MapBiomas

Overview

The MapBiomas project provides comprehensive satellite-based data on land cover and land use changes across Brazil. The project uses machine learning algorithms applied to Landsat satellite imagery to classify and track changes in biomes, land cover types, and economic activities over time. This dataset is essential for understanding deforestation, agricultural expansion, conservation outcomes, and landscape dynamics in the Amazon and other biomes.

MapBiomas represents one of the most detailed and scientifically rigorous sources of remote sensing data on Brazilian land cover, with coverage spanning from 1985 to present and annual updates.

Data Coverage

The MapBiomas dataset includes:

Available Datasets

1. Land Cover ("mapbiomas_cover")

Annual maps of land cover types including: - Forest Types: Native forest, forest formation, forest plantation - Non-Forest Vegetation: Grassland/pasture, shrub vegetation, herbaceous - Agricultural Land: Temporary crops (annual/seasonal), perennial crops (sugarcane, coffee, cocoa) - Urban Areas: Urban/built-up land and infrastructure - Water Bodies: Rivers, lakes, reservoirs, aquaculture - Non-vegetated: Mining, bare soil, rock outcrops - Other Categories: Cloud coverage, nodata

Key Applications: Deforestation monitoring, agricultural expansion, urbanization patterns, forest conservation

2. Land Cover Transitions ("mapbiomas_transition")

Year-to-year changes in land cover including: - Deforestation: Conversion from forest to other uses - Reforestation: Conversion to forest types - Agricultural Transitions: Changes between crop types or from other uses to agriculture - Degradation: Forest to degraded/shrub vegetation - Regeneration: Abandoned areas returning to vegetation

Key Applications: Deforestation tracking, regeneration assessment, agricultural dynamics, conservation effectiveness

3. Deforestation and Regeneration ("mapbiomas_deforestation_regeneration")

Specific focus on forest cover changes: - Deforestation: Permanent loss of forest cover - Forest Regeneration: Secondary forest regrowth and natural recovery - Degradation Signals: Forest areas showing stress indicators - Cumulative Deforestation: Total forest loss since baseline

Key Applications: Amazon monitoring, conservation evaluation, regeneration potential, climate impact studies

4. Mining Activities ("mapbiomas_mining")

Areas used for mining operations: - Active Mining: Currently operational extraction sites - Abandoned Mining: Previous mining areas, potentially available for rehabilitation - Mining Extent: Total area impacted by mining activities - Mining Types: Surface mining, artisanal mining, infrastructure

Key Applications: Environmental impact assessment, land degradation, restoration planning, environmental compliance

5. Irrigation ("mapbiomas_irrigation")

Note: Temporarily unavailable - new collection coming soon

Previously included: - Irrigated Areas: Extent of irrigation in agricultural systems - Irrigation Type: Drip, center-pivot, flood irrigation - Crop Types Under Irrigation: Which crops receive irrigation

6. Water Bodies ("mapbiomas_water")

Note: Temporarily unavailable - new collection coming soon

Previously included: - Surface Water Extent: Permanent and seasonal water bodies - Water Type: Natural rivers/lakes vs. artificial reservoirs - Seasonal Variation: Dry vs. wet season water extent

7. Wildfire Burn Scars ("mapbiomas_fire")

Areas affected by wildfires: - Burn Scars: Areas burned in recent fires - Fire Extent: Total area impacted per year - Fire Frequency: Repeated burning in same areas (indicating high fire risk) - Fire Season: Temporal pattern of fires

Key Applications: Fire monitoring, ecosystem vulnerability, climate impacts, fire management


Function Parameters

Options:

  1. dataset: Seven possible choices (two temporarily unavailable)
  2. raw_data:
  3. geo_level: Varies by dataset
  4. language:

Measurement Units and Notes



Examples

# download treated MapBiomas land cover data by municipality
data <- load_mapbiomas(
  dataset = "mapbiomas_cover",
  raw_data = FALSE,
  geo_level = "municipality",
  language = "eng"
)

# download treated data on mining on indigenous lands
data <- load_mapbiomas(
  dataset = "mapbiomas_mining",
  raw_data = FALSE,
  geo_level = "indigenous_land",
  language = "eng"
)

# download treated wildfire burn scar data by state
data <- load_mapbiomas(
  dataset = "mapbiomas_fire",
  raw_data = FALSE,
  geo_level = "state",
  language = "eng"
)

TerraClimate

Overview

TerraClimate is a global climate and climatic water balance dataset developed by the Climatology Lab at University of California, Merced. This package provides access to TerraClimate data for Brazil and the Amazon region.

This dataset provides:

TerraClimate is essential for understanding climate variability, water availability, drought risk, agricultural potential, and climate change impacts across Brazil and the Amazon.

Data Source and Methodology

TerraClimate data is compiled by: - University of California Climatology Lab - Integration of satellite and ground-based observations - Validated against station networks - Downscaled to ~4km global resolution - Monthly temporal resolution with daily and subdaily estimates available

For more information, visit TerraClimate Project.


Available Climate Variables

TerraClimate provides 13 main climate and water balance variables:

Dataset Code Description Units
max_temperature tmax Maximum 2-m Temperature °C
min_temperature tmin Minimum 2-m Temperature °C
wind_speed ws Wind Speed at 10-m m/s
vapor_pressure_deficit vpd Vapor Pressure Deficit kPa
vapor_pressure vap 2-m Vapor Pressure kPa
snow_water_equivalent swe Snow Water Equivalent at End of Month mm
shortwave_radiation_flux srad Downward Shortwave Radiation Flux W/m²
soil_moisture soil Soil Moisture at End of Month mm
runoff q Runoff mm
precipitation ppt Accumulated Precipitation mm
potential_evaporation pet Reference Evapotranspiration mm
climatic_water_deficit def Climatic Water Deficit mm
water_evaporation aet Actual Evapotranspiration mm
palmer_drought_severity_index PDSI Palmer Drought Severity Index unitless

Data Format and Coverage

Spatial Resolution

Temporal Resolution

Data Type


Function Parameters

1. dataset

Selects which climate variable to download.

# Temperature and radiation
dataset = "max_temperature"         # tmax
dataset = "min_temperature"         # tmin
dataset = "shortwave_radiation_flux" # srad

# Water and moisture
dataset = "precipitation"           # ppt
dataset = "potential_evaporation"   # pet
dataset = "water_evaporation"       # aet (actual evapotranspiration)
dataset = "soil_moisture"           # soil
dataset = "runoff"                  # q

# Atmospheric variables
dataset = "wind_speed"              # ws
dataset = "vapor_pressure"          # vap
dataset = "vapor_pressure_deficit"  # vpd

# Drought and composite indices
dataset = "climatic_water_deficit"  # def
dataset = "palmer_drought_severity_index" # PDSI
dataset = "snow_water_equivalent"   # swe

2. raw_data

Controls data format returned.

raw_data = FALSE  # logical

3. time_period

Specifies which year(s) to download.

Available range: 1958 to present (most recent months have 2-3 month lag)

time_period = 2020              # single year
time_period = c(2010, 2020)     # specific years
time_period = 2010:2020         # range of years

Restricts geographic coverage to Legal Amazon region.

legal_amazon_only = TRUE  # logical

Recommendation: Use TRUE to significantly reduce download size for Amazon-focused research.

5. language

Output language for variable names and documentation.

language = "eng"  # character string

Important Download Considerations

Data Size: TerraClimate raster data is substantial. Consider:

Recommendations: - Use legal_amazon_only = TRUE to reduce size by ~95% - Download single or 2-3 year periods rather than decades at once - Use high-speed internet connection - Have at least 10-50 GB free disk space for multi-year downloads


Examples

# download precipitation data for the Legal Amazon (2020)
precip <- load_climate(
  dataset = "precipitation",
  time_period = 2020,
  legal_amazon_only = TRUE,
  language = "eng"
)
# download maximum temperature for multiple years, all of Brazil
max_temp <- load_climate(
  dataset = "max_temperature",
  time_period = 2010:2012,
  language = "eng"
)

Data Notes

Variable Definitions

Data Quality

Important Limitations

  1. Raster data large: Multi-year downloads can be hundreds of MB to GBs
  2. 4km resolution: Suitable for regional analysis; may miss local variation
  3. Monthly aggregation: Daily and sub-daily variation not captured
  4. Recent data lag: Most recent 2-3 months not yet available
  5. Interpolation uncertainty: Some regions have lower data density
  6. Snow/ice areas: Less accurate in high mountains or glaciated regions (not major issue for Amazon)

Due to large file size: - Aggregate early: Summarize by month/year quickly to reduce memory use - Legal Amazon only: Massive size reduction if working in Amazon region - Subset years: Download only years of interest rather than decades - Extract points: If doing point-based analysis, extract specific coordinates to simplify data - Use cloud computing: Consider cloud platforms (Google Earth Engine) for very large analyses


SEEG

Overview

SEEG (Sistema de Estimativa de Emissões e Remoções de Gases de Efeito Estufa - System of Estimates of Emissions and Removals of Greenhouse Gases) is Brazil’s most comprehensive greenhouse gas emissions database developed by Observatório do Clima (Climate Observatory).

This dataset provides:

SEEG is the primary tool for understanding Brazil’s greenhouse gas emissions profile, tracking progress toward climate goals, identifying emission hotspots, and supporting climate policy.

Data Source and Methodology

SEEG emissions estimates are compiled using: - Government data from multiple agencies (MAPA, IBGE, ANP, etc.) - Satellite monitoring of deforestation and land use - International IPCC methodology standards - Peer-reviewed scientific research - Regular updates as new government data becomes available

For more information, visit SEEG Project and Observatório do Clima.


Available Datasets

1. seeg (All Sectors Combined)

Complete greenhouse gas emissions across all sectors in one dataset.

2. seeg_farming (Agricultural and Livestock Emissions)

Greenhouse gas emissions from agriculture and livestock activities.

3. seeg_energy (Energy Sector Emissions)

Emissions from energy production and consumption.

4. seeg_land (Land Use Change Emissions)

Emissions and removals from changes in forest cover and land use.

5. seeg_industry (Industrial Process Emissions)

Emissions from manufacturing and industrial processes.

6. seeg_residuals (Waste and Residuals Emissions)

Emissions from waste management, landfills, and waste treatment.


Important Data Characteristics

Collection 9 Data

The data provided is from SEEG’s Collection 9: - Time period: 2000-2018 - Methodology: Latest available when data was compiled - Quality: Peer-reviewed and validated - Revisions: May be updated in future SEEG collections as better data becomes available

Emissions Units

Download Considerations

Important: The complete SEEG dataset is quite large. When downloading: - Entire datasets are downloaded as single files; year selection is limited - A stable, high-speed internet connection is recommended - Downloads may take time depending on connection speed - Ensure sufficient disk space for storage


Function Parameters

1. dataset

Selects which emission sector(s) to download.

dataset = "seeg"              # All sectors (raw_data = TRUE only)
dataset = "seeg_farming"      # Agriculture and livestock
dataset = "seeg_energy"       # Energy sector
dataset = "seeg_land"         # Land use changes
dataset = "seeg_industry"     # Industrial processes
dataset = "seeg_residuals"    # Waste and residuals

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. geo_level

Specifies geographic aggregation level.

geo_level = "state"  # character string

4. language

Output language for variable names and labels.

language = "eng"  # character string

Note on timing: Downloads may take considerable time due to file size.


Examples

Example 1: All sectors combined (raw data) at the country level

# download raw SEEG data (all sectors) at the country level
# note: dataset = "seeg" only works with raw_data = TRUE
all_emissions <- load_seeg(
  dataset = "seeg",
  raw_data = TRUE,
  geo_level = "country",
  language = "eng"
)

Example 2: Agricultural emissions by state

# download treated agricultural emissions at the state level
farming <- load_seeg(
  dataset = "seeg_farming",
  raw_data = FALSE,
  geo_level = "state",
  language = "eng"
)

Example 3: Land use change emissions by state

# download treated land use change emissions at the state level
land_use <- load_seeg(
  dataset = "seeg_land",
  raw_data = FALSE,
  geo_level = "state",
  language = "eng"
)

Example 4: Energy emissions by municipality

# download treated energy emissions at the municipality level
energy <- load_seeg(
  dataset = "seeg_energy",
  raw_data = FALSE,
  geo_level = "municipality",
  language = "eng"
)

Example 5: Industrial process emissions by state

# download treated industrial process emissions at the state level
industry <- load_seeg(
  dataset = "seeg_industry",
  raw_data = FALSE,
  geo_level = "state",
  language = "eng"
)

Example 6: Waste emissions by state

# download treated waste emissions at the state level
residuals <- load_seeg(
  dataset = "seeg_residuals",
  raw_data = FALSE,
  geo_level = "state",
  language = "eng"
)

Data Notes

Emission Sources Included

SEEG includes all major anthropogenic emission sources: - Agriculture (livestock, crops, soil) - Energy (electricity, transport, heating) - Land use change (deforestation, afforestation) - Industrial processes (cement, chemicals, metals) - Waste (landfills, wastewater)

Methodology

Estimates follow: - IPCC guidelines for national greenhouse gas inventories - Brazilian national inventory standards - International best practices - Transparent, documented assumptions

Data Quality

Limitations

  1. Fixed time period: Collection 9 covers 2000-2018 only
  2. File size: Large downloads; requires good internet
  3. Year aggregation: Cannot select individual years; entire dataset downloaded
  4. Revisions: Methodology may change in future SEEG releases
  5. Sub-national uncertainty: Municipal and state estimates have higher uncertainty than national

🔴 This function uses the googledrive package to download data at the municipality level. In case of authentication errors, see googledrive.

CENSOAGRO

Overview

The Census of Agriculture (Censo Agropecuário) is Brazil’s comprehensive survey of agricultural establishments and activities, conducted by IBGE (Instituto Brasileiro de Geografia e Estatística). This census collects detailed information about:

The census provides critical data for agricultural policy, market research, and understanding the structure of Brazilian agriculture across regional and temporal dimensions.

Data Coverage

Data is collected at multiple geographic levels: - Country level: aggregate national statistics - State level: disaggregated by Brazilian states - Municipality level: available for select datasets (currently "livestock_production")

Historical data spans from 1920 onwards, with different time series available for different datasets based on IBGE’s survey methodology evolution.


Available Datasets

1. agricultural_land_area

Provides comprehensive data on total agricultural land area and the number of agricultural properties.

2. agricultural_area_use

Details how agricultural properties use their land (crop farming, pasture, forests, etc.).

3. agricultural_employees_tractors

Captures information about the agricultural workforce and mechanization levels.

4. agricultural_producer_condition

Describes the tenure status of agricultural land (ownership, rental, partnership, etc.).

5. animal_production

Details the number of livestock animals farmed by species and type.

6. animal_products

Quantifies production volumes of animal-based products.

7. vegetable_production_area

Provides detailed crop production data including area planted and volume produced.

8. vegetable_production_temporary

Focuses specifically on temporary crops (annual crops that must be replanted each season).

9. vegetable_production_permanent

Focuses on permanent crops (perennial crops that produce for multiple years).

10. livestock_production

Specialized dataset on bovine cattle production and related establishments.


Function Parameters

1. dataset

Selects which dataset to download. See dataset descriptions above.

dataset = "agricultural_land_area"  # character string

2. raw_data

Controls whether to download the original data or the processed/cleaned version.

Default behavior: Raw data typically requires more cleaning and interpretation, while treated data is ready for immediate analysis.

raw_data = FALSE  # logical

3. geo_level

Specifies the geographic aggregation level.

geo_level = "state"  # character string

4. time_period

Defines which year(s) to download. Availability varies by dataset:

Dataset Available Years
agricultural_land_area 1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017
agricultural_area_use 1970, 1975, 1980, 1985, 1995, 2006, 2017
agricultural_employees_tractors 1970, 1975, 1980, 1985, 1995, 2006, 2017
agricultural_producer_condition 1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017
animal_production 1970, 1975, 1980, 1985, 1995, 2006, 2017
animal_products 1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017
vegetable_production_area 1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017
vegetable_production_temporary 1970, 1975, 1980, 1985, 1995, 2006, 2017
vegetable_production_permanent 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017
livestock_production 2017

You can request a single year or a range of years:

time_period = 2006           # single year
time_period = c(1995, 2006)  # multiple specific years
time_period = 1995:2006      # will select years within this range that are available

5. language

Output language for variable names and labels.

language = "eng"  # character string

Examples

# download treated land area data at the country level in 2017
data <- load_censoagro(
  dataset = "agricultural_land_area",
  raw_data = FALSE,
  geo_level = "country",
  time_period = 2017,
  language = "eng"
)

# download treated temporary crop data by state in 1995 in portuguese
data <- load_censoagro(
  dataset = "vegetable_production_temporary",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 1995,
  language = "pt"
)

# download municipality-level cattle data (only available for livestock_production)
data <- load_censoagro(
  dataset = "livestock_production",
  raw_data = FALSE,
  geo_level = "municipality",
  time_period = 2017,
  language = "eng"
)

Data Notes

Raw vs. Treated Data

Data Organization

When using treated data, the output is typically in long format with one row per observation unit, containing: - Geographic identifiers (state, municipality if applicable) - Year of the census - Product/category names (crop type, animal species, etc.) - Quantitative measurements (area, quantity, count) - Number of establishments/properties

Important Considerations

  1. Time gaps: Census data is not collected every year. Years with no data simply won’t be available.
  2. Geographic changes: Brazil’s state boundaries have changed historically; use caution when comparing very old data
  3. Definition changes: IBGE’s classification of crops and agricultural activities has evolved. Variables may not be directly comparable across all decades.
  4. Municipality data: Currently only available for livestock_production in 2017
  5. Download size: Historical data requests with multiple years may be large; plan accordingly

Citing the Data

When using this data in research or publications, cite:

IBGE - Instituto Brasileiro de Geografia e Estatística. Censo Agropecuário. Available at: https://sidra.ibge.gov.br/pesquisa/censo-agropecuario


Social Data

IPS

Overview

The Amazon Social Progress Index (IPS) is a comprehensive indicator framework that measures social and environmental progress in the Legal Amazon region. This collaborative initiative combines:

This dataset captures:

The IPS provides a holistic view of sustainable development, moving beyond simple economic measures (GDP) to encompass environmental sustainability and social well-being.

Data Source and Methodology

The Social Progress Index: - Based on 50+ individual indicators across 12 domains - Uses data from government agencies, NGOs, and research institutions - Aggregated into 3 main dimensions and 12 subdimensions - Indexed to 0-100 scale for comparability - Methodologically rigorous with transparent weighting

For detailed methodology, visit Social Progress Imperative.


Available Dimensions

The IPS framework includes 8 main dataset options:

1. all

Complete Social Progress Index with all dimensions and indicators.

2. life_quality

Indicators related to quality of life and well-being.

3. sanit_habit

Sanitation and habitat indicators.

4. violence

Public safety and violence indicators.

5. educ

Education and literacy indicators.

6. communic

Communication and connectivity indicators.

7. mortality

Health and mortality indicators.

8. deforest

Environmental and deforestation indicators.


Function Parameters

1. dataset

Selects which dimension(s) to download.

dataset = "all"         # All dimensions
dataset = "life_quality" # Quality of life metrics
dataset = "sanit_habit"  # Sanitation and habitat
dataset = "violence"     # Public safety and violence
dataset = "educ"         # Education indicators
dataset = "communic"     # Communication and connectivity
dataset = "mortality"    # Health and mortality
dataset = "deforest"     # Environmental and deforestation

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. time_period

Specifies which assessment year(s) to download.

Available years: 2014, 2018, 2021, 2023

time_period = 2023              # Most recent
time_period = c(2018, 2023)     # Specific years
time_period = c(2014, 2018, 2021, 2023)  # Multiple years

4. language

Output language for variable names and labels.

language = "eng"  # character string

Examples

# download raw IPS data from 2014
data <- load_ips(
  dataset = "all",
  raw_data = TRUE,
  time_period = 2014,
  language = "eng"
)

# download treated deforestation IPS data from 2018 in portuguese
data <- load_ips(
  dataset = "deforest",
  raw_data = FALSE,
  time_period = 2018,
  language = "pt"
)

Data Notes

Index Scales

Dimensions and Indicators

Each dimension contains multiple indicators: - Life quality: 4-6 indicators - Sanitation/habitat: 3-5 indicators
- Violence: 3-4 indicators - Education: 3-4 indicators - Communication: 2-3 indicators - Mortality: 3-4 indicators - Deforestation: 2-3 indicators

(Exact number varies by year and methodology)

Temporal Comparisons

When comparing across years (2014, 2018, 2021, 2023): - Methodology may have evolved between assessments - New indicators may have been added - Some municipalities may not have data in all years - Use caution comparing very old (2014) with recent (2023) data

Missing Data

Geographic Coverage


IEMA

Overview

Data from the Institute of Environment and Water Resources (Instituto de Energia e Meio Ambiente - IEMA), documenting electric energy access across the Amazon region. This dataset provides comprehensive information on populations without access to electric energy throughout the Legal Amazon in 2018, offering critical insights into energy poverty and infrastructure gaps in the region.

The IEMA dataset is particularly valuable for understanding energy access disparities in the Amazon, including remote communities, indigenous territories, and areas with limited infrastructure development.

Data Coverage

The IEMA dataset includes:

Dataset Description

Key Variables

  1. Population Without Energy Access: Number of individuals lacking access to electric energy
  2. Geographic Identifiers: Municipality, state, and biome information
  3. Settlement Type: Rural vs. urban classification where available
  4. Indigenous Territory Data: Energy access in indigenous lands (where applicable)
  5. Infrastructure Indicators: Distance to grid, cost of connection, barriers to access

Geographic Coverage

The Legal Amazon encompasses: - 9 Full States: Amazonas, Roraima, Acre, Amazonas, Rondônia, Mato Grosso, Amapá, Pará, Maranhão - Partial Coverage: Parts of Maranhão and other states - Total Area: Approximately 5.5 million square kilometers - Population Focus: Amazon-dwelling populations with particular attention to vulnerable groups

Energy Access Dimensions

The dataset addresses multiple aspects of energy poverty: - Access Type: Grid connection vs. off-grid solutions - Reliability: Service quality and availability - Affordability: Connection costs and monthly tariffs - Geographic Barriers: Remote location and access challenges - Population Characteristics: Indigenous communities, rural settlements, urban poor


Function Parameters

Options:

  1. dataset: "iema"

  2. raw_data:

  3. language:


Data Limitations

Since this dataset captures a single year (2018), it represents a snapshot rather than a time series. The cross-sectional nature means: - No Trend Analysis: Cannot track changes over time without merging with other sources - Temporal Stability: Conditions may have changed since 2018 - Recent Updates: Users may need to contact IEMA for more recent data


Examples

# download treated IEMA energy access data
data <- load_iema(
  raw_data = FALSE,
  language = "eng"
)

🔴 This function uses the googledrive package to download data. In case of authentication errors, see googledrive.

Population

Overview

Population dataset provides Brazil’s official population statistics from IBGE (Brazilian Institute of Geography and Statistics). This dataset includes:

Population data is fundamental to many analyses, providing denominators for per capita metrics, understanding urbanization patterns, tracking demographic trends, and supporting policy and development planning.

Data Source and Methodology

Population data comes from: - Census years (2007, 2010): Actual population enumeration - Estimation years (2001-2006, 2008-2009, 2011-2021): IBGE official population projections based on census data and vital statistics - Methodology: Cohort-component methodology considering births, deaths, migration - Updates: Revised annually as new vital statistics data becomes available

For more information, visit IBGE Population Estimates.


Available Dataset

population

Official Brazilian population statistics and estimates.


Important Data Characteristics

Census Years vs. Estimate Years

Year Type Years Nature
Census 2007, 2010 Actual population count
Estimates 2001-2006, 2008-2009, 2011-2021 Official IBGE projections

Census years provide actual counts; other years are official projections.

Population Definition


Function Parameters

1. dataset

Only one dataset is available:

dataset = "population"  # Population statistics

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. geo_level

Specifies geographic aggregation level.

geo_level = "municipality"  # character string

4. time_period

Specifies which year(s) to download.

Available years: 2001-2006, 2007 (census), 2008-2009, 2010 (census), 2011-2021

time_period = 2020              # single year
time_period = c(2010, 2020)     # specific years (includes 2010 census)
time_period = 2010:2020         # range of years

5. language

Output language for variable names.

language = "eng"  # character string

Examples

Example 1: Population by state for a single year

# download treated population data at the state level for 2021
pop_2021 <- load_population(
  dataset = "population",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2021,
  language = "eng"
)

Example 2: Population by state over time

# download treated population data at the state level for 2010 to 2021
pop_series <- load_population(
  dataset = "population",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2010:2021,
  language = "eng"
)

Example 3: Population by municipality

# download treated population data at the municipality level for 2020
pop_munic <- load_population(
  dataset = "population",
  raw_data = FALSE,
  geo_level = "municipality",
  time_period = 2020,
  language = "eng"
)

Data Notes

Data Structure

Each record contains: - Geographic identifier (state or municipality name/code) - Year - Population count

Simple and straightforward structure, useful as base for other calculations.

Census vs. Estimate Quality

Municipal Variations

Important Limitations

  1. Estimate years: Non-census years are projections, not actual counts
  2. Lag in release: May take time for latest estimates to be published
  3. Revisions: Estimates revised annually as better vital statistics data becomes available
  4. Boundary changes: Municipal boundaries changed; affects historical comparisons
  5. Definition: Includes registered residents; may not match actual lived population

Economic Data

COMEX

Overview

COMEX (Comércio Exterior - Foreign Trade) dataset provides Brazil’s official international trade statistics extracted from Siscomex, the Integrated System of Foreign Trade maintained by the Brazilian government.

This dataset captures:

COMEX is the primary official source for Brazil’s international trade statistics, widely used for trade policy analysis, business intelligence, academic research, and economic monitoring.

Data Source and Coverage

COMEX data comes from: - Official records from Siscomex (Brazil’s foreign trade system) - Mandatory declarations by exporters and importers - Updated monthly with current month data - Historical data from 1989 onwards

Important note on nomenclature: From 1989 to 1996, Brazil used a different system of product nomenclature (NBLC - Nomenclatura Brasileira de Mercadorias). All conversions to the current nomenclature system are available and the package handles this transparently.

For more information, visit the Brazilian Ministry of Productivity, Employment and Foreign Trade.


Available Datasets

1. export_mun (Exports by Municipality)

Export data disaggregated at the municipality level.

2. import_mun (Imports by Municipality)

Import data disaggregated at the municipality level.

3. export_prod (Exports by Producer)

Export data organized by producer/exporter and product.

4. import_prod (Imports by Producer)

Import data organized by importer/distributor and product.


Function Parameters

1. dataset

Selects which trade dataset to download.

dataset = "export_mun"   # exports by municipality
dataset = "import_mun"   # imports by municipality
dataset = "export_prod"  # exports by producer/exporter
dataset = "import_prod"  # imports by producer/importer

2. raw_data

Controls whether to download the original data or the processed/cleaned version.

raw_data = FALSE  # logical

3. time_period

Specifies which year(s) to download. Available from 1989 onwards.

time_period = 2020              # single year
time_period = c(2018, 2020)     # specific years
time_period = 2015:2020         # range of years

Note: Monthly data means each year can be quite large. Consider downloading specific years or ranges to manage file size.

4. language

Output language for variable names and documentation.

language = "eng"  # character string

Examples

# download treated exports data by municipality from 2020 to 2021
data <- load_br_trade(
  dataset = "export_mun",
  raw_data = FALSE,
  time_period = 2020:2021,
  language = "eng"
)

# download treated imports data by municipality from 2020 to 2021
data <- load_br_trade(
  dataset = "import_mun",
  raw_data = FALSE,
  time_period = 2020:2021,
  language = "eng"
)

Data Notes

Raw vs. Treated Data

Product Classification

Data Characteristics

  1. Monthly frequency: Data is reported monthly; aggregation to annual or quarterly is straightforward
  2. Producer vs. Municipality:
  3. Missing data: Some small trade flows may not be reported
  4. Currency: All values in USD

Nomenclature Conversion

When using data spanning 1989-1996 to 1997 onwards, be aware: - Product categories may differ between nomenclature systems - Conversions are available but not always 1:1 - Compare very old with recent data with caution

BACI

Overview

BACI (Base pour l’Analyse du Commerce International) is a comprehensive database of bilateral trade flows developed by CEPII (Centre d’Études Prospectives et d’Informations Internationales), a leading French research center on international trade.

This dataset provides:

The BACI dataset is widely used in academic research, policy analysis, and international trade studies due to its comprehensive coverage and quality control procedures.

Data Source and Methodology

BACI is constructed from UN Comtrade data with significant processing: - Reconciliation of discrepancies between reported imports and exports - Imputation of missing values using econometric techniques - Classification using the Harmonized System (HS) nomenclature - All values converted to USD for comparability

For detailed methodological information, visit the CEPII BACI documentation.


Available Dataset

HS92 (Harmonized System 1992)

The HS92 classification provides trade data organized under the 1992 version of the Harmonized System nomenclature.


Important Information About Downloads

Download Size and Time

Important: The BACI dataset is very large. All data is packaged in a single compressed file on the CEPII server. Even if you only need data for specific years, the entire dataset must be downloaded first, then processed locally.

Recommendation: Plan your download during off-peak hours or use a stable, high-speed connection.


Function Parameters

1. dataset

Currently only one dataset is available:

dataset = "HS92"  # Harmonized System 1992 classification

2. raw_data

Controls whether to download the original data or the processed/cleaned version.

raw_data = FALSE  # logical

3. time_period

Specifies which year(s) to download. You can request single or multiple years.

time_period = 2016              # single year
time_period = c(2010, 2015)     # multiple specific years
time_period = 2010:2020         # range (will select available years)

4. language

Output language for variable names and documentation.

language = "eng"  # character string


Examples

# download treated trade data for 2016 (HS92 classification)
# Warning: large download, may take a long time
trade_2016 <- load_baci(
  dataset = "HS92",
  raw_data = FALSE,
  time_period = 2016,
  language = "eng"
)

Data Notes

Raw vs. Treated Data

Data Structure

Each row typically represents a trade flow with: - Exporter: Country code (ISO 3-letter code) - Importer: Country code (ISO 3-letter code) - Year: Calendar year of the trade flow - Product code: HS 6-digit classification - Product name: Description of the product - Value: Trade value in USD - Quantity: Physical quantity (where available)

Country and Product Coverage

Quality Notes


PIB-Munic

Overview

PIBMUNIC (Produto Interno Bruto por Município - Gross Domestic Product by Municipality) is Brazil’s official municipal-level GDP data produced by IBGE. This dataset provides:

PIBMUNIC is essential for understanding Brazil’s regional economic structure, identifying economic disparities, analyzing sectoral specialization, and assessing economic development across municipalities.

Data Source and Methodology

PIBMUNIC data is compiled by IBGE using: - Data from CEMPRE (firm registry) for employment and output - Production and consumption surveys - Tax and financial records - Trade and services data - National accounts framework aligned with international standards

For more information, visit IBGE National Accounts.


Available Dataset

pibmunic

Complete municipal GDP statistics with sectoral detail.


Function Parameters

1. dataset

Only one dataset is available:

dataset = "pibmunic"  # Municipal GDP data

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. geo_level

Specifies geographic aggregation level.

geo_level = "municipality"  # character string

4. time_period

Specifies which year(s) to download.

time_period = 2020              # single year
time_period = c(2015, 2020)     # specific years
time_period = 2015:2020         # range of years

5. language

Output language for variable names.

language = "eng"  # character string

Examples

Example 1: Municipal GDP for a single year

# download treated municipal GDP data for 2020
pib_munic <- load_pibmunic(
  dataset = "pibmunic",
  raw_data = FALSE,
  geo_level = "municipality",
  time_period = 2020,
  language = "eng"
)

Example 2: State-level GDP over time

# download treated state-level GDP data for 2015 to 2020
pib_state <- load_pibmunic(
  dataset = "pibmunic",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2015:2020,
  language = "eng"
)

Example 3: Country-level GDP in Portuguese

# download treated country-level GDP data for 2010 to 2020 in Portuguese
pib_br <- load_pibmunic(
  dataset = "pibmunic",
  raw_data = FALSE,
  geo_level = "country",
  time_period = 2010:2020,
  language = "pt"
)

Data Notes

Variables Structure

Typical variables include: - gdp: Gross Domestic Product at current prices (R$) - value_added_agriculture: Farming and forestry sector - value_added_industry: Manufacturing and construction - value_added_services: Commerce, finance, transport, etc. - value_added_public_admin: Government services - net_taxes: Taxes minus subsidies - Additional detail depends on specific IBGE release

Current Prices

Data Coverage

Important Limitations

  1. Current prices: Values not adjusted for inflation
  2. Time lag: Recent years may not be available
  3. Confidentiality: Some small municipalities may have aggregated data
  4. Methodological changes: IBGE occasionally updates national accounts methodology
  5. Municipal boundaries: Changed in 2021; affects historical comparisons

CEMPRE

Overview

Employment, salary and firm data from IBGE’s Cadastro Central de Empresas (CEMPRE). This comprehensive dataset provides information on companies and other organizations registered with Brazil’s tax authority (Receita Federal), including employment levels, wage information, and business establishment data across Brazilian municipalities and sectors.

The CEMPRE dataset is one of the most detailed sources of firm-level data in Brazil, covering virtually all formal enterprises and organizations operating in the country.

Data Coverage

The CEMPRE dataset includes:

Dataset Description

Key Variables

  1. Establishment Count: Number of registered establishments
  2. Employment: Total number of employees and average employees per establishment
  3. Wages: Average salary, total payroll, and wage bill information
  4. Economic Classification: CNAE codes for sector identification
  5. Location: State, region, and municipality identifiers

Geographic Aggregation Levels

The data is available at three different aggregation levels: - Country Level: Aggregate statistics for all of Brazil - State Level: Data aggregated by state (27 units) - Municipality Level: Data disaggregated to municipality level (5,570+ municipalities)

Sectoral Detail

Data can be retrieved with sector disaggregation or aggregate form: - Sectoral Disaggregation: Detailed breakdown by CNAE 2.0 (main divisions and subdivisions) - Aggregate: Total across all sectors


Function Parameters

Options:

  1. dataset: "cempre"

  2. raw_data:

  3. geo_level:

  4. time_period: Specifies the years for which data will be downloaded (e.g., 2010:2020 for 2010 through 2020)

  5. language:

  6. sectors:


Examples

# download raw data at the country level from 2008 to 2010
data <- load_cempre(
  raw_data = TRUE,
  geo_level = "country",
  time_period = 2008:2010,
  language = "eng"
)

# download treated state-level data split by sector in portuguese
data <- load_cempre(
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2008:2010,
  language = "pt",
  sectors = TRUE
)

PAM

Municipal Agricultural Production (PAM, in Portuguese) is a nationwide annual survey conducted by IBGE (Brazilian Institute of Geography and Statistics) which provides information on agricultural products, such as quantity produced, area planted and harvested, average quantity of output and monetary value of such output. The products are divided in permanent and temporary farmed land, as well as dedicated surveys to the four products that yield multiple harvests a year (beans, potato, peanut and corn), which all sum to a total survey of 64 agricultural products (31 of temporary tillage and 33 of permanent tillage). Output, however, is only included in the dataset if the planted area occupies over 1 acre or if output exceeds one tonne.

Permanent farming is characterized by a cycle of long duration, whose harvests may be done multiple times across the years without the need of planting seeds again. Temporary farming, on the other hand, consists of cycles of short and medium duration, which after harvesting require planting seeds again.

The data also has multiple aggregation levels, such as nationwide, by region, mesoregion and microregion, as well as state and municipality.

The data available has a yearly frequency and is available from 1974 to the present, with the exception of the four multiple-harvest products, which are only available from 2003. More information can be found on this link (only in Portuguese).


Options:

  1. dataset: See tables below

  2. raw_data: there are two options:

  3. geo_level: "country", "region", "state", or "municipality"

  4. time_period: picks the years for which the data will be downloaded

  5. language: you can choose between Portuguese ("pt") and English ("eng")


The datasets supported are shown in the tables below, made up of both the original databases and their narrower subsets. Note that downloading only specific crops is considerably faster.

Full datasets provided by IBGE:
dataset
all_crops
temporary_crops
permanent_crops
corn
potato
peanut
beans
Datasets generated from Temporary Crops:
dataset Name (pt) Name (eng)
total_temporary Total Total
abacaxi Abacaxi Pineapple
alfafa Alfafa Fenada Alfafa Fenada
alho Alho Garlic
algodao_herbaceo Algodao Herbaceo (em Caroco) Herbaceous Cotton (in Caroco)
amendoim_temporary Amendoim (em Casca) Peanuts (in Shell)
arroz Arroz (em Casca) Rice (in husk)
aveia Aveia (em Grao) Oats (in grain)
batata_doce Batata Doce Sweet potato
batata_inglesa Batata Inglesa English potato
cana_de_acucar Cana de Acucar Sugar cane
cana_para_forragem Cana para Forragem Forage cane
castor_bean Mamona (Baga) Castor bean (Berry)
cebola Cebola Onion
cevada Cevada (em Grao) Barley (in Grain)
ervilha Ervilha (em Grao) Pea (in Grain)
fava Fava (em Grao) Broad Bean (in Grain)
feijao_temporary Feijao (em Grao) Beans (in Grain)
fumo Fumo (em Folha) Smoke (in Sheet)
girassol_sementes Girassol (em Grao) Sunflower (in Grain)
juta_fibra Juta (Fibra) Jute (Fiber)
linho_sementes Linho (Semente) Linen (Seed)
malva_fibra Malva (Fibra) Malva (Fiber)
mandioca Mandioca Cassava
melancia Melancia watermelon
melao Melao Melon
milho_temporary Milho (em Grao) corn (in grain)
rami_fibra Rami (Fibra) Ramie (Fiber)
rye Centeio (em Grao) Rye (in grain)
soja Soja (em Grao) Soybean (in grain)
sorgo Sorgo (em Grao) Sorghum (in Grain)
tomate Tomate Tomato
trigo Trigo (em Grao) Wheat in grain)
triticale Triticale (em Grao) Triticale (in grain)
Datasets generated from Permanent Crops:
dataset Name (pt) Name (eng)
acai Acai Acai
annatto_seeds Urucum (Semente) Annatto (Seed)
apple Maca Apple
avocado Abacate Avocado
banana Banana (Cacho) Banana (Bunch)
black_pepper Pimenta do Reino Black pepper
cashew Caju Cashew
cashew_nut Castanha de Caju Cashew Nuts
cocoa_beans Cacau (em Amendoa) Cocoa (in Almonds)
coffee_arabica Cafe (em Grao) Arabica Cafe (in Grao) Arabica
coffee_canephora Cafe (em Grao) Canephora Cafe (in Grain) Canephora
coffee_total Cafe (em Grao) Total Coffee (in Grain) Total
coconut Coco da Baia Coconut
coconut_bunch Dende (Cacho de Coco) Coconut Bunch
cotton_arboreo Algodao Arboreo (em Caroco) Arboreo cotton (in Caroco)
fig Figo Fig
grape Uva Grape
guarana_seeds Guarana (Semente) Guarana (Seed)
guava Goiaba Guava
heart_of_palm Palmito Palm heart
india_tea Cha da India (Folha Verde) India Tea (Leaf)
khaki Caqui Khaki
lemon Limao Lemon
mango Manga Mango
papaya Mamao Papaya
passion_fruit Maracuja Passion fruit
peach Pessego Peach
pear Pera Pear
permanent_total Total Total
quince Marmelo Quince
rubber_coagulated_latex Borracha (Latex Coagulado) Rubber (Coagulated Latex)
rubber_liquid_latex Borracha (Latex Liquido) Rubber (Liquid Latex)
sisal_or_agave Sisal ou Agave (Fibra) Sisal or Agave (Fiber)
tangerine Tangerina Tangerine
tung Tungue (Fruto Seco) Tung (Dry Fruit)
walnut Noz (Fruto Seco) Walnut (Dry Fruit)
yerba_mate Erva Mate (Folha Verde) Mate Herb (Leaf)

Examples:

# download treated data at the state level from 2010 to 2011 for all crops
data <- load_pam(
  dataset = "all_crops",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2010:2011,
  language = "eng"
)

PEVS

Overview

PEVS (Produção da Silvicultura e da Extração Vegetal - Silviculture and Forestry Extraction Production) is a comprehensive annual survey conducted by IBGE that collects data on forestry and related activities in Brazil.

This dataset provides:

PEVS is Brazil’s primary source for forestry production statistics, important for understanding the timber industry, forest management practices, and sustainable use of forest resources.

Data Source and Coverage

PEVS data comes from: - Direct surveys of forestry companies and producers - Administrative records from forestry operations - Compiled and validated by IBGE’s agriculture statistics division - Annual release with data for previous year - Covers both industrial and subsistence-level forestry activities

For more information, visit IBGE Agriculture Statistics.


Available Datasets

1. pevs_forest_crops

Production data from forest crop plantations (timber and non-timber products).

2. pevs_silviculture

Data on silviculture activities including afforestation, reforestation, and forest management.

3. pevs_silviculture_area

Total existing area used for silviculture operations, disaggregated by forest species.


Function Parameters

1. dataset

Selects which forestry dataset to download.

dataset = "pevs_forest_crops"       # Forest crop production (1986-2019)
dataset = "pevs_silviculture"       # Silviculture production (1986-2019)
dataset = "pevs_silviculture_area"  # Silviculture land area (2013-2019)

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. geo_level

Specifies the geographic aggregation level.

geo_level = "state"  # character string

4. time_period

Defines which year(s) to download.

Important: Dataset-specific availability:

Dataset Available Years
pevs_forest_crops 1986-2019
pevs_silviculture 1986-2019
pevs_silviculture_area 2013-2019
time_period = 2019              # single year
time_period = c(2010, 2015)     # specific years
time_period = 2010:2019         # range of years

5. language

Output language for variable names.

language = "eng"  # character string

Examples

Example 1: Forest crop production by state

# download treated forest crops data at the state level for 2019
forest_crops <- load_pevs(
  dataset = "pevs_forest_crops",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2019,
  language = "eng"
)

Example 2: Silviculture area by state over time

# download treated silviculture area data at the state level for 2013 to 2019
silvi_area <- load_pevs(
  dataset = "pevs_silviculture_area",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2013:2019,
  language = "eng"
)

Example 3: Silviculture production by region

# download treated silviculture production data at the region level for 2019
silvi_prod <- load_pevs(
  dataset = "pevs_silviculture",
  raw_data = FALSE,
  geo_level = "region",
  time_period = 2019,
  language = "eng"
)

Data Notes

Data Structure

Each row typically represents: - A geographic unit (country, region, state, or municipality) - A specific year - A product type or forest species - Quantity produced (in appropriate units: cubic meters, tons, etc.) - Value of production (in currency units)

Product Categories

Forest crops include: - Timber species: Eucalyptus, pine, other timber - Non-timber products: Charcoal, resins, turpentine, cork, bark - Other forest products: Tannin, rosin, pulpwood

Exact categories vary by dataset and year.

Raw vs. Treated Data

Important Limitations

  1. Data collection changes: Survey methodology may evolve over time
  2. Area vs. production: Area data only available from 2013; earlier years have production data only
  3. Municipality data sparse: Many small municipalities may have zero or no reported data
  4. Seasonal nature: Some products are seasonal; annual aggregates smooth out variation
  5. Informal forestry: May undercount small-scale or informal forestry operations

Units of Measurement


PPM

Overview

PPM (Pesquisa da Pecuária Municipal - Municipal Livestock Survey) is Brazil’s comprehensive annual survey of livestock activities conducted by IBGE. This dataset provides:

PPM is the primary data source for understanding Brazil’s livestock sector, which is economically significant and globally important for beef, poultry, and dairy exports.

Data Source and Methodology

PPM data is compiled from: - Direct surveys of livestock producers - Agricultural censuses and administrative records - Municipal agriculture secretariats - Processed and validated by IBGE - Annual release with data for reference year

For more information, visit IBGE Livestock Statistics.


Available Datasets

1. ppm_livestock_inventory

Total livestock herds disaggregated by animal species.

2. ppm_sheep_farming

Specialized data on sheep production and wool/fleece harvest.

3. ppm_animal_origin_production

Production of animal-based products (milk, eggs, honey, wool, etc.).

4. ppm_cow_farming

Detailed dairy cow farming data with milking and productivity metrics.

5. ppm_aquaculture

Aquaculture activities including fish, shrimp, and mollusk farming.


Function Parameters

1. dataset

Selects which livestock/animal production dataset to download.

dataset = "ppm_livestock_inventory"     # Animal populations by species
dataset = "ppm_sheep_farming"           # Sheep and wool production
dataset = "ppm_animal_origin_production"  # Milk, eggs, honey, wool
dataset = "ppm_cow_farming"             # Dairy cow productivity
dataset = "ppm_aquaculture"             # Fish and aquaculture production

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. geo_level

Specifies geographic aggregation level.

geo_level = "state"  # character string

4. time_period

Specifies which year(s) to download.

time_period = 2020              # single year
time_period = c(2010, 2020)     # specific years
time_period = 2010:2020         # range of years

Note: All datasets available from 1974 onwards, though aquaculture more complete from 2000s.

5. language

Output language for variable names.

language = "eng"  # character string

Examples

Example 1: Livestock inventory by state

# download treated livestock inventory data at the state level for 2020
livestock <- load_ppm(
  dataset = "ppm_livestock_inventory",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2020,
  language = "eng"
)

Example 2: Dairy cow farming by state

# download treated dairy cow data at the state level for 2020
dairy <- load_ppm(
  dataset = "ppm_cow_farming",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2020,
  language = "eng"
)

Example 3: Animal origin production at the country level

# download treated animal origin production data at the country level for 2020
animal_products <- load_ppm(
  dataset = "ppm_animal_origin_production",
  raw_data = FALSE,
  geo_level = "country",
  time_period = 2020,
  language = "eng"
)

Example 4: Sheep farming by state

# download treated sheep farming data at the state level for 2020
sheep <- load_ppm(
  dataset = "ppm_sheep_farming",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2020,
  language = "eng"
)

Example 5: Aquaculture by state over time

# download treated aquaculture data at the state level for 2015 to 2020
aquaculture <- load_ppm(
  dataset = "ppm_aquaculture",
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2015:2020,
  language = "eng"
)

Data Notes

Data Structure

Each record typically contains: - Geographic identifier (state or municipality) - Year - Animal species or product type - Quantity (number of animals or production volume) - Value (if applicable) - Number of establishments

Units of Measurement

Raw vs. Treated Data

Important Limitations

  1. Survey-based data: Subject to sampling and reporting error
  2. Informal operations: May undercount small or informal livestock operations
  3. Data lag: Published with delay; recent years may not be available
  4. Aquaculture newer: Aquaculture data less complete for very early years
  5. Methodology changes: Survey methods may evolve; can affect comparability

SIGMINE

Overview

SIGMINE (Sistema de Informações Geográficas da Mineração - Mining Geographic Information System) is Brazil’s official mining registry maintained by the National Mining Agency (ANM), the regulatory body that succeeded the former National Department of Mineral Production (DNPM).

This dataset provides:

SIGMINE data is critical for understanding Brazil’s mining sector impact on the environment, economy, and land use, particularly in sensitive regions like the Amazon.

Data Source and Information

SIGMINE data comes from: - Brazilian mining authorization and regulatory system - Geological Service of Brazil (SGB-CPRM) - Environmental licensing agencies - Updated continuously as mining permits are issued/revoked - Publicly accessible through government databases

For more information, visit the National Mining Agency (ANM) and mining databases.


Available Dataset

sigmine_active (Active Mining Operations)

Comprehensive registry of legally operating mining projects in Brazil.


Function Parameters

1. dataset

Only one dataset is available:

dataset = "sigmine_active"  # Active mining operations

2. raw_data

Controls whether to download original or processed data.

raw_data = FALSE  # logical

3. language

Output language for variable names and documentation.

language = "eng"  # character string

Data Structure

SIGMINE data typically includes:


Examples

Example 1: Active mining data in portuguese

# download treated active mining data in portuguese
mining_active <- load_sigmine(
  dataset = "sigmine_active",
  raw_data = FALSE,
  language = "pt"
)

Example 2: Active mining data in english

# download treated active mining data in english
mining_eng <- load_sigmine(
  dataset = "sigmine_active",
  raw_data = FALSE,
  language = "eng"
)

Data Notes

Mining Authorization Status

SIGMINE tracks mines at various stages: - Active: Currently operating, producing minerals - Suspended: Temporarily halted operations - Recovery: Reclamation and environmental restoration phase - Other: Various other regulatory statuses

Minerals Covered

SIGMINE includes: - Precious metals: Gold, silver, platinum - Industrial minerals: Iron, copper, aluminum, tin, others - Energy minerals: Coal, uranium - Gemstones: Diamonds, emeralds, others - Construction minerals: Sand, gravel, granite

Data Updates

Geographic Information

Important Limitations

  1. Only legal operations: Artisanal/informal mining not included
  2. Status lags: Official status may lag actual operational status
  3. Environmental permits: May not reflect real-time compliance
  4. Spatial accuracy: Boundaries may have uncertain precision
  5. Definition changes: Mining classification/categories may evolve

ANEEL

Loads data from the National Electrical Energy Agency (ANEEL), a Brazilian independent federal agency linked to the Ministry of Mines and Energy (MME). ANEEL works to provide favorable conditions for the Electrical Energy Market to develop with balance and for the benefit of society.

As for now, there are three different datasets available for download: Energy Development Budget, Energy Generation, and Energy Enterprises Distributed.

Energy Development Budget

The Energy Development Budget dataset showcases the Energy Development Account’s (CDE) annual budget expenses. The CDE is designed to promote the Brazilian energy development and is managed by the Electrical Energy Commercialization Chamber (CCEE).

In the current implementation, data is available from 2017 to 2022 and must be downloaded by year. The year argument can be a single year or a vector of years. The dataset includes the type of expense, its value in R$ (Reais), and its share over the total amount of CDE budget expenses in each year.*.

*Note that ‘share_of_total’ values sum to 1 for each year available.

Energy Generation

The Energy Generation dataset showcases information about ANEEL’s Generation Informations System (SIGA). SIGA provides information about the Brazilian electrical energy generation installed capacity.

The dataset provides information at the individual venture/entity level. It contains information about the power, source, stage, type of permission, origin and final fuel with which each venture/entity operates, as well as other legal, technical and geographical information.* Operation start dates contained in the dataset go as far back as 1924 up to 2022.

* For more details on each variable, access This link and select “Manual do Usuario”.

Energy Enterprises

The Energy Enterprises dataset showcases information about distributed micro and mini generators, covered by the Regulatory Resolution nº 482/2012. The list of projects is classified by variables that make up their identification, namely: connected distributor, project code, numerical nucleus of the project code, owner name, production class, subgroup, name of the owner, number of consumer units that receive credits, connection date, type of generating unit, source, installed power, municipality, and federative unit where it is located.

The data is expressed in quantities and installed power in kW (kilowatt). The quantity corresponds to the number of distributed micro or mini generators installed in the specified period. The installed power is defined by the sum of the nominal active electric power of the generating units.

* For more details on each variable, access This link and select “Dicionário de dados”.


Options:

  1. dataset: there are three choices:
  2. raw_data: there are two options:
  3. language: you can choose between Portuguese ("pt") and English ("eng")
  4. year: only used for “energy_development_budget”. It can be a single year or a vector of years from 2017:2022. This argument is required for “energy_development_budget” and can be omitted for the other datasets.

Examples:

# download treated data about energy generation
clean_aneel <- load_aneel(
  dataset = "energy_generation",
  raw_data = FALSE
)


# download raw CDE data for one year
budget_aneel_2019 <- load_aneel(
  dataset = "energy_development_budget",
  raw_data = TRUE,
  year = 2019
)

# download raw CDE data for multiple years
budget_aneel_multi <- load_aneel(
  dataset = "energy_development_budget",
  raw_data = TRUE,
  year = c(2020, 2021)
)

EPE

Loads data from the Energy Research Company (EPE), a Brazilian public company that works closely with the Brazilian Ministry of Mines and Energy (MME) and other agencies to ensure the sustainable development of Brazil’s energy infrastructure. EPE’s duty on that mission is to support MME with quality research and studies in order to aid Brazil’s energy infrastructure planning.

As for now, there are four different datasets available for download: Consumer Energy Consumption, Industrial Energy Consumption, the National Energy Balance, and the State Energy Production Panel. All of them were obtained from the EPE website.

Consumer Energy Consumption

The Consumer Energy Consumption dataset provides monthly data from 2004 to 2025 about energy consumption and number of consumers. The data is organized by State, Region, or Electric Subsystem, and is broken down by class of service and type of consumer.

The available classes are: Residential, Commercial, Industrial, Rural, and Others. For each observation, the dataset reports the type of consumer (Captive or Free), total consumption in megawatt-hours (MWh), and the number of consumers.

When using the Subsystem or Region level, consumer totals are provided but are not disaggregated for all classes and consumer types.

Industrial Energy Consumption

The Industrial Energy Consumption dataset provides monthly data from 2004 to 2025 on energy consumption by industrial sector. Data is available at the State or Subsystem level. Each observation identifies the industrial sector responsible for the consumption and the amount consumed in megawatt-hours (MWh).

National Energy Balance

The National Energy Balance is a thorough and extensive research developed and published by EPE that contains useful data about energy production, consumption, imports, exports, transformation, and final use.

The processed dataset provides yearly data from 2003 to 2023. It covers all Brazilian energy sources (such as petróleo, gás natural, carvão, eletricidade, lenha, solar and others) and distinguishes between different types of energy flow: production, transformation, final consumption, losses, and adjustments.

Each energy source appears as a separate column in the original spreadsheets. The cleaned data is returned in long format, with one row per combination of year, energy source, and account type. The account type is labeled to indicate whether it refers to production, transformation (for example, “TRANSFORMAÇÃO – REFINARIAS DE PETRÓLEO”), or consumption (for example, “CONSUMO – RESIDENCIAL”).

State Energy Production Panel

The State Energy Production Panel provides yearly data from 2011 to 2024 on electricity generation by energy source and Brazilian state. The data is sourced from EPE’s BEN Chapter 8 (Dados Estaduais) and covers all 27 states, including the Federal District.

Each row corresponds to one state-year combination. The dataset includes production by source (hydro, wind, solar, nuclear, thermal, sugar cane, firewood, black liquor, other renewables, steam coal, natural gas, coke oven gas, fuel oil, diesel, and other non-renewables), total production, and an indicator of whether the state belongs to the Legal Amazon.


Options:

  1. dataset: there are three choices:
    "consumer_energy_consumption": monthly energy consumption and consumers by State, Region or Electric Subsystem
    "industrial_energy_consumption": monthly industrial energy consumption by State or Subsystem
    "national_energy_balance": yearly energy flow by account and energy source
    "energy_state_panel": yearly energy production by source and state

  2. raw_data: there are two options:
    TRUE: if you want the data as it is originally.
    FALSE: if you want the treated version of the data.

  3. geo_level: only applies to "consumer_energy_consumption" and "industrial_energy_consumption" datasets.
    "state"
    "subsystem"

  4. language: you can choose between Portuguese ("pt") and English ("eng")


Examples:

# download treated data about consumer energy consumption at the state level
clean_epe <- load_epe(
  dataset = "consumer_energy_consumption",
  geo_level = "state",
  raw_data = FALSE
)

# download treated data from the National Energy Balance
balance <- load_epe(
  dataset = "national_energy_balance",
  raw_data = FALSE
)

# download treated data on state energy production panel
state_panel <- load_epe(
  dataset = "energy_state_panel",
  raw_data = FALSE,
  language = "eng"
)

Other tools

Overview

The municipalities dataset is a foundational reference dataset included in the datazoom.amazonia package. It contains key information about all Brazilian municipalities and identifies which are located in (or overlapping with) the Legal Amazon region.

This dataset includes:

The municipalities dataset is essential infrastructure for this package—most package functions that return geographic data match results to this municipalities reference to enable Legal Amazon filtering and consistent geographic identification.

Data Source

The municipalities dataset is compiled from: - IBGE: Brazilian Institute of Geography and Statistics official municipal data - Legal Amazon definition: Based on official Brazilian government definition - Spatial boundaries: IBGE 2019 municipal boundary shapefile - Maintained by: datazoom.amazonia package developers


Dataset Structure

The municipalities dataset contains the following information:

Key Variables

Data Types


Accessing the Dataset

In R Code

# Load Brazilian municipalities dataset
data <- datazoom.amazonia::municipalities

# Or after loading the package
library(datazoom.amazonia)
data <- municipalities

# View structure
str(municipalities)
head(municipalities)

# Filter for Legal Amazon municipalities only
amazon_municipalities <- municipalities %>%
  filter(legal_amazon == TRUE)

What It Contains

The dataset includes all 5,570+ Brazilian municipalities with: - Official IBGE identification codes - State and region classification - Legal Amazon flag for filtering - Spatial geometries for geographic analysis


Common Use Cases

Many analyses focus specifically on the Legal Amazon region. Use the municipalities dataset to identify relevant municipalities:

library(dplyr)
# Load any dataset with municipality information
data <- load_prodes(
  dataset = "deforestation",
  raw_data = FALSE,
  geo_level = "municipality",
  language = "eng"
)

# Filter to Legal Amazon using municipalities reference
amazon_data <- data %>%
  inner_join(
    municipalities %>% 
      filter(legal_amazon == TRUE) %>%
      select(code, name),
    by = c("municipality_code" = "code")
  )

Use Case 2: Partial Amazon Municipalities

Important: Some municipalities are only partially within the Legal Amazon.

For statistics reported at municipality level in this package: - Partial Amazon municipalities: Data is reported for only the Amazon-included portion - Identification: The municipalities dataset identifies these cases - Interpretation: When a municipality is partially in Amazon, reported statistics reflect the Amazon portion only

# Identify fully vs. partially included municipalities
full_amazon <- municipalities %>%
  filter(legal_amazon == TRUE & fully_included == TRUE)

partial_amazon <- municipalities %>%
  filter(legal_amazon == TRUE & fully_included == FALSE)

print(paste("Fully in Amazon:", nrow(full_amazon)))
print(paste("Partially in Amazon:", nrow(partial_amazon)))

Key Characteristics

Municipal Coverage

The Legal Amazon includes: - States: All or parts of Acre, Amapá, Amazonas, Distrito Federal, Goiás, Maranhão, Mato Grosso, Mato Grosso do Sul, Pará, Rondônia, Roraima, Tocantins - Municipalities: 570+ municipalities fully or partially in Legal Amazon - Definition: Based on official Brazilian legislation (Law 8,001/1990)

Partial Inclusion

Important note on municipality-level data: - Partial municipalities: Some municipalities extend beyond Legal Amazon boundaries - Data reporting: When data is reported at municipality level for partial municipalities, it reflects only the Legal Amazon portion - Identification: This dataset identifies which municipalities are partial

# Check for partial municipalities
partial_check <- municipalities %>%
  filter(legal_amazon == TRUE) %>%
  filter(!is.na(amazon_percentage)) %>%
  filter(amazon_percentage < 100)

if (nrow(partial_check) > 0) {
  print("Municipalities partially in Legal Amazon:")
  print(partial_check)
}

Important Notes

Boundary Changes

Brazilian municipalities occasionally undergo changes: - New municipalities: Created from existing ones (last major change 2021) - Boundary adjustments: IBGE periodically refines municipal boundaries - Historical data: When comparing very old data with recent data, be aware municipalities may have been reorganized

Code Consistency

Always use IBGE municipality codes (not names) when: - Merging multiple datasets - Doing time-series analysis - Comparing across sources - Municipality names may change or be ambiguous; codes are unique and stable

Spatial Data

The municipalities dataset includes spatial boundaries (when loaded as SF object): - Format: Simple features (SF) polygons - CRS: WGS84 (EPSG:4326) - Use: Spatial operations, mapping, spatial joins with other geographic data


Accessing Spatial Data

For Mapping and Spatial Analysis

library(sf)
library(ggplot2)

# Load municipalities with geometry
municipalities_sf <- municipalities %>%
  st_as_sf()  # If not already SF format

# Map Legal Amazon
amazon_map <- municipalities_sf %>%
  filter(legal_amazon == TRUE)

ggplot(amazon_map) +
  geom_sf(fill = "lightgreen", color = "darkgreen") +
  labs(title = "Legal Amazon Municipalities") +
  theme_minimal()

# Spatial operations example: count municipalities by state
munic_by_state <- municipalities_sf %>%
  group_by(state) %>%
  summarize(
    num_municipalities = n(),
    total_area_km2 = sum(st_area(.), na.rm = TRUE) / 1e6,
    .groups = 'drop'
  )

Troubleshooting

Matching Issues

Problem: Municipality names don’t match between datasets Solution: Use IBGE municipality codes instead of names for joining data

Problem: Some municipalities missing after filtering Solution: Check for name spelling variations; use code-based matching

Geographic Analysis

Problem: Spatial operations are slow Solution: Simplify geometries (st_simplify()) or work with state/region level first

Problem: Mapping appears incorrect Solution: Verify CRS (should be WGS84); check for invalid geometries (st_is_valid())

Data Integration

Problem: Aggregating municipality data to regions Solution: Use left_join() with municipalities dataset to add region information

Problem: Partial municipalities causing data discrepancies Solution: Account for partial Amazon municipalities; some statistics are reported for Amazon portion only


The ‘googledrive’ package

For some of our functions, the original data is stored in Google Drive and exceeds the file size limit for which direct downloads are possible. As a result, the googledrive package is required to download the data though the Google Drive API and run the function.

The first time the package is called, it requires you to link your Google account and grant permissions to be able to download data through the Google Drive API.

You must tick all boxes when the permissions page opens, or else the following error will occur:

# Error in `gargle_abort_request_failed()`:
# ! Client error: (403) Forbidden
# Insufficient Permission: Request had insufficient authentication scopes.
# • domain: global
# • reason: insufficientPermissions
# • message: Insufficient Permission: Request had insufficient authentication
#  scopes.
# Run `rlang::last_error()` to see where the error occurred.

For further information, click here to access the official package page.

Credits

DataZoom is developed by a team at Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), Department of Economics. Our official website is at: https://datazoom.com.br/pt/.

To cite package datazoom.amazonia in publications use:

Data Zoom (2023). Data Zoom: Simplifying Access To Brazilian Microdata.
https://datazoom.com.br/en/

A BibTeX entry for LaTeX users is:

@Unpublished{DataZoom2023,
    author = {Data Zoom},
    title = {Data Zoom: Simplifying Access To Brazilian Microdata},
    url = {https://datazoom.com.br/en/},
    year = {2023},
}