The datazoom.amazonia package facilitates access to official Brazilian Amazon data, including agriculture, deforestation, production. The package provides functions that download and pre-process selected datasets.
You can install the released version of
datazoom.amazonia from CRAN
with:
install.packages("datazoom.amazonia")And the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("datazoompuc/datazoom.amazonia")
|
|
|
|
|
|
PRODES (Projeto de Monitoramento da Floresta Amazônica Brasileira - Project for Monitoring the Brazilian Amazon Forest) is Brazil’s primary satellite-based deforestation monitoring system operated by INPE (National Institute for Space Research).
This dataset provides:
PRODES is the authoritative source for Amazon deforestation statistics, used for environmental monitoring, policy evaluation, and international reporting on Brazilian forest protection.
PRODES monitoring: - Uses satellite imagery from multiple sources (Landsat, CBERS, others) - Automated and manual analysis to detect clear-cut deforestation - Annual assessment of forest loss - Recently (2020+) raster data published through TerraBrasilis - Data available as both raster files and aggregated by municipality
For more information, visit INPE PRODES Project and TerraBrasilis.
Clear-cut deforestation areas (complete forest loss).
Deforestation that was not captured in previous surveys (detected through improved methodology).
Remaining native forest and natural vegetation areas.
Water bodies and hydrographic features.
Non-forest areas including savanna, grasslands, and other vegetation types.
Cloud cover in satellite imagery (data quality indicator).
raw_data = TRUE): Returns
SpatRaster objects (raster grids) from TerraBrasilisraw_data = FALSE):
Aggregated to municipality level showing total affected area| Year | Type | Definition |
|---|---|---|
| 2007 | Cumulative | All deforestation from 1988-2007 |
| 2008-2023 | Incremental | Deforestation detected in that specific year |
When analyzing time trends, be aware that 2007 includes 19 years of accumulated loss.
Selects which landscape classification to download.
dataset = "deforestation" # Clear-cut deforestation (main product)
dataset = "residual_deforestation" # Previously undetected deforestation
dataset = "native_vegetation" # Remaining forests and vegetation (2023 only)
dataset = "hydrography" # Water bodies (2023 only)
dataset = "non_forest" # Savanna and other non-forest (2023 only)
dataset = "clouds" # Cloud cover (2023 only)Controls data format: raster or municipality aggregates.
TRUE: Returns raw raster data (SpatRaster objects)FALSE: Returns municipality-level aggregated area
dataraw_data = FALSE # logicalImportant: Raster data is large; ensure sufficient storage and memory.
Specifies which year(s) to download.
Available by dataset:
| Dataset | Available Years |
|---|---|
deforestation |
2007 (cumulative), 2008-2023 (incremental) |
residual_deforestation |
2010-2023 |
native_vegetation |
2023 only |
hydrography |
2023 only |
non_forest |
2023 only |
clouds |
2023 only |
time_period = 2020 # single year
time_period = c(2015, 2020) # multiple years
time_period = 2015:2020 # range of years (for deforestation only)Output language.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated deforestation data for 2023
deforestation <- load_prodes(
dataset = "deforestation",
raw_data = FALSE,
time_period = 2023,
language = "eng"
)# download treated deforestation data for 2008 to 2023
deforestation_series <- load_prodes(
dataset = "deforestation",
raw_data = FALSE,
time_period = 2008:2023,
language = "eng"
)# download 2007 (cumulative 1988-2007) plus all years up to 2023
all_deforestation <- load_prodes(
dataset = "deforestation",
raw_data = FALSE,
time_period = c(2007, 2008:2023),
language = "eng"
)# download treated residual deforestation data for 2020
residual <- load_prodes(
dataset = "residual_deforestation",
raw_data = FALSE,
time_period = 2020,
language = "eng"
)DETER (Real-Time Detection System) is a satellite-based monitoring system operated by INPE (National Institute of Space Research) that detects and reports changes in forest cover with near real-time frequency. This system provides:
DETER is the primary early-warning system for deforestation in the Amazon, used by Brazilian environmental agencies for enforcement, research institutions for analysis, and internationally for monitoring compliance with forest conservation goals.
DETER monitoring: - Uses satellite imagery from multiple sources (MODIS, Landsat, Sentinel) - Automated and manual analysis to detect recent disturbances - Generates “alerts” - polyons representing areas of detected change - Updates available regularly (frequency varies by satellite availability) - Operated by INPE’s Remote Sensing Division
For technical details, visit INPE DETER System.
DETER monitoring data for the Legal Amazon biome.
DETER monitoring data for the Cerrado biome.
The raw DETER data from INPE reports one alert/detection per row, with each row typically associated with a single municipality in the raw data. However, many alerts actually overlap multiple municipalities (typically 2-4 municipalities) that are not all shown in the original records. ### Data Processing
This package provides an important enhancement: spatial intersection with IBGE municipality geometries (2019 version) to identify ALL municipalities that each DETER alert overlaps with. This creates a more complete and accurate geographic picture than the original raw data.
Important note on CRS metadata: The CRS (Coordinate Reference System) information may need verification after loading, as coordinate system metadata can sometimes be unclear in the original INPE data.
Selects which biome’s DETER data to download.
dataset = "deter_amz" # Legal Amazon monitoring
dataset = "deter_cerrado" # Cerrado biome monitoringControls whether to download the original data or the processed/enhanced version.
TRUE: Returns raw INPE data with limited municipality
informationFALSE: Returns treated data with spatial intersection
to identify all affected municipalities, standardized English variable
namesraw_data = FALSE # logicalRecommendation: Use raw_data = FALSE to
get the enhanced municipality identification from spatial
intersection.
Output language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated DETER Amazon data
deter_amz <- load_deter(
dataset = "deter_amz",
raw_data = FALSE,
language = "eng"
)
# download treated DETER Cerrado data
deter_cerrado <- load_deter(
dataset = "deter_cerrado",
raw_data = FALSE,
language = "eng"
)Each alert/row typically contains: - Spatial geometry: Polygon coordinates (SF object) - Detection date: When the alert was issued - Alert type: Deforestation, forest degradation, or other disturbance - Area: Size of detected change in hectares - Municipality: Geographic unit identification - State: Brazilian state - Metadata: Satellite source, confidence level (varies by product)
The DEGRAD project is a research initiative that uses satellite imagery to monitor forest degradation in the Amazon. Unlike DETER’s near real-time alerts, DEGRAD provides a more detailed annual analysis of forest degradation patterns.
This dataset captures:
DEGRAD data is valuable for understanding forest degradation as a distinct phenomenon from clear-cut deforestation, important for carbon accounting, biodiversity protection, and understanding transition stages toward complete forest loss.
DEGRAD monitoring: - Conducted by INPE’s forest monitoring programs - Uses satellite imagery interpretation to identify forest degradation signs - Focuses on selective logging, small-scale agriculture, forest fires, and other degrading activities - Released as annual editions with comprehensive analysis - Limited documentation available (original INPE documentation is sparse)
For information, visit INPE Forest Monitoring.
Important: DEGRAD data is organized differently than real-time systems. Key points:
This package enhances the raw DEGRAD data by: - Intersecting DEGRAD spatial polygons with IBGE municipality boundaries (2019 version) - Providing municipality identification for each degradation event - Converting to Simple Features (SF) objects for spatial analysis
Note on CRS: Coordinate system metadata should be verified after loading, as original INPE data sometimes has unclear CRS information.
Detailed monitoring of forest degradation across the Legal Amazon.
Only one dataset is available:
dataset = "degrad" # Forest degradation monitoringControls whether to download the original data or the processed/enhanced version.
TRUE: Returns raw INPE data with minimal
processingFALSE: Returns treated data with English variable
names, municipality identification from spatial intersection, and
standardized formattingraw_data = FALSE # logicalRecommendation: Use raw_data = FALSE
for most applications to get municipality-level information. ### 3.
time_period
Specifies which year(s) of degradation events to download.
Important: When you request a year, you get events from that year regardless of which DEGRAD edition they appear in.
time_period = 2015 # single year
time_period = c(2010, 2015) # multiple specific years
time_period = 2010:2015 # range of yearsOutput language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character stringThe returned data is a Simple Features (SF) spatial object with:
# download treated forest degradation data from 2010 to 2012
data <- load_degrad(
dataset = "degrad",
raw_data = FALSE,
time_period = 2010:2012,
language = "eng"
)The annual edition structure (e.g., “DEGRAD 2016”) mixed with variable event years within those editions means: - When you request year 2015, you get all detected 2015 events regardless of edition - Some 2015 events may appear in both DEGRAD 2015 and DEGRAD 2016 editions - Duplication is handled in the data loading process
Common degradation types include: - Selective logging: Commercial timber extraction - Forest fires: Fire damage to forest areas - Agricultural clearing: Small-scale farming expansion - Mining: Degradation from mining activities - Other: Mixed or unclassified degradation causes
(Exact categories vary by edition; verify with your loaded data)
st_is_valid() to checkImazon (Instituto do Homem e Meio Ambiente da Amazônia - Institute of Man and Environment of the Amazon) is an independent Brazilian research organization that produces its own deforestation monitoring and analysis.
This dataset provides:
Imazon’s deforestation pressure index helps identify which municipalities face high deforestation risk, valuable for targeted conservation, enforcement, and development planning.
Imazon’s classification is based on: - Deforestation monitoring using satellite imagery - Analysis of deforestation trends and drivers - Risk assessment methodology to classify municipalities - Regular updates as monitoring data accumulates
For more information, visit Imazon Official Website.
Municipality-level deforestation pressure classification with spatial geometries.
Imazon’s three-level classification system: | Category | Pressure Level | Description | |———-|—|—| | 0 | No/Low Pressure | Minimal deforestation threat | | 1 | Moderate Pressure | Some deforestation risk | | 2 | High Pressure | Significant deforestation threat | | 3 | Very High Pressure | Severe/Critical deforestation risk |
Only one dataset is available:
dataset = "imazon_shp" # Imazon municipality pressure classificationControls whether to download the original data or the processed/cleaned version.
TRUE: Returns raw shapefile data exactly as provided by
ImazonFALSE: Returns treated data with standardized English
variable names, consistent formatting, and SF object structureraw_data = FALSE # logicalOutput language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated Imazon deforestation pressure data
data <- load_imazon(
raw_data = FALSE,
language = "eng"
)The 0-3 scale represents Imazon’s assessment of deforestation risk: - Based on historical deforestation trends - Current forest cover - Accessibility and economic drivers - Human pressure factors
This dataset works well in combination with: - DETER: Compare pressure classification with actual recent deforestation - DEGRAD: Correlate pressure with degradation patterns - CEMPRE: Analyze relationship between employment and deforestation pressure - COMEX: Examine trade patterns in high-pressure vs low-pressure areas
🔴 This function uses the googledrive package to
download data. In case of authentication errors, see googledrive.
The Brazilian Institute of Environment and Renewable Natural Resources (Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis - IBAMA) dataset documents environmental enforcement actions, including embargoes of productive assets and fines issued for environmental violations. This dataset provides individual-level records of environmental infractions from 2005 onwards, representing the enforcement activity of Brazil’s primary federal environmental agency.
The data covers environmental violations across multiple sectors including deforestation, illegal mining, wildlife trafficking, illegal fishing, and other environmental crimes.
The IBAMA dataset includes:
The function provides access to three distinct datasets tracking different stages of environmental enforcement:
"embargoed_areas")
"distributed_fines")
"collected_fines")
The function returns either: - Raw Data: Individual infraction records (original format) - Aggregated Data: Summary statistics for each time-location period, including: - Total number of infractions - Infractions sent to prosecution - Infractions with ongoing legal proceedings - Embargoed area totals - Fine totals and collection rates
Options:
"embargoed_areas": Embargoed productive areas"distributed_fines": Issued fines (paid or unpaid)"collected_fines": Fines that have been paidTRUE: Individual-level records as originally
recordedFALSE: Treated/aggregated version of the data"all")"AC" (Acre)c("AC", "AM", "AP") (Acre,
Amazonas, Amapá)"pt": Portuguese language labels and names"eng": English language# download treated embargoed areas data in english
data <- load_ibama(
dataset = "embargoed_areas",
raw_data = FALSE,
language = "eng"
)
# download treated collected fines data from Bahia
data <- load_ibama(
dataset = "collected_fines",
raw_data = FALSE,
states = "BA",
language = "pt"
)The MapBiomas project provides comprehensive satellite-based data on land cover and land use changes across Brazil. The project uses machine learning algorithms applied to Landsat satellite imagery to classify and track changes in biomes, land cover types, and economic activities over time. This dataset is essential for understanding deforestation, agricultural expansion, conservation outcomes, and landscape dynamics in the Amazon and other biomes.
MapBiomas represents one of the most detailed and scientifically rigorous sources of remote sensing data on Brazilian land cover, with coverage spanning from 1985 to present and annual updates.
The MapBiomas dataset includes:
"mapbiomas_cover")Annual maps of land cover types including: - Forest Types: Native forest, forest formation, forest plantation - Non-Forest Vegetation: Grassland/pasture, shrub vegetation, herbaceous - Agricultural Land: Temporary crops (annual/seasonal), perennial crops (sugarcane, coffee, cocoa) - Urban Areas: Urban/built-up land and infrastructure - Water Bodies: Rivers, lakes, reservoirs, aquaculture - Non-vegetated: Mining, bare soil, rock outcrops - Other Categories: Cloud coverage, nodata
Key Applications: Deforestation monitoring, agricultural expansion, urbanization patterns, forest conservation
"mapbiomas_transition")Year-to-year changes in land cover including: - Deforestation: Conversion from forest to other uses - Reforestation: Conversion to forest types - Agricultural Transitions: Changes between crop types or from other uses to agriculture - Degradation: Forest to degraded/shrub vegetation - Regeneration: Abandoned areas returning to vegetation
Key Applications: Deforestation tracking, regeneration assessment, agricultural dynamics, conservation effectiveness
"mapbiomas_deforestation_regeneration")Specific focus on forest cover changes: - Deforestation: Permanent loss of forest cover - Forest Regeneration: Secondary forest regrowth and natural recovery - Degradation Signals: Forest areas showing stress indicators - Cumulative Deforestation: Total forest loss since baseline
Key Applications: Amazon monitoring, conservation evaluation, regeneration potential, climate impact studies
"mapbiomas_mining")Areas used for mining operations: - Active Mining: Currently operational extraction sites - Abandoned Mining: Previous mining areas, potentially available for rehabilitation - Mining Extent: Total area impacted by mining activities - Mining Types: Surface mining, artisanal mining, infrastructure
Key Applications: Environmental impact assessment, land degradation, restoration planning, environmental compliance
"mapbiomas_irrigation")Note: Temporarily unavailable - new collection coming soon
Previously included: - Irrigated Areas: Extent of irrigation in agricultural systems - Irrigation Type: Drip, center-pivot, flood irrigation - Crop Types Under Irrigation: Which crops receive irrigation
"mapbiomas_water")Note: Temporarily unavailable - new collection coming soon
Previously included: - Surface Water Extent: Permanent and seasonal water bodies - Water Type: Natural rivers/lakes vs. artificial reservoirs - Seasonal Variation: Dry vs. wet season water extent
"mapbiomas_fire")Areas affected by wildfires: - Burn Scars: Areas burned in recent fires - Fire Extent: Total area impacted per year - Fire Frequency: Repeated burning in same areas (indicating high fire risk) - Fire Season: Temporal pattern of fires
Key Applications: Fire monitoring, ecosystem vulnerability, climate impacts, fire management
Options:
"mapbiomas_cover": Land cover types"mapbiomas_transition": Changes in land cover"mapbiomas_deforestation_regeneration": Forest cover
changes"mapbiomas_mining": Mining areas"mapbiomas_irrigation": Irrigated areas (temporarily
unavailable)"mapbiomas_water": Water bodies (temporarily
unavailable)"mapbiomas_fire": Wildfire burn scarsTRUE: Data in original format from MapBiomasFALSE: Cleaned and standardized version (recommended
for analysis)"mapbiomas_cover": "municipality" or
"indigenous_land""mapbiomas_transition": "municipality"
or "biome" (faster)"mapbiomas_deforestation_regeneration":
"municipality" only"mapbiomas_mining": "indigenous_land"
or "municipality""mapbiomas_irrigation": "state" or
"biome""mapbiomas_water": "municipality",
"state", or "biome""mapbiomas_fire": "state" only"pt": Portuguese language labels"eng": English language# download treated MapBiomas land cover data by municipality
data <- load_mapbiomas(
dataset = "mapbiomas_cover",
raw_data = FALSE,
geo_level = "municipality",
language = "eng"
)
# download treated data on mining on indigenous lands
data <- load_mapbiomas(
dataset = "mapbiomas_mining",
raw_data = FALSE,
geo_level = "indigenous_land",
language = "eng"
)
# download treated wildfire burn scar data by state
data <- load_mapbiomas(
dataset = "mapbiomas_fire",
raw_data = FALSE,
geo_level = "state",
language = "eng"
)TerraClimate is a global climate and climatic water balance dataset developed by the Climatology Lab at University of California, Merced. This package provides access to TerraClimate data for Brazil and the Amazon region.
This dataset provides:
TerraClimate is essential for understanding climate variability, water availability, drought risk, agricultural potential, and climate change impacts across Brazil and the Amazon.
TerraClimate data is compiled by: - University of California Climatology Lab - Integration of satellite and ground-based observations - Validated against station networks - Downscaled to ~4km global resolution - Monthly temporal resolution with daily and subdaily estimates available
For more information, visit TerraClimate Project.
TerraClimate provides 13 main climate and water balance variables:
| Dataset | Code | Description | Units |
|---|---|---|---|
| max_temperature | tmax | Maximum 2-m Temperature | °C |
| min_temperature | tmin | Minimum 2-m Temperature | °C |
| wind_speed | ws | Wind Speed at 10-m | m/s |
| vapor_pressure_deficit | vpd | Vapor Pressure Deficit | kPa |
| vapor_pressure | vap | 2-m Vapor Pressure | kPa |
| snow_water_equivalent | swe | Snow Water Equivalent at End of Month | mm |
| shortwave_radiation_flux | srad | Downward Shortwave Radiation Flux | W/m² |
| soil_moisture | soil | Soil Moisture at End of Month | mm |
| runoff | q | Runoff | mm |
| precipitation | ppt | Accumulated Precipitation | mm |
| potential_evaporation | pet | Reference Evapotranspiration | mm |
| climatic_water_deficit | def | Climatic Water Deficit | mm |
| water_evaporation | aet | Actual Evapotranspiration | mm |
| palmer_drought_severity_index | PDSI | Palmer Drought Severity Index | unitless |
Selects which climate variable to download.
# Temperature and radiation
dataset = "max_temperature" # tmax
dataset = "min_temperature" # tmin
dataset = "shortwave_radiation_flux" # srad
# Water and moisture
dataset = "precipitation" # ppt
dataset = "potential_evaporation" # pet
dataset = "water_evaporation" # aet (actual evapotranspiration)
dataset = "soil_moisture" # soil
dataset = "runoff" # q
# Atmospheric variables
dataset = "wind_speed" # ws
dataset = "vapor_pressure" # vap
dataset = "vapor_pressure_deficit" # vpd
# Drought and composite indices
dataset = "climatic_water_deficit" # def
dataset = "palmer_drought_severity_index" # PDSI
dataset = "snow_water_equivalent" # sweControls data format returned.
TRUE: Returns raw raster data (NetCDF, SpatRaster
format)FALSE: Returns aggregated data (specific format depends
on configuration)raw_data = FALSE # logicalSpecifies which year(s) to download.
Available range: 1958 to present (most recent months have 2-3 month lag)
time_period = 2020 # single year
time_period = c(2010, 2020) # specific years
time_period = 2010:2020 # range of yearsRestricts geographic coverage to Legal Amazon region.
TRUE: Downloads only data for Legal Amazon region (much
smaller files)FALSE: Downloads data for all Brazil (larger
files)legal_amazon_only = TRUE # logicalRecommendation: Use TRUE to
significantly reduce download size for Amazon-focused research.
Output language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character stringData Size: TerraClimate raster data is substantial. Consider:
Recommendations: - Use
legal_amazon_only = TRUE to reduce size by ~95% - Download
single or 2-3 year periods rather than decades at once - Use high-speed
internet connection - Have at least 10-50 GB free disk space for
multi-year downloads
# download precipitation data for the Legal Amazon (2020)
precip <- load_climate(
dataset = "precipitation",
time_period = 2020,
legal_amazon_only = TRUE,
language = "eng"
)
# download maximum temperature for multiple years, all of Brazil
max_temp <- load_climate(
dataset = "max_temperature",
time_period = 2010:2012,
language = "eng"
)Validation: Validated against independent station networks
Uncertainty: Varies by region; higher in data-sparse areas
Interpolation: Satellite and station data combined and downscaled to 4km
Reliability: Generally excellent for temperature and precipitation; water balance variables have higher uncertainty
Due to large file size: - Aggregate early: Summarize by month/year quickly to reduce memory use - Legal Amazon only: Massive size reduction if working in Amazon region - Subset years: Download only years of interest rather than decades - Extract points: If doing point-based analysis, extract specific coordinates to simplify data - Use cloud computing: Consider cloud platforms (Google Earth Engine) for very large analyses
SEEG (Sistema de Estimativa de Emissões e Remoções de Gases de Efeito Estufa - System of Estimates of Emissions and Removals of Greenhouse Gases) is Brazil’s most comprehensive greenhouse gas emissions database developed by Observatório do Clima (Climate Observatory).
This dataset provides:
SEEG is the primary tool for understanding Brazil’s greenhouse gas emissions profile, tracking progress toward climate goals, identifying emission hotspots, and supporting climate policy.
SEEG emissions estimates are compiled using: - Government data from multiple agencies (MAPA, IBGE, ANP, etc.) - Satellite monitoring of deforestation and land use - International IPCC methodology standards - Peer-reviewed scientific research - Regular updates as new government data becomes available
For more information, visit SEEG Project and Observatório do Clima.
Complete greenhouse gas emissions across all sectors in one dataset.
raw_data = TRUEGreenhouse gas emissions from agriculture and livestock activities.
Emissions from energy production and consumption.
Emissions and removals from changes in forest cover and land use.
Emissions from manufacturing and industrial processes.
Emissions from waste management, landfills, and waste treatment.
The data provided is from SEEG’s Collection 9: - Time period: 2000-2018 - Methodology: Latest available when data was compiled - Quality: Peer-reviewed and validated - Revisions: May be updated in future SEEG collections as better data becomes available
Important: The complete SEEG dataset is quite large. When downloading: - Entire datasets are downloaded as single files; year selection is limited - A stable, high-speed internet connection is recommended - Downloads may take time depending on connection speed - Ensure sufficient disk space for storage
Selects which emission sector(s) to download.
dataset = "seeg" # All sectors (raw_data = TRUE only)
dataset = "seeg_farming" # Agriculture and livestock
dataset = "seeg_energy" # Energy sector
dataset = "seeg_land" # Land use changes
dataset = "seeg_industry" # Industrial processes
dataset = "seeg_residuals" # Waste and residualsControls whether to download original or processed data.
TRUE: Returns raw SEEG data format (more detailed)FALSE: Returns treated data with English variable names
and standardized formatraw_data = FALSE # logicalSpecifies geographic aggregation level.
"country": National total"state": State-level emissions (27 units)"municipality": All 5,570+ municipalitiesgeo_level = "state" # character stringOutput language for variable names and labels.
"pt": Portuguese"eng": Englishlanguage = "eng" # character stringNote on timing: Downloads may take considerable time due to file size.
# download raw SEEG data (all sectors) at the country level
# note: dataset = "seeg" only works with raw_data = TRUE
all_emissions <- load_seeg(
dataset = "seeg",
raw_data = TRUE,
geo_level = "country",
language = "eng"
)# download treated agricultural emissions at the state level
farming <- load_seeg(
dataset = "seeg_farming",
raw_data = FALSE,
geo_level = "state",
language = "eng"
)# download treated land use change emissions at the state level
land_use <- load_seeg(
dataset = "seeg_land",
raw_data = FALSE,
geo_level = "state",
language = "eng"
)# download treated energy emissions at the municipality level
energy <- load_seeg(
dataset = "seeg_energy",
raw_data = FALSE,
geo_level = "municipality",
language = "eng"
)# download treated industrial process emissions at the state level
industry <- load_seeg(
dataset = "seeg_industry",
raw_data = FALSE,
geo_level = "state",
language = "eng"
)# download treated waste emissions at the state level
residuals <- load_seeg(
dataset = "seeg_residuals",
raw_data = FALSE,
geo_level = "state",
language = "eng"
)SEEG includes all major anthropogenic emission sources: - Agriculture (livestock, crops, soil) - Energy (electricity, transport, heating) - Land use change (deforestation, afforestation) - Industrial processes (cement, chemicals, metals) - Waste (landfills, wastewater)
Estimates follow: - IPCC guidelines for national greenhouse gas inventories - Brazilian national inventory standards - International best practices - Transparent, documented assumptions
🔴 This function uses the googledrive package to
download data at the municipality level. In case of authentication
errors, see googledrive.
The Census of Agriculture (Censo Agropecuário) is Brazil’s comprehensive survey of agricultural establishments and activities, conducted by IBGE (Instituto Brasileiro de Geografia e Estatística). This census collects detailed information about:
The census provides critical data for agricultural policy, market research, and understanding the structure of Brazilian agriculture across regional and temporal dimensions.
Data is collected at multiple geographic levels: - Country
level: aggregate national statistics - State
level: disaggregated by Brazilian states - Municipality
level: available for select datasets (currently
"livestock_production")
Historical data spans from 1920 onwards, with different time series available for different datasets based on IBGE’s survey methodology evolution.
Provides comprehensive data on total agricultural land area and the number of agricultural properties.
Details how agricultural properties use their land (crop farming, pasture, forests, etc.).
Captures information about the agricultural workforce and mechanization levels.
Describes the tenure status of agricultural land (ownership, rental, partnership, etc.).
Details the number of livestock animals farmed by species and type.
Quantifies production volumes of animal-based products.
Provides detailed crop production data including area planted and volume produced.
Focuses specifically on temporary crops (annual crops that must be replanted each season).
Focuses on permanent crops (perennial crops that produce for multiple years).
Specialized dataset on bovine cattle production and related establishments.
Selects which dataset to download. See dataset descriptions above.
dataset = "agricultural_land_area" # character stringControls whether to download the original data or the processed/cleaned version.
TRUE: Returns raw data exactly as published by
IBGEFALSE: Returns treated data with standardized
formatting, variable names in English, and consistent unitsDefault behavior: Raw data typically requires more cleaning and interpretation, while treated data is ready for immediate analysis.
raw_data = FALSE # logicalSpecifies the geographic aggregation level.
"country": National aggregate"state": Disaggregated by Brazilian state"municipality": Available only for
"livestock_production" datasetgeo_level = "state" # character stringDefines which year(s) to download. Availability varies by dataset:
| Dataset | Available Years |
|---|---|
agricultural_land_area |
1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017 |
agricultural_area_use |
1970, 1975, 1980, 1985, 1995, 2006, 2017 |
agricultural_employees_tractors |
1970, 1975, 1980, 1985, 1995, 2006, 2017 |
agricultural_producer_condition |
1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017 |
animal_production |
1970, 1975, 1980, 1985, 1995, 2006, 2017 |
animal_products |
1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017 |
vegetable_production_area |
1920, 1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017 |
vegetable_production_temporary |
1970, 1975, 1980, 1985, 1995, 2006, 2017 |
vegetable_production_permanent |
1940, 1950, 1960, 1970, 1975, 1980, 1985, 1995, 2006, 2017 |
livestock_production |
2017 |
You can request a single year or a range of years:
time_period = 2006 # single year
time_period = c(1995, 2006) # multiple specific years
time_period = 1995:2006 # will select years within this range that are availableOutput language for variable names and labels.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated land area data at the country level in 2017
data <- load_censoagro(
dataset = "agricultural_land_area",
raw_data = FALSE,
geo_level = "country",
time_period = 2017,
language = "eng"
)
# download treated temporary crop data by state in 1995 in portuguese
data <- load_censoagro(
dataset = "vegetable_production_temporary",
raw_data = FALSE,
geo_level = "state",
time_period = 1995,
language = "pt"
)
# download municipality-level cattle data (only available for livestock_production)
data <- load_censoagro(
dataset = "livestock_production",
raw_data = FALSE,
geo_level = "municipality",
time_period = 2017,
language = "eng"
)raw_data = TRUE): Exactly as
published by IBGE, with original formatting and Portuguese variable
namesraw_data = FALSE):
Cleaned and standardized with English variable names, consistent units
(hectares for area, kilograms for production quantities), and NA values
properly handledWhen using treated data, the output is typically in long format with one row per observation unit, containing: - Geographic identifiers (state, municipality if applicable) - Year of the census - Product/category names (crop type, animal species, etc.) - Quantitative measurements (area, quantity, count) - Number of establishments/properties
livestock_production in 2017When using this data in research or publications, cite:
IBGE - Instituto Brasileiro de Geografia e Estatística. Censo Agropecuário. Available at: https://sidra.ibge.gov.br/pesquisa/censo-agropecuario
The Amazon Social Progress Index (IPS) is a comprehensive indicator framework that measures social and environmental progress in the Legal Amazon region. This collaborative initiative combines:
This dataset captures:
The IPS provides a holistic view of sustainable development, moving beyond simple economic measures (GDP) to encompass environmental sustainability and social well-being.
The Social Progress Index: - Based on 50+ individual indicators across 12 domains - Uses data from government agencies, NGOs, and research institutions - Aggregated into 3 main dimensions and 12 subdimensions - Indexed to 0-100 scale for comparability - Methodologically rigorous with transparent weighting
For detailed methodology, visit Social Progress Imperative.
The IPS framework includes 8 main dataset options:
Complete Social Progress Index with all dimensions and indicators.
Indicators related to quality of life and well-being.
Sanitation and habitat indicators.
Public safety and violence indicators.
Education and literacy indicators.
Communication and connectivity indicators.
Health and mortality indicators.
Environmental and deforestation indicators.
Selects which dimension(s) to download.
dataset = "all" # All dimensions
dataset = "life_quality" # Quality of life metrics
dataset = "sanit_habit" # Sanitation and habitat
dataset = "violence" # Public safety and violence
dataset = "educ" # Education indicators
dataset = "communic" # Communication and connectivity
dataset = "mortality" # Health and mortality
dataset = "deforest" # Environmental and deforestationControls whether to download original or processed data.
TRUE: Returns raw data exactly as publishedFALSE: Returns treated data with standardized English
variable names and formattingraw_data = FALSE # logicalSpecifies which assessment year(s) to download.
Available years: 2014, 2018, 2021, 2023
time_period = 2023 # Most recent
time_period = c(2018, 2023) # Specific years
time_period = c(2014, 2018, 2021, 2023) # Multiple yearsOutput language for variable names and labels.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download raw IPS data from 2014
data <- load_ips(
dataset = "all",
raw_data = TRUE,
time_period = 2014,
language = "eng"
)
# download treated deforestation IPS data from 2018 in portuguese
data <- load_ips(
dataset = "deforest",
raw_data = FALSE,
time_period = 2018,
language = "pt"
)Each dimension contains multiple indicators: - Life
quality: 4-6 indicators - Sanitation/habitat:
3-5 indicators
- Violence: 3-4 indicators -
Education: 3-4 indicators -
Communication: 2-3 indicators -
Mortality: 3-4 indicators -
Deforestation: 2-3 indicators
(Exact number varies by year and methodology)
When comparing across years (2014, 2018, 2021, 2023): - Methodology may have evolved between assessments - New indicators may have been added - Some municipalities may not have data in all years - Use caution comparing very old (2014) with recent (2023) data
na.rm = TRUE in aggregations to handle missing
valuesData from the Institute of Environment and Water Resources (Instituto de Energia e Meio Ambiente - IEMA), documenting electric energy access across the Amazon region. This dataset provides comprehensive information on populations without access to electric energy throughout the Legal Amazon in 2018, offering critical insights into energy poverty and infrastructure gaps in the region.
The IEMA dataset is particularly valuable for understanding energy access disparities in the Amazon, including remote communities, indigenous territories, and areas with limited infrastructure development.
The IEMA dataset includes:
The Legal Amazon encompasses: - 9 Full States: Amazonas, Roraima, Acre, Amazonas, Rondônia, Mato Grosso, Amapá, Pará, Maranhão - Partial Coverage: Parts of Maranhão and other states - Total Area: Approximately 5.5 million square kilometers - Population Focus: Amazon-dwelling populations with particular attention to vulnerable groups
The dataset addresses multiple aspects of energy poverty: - Access Type: Grid connection vs. off-grid solutions - Reliability: Service quality and availability - Affordability: Connection costs and monthly tariffs - Geographic Barriers: Remote location and access challenges - Population Characteristics: Indigenous communities, rural settlements, urban poor
Options:
dataset: "iema"
raw_data:
TRUE: Data in original format from IEMAFALSE: Cleaned and standardized versionlanguage:
"pt": Portuguese language"eng": English languageSince this dataset captures a single year (2018), it represents a snapshot rather than a time series. The cross-sectional nature means: - No Trend Analysis: Cannot track changes over time without merging with other sources - Temporal Stability: Conditions may have changed since 2018 - Recent Updates: Users may need to contact IEMA for more recent data
# download treated IEMA energy access data
data <- load_iema(
raw_data = FALSE,
language = "eng"
)🔴 This function uses the googledrive package to
download data. In case of authentication errors, see googledrive.
Population dataset provides Brazil’s official population statistics from IBGE (Brazilian Institute of Geography and Statistics). This dataset includes:
Population data is fundamental to many analyses, providing denominators for per capita metrics, understanding urbanization patterns, tracking demographic trends, and supporting policy and development planning.
Population data comes from: - Census years (2007, 2010): Actual population enumeration - Estimation years (2001-2006, 2008-2009, 2011-2021): IBGE official population projections based on census data and vital statistics - Methodology: Cohort-component methodology considering births, deaths, migration - Updates: Revised annually as new vital statistics data becomes available
For more information, visit IBGE Population Estimates.
Official Brazilian population statistics and estimates.
| Year Type | Years | Nature |
|---|---|---|
| Census | 2007, 2010 | Actual population count |
| Estimates | 2001-2006, 2008-2009, 2011-2021 | Official IBGE projections |
Census years provide actual counts; other years are official projections.
Only one dataset is available:
dataset = "population" # Population statisticsControls whether to download original or processed data.
TRUE: Returns raw IBGE data formatFALSE: Returns treated data with English variable names
and standardized formatraw_data = FALSE # logicalSpecifies geographic aggregation level.
"country": National total"state": State-level population (27 units)"municipality": All 5,570+ municipalitiesgeo_level = "municipality" # character stringSpecifies which year(s) to download.
Available years: 2001-2006, 2007 (census), 2008-2009, 2010 (census), 2011-2021
time_period = 2020 # single year
time_period = c(2010, 2020) # specific years (includes 2010 census)
time_period = 2010:2020 # range of yearsOutput language for variable names.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated population data at the state level for 2021
pop_2021 <- load_population(
dataset = "population",
raw_data = FALSE,
geo_level = "state",
time_period = 2021,
language = "eng"
)# download treated population data at the state level for 2010 to 2021
pop_series <- load_population(
dataset = "population",
raw_data = FALSE,
geo_level = "state",
time_period = 2010:2021,
language = "eng"
)# download treated population data at the municipality level for 2020
pop_munic <- load_population(
dataset = "population",
raw_data = FALSE,
geo_level = "municipality",
time_period = 2020,
language = "eng"
)Each record contains: - Geographic identifier (state or municipality name/code) - Year - Population count
Simple and straightforward structure, useful as base for other calculations.
COMEX (Comércio Exterior - Foreign Trade) dataset provides Brazil’s official international trade statistics extracted from Siscomex, the Integrated System of Foreign Trade maintained by the Brazilian government.
This dataset captures:
COMEX is the primary official source for Brazil’s international trade statistics, widely used for trade policy analysis, business intelligence, academic research, and economic monitoring.
COMEX data comes from: - Official records from Siscomex (Brazil’s foreign trade system) - Mandatory declarations by exporters and importers - Updated monthly with current month data - Historical data from 1989 onwards
Important note on nomenclature: From 1989 to 1996, Brazil used a different system of product nomenclature (NBLC - Nomenclatura Brasileira de Mercadorias). All conversions to the current nomenclature system are available and the package handles this transparently.
For more information, visit the Brazilian Ministry of Productivity, Employment and Foreign Trade.
Export data disaggregated at the municipality level.
Import data disaggregated at the municipality level.
Export data organized by producer/exporter and product.
Import data organized by importer/distributor and product.
Selects which trade dataset to download.
dataset = "export_mun" # exports by municipality
dataset = "import_mun" # imports by municipality
dataset = "export_prod" # exports by producer/exporter
dataset = "import_prod" # imports by producer/importerControls whether to download the original data or the processed/cleaned version.
TRUE: Returns raw data exactly as published by
SiscomexFALSE: Returns treated data with standardized
formatting, English variable names, and cleaned valuesraw_data = FALSE # logicalSpecifies which year(s) to download. Available from 1989 onwards.
time_period = 2020 # single year
time_period = c(2018, 2020) # specific years
time_period = 2015:2020 # range of yearsNote: Monthly data means each year can be quite large. Consider downloading specific years or ranges to manage file size.
Output language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated exports data by municipality from 2020 to 2021
data <- load_br_trade(
dataset = "export_mun",
raw_data = FALSE,
time_period = 2020:2021,
language = "eng"
)
# download treated imports data by municipality from 2020 to 2021
data <- load_br_trade(
dataset = "import_mun",
raw_data = FALSE,
time_period = 2020:2021,
language = "eng"
)raw_data = TRUE): Original
Siscomex format, potentially with inconsistencies and naming conventions
from different time periodsraw_data = FALSE):
Standardized with English variable names, consistent units (USD for
values), and cleaned formattingWhen using data spanning 1989-1996 to 1997 onwards, be aware: - Product categories may differ between nomenclature systems - Conversions are available but not always 1:1 - Compare very old with recent data with caution
BACI (Base pour l’Analyse du Commerce International) is a comprehensive database of bilateral trade flows developed by CEPII (Centre d’Études Prospectives et d’Informations Internationales), a leading French research center on international trade.
This dataset provides:
The BACI dataset is widely used in academic research, policy analysis, and international trade studies due to its comprehensive coverage and quality control procedures.
BACI is constructed from UN Comtrade data with significant processing: - Reconciliation of discrepancies between reported imports and exports - Imputation of missing values using econometric techniques - Classification using the Harmonized System (HS) nomenclature - All values converted to USD for comparability
For detailed methodological information, visit the CEPII BACI documentation.
The HS92 classification provides trade data organized under the 1992 version of the Harmonized System nomenclature.
Important: The BACI dataset is very large. All data is packaged in a single compressed file on the CEPII server. Even if you only need data for specific years, the entire dataset must be downloaded first, then processed locally.
Recommendation: Plan your download during off-peak hours or use a stable, high-speed connection.
Currently only one dataset is available:
dataset = "HS92" # Harmonized System 1992 classificationControls whether to download the original data or the processed/cleaned version.
TRUE: Returns raw CEPII data with original formatting
and column namesFALSE: Returns treated data with standardized English
variable names and cleaned formattingraw_data = FALSE # logicalSpecifies which year(s) to download. You can request single or multiple years.
time_period = 2016 # single year
time_period = c(2010, 2015) # multiple specific years
time_period = 2010:2020 # range (will select available years)Output language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated trade data for 2016 (HS92 classification)
# Warning: large download, may take a long time
trade_2016 <- load_baci(
dataset = "HS92",
raw_data = FALSE,
time_period = 2016,
language = "eng"
)raw_data = TRUE): CEPII’s
original format with variable codes and potentially non-standard
namingraw_data = FALSE):
Standardized with English variable names, consistent column formatting,
and ready for immediate analysisEach row typically represents a trade flow with: - Exporter: Country code (ISO 3-letter code) - Importer: Country code (ISO 3-letter code) - Year: Calendar year of the trade flow - Product code: HS 6-digit classification - Product name: Description of the product - Value: Trade value in USD - Quantity: Physical quantity (where available)
PIBMUNIC (Produto Interno Bruto por Município - Gross Domestic Product by Municipality) is Brazil’s official municipal-level GDP data produced by IBGE. This dataset provides:
PIBMUNIC is essential for understanding Brazil’s regional economic structure, identifying economic disparities, analyzing sectoral specialization, and assessing economic development across municipalities.
PIBMUNIC data is compiled by IBGE using: - Data from CEMPRE (firm registry) for employment and output - Production and consumption surveys - Tax and financial records - Trade and services data - National accounts framework aligned with international standards
For more information, visit IBGE National Accounts.
Complete municipal GDP statistics with sectoral detail.
Only one dataset is available:
dataset = "pibmunic" # Municipal GDP dataControls whether to download original or processed data.
TRUE: Returns raw IBGE formatFALSE: Returns treated data with English variable names
and standardized formattingraw_data = FALSE # logicalSpecifies geographic aggregation level.
"country": National aggregate"state": State-level aggregation"municipality": Most detailed level with all
municipalitiesgeo_level = "municipality" # character stringSpecifies which year(s) to download.
time_period = 2020 # single year
time_period = c(2015, 2020) # specific years
time_period = 2015:2020 # range of yearsOutput language for variable names.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated municipal GDP data for 2020
pib_munic <- load_pibmunic(
dataset = "pibmunic",
raw_data = FALSE,
geo_level = "municipality",
time_period = 2020,
language = "eng"
)# download treated state-level GDP data for 2015 to 2020
pib_state <- load_pibmunic(
dataset = "pibmunic",
raw_data = FALSE,
geo_level = "state",
time_period = 2015:2020,
language = "eng"
)# download treated country-level GDP data for 2010 to 2020 in Portuguese
pib_br <- load_pibmunic(
dataset = "pibmunic",
raw_data = FALSE,
geo_level = "country",
time_period = 2010:2020,
language = "pt"
)Typical variables include: - gdp: Gross Domestic Product at current prices (R$) - value_added_agriculture: Farming and forestry sector - value_added_industry: Manufacturing and construction - value_added_services: Commerce, finance, transport, etc. - value_added_public_admin: Government services - net_taxes: Taxes minus subsidies - Additional detail depends on specific IBGE release
Employment, salary and firm data from IBGE’s Cadastro Central de Empresas (CEMPRE). This comprehensive dataset provides information on companies and other organizations registered with Brazil’s tax authority (Receita Federal), including employment levels, wage information, and business establishment data across Brazilian municipalities and sectors.
The CEMPRE dataset is one of the most detailed sources of firm-level data in Brazil, covering virtually all formal enterprises and organizations operating in the country.
The CEMPRE dataset includes:
The data is available at three different aggregation levels: - Country Level: Aggregate statistics for all of Brazil - State Level: Data aggregated by state (27 units) - Municipality Level: Data disaggregated to municipality level (5,570+ municipalities)
Data can be retrieved with sector disaggregation or aggregate form: - Sectoral Disaggregation: Detailed breakdown by CNAE 2.0 (main divisions and subdivisions) - Aggregate: Total across all sectors
Options:
dataset: "cempre"
raw_data:
TRUE: Returns the data in its original format from
IBGEFALSE: Returns cleaned and standardized datageo_level:
"country": National aggregate"state": Aggregated by state"municipality": Disaggregated to municipality level
(detailed results)time_period: Specifies the years for which data
will be downloaded (e.g., 2010:2020 for 2010 through
2020)
language:
"pt": Portuguese language (variable names and
labels)"eng": English languagesectors:
TRUE: Data is returned separated and disaggregated by
economic sector (CNAE)FALSE: Data is aggregated across all sectors# download raw data at the country level from 2008 to 2010
data <- load_cempre(
raw_data = TRUE,
geo_level = "country",
time_period = 2008:2010,
language = "eng"
)
# download treated state-level data split by sector in portuguese
data <- load_cempre(
raw_data = FALSE,
geo_level = "state",
time_period = 2008:2010,
language = "pt",
sectors = TRUE
)Municipal Agricultural Production (PAM, in Portuguese) is a nationwide annual survey conducted by IBGE (Brazilian Institute of Geography and Statistics) which provides information on agricultural products, such as quantity produced, area planted and harvested, average quantity of output and monetary value of such output. The products are divided in permanent and temporary farmed land, as well as dedicated surveys to the four products that yield multiple harvests a year (beans, potato, peanut and corn), which all sum to a total survey of 64 agricultural products (31 of temporary tillage and 33 of permanent tillage). Output, however, is only included in the dataset if the planted area occupies over 1 acre or if output exceeds one tonne.
Permanent farming is characterized by a cycle of long duration, whose harvests may be done multiple times across the years without the need of planting seeds again. Temporary farming, on the other hand, consists of cycles of short and medium duration, which after harvesting require planting seeds again.
The data also has multiple aggregation levels, such as nationwide, by region, mesoregion and microregion, as well as state and municipality.
The data available has a yearly frequency and is available from 1974 to the present, with the exception of the four multiple-harvest products, which are only available from 2003. More information can be found on this link (only in Portuguese).
Options:
dataset: See tables below
raw_data: there are two options:
TRUE: if you want the data as it is originally.FALSE: if you want the treated version of the
data.geo_level: "country",
"region", "state", or
"municipality"
time_period: picks the years for which the data will be downloaded
language: you can choose between Portuguese
("pt") and English ("eng")
The datasets supported are shown in the tables below, made up of both the original databases and their narrower subsets. Note that downloading only specific crops is considerably faster.
| dataset |
|---|
| all_crops |
| temporary_crops |
| permanent_crops |
| corn |
| potato |
| peanut |
| beans |
| dataset | Name (pt) | Name (eng) |
|---|---|---|
| total_temporary | Total | Total |
| abacaxi | Abacaxi | Pineapple |
| alfafa | Alfafa Fenada | Alfafa Fenada |
| alho | Alho | Garlic |
| algodao_herbaceo | Algodao Herbaceo (em Caroco) | Herbaceous Cotton (in Caroco) |
| amendoim_temporary | Amendoim (em Casca) | Peanuts (in Shell) |
| arroz | Arroz (em Casca) | Rice (in husk) |
| aveia | Aveia (em Grao) | Oats (in grain) |
| batata_doce | Batata Doce | Sweet potato |
| batata_inglesa | Batata Inglesa | English potato |
| cana_de_acucar | Cana de Acucar | Sugar cane |
| cana_para_forragem | Cana para Forragem | Forage cane |
| castor_bean | Mamona (Baga) | Castor bean (Berry) |
| cebola | Cebola | Onion |
| cevada | Cevada (em Grao) | Barley (in Grain) |
| ervilha | Ervilha (em Grao) | Pea (in Grain) |
| fava | Fava (em Grao) | Broad Bean (in Grain) |
| feijao_temporary | Feijao (em Grao) | Beans (in Grain) |
| fumo | Fumo (em Folha) | Smoke (in Sheet) |
| girassol_sementes | Girassol (em Grao) | Sunflower (in Grain) |
| juta_fibra | Juta (Fibra) | Jute (Fiber) |
| linho_sementes | Linho (Semente) | Linen (Seed) |
| malva_fibra | Malva (Fibra) | Malva (Fiber) |
| mandioca | Mandioca | Cassava |
| melancia | Melancia | watermelon |
| melao | Melao | Melon |
| milho_temporary | Milho (em Grao) | corn (in grain) |
| rami_fibra | Rami (Fibra) | Ramie (Fiber) |
| rye | Centeio (em Grao) | Rye (in grain) |
| soja | Soja (em Grao) | Soybean (in grain) |
| sorgo | Sorgo (em Grao) | Sorghum (in Grain) |
| tomate | Tomate | Tomato |
| trigo | Trigo (em Grao) | Wheat in grain) |
| triticale | Triticale (em Grao) | Triticale (in grain) |
| dataset | Name (pt) | Name (eng) |
|---|---|---|
| acai | Acai | Acai |
| annatto_seeds | Urucum (Semente) | Annatto (Seed) |
| apple | Maca | Apple |
| avocado | Abacate | Avocado |
| banana | Banana (Cacho) | Banana (Bunch) |
| black_pepper | Pimenta do Reino | Black pepper |
| cashew | Caju | Cashew |
| cashew_nut | Castanha de Caju | Cashew Nuts |
| cocoa_beans | Cacau (em Amendoa) | Cocoa (in Almonds) |
| coffee_arabica | Cafe (em Grao) Arabica | Cafe (in Grao) Arabica |
| coffee_canephora | Cafe (em Grao) Canephora | Cafe (in Grain) Canephora |
| coffee_total | Cafe (em Grao) Total | Coffee (in Grain) Total |
| coconut | Coco da Baia | Coconut |
| coconut_bunch | Dende (Cacho de Coco) | Coconut Bunch |
| cotton_arboreo | Algodao Arboreo (em Caroco) | Arboreo cotton (in Caroco) |
| fig | Figo | Fig |
| grape | Uva | Grape |
| guarana_seeds | Guarana (Semente) | Guarana (Seed) |
| guava | Goiaba | Guava |
| heart_of_palm | Palmito | Palm heart |
| india_tea | Cha da India (Folha Verde) | India Tea (Leaf) |
| khaki | Caqui | Khaki |
| lemon | Limao | Lemon |
| mango | Manga | Mango |
| papaya | Mamao | Papaya |
| passion_fruit | Maracuja | Passion fruit |
| peach | Pessego | Peach |
| pear | Pera | Pear |
| permanent_total | Total | Total |
| quince | Marmelo | Quince |
| rubber_coagulated_latex | Borracha (Latex Coagulado) | Rubber (Coagulated Latex) |
| rubber_liquid_latex | Borracha (Latex Liquido) | Rubber (Liquid Latex) |
| sisal_or_agave | Sisal ou Agave (Fibra) | Sisal or Agave (Fiber) |
| tangerine | Tangerina | Tangerine |
| tung | Tungue (Fruto Seco) | Tung (Dry Fruit) |
| walnut | Noz (Fruto Seco) | Walnut (Dry Fruit) |
| yerba_mate | Erva Mate (Folha Verde) | Mate Herb (Leaf) |
Examples:
# download treated data at the state level from 2010 to 2011 for all crops
data <- load_pam(
dataset = "all_crops",
raw_data = FALSE,
geo_level = "state",
time_period = 2010:2011,
language = "eng"
)PEVS (Produção da Silvicultura e da Extração Vegetal - Silviculture and Forestry Extraction Production) is a comprehensive annual survey conducted by IBGE that collects data on forestry and related activities in Brazil.
This dataset provides:
PEVS is Brazil’s primary source for forestry production statistics, important for understanding the timber industry, forest management practices, and sustainable use of forest resources.
PEVS data comes from: - Direct surveys of forestry companies and producers - Administrative records from forestry operations - Compiled and validated by IBGE’s agriculture statistics division - Annual release with data for previous year - Covers both industrial and subsistence-level forestry activities
For more information, visit IBGE Agriculture Statistics.
Production data from forest crop plantations (timber and non-timber products).
Data on silviculture activities including afforestation, reforestation, and forest management.
Total existing area used for silviculture operations, disaggregated by forest species.
Selects which forestry dataset to download.
dataset = "pevs_forest_crops" # Forest crop production (1986-2019)
dataset = "pevs_silviculture" # Silviculture production (1986-2019)
dataset = "pevs_silviculture_area" # Silviculture land area (2013-2019)Controls whether to download original or processed data.
TRUE: Returns raw IBGE data with original Portuguese
variable namesFALSE: Returns treated data with English variable
names, standardized units, and cleaned formattingraw_data = FALSE # logicalSpecifies the geographic aggregation level.
"country": National aggregate"region": Brazilian geographic regions (North,
Northeast, Center-West, Southeast, South)"state": State-level data (27 units)"municipality": Most granular level with 5,570+
municipalitiesgeo_level = "state" # character stringDefines which year(s) to download.
Important: Dataset-specific availability:
| Dataset | Available Years |
|---|---|
pevs_forest_crops |
1986-2019 |
pevs_silviculture |
1986-2019 |
pevs_silviculture_area |
2013-2019 |
time_period = 2019 # single year
time_period = c(2010, 2015) # specific years
time_period = 2010:2019 # range of yearsOutput language for variable names.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated forest crops data at the state level for 2019
forest_crops <- load_pevs(
dataset = "pevs_forest_crops",
raw_data = FALSE,
geo_level = "state",
time_period = 2019,
language = "eng"
)# download treated silviculture area data at the state level for 2013 to 2019
silvi_area <- load_pevs(
dataset = "pevs_silviculture_area",
raw_data = FALSE,
geo_level = "state",
time_period = 2013:2019,
language = "eng"
)# download treated silviculture production data at the region level for 2019
silvi_prod <- load_pevs(
dataset = "pevs_silviculture",
raw_data = FALSE,
geo_level = "region",
time_period = 2019,
language = "eng"
)Each row typically represents: - A geographic unit (country, region, state, or municipality) - A specific year - A product type or forest species - Quantity produced (in appropriate units: cubic meters, tons, etc.) - Value of production (in currency units)
Forest crops include: - Timber species: Eucalyptus, pine, other timber - Non-timber products: Charcoal, resins, turpentine, cork, bark - Other forest products: Tannin, rosin, pulpwood
Exact categories vary by dataset and year.
raw_data = TRUE): IBGE
original format, Portuguese variable namesraw_data = FALSE):
English names, standardized units, cleaned formattingPPM (Pesquisa da Pecuária Municipal - Municipal Livestock Survey) is Brazil’s comprehensive annual survey of livestock activities conducted by IBGE. This dataset provides:
PPM is the primary data source for understanding Brazil’s livestock sector, which is economically significant and globally important for beef, poultry, and dairy exports.
PPM data is compiled from: - Direct surveys of livestock producers - Agricultural censuses and administrative records - Municipal agriculture secretariats - Processed and validated by IBGE - Annual release with data for reference year
For more information, visit IBGE Livestock Statistics.
Total livestock herds disaggregated by animal species.
Specialized data on sheep production and wool/fleece harvest.
Production of animal-based products (milk, eggs, honey, wool, etc.).
Detailed dairy cow farming data with milking and productivity metrics.
Aquaculture activities including fish, shrimp, and mollusk farming.
Selects which livestock/animal production dataset to download.
dataset = "ppm_livestock_inventory" # Animal populations by species
dataset = "ppm_sheep_farming" # Sheep and wool production
dataset = "ppm_animal_origin_production" # Milk, eggs, honey, wool
dataset = "ppm_cow_farming" # Dairy cow productivity
dataset = "ppm_aquaculture" # Fish and aquaculture productionControls whether to download original or processed data.
TRUE: Returns raw IBGE formatFALSE: Returns treated data with English variable names
and standardized unitsraw_data = FALSE # logicalSpecifies geographic aggregation level.
"country": National aggregate"region": Brazilian geographic regions (5 regions)"state": State-level data (27 units)"municipality": All 5,570+ municipalitiesgeo_level = "state" # character stringSpecifies which year(s) to download.
time_period = 2020 # single year
time_period = c(2010, 2020) # specific years
time_period = 2010:2020 # range of yearsNote: All datasets available from 1974 onwards, though aquaculture more complete from 2000s.
Output language for variable names.
"pt": Portuguese"eng": Englishlanguage = "eng" # character string# download treated livestock inventory data at the state level for 2020
livestock <- load_ppm(
dataset = "ppm_livestock_inventory",
raw_data = FALSE,
geo_level = "state",
time_period = 2020,
language = "eng"
)# download treated dairy cow data at the state level for 2020
dairy <- load_ppm(
dataset = "ppm_cow_farming",
raw_data = FALSE,
geo_level = "state",
time_period = 2020,
language = "eng"
)# download treated animal origin production data at the country level for 2020
animal_products <- load_ppm(
dataset = "ppm_animal_origin_production",
raw_data = FALSE,
geo_level = "country",
time_period = 2020,
language = "eng"
)# download treated sheep farming data at the state level for 2020
sheep <- load_ppm(
dataset = "ppm_sheep_farming",
raw_data = FALSE,
geo_level = "state",
time_period = 2020,
language = "eng"
)# download treated aquaculture data at the state level for 2015 to 2020
aquaculture <- load_ppm(
dataset = "ppm_aquaculture",
raw_data = FALSE,
geo_level = "state",
time_period = 2015:2020,
language = "eng"
)Each record typically contains: - Geographic identifier (state or municipality) - Year - Animal species or product type - Quantity (number of animals or production volume) - Value (if applicable) - Number of establishments
SIGMINE (Sistema de Informações Geográficas da Mineração - Mining Geographic Information System) is Brazil’s official mining registry maintained by the National Mining Agency (ANM), the regulatory body that succeeded the former National Department of Mineral Production (DNPM).
This dataset provides:
SIGMINE data is critical for understanding Brazil’s mining sector impact on the environment, economy, and land use, particularly in sensitive regions like the Amazon.
SIGMINE data comes from: - Brazilian mining authorization and regulatory system - Geological Service of Brazil (SGB-CPRM) - Environmental licensing agencies - Updated continuously as mining permits are issued/revoked - Publicly accessible through government databases
For more information, visit the National Mining Agency (ANM) and mining databases.
Comprehensive registry of legally operating mining projects in Brazil.
Only one dataset is available:
dataset = "sigmine_active" # Active mining operationsControls whether to download original or processed data.
TRUE: Returns raw SIGMINE/DNPM data format with
original codingFALSE: Returns treated data with English variable names
and standardized formattingraw_data = FALSE # logicalOutput language for variable names and documentation.
"pt": Portuguese"eng": Englishlanguage = "eng" # character stringSIGMINE data typically includes:
# download treated active mining data in portuguese
mining_active <- load_sigmine(
dataset = "sigmine_active",
raw_data = FALSE,
language = "pt"
)# download treated active mining data in english
mining_eng <- load_sigmine(
dataset = "sigmine_active",
raw_data = FALSE,
language = "eng"
)SIGMINE tracks mines at various stages: - Active: Currently operating, producing minerals - Suspended: Temporarily halted operations - Recovery: Reclamation and environmental restoration phase - Other: Various other regulatory statuses
SIGMINE includes: - Precious metals: Gold, silver, platinum - Industrial minerals: Iron, copper, aluminum, tin, others - Energy minerals: Coal, uranium - Gemstones: Diamonds, emeralds, others - Construction minerals: Sand, gravel, granite
Loads data from the National Electrical Energy Agency (ANEEL), a Brazilian independent federal agency linked to the Ministry of Mines and Energy (MME). ANEEL works to provide favorable conditions for the Electrical Energy Market to develop with balance and for the benefit of society.
As for now, there are three different datasets available for download: Energy Development Budget, Energy Generation, and Energy Enterprises Distributed.
The Energy Development Budget dataset showcases the Energy Development Account’s (CDE) annual budget expenses. The CDE is designed to promote the Brazilian energy development and is managed by the Electrical Energy Commercialization Chamber (CCEE).
In the current implementation, data is available from 2017 to 2022 and must be downloaded by year. The year argument can be a single year or a vector of years. The dataset includes the type of expense, its value in R$ (Reais), and its share over the total amount of CDE budget expenses in each year.*.
*Note that ‘share_of_total’ values sum to 1 for each year available.
The Energy Generation dataset showcases information about ANEEL’s Generation Informations System (SIGA). SIGA provides information about the Brazilian electrical energy generation installed capacity.
The dataset provides information at the individual venture/entity level. It contains information about the power, source, stage, type of permission, origin and final fuel with which each venture/entity operates, as well as other legal, technical and geographical information.* Operation start dates contained in the dataset go as far back as 1924 up to 2022.
* For more details on each variable, access This link and select “Manual do Usuario”.
The Energy Enterprises dataset showcases information about distributed micro and mini generators, covered by the Regulatory Resolution nº 482/2012. The list of projects is classified by variables that make up their identification, namely: connected distributor, project code, numerical nucleus of the project code, owner name, production class, subgroup, name of the owner, number of consumer units that receive credits, connection date, type of generating unit, source, installed power, municipality, and federative unit where it is located.
The data is expressed in quantities and installed power in kW (kilowatt). The quantity corresponds to the number of distributed micro or mini generators installed in the specified period. The installed power is defined by the sum of the nominal active electric power of the generating units.
* For more details on each variable, access This link and select “Dicionário de dados”.
Options:
"energy_development_budget": government spending
towards energy sources"energy_generation": energy generation by
entity/corporation"energy_enterprises_distributed": distributed micro and
mini generatorsTRUE: if you want the data as it is originally.FALSE: if you want the treated version of the
data.("pt") and English ("eng")Examples:
# download treated data about energy generation
clean_aneel <- load_aneel(
dataset = "energy_generation",
raw_data = FALSE
)
# download raw CDE data for one year
budget_aneel_2019 <- load_aneel(
dataset = "energy_development_budget",
raw_data = TRUE,
year = 2019
)
# download raw CDE data for multiple years
budget_aneel_multi <- load_aneel(
dataset = "energy_development_budget",
raw_data = TRUE,
year = c(2020, 2021)
)Loads data from the Energy Research Company (EPE), a Brazilian public company that works closely with the Brazilian Ministry of Mines and Energy (MME) and other agencies to ensure the sustainable development of Brazil’s energy infrastructure. EPE’s duty on that mission is to support MME with quality research and studies in order to aid Brazil’s energy infrastructure planning.
As for now, there are four different datasets available for download: Consumer Energy Consumption, Industrial Energy Consumption, the National Energy Balance, and the State Energy Production Panel. All of them were obtained from the EPE website.
The Consumer Energy Consumption dataset provides monthly data from 2004 to 2025 about energy consumption and number of consumers. The data is organized by State, Region, or Electric Subsystem, and is broken down by class of service and type of consumer.
The available classes are: Residential, Commercial, Industrial, Rural, and Others. For each observation, the dataset reports the type of consumer (Captive or Free), total consumption in megawatt-hours (MWh), and the number of consumers.
When using the Subsystem or Region level, consumer totals are provided but are not disaggregated for all classes and consumer types.
The Industrial Energy Consumption dataset provides monthly data from 2004 to 2025 on energy consumption by industrial sector. Data is available at the State or Subsystem level. Each observation identifies the industrial sector responsible for the consumption and the amount consumed in megawatt-hours (MWh).
The National Energy Balance is a thorough and extensive research developed and published by EPE that contains useful data about energy production, consumption, imports, exports, transformation, and final use.
The processed dataset provides yearly data from 2003 to 2023. It covers all Brazilian energy sources (such as petróleo, gás natural, carvão, eletricidade, lenha, solar and others) and distinguishes between different types of energy flow: production, transformation, final consumption, losses, and adjustments.
Each energy source appears as a separate column in the original spreadsheets. The cleaned data is returned in long format, with one row per combination of year, energy source, and account type. The account type is labeled to indicate whether it refers to production, transformation (for example, “TRANSFORMAÇÃO – REFINARIAS DE PETRÓLEO”), or consumption (for example, “CONSUMO – RESIDENCIAL”).
The State Energy Production Panel provides yearly data from 2011 to 2024 on electricity generation by energy source and Brazilian state. The data is sourced from EPE’s BEN Chapter 8 (Dados Estaduais) and covers all 27 states, including the Federal District.
Each row corresponds to one state-year combination. The dataset includes production by source (hydro, wind, solar, nuclear, thermal, sugar cane, firewood, black liquor, other renewables, steam coal, natural gas, coke oven gas, fuel oil, diesel, and other non-renewables), total production, and an indicator of whether the state belongs to the Legal Amazon.
Options:
dataset: there are three choices:
"consumer_energy_consumption": monthly energy consumption
and consumers by State, Region or Electric Subsystem
"industrial_energy_consumption": monthly industrial energy
consumption by State or Subsystem
"national_energy_balance": yearly energy flow by account
and energy source
"energy_state_panel": yearly energy production by source
and state
raw_data: there are two options:
TRUE: if you want the data as it is originally.
FALSE: if you want the treated version of the
data.
geo_level: only applies to
"consumer_energy_consumption" and
"industrial_energy_consumption" datasets.
"state"
"subsystem"
language: you can choose between Portuguese
("pt") and English ("eng")
Examples:
# download treated data about consumer energy consumption at the state level
clean_epe <- load_epe(
dataset = "consumer_energy_consumption",
geo_level = "state",
raw_data = FALSE
)
# download treated data from the National Energy Balance
balance <- load_epe(
dataset = "national_energy_balance",
raw_data = FALSE
)
# download treated data on state energy production panel
state_panel <- load_epe(
dataset = "energy_state_panel",
raw_data = FALSE,
language = "eng"
)The municipalities dataset is a foundational reference
dataset included in the datazoom.amazonia package. It contains key
information about all Brazilian municipalities and identifies which are
located in (or overlapping with) the Legal Amazon region.
This dataset includes:
The municipalities dataset is essential infrastructure for this package—most package functions that return geographic data match results to this municipalities reference to enable Legal Amazon filtering and consistent geographic identification.
The municipalities dataset is compiled from: - IBGE: Brazilian Institute of Geography and Statistics official municipal data - Legal Amazon definition: Based on official Brazilian government definition - Spatial boundaries: IBGE 2019 municipal boundary shapefile - Maintained by: datazoom.amazonia package developers
The municipalities dataset contains the following information:
# Load Brazilian municipalities dataset
data <- datazoom.amazonia::municipalities
# Or after loading the package
library(datazoom.amazonia)
data <- municipalities
# View structure
str(municipalities)
head(municipalities)
# Filter for Legal Amazon municipalities only
amazon_municipalities <- municipalities %>%
filter(legal_amazon == TRUE)The dataset includes all 5,570+ Brazilian municipalities with: - Official IBGE identification codes - State and region classification - Legal Amazon flag for filtering - Spatial geometries for geographic analysis
Many analyses focus specifically on the Legal Amazon region. Use the municipalities dataset to identify relevant municipalities:
library(dplyr)
# Load any dataset with municipality information
data <- load_prodes(
dataset = "deforestation",
raw_data = FALSE,
geo_level = "municipality",
language = "eng"
)
# Filter to Legal Amazon using municipalities reference
amazon_data <- data %>%
inner_join(
municipalities %>%
filter(legal_amazon == TRUE) %>%
select(code, name),
by = c("municipality_code" = "code")
)Important: Some municipalities are only partially within the Legal Amazon.
For statistics reported at municipality level in this package: - Partial Amazon municipalities: Data is reported for only the Amazon-included portion - Identification: The municipalities dataset identifies these cases - Interpretation: When a municipality is partially in Amazon, reported statistics reflect the Amazon portion only
# Identify fully vs. partially included municipalities
full_amazon <- municipalities %>%
filter(legal_amazon == TRUE & fully_included == TRUE)
partial_amazon <- municipalities %>%
filter(legal_amazon == TRUE & fully_included == FALSE)
print(paste("Fully in Amazon:", nrow(full_amazon)))
print(paste("Partially in Amazon:", nrow(partial_amazon)))The Legal Amazon includes: - States: All or parts of Acre, Amapá, Amazonas, Distrito Federal, Goiás, Maranhão, Mato Grosso, Mato Grosso do Sul, Pará, Rondônia, Roraima, Tocantins - Municipalities: 570+ municipalities fully or partially in Legal Amazon - Definition: Based on official Brazilian legislation (Law 8,001/1990)
Important note on municipality-level data: - Partial municipalities: Some municipalities extend beyond Legal Amazon boundaries - Data reporting: When data is reported at municipality level for partial municipalities, it reflects only the Legal Amazon portion - Identification: This dataset identifies which municipalities are partial
# Check for partial municipalities
partial_check <- municipalities %>%
filter(legal_amazon == TRUE) %>%
filter(!is.na(amazon_percentage)) %>%
filter(amazon_percentage < 100)
if (nrow(partial_check) > 0) {
print("Municipalities partially in Legal Amazon:")
print(partial_check)
}Brazilian municipalities occasionally undergo changes: - New municipalities: Created from existing ones (last major change 2021) - Boundary adjustments: IBGE periodically refines municipal boundaries - Historical data: When comparing very old data with recent data, be aware municipalities may have been reorganized
Always use IBGE municipality codes (not names) when: - Merging multiple datasets - Doing time-series analysis - Comparing across sources - Municipality names may change or be ambiguous; codes are unique and stable
The municipalities dataset includes spatial boundaries (when loaded as SF object): - Format: Simple features (SF) polygons - CRS: WGS84 (EPSG:4326) - Use: Spatial operations, mapping, spatial joins with other geographic data
library(sf)
library(ggplot2)
# Load municipalities with geometry
municipalities_sf <- municipalities %>%
st_as_sf() # If not already SF format
# Map Legal Amazon
amazon_map <- municipalities_sf %>%
filter(legal_amazon == TRUE)
ggplot(amazon_map) +
geom_sf(fill = "lightgreen", color = "darkgreen") +
labs(title = "Legal Amazon Municipalities") +
theme_minimal()
# Spatial operations example: count municipalities by state
munic_by_state <- municipalities_sf %>%
group_by(state) %>%
summarize(
num_municipalities = n(),
total_area_km2 = sum(st_area(.), na.rm = TRUE) / 1e6,
.groups = 'drop'
)Problem: Municipality names don’t match between datasets Solution: Use IBGE municipality codes instead of names for joining data
Problem: Some municipalities missing after filtering Solution: Check for name spelling variations; use code-based matching
Problem: Spatial operations are slow
Solution: Simplify geometries
(st_simplify()) or work with state/region level first
Problem: Mapping appears incorrect
Solution: Verify CRS (should be WGS84); check for
invalid geometries (st_is_valid())
Problem: Aggregating municipality data to regions
Solution: Use left_join() with
municipalities dataset to add region information
Problem: Partial municipalities causing data discrepancies Solution: Account for partial Amazon municipalities; some statistics are reported for Amazon portion only
For some of our functions, the original data is stored in Google
Drive and exceeds the file size limit for which direct downloads are
possible. As a result, the googledrive package is required
to download the data though the Google Drive API and run the
function.
The first time the package is called, it requires you to link your Google account and grant permissions to be able to download data through the Google Drive API.
You must tick all boxes when the permissions page opens, or else the following error will occur:
# Error in `gargle_abort_request_failed()`:
# ! Client error: (403) Forbidden
# Insufficient Permission: Request had insufficient authentication scopes.
# • domain: global
# • reason: insufficientPermissions
# • message: Insufficient Permission: Request had insufficient authentication
# scopes.
# Run `rlang::last_error()` to see where the error occurred.For further information, click here to access the official package page.
DataZoom is developed by a team at Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio), Department of Economics. Our official website is at: https://datazoom.com.br/pt/.
To cite package datazoom.amazonia in publications
use:
Data Zoom (2023). Data Zoom: Simplifying Access To Brazilian Microdata.
https://datazoom.com.br/en/
A BibTeX entry for LaTeX users is:
@Unpublished{DataZoom2023,
author = {Data Zoom},
title = {Data Zoom: Simplifying Access To Brazilian Microdata},
url = {https://datazoom.com.br/en/},
year = {2023},
}