CEOdata provides convenient access to the microdata
(individual-level survey responses) produced by the Centre d’Estudis
d’Opinió (CEO), the public opinion institute of the Government of
Catalonia.
This vignette is fully offline and uses the bundled small example
datasets in the data/ folder.
The central entry point is the function
CEOdata(), which downloads and imports
microdata into R.Depending on the arguments provided, CEOdata() can
retrieve either:
In addition to data retrieval, the packages includes:
CEOmeta(): which provides metadata for
all individual studies, giving a complete list and details of all
available surveys, and allowing the user to search for specific
topics.CEOaccumulated_meta(), which provides
access to the list of available accumulated microdata series.CEOsearch(), which allows searching
variable names, variable labels, and value labels within downloaded
datasets.Together, these functions provide a coherent workflow for
discovering, downloading, and exploring CEO survey microdata directly
from R.
An accumulated microdata series is a dataset that combines the individual responses from multiple CEO surveys conducted under a common design and topic.
For example, the series “BOP_presencial” contains the
accumulated microdata of the Baròmetres d’Opinió
Política conducted face-to-face since 2014. Each row
corresponds to an individual respondent, while the dataset aggregates
responses across several survey waves (each identified by a different
REO code).
In contrast to downloading a single study (via its REO code), working with an accumulated series allows users to: - Analyse trends across time - Pool observations to increase statistical power - Work with a harmonised questionnaire structure across waves
Each series is identified by a codi_serie, which can be
inspected using CEOaccumulated_meta().
The available accumulated microdata series can be inspected using
CEOaccumulated_meta():
## # A tibble: 6 × 10
## codi_serie titol_serie mode_admin data_inici data_fi reo estat univers
## <chr> <chr> <chr> <date> <date> <chr> <chr> <chr>
## 1 BOP_telefoni… Microdades… Telefònica 2006-03-06 2013-11-14 346,… Seri… Poblac…
## 2 BOP_presenci… Microdades… Presencial 2014-03-02 2025-06-28 746,… Seri… Poblac…
## 3 Context Microdades… Presencia… 2024-12-09 2021-05-19 760,… Seri… Poblac…
## 4 VdG_telefoni… Microdades… Telefònica 2007-12-12 2021-12-15 406,… Seri… Poblac…
## 5 VdG_presenci… Microdades… Presencial 2009-05-26 2019-12-12 511,… Seri… Poblac…
## 6 VdG_autoadmi… Microdades… Autoadmin… 2022-12-02 2025-12-08 1044… Seri… Poblac…
## # ℹ 2 more variables: microdades_1 <chr>, microdades_2 <chr>
This function returns a tibble where each row corresponds to an accumulated series. The most relevant columns are:
codi_serie: identifier user to
download the series.titol_serie: descriptive title of the
series.mode_admin: mode of
administration.data_inici and
data_fi: temporal coverage.reo: the list of REO
codes, separeted by commas, that the accumulated series contains.estat: whether the series is inactive
or active.microdades_1: direct link to the
microdata file (used in CEOdata()).To see only the identifiers:
## [1] "BOP_telefonica" "BOP_presencial" "Context"
## [4] "VdG_telefonica" "VdG_presencial" "VdG_autoadministrada"
You can also filter the metadata to inspect a specific series:
## # A tibble: 1 × 10
## codi_serie titol_serie mode_admin data_inici data_fi reo estat univers
## <chr> <chr> <chr> <date> <date> <chr> <chr> <chr>
## 1 BOP_presenci… Microdades… Presencial 2014-03-02 2025-06-28 746,… Seri… Poblac…
## # ℹ 2 more variables: microdades_1 <chr>, microdades_2 <chr>
Once a codi_serie has been identified, the corresponding
dataset can be loaded using CEOdata(). In this offline
vignette, the available accumulated series example is
“BOP_presencial”.
## # A tibble: 6 × 14
## PONDERA REO BOP_NUM ANY DATA_FIN DATA_INI SEXE EDAT INGRESSOS_1_15
## <dbl> <dbl> <fct> <fct> <date> <date> <fct> <dbl> <fct>
## 1 1 1119 Feb. 25 … 2025 2025-03-10 2025-03-10 Masc… 26 Més de 6.000 €
## 2 1 1119 Feb. 25 … 2025 2025-03-05 2025-03-05 Masc… 30 De 3.001 a 4.…
## 3 1 1119 Feb. 25 … 2025 2025-03-12 2025-03-12 Masc… 48 No té cap tip…
## 4 1 1119 Feb. 25 … 2025 2025-02-18 2025-02-18 Feme… 30 De 1001 a 120…
## 5 1 1119 Feb. 25 … 2025 2025-03-03 2025-03-03 Feme… 36 De 2.001 a 2.…
## 6 1 1119 Feb. 25 … 2025 2025-02-17 2025-02-17 Masc… 22 Més de 6.000 €
## # ℹ 5 more variables: LLOC_NAIX <fct>, INTERES_POL <fct>,
## # SATIS_DEMOCRACIA <fct>, EDAT_GR <fct>, EDAT_CEO <fct>
This is equivalent to explicitly specifying the series:
## # A tibble: 6 × 14
## PONDERA REO BOP_NUM ANY DATA_FIN DATA_INI SEXE EDAT INGRESSOS_1_15
## <dbl> <dbl> <fct> <fct> <date> <date> <fct> <dbl> <fct>
## 1 1 1119 Feb. 25 … 2025 2025-03-10 2025-03-10 Masc… 26 Més de 6.000 €
## 2 1 1119 Feb. 25 … 2025 2025-03-05 2025-03-05 Masc… 30 De 3.001 a 4.…
## 3 1 1119 Feb. 25 … 2025 2025-03-12 2025-03-12 Masc… 48 No té cap tip…
## 4 1 1119 Feb. 25 … 2025 2025-02-18 2025-02-18 Feme… 30 De 1001 a 120…
## 5 1 1119 Feb. 25 … 2025 2025-03-03 2025-03-03 Feme… 36 De 2.001 a 2.…
## 6 1 1119 Feb. 25 … 2025 2025-02-17 2025-02-17 Masc… 22 Més de 6.000 €
## # ℹ 5 more variables: LLOC_NAIX <fct>, INTERES_POL <fct>,
## # SATIS_DEMOCRACIA <fct>, EDAT_GR <fct>, EDAT_CEO <fct>
Attempting to load a different accumulated series in this offline vignette returns an informative error. This is available if the computer has internet connection:
## Error : Offline vignette example includes only series = 'BOP_presencial'.
The returned object is a tibble where each row represents an individual respondent and columns correspond to survey variables. Accumulated series typically combine multiple survey waves that share a comparable questionnaire structure.
By default, SPSS labelled variables are converted into standard R
factors. To retain the original haven_labelled format:
## # A tibble: 6 × 14
## PONDERA REO BOP_NUM ANY DATA_FIN DATA_INI SEXE EDAT
## <dbl> <dbl> <dbl+lbl> <dbl+lb> <date> <date> <dbl+l> <dbl>
## 1 1 1119 61 [Feb. 25 - 1119] 2025 2025-03-10 2025-03-10 1 [Mas… 26
## 2 1 1119 61 [Feb. 25 - 1119] 2025 2025-03-05 2025-03-05 1 [Mas… 30
## 3 1 1119 61 [Feb. 25 - 1119] 2025 2025-03-12 2025-03-12 1 [Mas… 48
## 4 1 1119 61 [Feb. 25 - 1119] 2025 2025-02-18 2025-02-18 2 [Fem… 30
## 5 1 1119 61 [Feb. 25 - 1119] 2025 2025-03-03 2025-03-03 2 [Fem… 36
## 6 1 1119 61 [Feb. 25 - 1119] 2025 2025-02-17 2025-02-17 1 [Mas… 22
## # ℹ 6 more variables: INGRESSOS_1_15 <dbl+lbl>, LLOC_NAIX <dbl+lbl>,
## # INTERES_POL <dbl+lbl>, SATIS_DEMOCRACIA <dbl+lbl>, EDAT_GR <dbl+lbl>,
## # EDAT_CEO <dbl+lbl>
All individual surveys from the Generalitat de Catalunya are identified by a REO code (Registre d’Estudis d’Opinió). Each REO corresponds to a specific survey wave conducted at a given time.
The available studies in the offline example can be inspected using
CEOmeta():
## # A tibble: 6 × 48
## REO `Titol enquesta` `Titol estudi` `Metodologia enquesta`
## <fct> <chr> <chr> <fct>
## 1 1150 Baròmetre de la bicicleta. 2025 Baròmetre de … quantitativa
## 2 1149 Enquesta de valoració del Govern … Enquesta de v… quantitativa
## 3 1148 Enquesta de satisfacció dels serv… Enquesta de s… quantitativa
## 4 1147 Elements cohesionadors de la soci… Elements cohe… quantitativa
## 5 1146 Avaluació de la satisfacció de le… Avaluació de … quantitativa
## 6 1145 Baròmetre d'Opinió Política. 3a o… Baròmetre d'O… quantitativa
## # ℹ 44 more variables: `Metode de recollida de dades` <fct>, Objectius <chr>,
## # `Ambit territorial` <fct>, Cost <dbl>, `Promotors enquesta` <chr>,
## # `Executors enquesta` <chr>, `Promotors estudi` <chr>,
## # `Executors estudi` <chr>, `Data de treball de camp` <chr>,
## # `Dia inici treball de camp` <date>, `Dia final treball de camp` <date>,
## # Univers <chr>, Mostra <chr>, `Mostra estudis quantitatius` <dbl>,
## # `Mostra estudis qualitatius` <chr>, `Error mostral` <chr>, …
This function returns a tibble where each row corresponds to a study. Among the most relevant columns are:
REO: the study identifier.Títol enquesta: descriptive title of
the study.Objectius: description of the survey
goals and contents.Dia inici treball de camp and
Dia final treball de camp: fieldwork
dates.microdata_available: logical indicator
of whether microdata are publicly available.Microdades_1: direct link to the
microdata file (used in CEOdata()).Internal surveys from the CEO have publicly available microdata, but there are other surveys from different institutions of the catalan government that might not have available microdata to retrieve. To get only the surveys that can be retrieved:
## # A tibble: 3 × 48
## REO `Titol enquesta` `Titol estudi` `Metodologia enquesta`
## <fct> <chr> <chr> <fct>
## 1 1149 Enquesta de valoració del Govern … Enquesta de v… quantitativa
## 2 1145 Baròmetre d'Opinió Política. 3a o… Baròmetre d'O… quantitativa
## 3 1143 Enquesta sobre postveritat i teor… Enquesta sobr… quantitativa
## # ℹ 44 more variables: `Metode de recollida de dades` <fct>, Objectius <chr>,
## # `Ambit territorial` <fct>, Cost <dbl>, `Promotors enquesta` <chr>,
## # `Executors enquesta` <chr>, `Promotors estudi` <chr>,
## # `Executors estudi` <chr>, `Data de treball de camp` <chr>,
## # `Dia inici treball de camp` <date>, `Dia final treball de camp` <date>,
## # Univers <chr>, Mostra <chr>, `Mostra estudis quantitatius` <dbl>,
## # `Mostra estudis qualitatius` <chr>, `Error mostral` <chr>, …
The search argument allows users to look for keywords
across several descriptive fields (such as title, summary, objectives…).
Search words should be in Catalan.
## # A tibble: 1 × 48
## REO `Titol enquesta` `Titol estudi` `Metodologia enquesta`
## <fct> <chr> <chr> <fct>
## 1 1145 Baròmetre d'Opinió Política. 3a o… Baròmetre d'O… quantitativa
## # ℹ 44 more variables: `Metode de recollida de dades` <fct>, Objectius <chr>,
## # `Ambit territorial` <fct>, Cost <dbl>, `Promotors enquesta` <chr>,
## # `Executors enquesta` <chr>, `Promotors estudi` <chr>,
## # `Executors estudi` <chr>, `Data de treball de camp` <chr>,
## # `Dia inici treball de camp` <date>, `Dia final treball de camp` <date>,
## # Univers <chr>, Mostra <chr>, `Mostra estudis quantitatius` <dbl>,
## # `Mostra estudis qualitatius` <chr>, `Error mostral` <chr>, …
Once you have identified the REO code of a study, you can load its
microdata using CEOdata(reo = ...).
## # A tibble: 6 × 11
## PONDERA REO SEXE BOP_NUM EDAT INGRESSOS_1_15 LLOC_NAIX INTERES_POL
## <dbl> <dbl> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 1 1145 Femení Oct. 25 - 11… 37 De 1.801 a 2.… A Catalu… Gens
## 2 1 1145 Masculí Oct. 25 - 11… 19 De 1.801 a 2.… A Catalu… Gens
## 3 1 1145 Femení Oct. 25 - 11… 37 De 2.401 a 3.… A Catalu… Poc
## 4 1 1145 Femení Oct. 25 - 11… 83 De 1001 a 120… A altres… Bastant
## 5 1 1145 Masculí Oct. 25 - 11… 70 De 2.001 a 2.… A Catalu… Bastant
## 6 1 1145 Masculí Oct. 25 - 11… 65 De 4.501 a 5.… A altres… Bastant
## # ℹ 3 more variables: SATIS_DEMOCRACIA <fct>, EDAT_GR <fct>, EDAT_CEO <fct>
The returned object is a tibble where each row corresponds to an
individual respondent and columns correspond to survey variables. If a
REO has not available microdata, CEOdata() will return an
informative error when retrieving the information.
As with accumulated series, by default the package converts
SPSS-labelled variables into standard R factors. To keep the raw
haven_labelled format:
## # A tibble: 6 × 11
## PONDERA REO SEXE BOP_NUM EDAT INGRESSOS_1_15 LLOC_NAIX INTERES_POL
## <dbl> <dbl> <dbl+lbl> <dbl+lb> <dbl+l> <dbl+lbl> <dbl+lbl> <dbl+lbl>
## 1 1 1145 2 [Femení] 63 [Oct… 37 [37] 8 [De 1.801 … 1 [A Cat… 4 [Gens]
## 2 1 1145 1 [Mascul… 63 [Oct… 19 [19] 8 [De 1.801 … 1 [A Cat… 4 [Gens]
## 3 1 1145 2 [Femení] 63 [Oct… 37 [37] 10 [De 2.401 … 1 [A Cat… 3 [Poc]
## 4 1 1145 2 [Femení] 63 [Oct… 83 [83] 6 [De 1001 a… 2 [A alt… 2 [Bastant]
## 5 1 1145 1 [Mascul… 63 [Oct… 70 [70] 9 [De 2.001 … 1 [A Cat… 2 [Bastant]
## 6 1 1145 1 [Mascul… 63 [Oct… 65 [65] 13 [De 4.501 … 2 [A alt… 2 [Bastant]
## # ℹ 3 more variables: SATIS_DEMOCRACIA <dbl+lbl>, EDAT_GR <dbl+lbl>,
## # EDAT_CEO <dbl+lbl>
Once a dataset has been downloaded using CEOdata(), the
function CEOsearch() can be used to look for keywords in
the variable labels or value labels. this is especially useful when
working with large questionnaires and searching for specific topics.
You can search for keywords in the variable labels, for example, look for “trust” in the last retrieved dataset. Keywords must be typed in catalan language.
## # A tibble: 1 × 2
## Variable Label
## <chr> <chr>
## 1 SATIS_DEMOCRACIA 26. Està vostè molt, bastant, poc o gens satisfet/a amb el f…
Sometimes, information might be on the value labels instead of the variables themselves. You can also search within response categories.
## # A tibble: 6 × 2
## Variable Value
## <fct> <fct>
## 1 LLOC_NAIX A Catalunya
## 2 LLOC_NAIX A altres comunitats autònomes
## 3 LLOC_NAIX Unió Europea
## 4 LLOC_NAIX Resta del món
## 5 LLOC_NAIX Fora d'Espanya
## 6 LLOC_NAIX No ho sap
CEO microdata are originally distributed as SPSS (.sav)
files. These files store categorical variables using value labels
(e.g. 1 = Yes, 2 = No) rather than plain R
factors.
By default, CEOdata()converts SPSS-labelled variables
into standard R factors. This makes the dataset immediately convenient
for descriptive statistics, modelling, and plotting in R,
as most workflows expect factors rather than labelled vectors. If you
prefer labelled structure, for example to retain exact numeric codings,
you can set the argument raw = TRUE when retrieving any
dataset.
In online use, CEOdata retrieves datasets directly from
the official open data platform of the Generalitat de Catalunya. In this
vignette, all examples are run offline with fixed local files. Online
retrieval has implications for reproducibility:
As a consequence, repeated calls to CEOdata() at
different points in time may return slightly different datasets.
To enhance reproducibility in applied research, it is recommended to:
## [1] '1.4.0'
CEOdata aims to provide convenient and transparent
access to official survey data, but reproducible research practices
remain the responsibility of the analyst.