Out of the box, deident features a set of
transformations to aid in the de-identification of data sets. Each
transformation is implemented via R6Class and extends
BaseDeident. User defined transformations can be
implemented in a similar manner.
To demonstrate the different transformation we supply a toy data set,
df, comprising 26 observations of three variables:
X if B <= 13,
Y if B > 13Apply a cached random replacement cipher. Re-occurrence of the same key will receive the same hash.
Implemented deident options:
deident(df, "psudonymize", A)
deident(df, "Pseudonymizer", A)
deident(df, Pseudonymizer, A)
deident(df, Pseudonymizer$new(), A)
psu <- Pseudonymizer$new()
deident(df, psu, A)By default Pseudonymizer replaces values in variables
with a random alpha-numeric string of 5 characters. This can be replaced
via calling set_method on an instantiated Pseudonymizer
with the desired function:
psu <- Pseudonymizer$new()
new_method <- function(key, ...){
paste(sample(letters, 12, T), collapse="")
}
psu$set_method(new_method)
deident(df, psu, A)
#> DeidentList
#> 1 step(s) implemented
#> Step 1 : 'Pseudonymizer' on variable(s) A
#> For data:
#> columns: A, B, CThe first argument to the method receives the key to be transformed.
Implemented deident options:
Apply cryptographic hashing to a variable.
Implemented deident options:
deident(df, "encrypt", A)
deident(df, "Encrypter", A)
deident(df, Encrypter, A)
deident(df, Encrypter$new(), A)
encrypt <- Encrypter$new()
deident(df, encrypt, A)At initialization, Encrypter can be given
hash_key and seed values to control the
cryptographic encryption. It is recommended users set these values and
do not disclose them.
Apply Gaussian white noise to a numeric variable.
Implemented deident options:
Aggregate categorical values dependent on a user supplied list. the
list must be supplied to Blur at initialization.
Implemented deident options:
Aggregate numeric values dependent on a user supplied vector of
breaks/ cuts. If no vector is supplied NumericBlurer
defaults to a binary classification about 0.
Implemented deident options:
deident(df, "numeric_blur", B)
deident(df, "NumericBlurer", B)
deident(df, NumericBlurer, B)
deident(df, NumericBlurer$new(), B)
numeric_blur <- NumericBlurer$new()
deident(df, numeric_blur, B)At initialization NumericBlurer takes an argument
cuts to define the limits of each interval.
Apply Shuffler to a data set having first grouped the
data on column(s). The grouping needs to be defined at
initialization.
Implemented deident options:
At initialization GroupedShuffler takes an argument
limit such that if any aggregated sub group has fewer than
limit observations all values are dropped.