featdelta is an R package for incremental feature
engineering with database persistence. It is designed for workflows
where raw observations live in a database, feature logic is easier to
write and test in R, and the computed features should be stored back in
a database table for modelling, reporting, monitoring, or downstream
reuse.
Instead of rebuilding the same feature table from scratch whenever
new rows arrive, featdelta lets you define feature
expressions in R, fetch only rows that have not yet been processed,
compute the features locally, and upsert the results into a persistent
feature table.
install.packages("featdelta")You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("LordRudolf/featdelta")The standard pipeline is:
fd_define();The main orchestration function is fd_run(), which
combines the fetch, compute, and upsert steps.
This example uses an in-memory SQLite database and
mtcars, but the same pattern applies to database tables
selected with ordinary SQL.
library(DBI)
library(RSQLite)
library(featdelta)
cars <- mtcars
cars$id <- seq_len(nrow(cars))
con <- dbConnect(SQLite(), ":memory:")
dbWriteTable(con, "raw_cars", cars[1:20, ], overwrite = TRUE)
defs <- fd_define(
transmission = ifelse(am == 1, "automatic", "manual"),
hp_per_cyl = hp / cyl,
wt_per_hp = wt / hp
)
run_day_one <- fd_run(
con = con,
sql = "SELECT * FROM raw_cars ORDER BY id",
defs = defs,
key = "id",
feat_table_name = "car_features"
)
dbGetQuery(con, "SELECT * FROM car_features ORDER BY id")When more raw rows arrive, call the same pipeline again. By default,
fd_run(fetch_mode = "new_only") processes only keys that
are not already in the feature table.
dbAppendTable(con, "raw_cars", cars[21:30, ])
run_day_two <- fd_run(
con = con,
sql = "SELECT * FROM raw_cars ORDER BY id",
defs = defs,
key = "id",
feat_table_name = "car_features"
)
dbGetQuery(con, "SELECT * FROM car_features ORDER BY id")If feature definitions change and existing rows should be recomputed,
use fetch_mode = "all" to refresh the rows returned by the
SQL query.
defs_v2 <- fd_define(
transmission = ifelse(am == 1, "automatic", "manual"),
hp_per_cyl = hp / cyl,
wt_per_hp = wt / hp,
mpg_per_cyl = mpg / cyl
)
fd_run(
con = con,
sql = "SELECT * FROM raw_cars ORDER BY id",
defs = defs_v2,
key = "id",
feat_table_name = "car_features",
fetch_mode = "all"
)For feature logic that naturally produces several columns at once,
use fd_block().
defs <- fd_define(
engine_ratios = fd_block({
data.frame(
hp_per_cyl = hp / cyl,
disp_per_cyl = disp / cyl,
wt_per_hp = wt / hp
)
})
)The package includes vignettes covering the getting-started workflow, feature definition patterns, database pipeline details, production patterns, and scheduled runs.