Knot Selection for Smooth Profiles of Visual Meteor Showers

2026-05-16

Setup

Load the package before running any of the code below:

library(vismeteor)

Introduction

The activity of a visual meteor shower is a smooth, a priori unknown function of solar longitude. The classical evaluation summarises this by computing the Zenithal Hourly Rate (ZHR) per observation interval and plotting it as a noisy stepped curve. Single-interval ZHRs have high variance — Poisson noise plus observer-to-observer scatter — and the underlying activity profile is barely visible.

A spline-based fit across solar longitude collapses this noise into a smooth, bias-controlled curve, provided that the spline’s complexity (number and location of interior knots) is chosen wisely. select_knots() automates that choice: given a candidate grid of possible interior knots and a user-supplied score function (BIC, AIC, or a cross-validation criterion), it returns a small subset that minimises the score.

Conceptually, the supplied knot_candidates form a grid of intervals over the solar longitude: between two adjacent candidates lies an interval in which the spline can bend independently. select_knots() decides which interval boundaries are actually required — i.e. which positions carry enough curvature to justify a knot under the score. The natural resolution of the input grid in visual meteor work is the observation granularity itself: typical VMDB observation intervals are 10–15 minutes long, corresponding to roughly 0.007–0.010 degrees of solar longitude. A fine candidate grid at that resolution is the practice recommendation; select_knots() will then reduce it to a handful of statistically significant knots.

When to use select_knots()

The point of select_knots() is to keep the knot positions explicit and few: a handful of values you can plot, label, and reason about, instead of a dense anonymous knot grid hidden inside a penalised smoother. The function is model-agnostic — your score_fun picks the model family, offset, covariates, and the criterion (BIC, AIC, or a cross-validation score) you optimise.

This function is the right tool when

Prefer alternatives when

Important clarification. select_knots() chooses knots for a good fit, not for detecting extrema. Knots typically land next to maxima or minima — where the curvature is high — and not on them. Local extrema of the activity profile become visible from the shape of the fitted curve, not from the list of selected knots.

The implementation is generic, but its motivation comes directly from the visual-meteor use case: activity profiles are the natural setting in which knot positions can be physically interpreted — main maximum, sub-maxima, shower onset and end. The same interpretive value carries over to other meteor-related smooth fits — for instance the apparent magnitude distribution of a shower. Alternative packages (mgcv, earth) replace explicit knots with continuous penalty mechanisms, which gives up that interpretation.

Data preparation

We use the package’s example data set PER_2015_rates (visual rate observations of the Perseids 2015; see ?PER_2015_rates and ?load_vmdb). The raw filtering follows standard practice:

data(PER_2015_rates)
obs <- PER_2015_rates$observations
obs <- subset(
    obs,
    rad_alt > 15 & moon_alt < 0 & sun_alt < -5 &
        sl_start >= 139.2 & sl_end <= 140.6
)

deg2rad <- \(x) x * pi / 180
obs$sl <- (obs$sl_start + obs$sl_end) / 2
obs$t <- with(obs, t_eff * sin(deg2rad(rad_alt)) / f)

rates <- data.frame(
    sl     = obs$sl,
    t      = obs$t,
    limmag = obs$lim_magn,
    freq   = obs$freq
)
rates <- rates[order(rates$sl), ]
nrow(rates)
#> [1] 1986

What each step does:

We will call t simply the observing time from now on — not to be confused with t_eff, the raw effective observing time after breaks; t is t_eff after the zenith (sin(rad_alt)) and cloud-cover (/ f) corrections.

Model: Poisson GLM with a spline of solar longitude

fit_glm <- \(rates, knots) {
    f <- if (length(knots) == 0L) {
        freq ~ splines::ns(sl) + limmag + offset(log(t))
    } else {
        freq ~ splines::ns(sl, knots = knots) +
            limmag + offset(log(t))
    }
    stats::glm(f, data = rates, family = stats::poisson())
}
score_bic <- \(rates, knots) stats::BIC(fit_glm(rates, knots))

offset(log(t)) makes the linear predictor a log-rate per observing hour at the observer’s own limiting magnitude. The limmag term then rescales that rate to the ZHR reference (limmag = 6.5): exp(coef(limmag)) is the data-driven population index r of the shower under this model. The spline term splines::ns() is a natural cubic spline, well-behaved at the boundaries.

A practical note on the offset: offset() inside a formula is a formula-special and must stay unqualified — both stats::offset(...) in the formula and passing the term via glm()’s offset = argument break predict() on newdata.

Knot candidates

A regular grid is the simplest choice but ignores the actual observation density. In practice the candidate grid is built from the data themselves via freq_quantile() (see ?freq_quantile): partition the observations, ordered by solar longitude, into bins each containing at least a fixed minimum number of meteors, then use the meteor-weighted solar-longitude midpoint of each bin as a candidate. This produces a fine grid in observation-rich regions and a coarse grid where data is sparse — exactly the resolution select_knots() benefits from:

rates$q <- vismeteor::freq_quantile(rates$freq, 12) # >= 12 meteors per bin
rates$qsl <- with(
    rates,
    ave(freq * sl, q, FUN = sum) /
        ave(freq, q, FUN = sum)
)
cand <- unique(round(rates$qsl, 2))
length(cand)
#> [1] 103

The round(..., 2) step collapses bins whose centres are practically indistinguishable (resolution 0.01 deg). The minimum bin count (12) controls how dense the candidate grid is; a smaller value gives a finer grid (slower fit), a larger value a coarser one.

Fixing knots

The fixed_knots argument pins knot positions that must be present in every fitted model. In forward mode they are set from the start and the search adds further knots on top; in backward mode they are never proposed for removal, neither singly nor by bulk removal. fixed_knots need not be a subset of knot_candidates — values supplied there are forced into the working set regardless.

A fixed knot is a constraint on the search, not a finding from it: the criterion no longer has the freedom to reject it. The earlier warning that knots typically land next to extrema rather than on them still applies — forcing a knot exactly on, say, a presumed maximum overrides that pattern at the chosen position. Use it when there is a non-data reason a position must be in the model (a documented event time, an external anchor, a stable reference across analyses); see ?select_knots for the full semantics.

Forward selection

The vignette runs serially (n_cores = 1L) because the search window is narrow. For a full Perseids activity range (≈ 107–155 deg) the per-round candidate scoring benefits from n_cores = parallel::detectCores() - 1L; see ?select_knots for the reproducibility recipe with mclapply().

res_fwd <- vismeteor::select_knots(rates, cand, score_bic)
sort(c(res_fwd$knots, res_fwd$fixed_knots))
#> [1] 139.56 139.63 139.65 139.70 139.80 139.84 139.86 140.59
res_fwd$score
#> [1] 14106.83
plot(res_fwd$history$n_knots, res_fwd$history$score,
    type = "b", pch = 19, col = "blue",
    xlab = "number of interior knots", ylab = "BIC",
    main = "Forward selection: BIC trajectory"
)
points(tail(res_fwd$history$n_knots, 1L), res_fwd$score,
    pch = 19, col = "red", cex = 1.5
)
abline(h = res_fwd$score, lty = 2, col = "grey50")

BIC trajectory of the forward search over the number of interior knots, with the score optimum at the endpoint

The red point marks the score minimum — and with the default n_steps = NULL it is also the endpoint of the trajectory, because the search stops as soon as no further addition improves the score. To see what one more step would do, re-run the algorithm starting from the optimum with a positive n_steps:

res_fwd_plus1 <- vismeteor::select_knots(rates, cand, score_bic,
    fixed_knots = c(
        res_fwd$knots,
        res_fwd$fixed_knots
    ),
    n_steps = 1L
)
res_fwd_plus1$score - res_fwd$score # always >= 0: how much worse one step makes it
#> [1] 2.593168

n_steps is an exploration knob, not a model-selection knob: it forces the algorithm to take a fixed number of further iterations regardless of whether they improve the score, so that the immediate score landscape past the optimum becomes visible through history. A shallow rise tells you the BIC optimum is soft and that a slightly richer model would not cost much; a steep rise tells you the optimum is sharp and the parsimony is justified. The same recipe — re-call with fixed_knots = c(prev$knots, prev$fixed_knots), n_steps = k — extends to as many follow-up steps as you like; see ?select_knots.

Backward elimination

res_bwd <- vismeteor::select_knots(rates, cand, score_bic, backward = TRUE)
sort(c(res_bwd$knots, res_bwd$fixed_knots))
#>  [1] 139.68 139.69 139.70 139.71 139.74 139.75 139.77 140.44 140.50 140.51
#> [11] 140.52 140.53
res_bwd$score
#> [1] 14060.24
plot(res_bwd$history$n_knots, res_bwd$history$score,
    type = "b", pch = 19, col = "blue",
    xlab = "number of interior knots", ylab = "BIC",
    main = "Backward elimination: BIC trajectory"
)
points(tail(res_bwd$history$n_knots, 1L), res_bwd$score,
    pch = 19, col = "red", cex = 1.5
)
abline(h = res_bwd$score, lty = 2, col = "grey50")

BIC trajectory of the backward search over the number of interior knots, with the score optimum at the endpoint

The trajectory starts at the full candidate pool on the right and moves left as knots are removed; gaps along the x-axis correspond to bulk removal rounds (see bulk_gap in ?select_knots). The red point again marks the score minimum.

Forward vs. backward: when and why they differ

Both searches walk the lattice of knot subsets greedily — each round commits to the locally best add or drop — but they start at opposite corners:

That is why the two paths can disagree: at any intermediate state the score of a candidate move depends on what is currently selected. Adding k_i to the empty set is a genuinely different operation from adding k_i to a set that already contains its neighbours, and the two searches encounter k_i in incompatible contexts.

Practical recommendation. For activity-profile fitting we recommend backward = TRUE for the final analysis, even though the function’s API default is FALSE (the function is generic and defaults to the cheaper-per-fit direction). The reason is qualitative, not performance: starting from the overfit end ensures that all real curvature is already present in the initial model, so the search only has to decide what is redundant — which is the safer mode against the forward “miss-by-combination” failure. Use backward = FALSE (forward) for a quick, sparse first sketch or when the full-candidate initial fit is itself too expensive (very large candidate pool, complex model family).

In practice neither direction is uniformly faster: forward iterates many small fits one knot at a time, while backward starts with the single most expensive fit and then collapses fast via the bulk-removal mechanism (bulk_gap = 4L by default; see ?select_knots).

In any case, running both directions and keeping the lower-BIC result is a cheap robustness check; large disagreements (in either knot count or score) are a warning that the chosen criterion is on the flat part of its landscape and that the data alone may not justify a unique parsimonious model. The n_steps argument lets either search continue for a fixed number of iterations past its first local minimum, which is useful for inspecting how flat or sharp that minimum is.

Final fit and ZHR curve

best <- if (res_fwd$score <= res_bwd$score) {
    c(res_fwd$knots, res_fwd$fixed_knots)
} else {
    c(res_bwd$knots, res_bwd$fixed_knots)
}
fit <- fit_glm(rates, sort(best))

# Data-driven population index of the shower, with 95% Wald CI
# (log-scale CI on the limmag coefficient, exponentiated).
limmag_row <- summary(fit)$coefficients["limmag", ]
r_hat <- exp(limmag_row["Estimate"])
r_ci <- exp(limmag_row["Estimate"] +
    c(-1, 1) * 1.96 * limmag_row["Std. Error"])
c(
    estimate = unname(r_hat),
    lower = unname(r_ci[1]),
    upper = unname(r_ci[2])
)
#> estimate    lower    upper 
#> 1.218461 1.192027 1.245482

The point estimate is the population index r implied by the rate-vs-limmag slope of this dataset; the CI width tells you how sharply that slope is pinned down by the observations (a tight band means a well-determined r, a wide band means the data leave it essentially free). It enters the ZHR prediction below.

sl_grid <- seq(139.2, 140.6, length.out = 400)
pred_df <- data.frame(sl = sl_grid, limmag = 6.5, t = 1)
link <- predict(fit, newdata = pred_df, type = "link", se.fit = TRUE)
zhr <- exp(link$fit)
zhr_lo <- exp(link$fit - 1.96 * link$se.fit)
zhr_hi <- exp(link$fit + 1.96 * link$se.fit)

plot(sl_grid, zhr,
    type = "n",
    ylim = range(c(zhr_lo, zhr_hi)),
    xlab = "solar longitude (deg)", ylab = "ZHR",
    main = "PER 2015 - fitted activity profile (95% CI)"
)
polygon(c(sl_grid, rev(sl_grid)),
    c(zhr_lo, rev(zhr_hi)),
    col = adjustcolor("blue", alpha.f = 0.2), border = NA
)
lines(sl_grid, zhr, col = "blue", lwd = 2)
rug(best, col = "red")

Fitted ZHR curve over solar longitude with a 95% confidence band and selected knots marked on the axis

The prediction is evaluated with t = 1 (one ZHR-hour) and limmag = 6.5 (the reference limiting magnitude), so the response is the ZHR per hour on the chosen reference scale. The red rug marks on the x-axis show the selected interior knots; note that they sit next to the activity peak rather than on top of it, as expected — that is where the curvature is highest.

The shaded band is the pointwise 95 % confidence interval of the fit. We obtain it on the link (log) scale — where the GLM’s standard errors are approximately Gaussian — and then transform back with exp(). The band reflects parameter uncertainty of the spline + linear predictor; it is not a prediction interval for individual observations, which would also have to account for Poisson dispersion of the counts themselves.

Residual analysis with quantile bins

Raw residuals of individual Poisson observations with small expectations are visually uninformative. A cleaner diagnostic groups the observations into bins of fixed minimum meteor count using freq_quantile() and compares the observed and expected totals per bin.

# Use a coarser binning here than for the candidate grid above
# (>= 500 meteors per bin) so that bin uncertainties are small enough
# to read systematic deviations from the diagnostic plot.
rates$qres <- vismeteor::freq_quantile(rates$freq, 500)
rates$pred <- predict(fit, type = "response")

bins <- with(rates, {
    obs_n <- tapply(freq, qres, sum)
    pred_n <- tapply(pred, qres, sum)
    data.frame(
        sl    = tapply(sl * freq, qres, sum) / obs_n,
        ratio = obs_n / pred_n,
        se    = sqrt(obs_n) / pred_n
    )
})
nrow(bins)
#> [1] 40

plot(bins$sl, bins$ratio,
    pch = 19, col = "blue",
    ylim = range(c(
        bins$ratio - 2 * bins$se,
        bins$ratio + 2 * bins$se
    )),
    xlab = "solar longitude (deg)",
    ylab = "observed / expected",
    main = "PER 2015 - binned residual ratios (+/- 1 sigma)"
)
arrows(bins$sl, bins$ratio - bins$se,
    bins$sl, bins$ratio + bins$se,
    angle = 90, code = 3, length = 0.03, col = "blue"
)
abline(h = 1, lty = 2, col = "grey50")

Binned observed-over-expected ratios with one-sigma error bars over solar longitude

A well-fitting model scatters the bin ratios randomly around 1, with about two thirds of the one-sigma error bars crossing the dashed line. Systematic arches over solar longitude would indicate too few knots in a region; visible scatter with no trend is Poisson over-dispersion (not pursued here).

The same freq_quantile() helper drives both the candidate grid above and the diagnostic here, but with different minimum bin sizes: a small minimum (12) for a fine candidate grid that select_knots() can prune from, and a larger minimum (500) for a small number of stable bins on which residual deviations are visually detectable. In the narrow vignette window the larger minimum yields only a handful of bins (see the nrow(bins) output); a full activity-range analysis would lower this minimum to keep enough bins for a visible trend.

See also