The purpose of this Vignette is to demonstrate the performance of onnxruntime inference with nativeORT, including CoreML capabilities. This will demonstrate nativeORT is capable of running at real-time (sub-29.97fps) inferencing.
This is tested on 50 256x256 arrays on an Apple M1 machine, simulating an incoming video stream.
# typical RGB 256x256 image
input <- array(
runif(1 * 3 * 256 * 256),
dim=c(1L, 3L, 256L, 256L)
)
session <- nativeORT::ort_session(model_path,
threads=0L,
opt_level=99L)
times_cpu <- numeric(100)
for (i in 1:100){
times_cpu[i] <- system.time(
nativeORT::ort_infer_raw(session, input)
)["elapsed"] * 1000
}
# CoreML
dir.create(path.expand("~/.nativeORT/cache"),
recursive = TRUE, showWarnings = FALSE
)
session <- nativeORT::ort_session(model_path,
provider='coreml',
cache_dir=path.expand("~/.nativeORT/cache"),
threads=0L,
opt_level=99L
)
times_coreml <- numeric(100)
for (i in 1:100){
times_coreml[i] <- system.time(
nativeORT::ort_infer_raw(session, input)
)["elapsed"] * 1000
}results <- data.frame(
run=rep(1:length(times_cpu), 2),
provider=c(
rep("CPU (nativeORT)", length(times_cpu)),
rep("CoreML (nativeORT)", length(times_coreml))
),
latency_ms=c(times_cpu, times_coreml)
)
ggplot(results, aes(x=run, y=latency_ms, color=provider)) +
geom_line() +
geom_hline(yintercept=33.3, linetype="dashed", color="red") +
annotate("text", x=85, y=40, label="29.97 fps threshold") +
labs(
title="Inference Latency Across Inference Engines",
subtitle="YOLOv11n, 256x256 Images, Apple M1",
x="Run",
y="Latency (ms)"
) +
theme_minimal()Notably, nativeORT can run substantially below real-time requirements. Due to optimization in the C++ bindings, the CPU and CoreML latency are near parity; however, it is of note that the CoreML runs offer better stability as they sit on dedicated hardware, whereas the CPU is subject to slowdowns when other processes hit.
CoreML does require a warmup (as noticed in the spike) but after one or two inferences, it becomes real-time performant. At a median latency of 7-8 milliseconds on Apple M1 Silicon, there is still time to run post-processing and remain under target latency.