---
title: "BenchmarkPerformance"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{BenchmarkPerformance}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{nativeORT}
VignetteBuilder: knitr
---


```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

if (!nativeORT::ort_is_installed()) {
  knitr::opts_chunk$set(eval=FALSE)
}
```

```{r load,include=FALSE, eval = FALSE}
library(nativeORT)
library(ggplot2)

model_path <- file.path(tools::R_user_dir("nativeORT", "data"), "yolo11n.onnx")

if (!file.exists(model_path)) {
  message("Downloading YOLOv11 model (~12Mb)...")
  dir.create(dirname(model_path), recursive=TRUE, showWarnings = FALSE)
  utils::download.file(
    url="https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo11n.onnx",
    destfile=model_path,
    mode="wb"
  )
  message("yolo11n.onnx downloaded!")
}
```

The purpose of this Vignette is to demonstrate the performance of onnxruntime inference with nativeORT, including CoreML capabilities. This will demonstrate nativeORT is capable of running at real-time (sub-29.97fps) inferencing.

This is tested on 50 256x256 arrays on an Apple M1 machine, simulating an incoming video stream.

## nativeORT CPU & CoreML

```{r benchmark, eval = FALSE}
# typical RGB 256x256 image
input <- array(
  runif(1 * 3 * 256 * 256),
  dim=c(1L, 3L, 256L, 256L)
)

session <- nativeORT::ort_session(model_path,
                                  threads=0L,
                                  opt_level=99L)

times_cpu <- numeric(100)
for (i in 1:100){
  times_cpu[i] <- system.time(
    nativeORT::ort_infer_raw(session, input)
  )["elapsed"] * 1000
}

# CoreML
dir.create(path.expand("~/.nativeORT/cache"),
           recursive = TRUE, showWarnings = FALSE
           )
session <- nativeORT::ort_session(model_path,
                                  provider='coreml',
                                  cache_dir=path.expand("~/.nativeORT/cache"),
                                  threads=0L,
                                  opt_level=99L
           )

times_coreml <- numeric(100)
for (i in 1:100){
  times_coreml[i] <- system.time(
    nativeORT::ort_infer_raw(session, input)
  )["elapsed"] * 1000
}
```

```{r plot, fig.width=7, eval = FALSE}
results <- data.frame(
  run=rep(1:length(times_cpu), 2),
  provider=c(
    rep("CPU (nativeORT)", length(times_cpu)),
    rep("CoreML (nativeORT)", length(times_coreml))
  ),
  latency_ms=c(times_cpu, times_coreml)
)

ggplot(results, aes(x=run, y=latency_ms, color=provider)) +
  geom_line() + 
  geom_hline(yintercept=33.3, linetype="dashed", color="red") +
  annotate("text", x=85, y=40, label="29.97 fps threshold") +
  labs(
    title="Inference Latency Across Inference Engines",
    subtitle="YOLOv11n, 256x256 Images, Apple M1",
    x="Run",
    y="Latency (ms)"
  ) +
  theme_minimal()
```

## Results

Notably, nativeORT can run substantially below real-time requirements. Due to optimization in the C++ bindings, the CPU and CoreML latency are near parity; however, it is of note that the CoreML runs offer better stability as they sit on dedicated hardware, whereas the CPU is subject to slowdowns when other processes hit.

CoreML does require a warmup (as noticed in the spike) but after one or two inferences, it becomes real-time performant. At a median latency of 7-8 milliseconds on Apple M1 Silicon, there is still time to run post-processing and remain under target latency.