llmjson repairs malformed JSON strings, particularly
those generated by Large Language Models (LLMs). It uses Rust for fast,
reliable JSON repair based on a vendored and bug-fixed version of the llm_json crate.
return_objects = TRUE.required and use
.default for missing required fieldsYou can install the development version of llmjson from GitHub:
# install.packages("remotes")
remotes::install_github("DyfanJones/llmjson")Or r-universe:
install.packages('llmjson', repos = c('https://dyfanjones.r-universe.dev', 'https://cloud.r-project.org'))This package requires the Rust toolchain to be installed on your system. If you don’t have Rust installed:
library(llmjson)
# Repair JSON with trailing comma
repair_json_str('{"key": "value",}')
#> [1] "{\"key\":\"value\"}"
# Repair JSON with unquoted keys
repair_json_str('{key: "value"}')
#> [1] "{\"key\":\"value\"}"
# Repair incomplete JSON
repair_json_str('{"name": "John", "age": 30')
#> [1] "{\"name\":\"John\",\"age\":30}"
# Repair JSON with single quotes
repair_json_str("{'name': 'John'}")
#> [1] "{\"name\":\"John\"}"Instead of returning a JSON string, you can get R objects directly:
# Return as R list instead of JSON string
result <- repair_json_str('{"name": "Alice", "age": 30}', return_objects = TRUE)
result
#> $name
#> [1] "Alice"
#>
#> $age
#> [1] 30
# Works with all repair functions
result <- repair_json_file("data.json", return_objects = TRUE)JSON numbers that exceed R’s 32-bit integer range (beyond
-2,147,483,648 to 2,147,483,647) need special handling. The
int64 parameter controls how these large integers are
converted:
json_str <- '{"id": 9007199254740993}'
# Option 1: "double" (default) - Convert to R numeric (may lose precision)
result <- repair_json_str(json_str, return_objects = TRUE, int64 = "double")
result$id
#> [1] 9.007199e+15 # Lost precision: actual value is 9007199254740992
# Option 2: "string" - Preserve exact value as character
result <- repair_json_str(json_str, return_objects = TRUE, int64 = "string")
result$id
#> [1] "9007199254740993" # Exact value preserved
# Option 3: "bit64" - Use bit64 package for true 64-bit integers
# Requires: install.packages("bit64")
result <- repair_json_str(json_str, return_objects = TRUE, int64 = "bit64")
result$id
#> integer64
#> [1] 9007199254740993 # Exact value preserved with integer typeWhich option should I use?
"double" (default) if your integers fit safely in
double precision and you don’t need exact integer arithmetic"string" if you need to preserve exact values and
plan to pass them to other systems"bit64" if you need exact integer arithmetic on
large integers in RDefine schemas to validate JSON structure and ensure correct R types. The schema system is inspired by the structr package and provides an intuitive way to define expected JSON structures:
# Define a schema for a user object
schema <- json_object(
name = json_string(),
age = json_integer(),
email = json_string()
)
# Repair and validate with schema
result <- repair_json_str(
'{"name": "Alice", "age": "30", "email": "alice@example.com"}',
schema = schema,
return_objects = TRUE
)
# Note: age is coerced from string "30" to integer 30
str(result)
#> List of 3
#> $ name : chr "Alice"
#> $ age : int 30
#> $ email: chr "alice@example.com"Control how missing fields are handled with .required
and .default parameters:
Required fields (.required = TRUE): -
Missing fields are added with their .default value (or
their type’s default if no explicit default) - Always appear in the
output
Optional fields (.required = FALSE, the
default): - Missing fields are omitted entirely from the output - Only
appear if present in the input JSON
# Example 1: Required field with explicit default
schema <- json_object(
name = json_string(.required = TRUE),
age = json_integer(.default = 25L, .required = TRUE) # required, will use default if missing
)
result <- repair_json_str('{"name": "Alice"}', schema = schema, return_objects = TRUE)
result
#> $name
#> [1] "Alice"
#>
#> $age
#> [1] 25
# Example 2: Optional field (omitted when missing)
schema <- json_object(
name = json_string(.required = TRUE),
nickname = json_string(.required = FALSE) # optional, omitted if not in input
)
result <- repair_json_str('{"name": "Bob"}', schema = schema, return_objects = TRUE)
result
#> $name
#> [1] "Bob"
# Note: nickname is not present since it was optional and missing from input
# Example 3: Required field with type default
schema <- json_object(
name = json_string(.required = TRUE),
age = json_integer(.required = TRUE) # required, will use type default (0L) if missing
)
result <- repair_json_str('{"name": "Charlie"}', schema = schema, return_objects = TRUE)
result
#> $name
#> [1] "Charlie"
#>
#> $age
#> [1] 0Build complex schemas with nested objects and arrays:
# Schema with nested object and array
schema <- json_object(
name = json_string(),
address = json_object(
city = json_string(),
zip = json_integer()
),
scores = json_array(json_integer())
)
json_str <- '{
"name": "Alice",
"address": {"city": "NYC", "zip": "10001"},
"scores": [90, 85, 95]
}'
result <- repair_json_str(json_str, schema = schema, return_objects = TRUE)
str(result)
#> List of 3
#> $ name : chr "Alice"
#> $ address:List of 2
#> ..$ city: chr "NYC"
#> ..$ zip : int 10001
#> $ scores : int [1:3] 90 85 95For repeated use with the same schema, use json_schema()
to compile the schema once and reuse it many times.
# Define your schema
schema <- json_object(
name = json_string(),
age = json_integer(),
email = json_string()
)
# Build it once - this creates an optimized internal representation
built_schema <- json_schema(schema)
# Reuse many times - much faster!
for (json_str in json_strings) {
result <- repair_json_str(json_str, built_schema, return_objects = TRUE)
# Process result...
}Performance comparison (complex nested schema): -
Without json_schema(): ~266µs per call - With
json_schema(): ~51µs per call (5.2x
faster) - No schema: ~44µs per call
The performance benefit is especially significant for: - Complex nested schemas with multiple levels - Batch processing of many JSON strings - Performance-critical applications - Real-time data processing pipelines
# Read and repair JSON from a file
repair_json_file("malformed.json")
# With schema validation
schema <- json_object(
name = json_string(.required = TRUE),
age = json_integer(.default = 25L, .required = TRUE) # required field with default
)
result <- repair_json_file("data.json", schema = schema, return_objects = TRUE)# Repair JSON from raw byte vector
raw_data <- charToRaw('{"key": "value",}')
repair_json_raw(raw_data)
#> [1] "{\"key\":\"value\"}"
# With return_objects
result <- repair_json_raw(raw_data, return_objects = TRUE)Read and repair JSON from any R connection (files, URLs, pipes, compressed files, etc.):
# Read from a file connection
conn <- file("malformed.json", "r")
result <- repair_json_conn(conn)
close(conn)
# Read from a URL
conn <- url("https://api.example.com/data.json")
result <- repair_json_conn(conn, return_objects = TRUE)
close(conn)
# Read from a compressed file
conn <- gzfile("data.json.gz", "r")
result <- repair_json_conn(conn, return_objects = TRUE, int64 = "string")
close(conn)
# Use with() to ensure connection is closed automatically
result <- local({
conn <- file("malformed.json", "r")
on.exit(close(conn))
repair_json_conn(conn, return_objects = TRUE)
})Large Language Models often generate JSON that is almost correct but has minor syntax errors. This package helps you handle those cases gracefully:
# LLM might output JSON with trailing commas and unquoted keys
llm_output <- '{
users: [
{name: "Alice", age: 30,},
{name: "Bob", age: 25,},
],
}'
# Option 1: Repair and parse with your chosen JSON parser (e.g., jsonlite)
repaired <- repair_json_str(llm_output)
(parsed <- jsonlite::fromJSON(repaired))
#> $users
#> age name
#> 1 30 Alice
#> 2 25 Bob
# Option 2: Use schema with return_objects for type safety
schema <- json_object(
users = json_array(json_object(
name = json_string(),
age = json_integer()
))
)
result <- repair_json_str(llm_output, schema = schema, return_objects = TRUE)
str(result)
#> List of 1
#> $ users:List of 2
#> ..$ :List of 2
#> .. ..$ name: chr "Alice"
#> .. ..$ age : int 30
#> ..$ :List of 2
#> .. ..$ name: chr "Bob"
#> .. ..$ age : int 25All repair functions support the schema,
return_objects, ensure_ascii, and
int64 parameters:
repair_json_str(json_str, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")
- Repair a malformed JSON stringrepair_json_file(path, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")
- Read and repair JSON from a filerepair_json_raw(raw_bytes, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")
- Repair JSON from a raw byte vectorrepair_json_conn(conn, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")
- Read and repair JSON from an R connection (file, URL, pipe, etc.)Parameters: - schema - Optional schema
definition (R list from json_object(), etc.) or built
schema (from json_schema()) - return_objects -
If TRUE, returns R objects instead of JSON strings -
ensure_ascii - If TRUE (default), escape
non-ASCII characters in the output JSON - int64 - Policy
for handling 64-bit integers: "double" (default),
"string", or "bit64"
json_schema(schema) - Compile a schema
definition for efficient reuse (5x performance improvement)json_object(..., .required) - Define a
JSON object with named fieldsjson_integer(.default, .required) -
Integer field (default: 0L)json_number(.default, .required) -
Number/numeric field (default: 0.0)json_string(.default, .required) -
String field (default: ““)json_boolean(.default, .required) -
Boolean field (default: FALSE)json_enum(.values, .default, .required)
- Enum field with allowed values (default: first value)json_date(.default, .format, .required)
- Date field with format specificationjson_timestamp(.default, .format, .tz, .required)
- POSIXct datetime fieldjson_array(items, .required) - Array
with specified item typejson_any(.required) - Accept any JSON
typeWhile R has several JSON parsing packages like jsonlite,
they typically fail when encountering malformed JSON.
llmjson is specifically designed to handle the common
errors that LLMs make when generating JSON output, making it ideal
for:
This package includes a vendored and bug-fixed version of the llm_json Rust crate (v1.0.1) by Ribelo, which is itself a Rust port of the Python json_repair library by Stefano Baccianella (mangiucugna). Our vendored version includes critical bug fixes for array parsing not present in the upstream release.
The schema system was inspired by the structr package, which provides elegant patterns for defining and validating data structures in R.
Please note that the llmjson project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.