| Type: | Package |
| Title: | Tidy Access to Women's Tennis Association (WTA) Data |
| Version: | 0.1.0 |
| Description: | Scrapes and tidies publicly available data from the Women's Tennis Association website (https://www.wtatennis.com). Provides helpers to retrieve player biographies, singles and doubles career overviews, match histories, live rankings and aggregate statistics. Dynamic pages are rendered through a headless 'Chrome' session so 'JavaScript'-generated content is fully captured, and all outputs are returned as tidy data frames suitable for downstream analysis or visualisation. |
| License: | Apache License (≥ 2) |
| URL: | https://github.com/Angnar-97/matchpointR |
| BugReports: | https://github.com/Angnar-97/matchpointR/issues |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | chromote, cli, jsonlite, magick, purrr, rvest, stringr, tibble, xml2 |
| Suggests: | httr2, knitr, rmarkdown, rsvg, testthat (≥ 3.0.0), withr |
| Config/testthat/edition: | 3 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-04-26 14:34:10 UTC; User |
| Author: | Alejandro Navas González [aut, cre] (alias: Angnar) |
| Maintainer: | Alejandro Navas González <angnar@telaris.es> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-28 19:40:12 UTC |
matchpointR: Tidy Access to Women's Tennis Association (WTA) Data
Description
matchpointR is a small scraper toolkit that turns the public pages of
https://www.wtatennis.com into tidy data frames. It ships helpers for
player biographies, career highlights, full match histories and live
rankings.
Details
Dynamic content is rendered through a headless Chrome session using the chromote package, so JavaScript-generated sections (matches, rankings) are fully captured before parsing. Where possible the package reads structured JSON-LD (schema.org) data instead of scraping CSS classes, for resilience against site redesigns.
Main functions
Author
Alejandro Navas González (Angnar).
Author(s)
Maintainer: Alejandro Navas González angnar@telaris.es (Angnar)
See Also
Useful links:
Report bugs at https://github.com/Angnar-97/matchpointR/issues
Fetch fully-rendered HTML with chromote
Description
Opens a headless Chrome session via chromote, waits for the page to settle, optionally clicks a "load more" button and/or scrolls, and returns the complete page source.
Usage
.chromote_get_html(
url,
wait = 8,
click_more_selector = NULL,
scroll = TRUE,
max_clicks = 50L,
session = NULL
)
Arguments
url |
Character. Destination URL. |
wait |
Numeric. Seconds to wait after initial navigation. Default 8. |
click_more_selector |
Optional CSS selector for a "load more" button that should be clicked repeatedly until it disappears. |
scroll |
Logical. Scroll to the bottom after each click? Default TRUE. |
max_clicks |
Integer. Safety cap for the click loop. Default 50. |
session |
Optional pre-existing |
Value
A character string containing the full page source.
Read dynamic HTML into an xml2 document
Description
Thin wrapper around .chromote_get_html() that parses the rendered HTML
with xml2::read_html().
Usage
.read_html_dynamic(
url,
wait = 8,
click_more_selector = NULL,
scroll = TRUE,
max_clicks = 50L,
session = NULL
)
Arguments
url |
Character. Destination URL. |
wait |
Numeric. Seconds to wait after initial navigation. Default 8. |
click_more_selector |
Optional CSS selector for a "load more" button that should be clicked repeatedly until it disappears. |
scroll |
Logical. Scroll to the bottom after each click? Default TRUE. |
max_clicks |
Integer. Safety cap for the click loop. Default 50. |
session |
Optional pre-existing |
Value
An xml2::xml_document.
Get basic bio for a WTA player
Description
Parses the profile header of a WTA player page and returns a one-row tibble with name, nationality, birth date, birth place, height and handedness. The bulk of the data is read from the page's JSON-LD (schema.org Person) block, which is more stable than the visual markup; height is read from the profile bio block as a fallback.
Usage
wta_get_player_basics(player_url, download_images = TRUE)
Arguments
player_url |
Character. Full URL to a player page. Build it with
|
download_images |
Logical. When |
Value
A one-row tibble::tibble() with columns:
player_idNumeric WTA id parsed from
@id.name,given_name,family_nameName fields.
birth_dateDate of birth (ISO 8601 character).
nationality,birth_place,birth_countryGeography fields.
heightHeight string as shown on the bio (e.g.
5' 9" (1.74m)).handednessDominant hand (
"Right-Handed"/"Left-Handed").nationality_code3-letter IOC/ISO code extracted from the flag image (e.g.
"CZE","USA").player_image_url,nationality_flag_urlHeadshot and flag URLs.
player_imagemagick-imageof the headshot, whendownload_images = TRUE.nationality_flagmagick-imageof the flag SVG, whendownload_images = TRUEand the suggested package rsvg is installed (otherwiseNA).
Examples
wta_get_player_basics(wta_player_url(320301, "katerina-siniakova"))
Get the match history for a WTA player
Description
Walks the dynamic "Matches" page of a player profile, clicking the "Show more" button until the full history is loaded, and returns one row per match with tournament, round, opponent, score and result.
Usage
wta_get_player_matches(player_url, max_clicks = 50L)
Arguments
player_url |
Character. URL to the player page; the function
normalises to the |
max_clicks |
Integer. Safety cap for the "Show more" click loop. Defaults to 50. |
Value
A tibble::tibble() with one row per match and columns:
tournament, tournament_date, round, opponent, opponent_seed,
opponent_country, opponent_rank, score, result.
Examples
url <- wta_player_url(320301, "katerina-siniakova", "matches")
wta_get_player_matches(url)
Get a WTA player's career highlights
Description
Returns the structured "additional properties" block from the page's JSON-LD: current singles and doubles rank, career titles, career prize money. Supplements with the career-high singles rank read from the bio side panel.
Usage
wta_get_player_overview(player_url)
Arguments
player_url |
Character. URL to the player overview page. |
Value
A long-format tibble::tibble() with columns metric and
value. Rows include singles_rank, doubles_rank,
singles_career_titles, doubles_career_titles,
career_prize_money, career_high.
Examples
wta_get_player_overview(wta_player_url(320301, "katerina-siniakova"))
Get the current WTA rankings
Description
Scrapes the rankings table at
https://www.wtatennis.com/rankings/singles (or /doubles) and returns
a tidy tibble. The initial page renders the first 50 rows; increase the
browser dwell time with wait if the widget hasn't hydrated yet.
Usage
wta_get_rankings(type = c("singles", "doubles"), top = NULL, wait = 12)
Arguments
type |
Character. One of |
top |
Integer. Limit the output to the top |
wait |
Numeric. Seconds to wait for the rankings widget to hydrate after navigation. Defaults to 12. |
Value
A tibble::tibble() with one row per player and columns:
rank, player_id, player, country, age,
tournaments_played, points.
Examples
wta_get_rankings("singles", top = 50)
Build a WTA player URL
Description
Convenience wrapper to assemble a canonical player URL from a numeric id and an optional slug.
Usage
wta_player_url(id, slug = NULL, section = c("overview", "matches"))
Arguments
id |
Character or integer. The WTA numeric player id (e.g. |
slug |
Optional character. Player slug (e.g. |
section |
Optional character. Page section to append as a path
segment, one of |
Value
A single character string with the full URL.
Examples
wta_player_url(320301, "katerina-siniakova")
wta_player_url(320301, "katerina-siniakova", "matches")