Package {ewens}


Title: Ewens Distribution
Version: 0.1.0
Description: Implements the probability mass function of, and random draws from, the Ewens distribution, a probability distribution over partitions of integer, as described in Ewens (1972) <doi:10.1016/0040-5809(72)90035-4>.
License: MIT + file LICENSE
Encoding: UTF-8
URL: https://github.com/chrishanretty/ewens
BugReports: https://github.com/chrishanretty/ewens/issues
RoxygenNote: 7.3.2
Imports: copula (≥ 1.0)
Suggests: knitr, quarto
VignetteBuilder: quarto
NeedsCompilation: yes
Packaged: 2026-05-14 11:01:47 UTC; chanret
Author: Chris Hanretty ORCID iD [aut, cre]
Maintainer: Chris Hanretty <chris.hanretty@gmail.com>
Repository: CRAN
Date/Publication: 2026-05-19 09:30:14 UTC

Probability mass function for the Ewens distribution

Description

Gives the probability mass function for the Ewens distribution, as described in Ewens, Warren (1972). "The sampling theory of selectively neutral alleles". Theoretical Population Biology. 3: 87–112. doi:10.1016/0040-5809(72)90035-4.

Usage

dewens(x, theta = 1, log = FALSE)

Arguments

x

A vector giving class memberships of each observation in the sample

theta

A non-negative parameter governing the expected sample diversity.

log

if TRUE, probabilities are given as log(p). Default is FALSE.

Details

The probability of a vector of counts m_1, ..., m_n is given by the expression

\frac{n!}{\theta (\theta + 1) ... (\theta + n - 1)}\prod_{j=1}^n \frac{\theta^{m_j}}{j^{m_j} m_j!}

Value

A numeric vector giving a probability (or if log = TRUE, a log probability)

Examples

x <- sample(LETTERS, 120, replace = TRUE)
dewens(x, theta = 1)
dewens(x, theta = 0) ## returns NaN since vector incompatible with zero diversity


Probability mass function for the number of classes from a Ewens distribution

Description

Probability mass function for the number of classes from a Ewens distribution

Usage

dewens_k(k, n, theta)

Arguments

k

An integer number of classes at which to evaluate the PMF

n

A sample size not less than k

theta

A non-negative parameter governing the expected sample diversity.

Details

The number of classes from a Ewens distribution with parameter \theta is given by the expression

Pr(K = k) = \lvert{} S^k_n \rvert{} \frac{\theta^k}{\theta (\theta + 1) ... (\theta + n - 1)}

, where \lvert{}S^k_n \rvert{} is the absolute value of a Stirling number of the first kind.

Value

The probability of observing k classes

Examples

x <- sample(LETTERS, 120, replace = TRUE)
dewens_k(1, 20, theta = 1) ## Pretty unlikely we just see one class


Calculate expected number of classes in a sample of size n given theta

Description

The expected number of classes from the Ewens distribution is given by \theta \sum_{j=1}^{n} \frac{1}{\theta + j - 1}. This is often more convenient than integrating across the PMF given by dewens_k

Usage

ewens_k_exact(n, theta)

Arguments

n

The sample size

theta

The non-negative parameter governing expected sample diversity


Maximum likelihood estimate of theta given sample vector with class memberships

Description

Maximum likelihood estimate of theta given sample vector with class memberships

Usage

ewens_mle(x)

Arguments

x

A vector containing class memberships; sample size n and number of classes k are calculated from this

Value

A scalar giving the estimate of theta


Draw from a generalized Chinese Restaurant Process

Description

Draw from a generalized Chinese Restaurant Process

Usage

gcrp(n, alpha = 0, theta = 1)

Arguments

n

The sample size.

alpha

A parameter between zero and one inclusive governing the expected sample diversity

theta

A non-negative parameter governing the expected sample diversity.

Value

A vector of length n consisting of numeric class labels.

Examples

rewens(100, 1)
rewens(120, 0.5)
rewens(10, 0)

Draw from the Ewens distribution

Description

Returns a vector with class membership

Usage

rewens(n, theta = 1)

Arguments

n

The sample size.

theta

A non-negative parameter governing the expected sample diversity.

Details

Although this command is described as sampling from the Ewens distribution, it is easier to think of it as a particular instantiation of the Chinese Restaurant Process, run for n "customers". The $j$th customer

Value

A vector of length n consisting of numeric class labels.

Examples

rewens(100, 1)
rewens(120, 0.5)
rewens(10, 0) ## equal to rep(1, 10)

Draw from the Griffiths-Engen-McCloskey distribution

Description

Draw from the Griffiths-Engen-McCloskey distribution

Usage

rgem(alpha = 0, theta = 1, trunc_at = 500)

Arguments

alpha

A parameter between zero and one

theta

A parameter which must be greater than -alpha

trunc_at

An integer which specifies the maximum number of components to return

Details

The Griffiths-Engen-McCloskey distribution is the infinite dimensional counterpart to the Ewens sampling distribution. This function does not return an infinite dimensional vector(!), but returns a vector of shares creating by a "stick-breaking" construction. The vector of shares is returned after trunc_at sticks are broken; this can mean that there is still a non-negligible residual amount.

Value

A vector of shares of length trunc_at which may sum to less than one