Skip to contents

Find peaks in a Ks distribution with Gaussian Mixture Models

Usage

find_ks_peaks(ks, npeaks = 2, min_ks = 0.01, max_ks = 4, verbose = FALSE)

Arguments

ks

A numeric vector of Ks values.

npeaks

Numeric scalar indicating the number of peaks in the Ks distribution. If you don't know how many peaks there are, you can include a range of values, and the number of peaks that produces the lowest BIC (Bayesian Information Criterion) will be selected as the optimal. Default: 2.

min_ks

Numeric scalar with the minimum Ks value. Removing very small Ks values is generally used to avoid the incorporation of allelic and/or splice variants and to prevent the fitting of a component to infinity. Default: 0.01.

max_ks

Numeric scalar indicating the maximum Ks value. Removing very large Ks values is usually performed to account for Ks saturation. Default: 4.

verbose

Logical indicating if messages should be printed on screen. Default: FALSE.

Value

A list with the following elements:

mean

Numeric with the estimated means.

sd

Numeric with the estimated standard deviations.

lambda

Numeric with the estimated mixture weights.

ks

Numeric vector of filtered Ks distribution based on arguments passed to min_ks and max_ks.

Examples

data(fungi_kaks)
scerevisiae_kaks <- fungi_kaks$saccharomyces_cerevisiae
ks <- scerevisiae_kaks$Ks

# Find 2 peaks in Ks distribution
peaks <- find_ks_peaks(ks, npeaks = 2)

# From 2 to 4 peaks, verbose = TRUE to show BIC values
peaks <- find_ks_peaks(ks, npeaks = c(2, 3, 4), verbose = TRUE)
#> Optimal number of peaks: 4
#> Bayesian Information Criterion (BIC): 
#>           E         V
#> 2 -4833.369 -4312.906
#> 3 -4109.263 -3625.427
#> 4 -4124.051 -3397.111
#> 
#> Top 3 models based on the BIC criterion: 
#>       V,4       V,3       E,3 
#> -3397.111 -3625.427 -4109.263