Split gene pairs based on their Ks peaks — split_pairs_by

The purpose of this function is to classify gene pairs by age when there are 2+ Ks peaks. This way, newer gene pairs are found within a certain number of standard deviations from the highest peak, and older genes are found close within smaller peaks.

Usage

split_pairs_by_peak(ks_df, peaks, nsd = 2, binwidth = 0.05)

Arguments

ks_df: A 3-column data frame with gene pairs in columns 1 and 2, and Ks values for the gene pair in column 3.
peaks: A list with mean, standard deviation, and amplitude of Ks peaks as generated by find_ks_peaks.
nsd: Numeric with the number of standard deviations to consider for each peak.
binwidth: Numeric scalar with binwidth for the histogram. Default: 0.05.

Value

A list with the following elements:

pairs: A 4-column data frame with the variables dup1 (character), dup2 (character), ks (numeric), and peak (numeric), representing duplicate gene pair, Ks values, and peak ID, respectively.
plot: A ggplot object with Ks peaks as returned by plot_ks_peaks, but with dashed red lines indicating boundaries for each peak.

Examples

data(fungi_kaks)
scerevisiae_kaks <- fungi_kaks$saccharomyces_cerevisiae

# Create a data frame of duplicate pairs and Ks values
ks_df <- scerevisiae_kaks[, c("dup1", "dup2", "Ks")]

# Create list of peaks
peaks <- find_ks_peaks(ks_df$Ks, npeaks = 2)

# Split pairs
spairs <- split_pairs_by_peak(ks_df, peaks)