The purpose of this function is to classify gene pairs by age when there are 2+ Ks peaks. This way, newer gene pairs are found within a certain number of standard deviations from the highest peak, and older genes are found close within smaller peaks.
Arguments
- ks_df
A 3-column data frame with gene pairs in columns 1 and 2, and Ks values for the gene pair in column 3.
- peaks
A list with mean, standard deviation, and amplitude of Ks peaks as generated by
find_ks_peaks
.- nsd
Numeric with the number of standard deviations to consider for each peak.
- binwidth
Numeric scalar with binwidth for the histogram. Default: 0.05.
Value
A list with the following elements:
- pairs
A 4-column data frame with the variables dup1 (character), dup2 (character), ks (numeric), and peak (numeric), representing duplicate gene pair, Ks values, and peak ID, respectively.
- plot
A ggplot object with Ks peaks as returned by
plot_ks_peaks
, but with dashed red lines indicating boundaries for each peak.
Examples
data(fungi_kaks)
scerevisiae_kaks <- fungi_kaks$saccharomyces_cerevisiae
# Create a data frame of duplicate pairs and Ks values
ks_df <- scerevisiae_kaks[, c("dup1", "dup2", "Ks")]
# Create list of peaks
peaks <- find_ks_peaks(ks_df$Ks, npeaks = 2)
# Split pairs
spairs <- split_pairs_by_peak(ks_df, peaks)