Skip to contents

MCSCanX provides a clustering module for viewing the relationship of collinear segments in multiple genomes (or heavily redundant genomes). It takes the predicted pairwise segments from dynamic programming (DAGchainer in particular) and then tries to build consensus segments from a set of related, overlapping segments.

Usage

rcpp_mcscanx_file(
  blast_file,
  gff_file,
  prefix = "out",
  outdir = "",
  match_score = 50L,
  gap_penalty = -1L,
  match_size = 5L,
  e_value = 1e-05,
  max_gaps = 25L,
  overlap_window = 5L,
  is_pairwise = FALSE,
  in_synteny = 0L,
  species_id_length = 3L,
  verbose = FALSE
)

Arguments

blast_file

Character indicating the path to the BLAST/DIAMOND output file.

gff_file

Character indicating the path to the "gff" file, which is a tab-delimited file with 4 columns indicating the chromosome name, gene id, gene start position, and gene end position, respectively.

prefix

Character indicating the prefix to output files. Default: "out".

outdir

Character indicating the path to the output directory. Default: "".

match_score

Numeric indicating the match score. Default: 50.

gap_penalty

Numeric indicating the gap penalty. Default: -1.

match_size

Numeric indicating the minimum number of genes required to call synteny. Default: 5.

e_value

Numeric indicating the minimum e-value allowed. Default: 1e-5.

max_gaps

Numeric indicating the maximum number of gaps between genes allowed. The unit measure of gaps is number of genes, so max_gaps = 20 indicates that a maximum of 20 genes can exist between two homologous genes for synteny to be called. Default: 25.

overlap_window

Numeric indicating the overlap window. Default: 5.

is_pairwise

Logical indicating whether only pairwise blocks should be reported. Default: FALSE.

in_synteny

Numeric indicating the patterns of collinear blocks, where 0 indicates intra and interspecies comparisons, 1 indicates intraspecies comparisons, and 2 indicates interspecies comparisons. Default: 0.

species_id_length

Integer indicating the length of the species IDs. Default: 3. 0: intra- and inter-species (default); 1: intra-species; 2: inter-species

verbose

Logical indicating whether to print progress messages to the screen. Default: FALSE.

Value

NULL, and a .collinearity file is created in the directory specified in outdir.

References

Wang et al. (2012) MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic acids research. 40.7, e49-e49.

Haas et al. (2004) DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics. 20.18 3643-3646.

Author

Kristian K Ullrich and Fabricio Almeida-Silva