SiteStripper

SiteStripper is a Perl program that removes fast-evolving sites from alignments. In doing this, it takes data partitioning and secondary structure into account so that you can still run phylogenetic analyses with partitioned models or a doublet model on the output alignments.

SiteStripper does not calculate site rates. You need to use other software such as HyPhy or TreeFinder to do this.

Downloads

sitestripper.pl v.1.02 download (zip file)
example files download (zip file)

User guide

SiteStripper is a command-line program. Just run the program and it will show you a list of command-line flags you can use to control the program (see also below). The alignment file has to be in fasta format. The rates file has to be in the HyPhy format. See the example files for more details.

SiteStripper is designed to keep track of which partition (character set) any give site is in. After removing the fast-evolving sites, it will return the character set definitions that allow you to run phylogenetic analyses with the same partitioning strategy as before but with a subset of the characters. If you want to use this option, you need to make a file in which the partitions are defined. See the parts.txt file of the example for formatting. To run the example with partitions, type perl sitestripper.pl -a alignment.fas -r siterates.txt -f .8 -o out.nex -pt parts.txt

If you have paired bases (e.g. stem regions in rRNA genes) you can specify this and the paired bases will either be retained together or discarded together. To determine whether a pair is discarded or retained, the mean of its base rates is calculated and compared to the rates of other bases (and base pairs). If you want to use this option, you need a file in which you specify base pairings as follows:

1:12
2:11
3:10
4:9
18:28
19:27
20:26
			

If you want to see whether removing fast sites gives you better results than removing random sites, you can have the program generate alignments from which randomly picked sites were removed. See Cocquyt et al. 2010 for an example.

Here is a complete list of the command-line flags you can use.

mandatory parameters
   -a    alignment_file  (must be in fasta format)
   -r    rates_file  (must be in the HyPhy output format)
		   if you wish to remove random sites (-ra option), this is not mandatory
   -f    fraction of characters to keep
   -o    output alignment  (in nexus format)

optional parameters
   -pa   file with list of paired bases
   -pt   file with definitions of data partitions  (one per line)
   -of   output format: fasta|nexus  (default: nexus)
   -ra   number of alignments with random sites stripped to be generated
			

Citation

If you find these tools useful, please cite them in your work. I recommend citing them as follows:
Verbruggen H. (2012) SiteStripper version 1.01. http://www.phycoweb.net/software

Notes and disclaimer

SiteStripper is in development and has not been tested extensively. It is quite plausible that incorrectly formatted input could lead to nonsensical output. In such cases, you should double-check your input, compare it to the example files and try again. If this still doesn't work, please feel free to write me an email (heroen.verbruggen@gmail.com).

SiteStripper is open-source and distributed according to the GNU General Public License. The program is provided "as is" without warranty of any kind, either expressed or implied. The entire risk as to the quality and performance of the program is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair or correction. The author will not be liable to you for damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the program (including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the program to operate with any other programs).