University of Minnesota

Consensus Finder

Consensus protein sequences are useful for numerous applications. Often, mutating a protein to be more like the consensus of homologs will increase the stability of a protein, allowing it to function at higher temperatures, and have better soluble expression when expressed recombinatly. Consensus Finder identifies a consensus sequence and predicts potentially stabilizing mutations.

Consensus Finder starts from your protein sequence, finds similar sequences from the NCBI database, aligns them, removes redundant/highly similar sequences, trims alignments to the size of the original query, and analyzes consensus. Output is a trimmed alignment, consensus sequence, frequency and count tables for amino acids at each position, as well as a list of suggested mutations to consensus that may be stabilizing.

For help and further instruction, please visit the help guide

Input PDB code:
Note: The letters in pdb codes should be upper case; for example, enter 1EY0, not 1ey0.
--OR--
Input FASTA file:
Note: Upload your protein (not DNA sequence) as a FASTA formatted text file with no spaces in the file name.

Email Address: (optional)

Show/Hide - Optional Operations

Set maximum sequences for BLAST search (Range: 10 - 10000; Default: 2000)
Set maximum e value for BLAST search (Range: 1e-30 - 1e-1; Default: 1e-3)
Conservation threshold for suggesting mutations (Range: .05 - .99 or blank to only use ratio; Default: blank)
Minimum ratio for determining consensus (Range: 1-100; Default: 7)
Use only matched portions, not complete sequences
Iterations of ClustalW alignments (Range: 1 - 5; Default: 1)
CD-Hit redundancy (Range: .5 - 1.0; Default 0.9)
Use options below to avoid mutations in or near the active site
PDB chain to use to define the active site (Range: A, B, etc...; Default A)
Amino acid number used to identify the center of the active site, e.g. the primary catalytic residue (Range: 1 - [number of residues in the protein])
Size of active site defined by distance in Ångströms to the active-site amino acid (Range: 2 - 20; Default 5)

Download Consensus Finder for local use

To cite Consensus Finder:
B. J. Jones, C. N. E. Kan, C. Luo, R. J. Kazlauskas (2020) Consensus Finder web tool to predict stabilizing substitutions in proteins. Meth. Enzymol. (Enzyme Engineering and evolution, including directed evolution, D. Tawfik, Ed.), 643, 129-48; https://doi.org/10.1016/bs.mie.2020.07.010; preprint on BioRxiv

B. J. Jones, H. Y. Lim, J. Huang, R. J. Kazlauskas (2017) Comparison of five protein engineering strategies to stabilize an α/β-hydrolase. Biochemistry 56, 6521–32; doi:10.1021/acs.biochem.7b00571

Consensus Finder uses the following tools:
blastp (2.2.31+): C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T. L. Madden (2008) BLAST+: architecture and applications. BMC Bioinformatics 10, 421;doi:/10.1186/1471-2105-10-421

CD-HIT (4.6.4): W. Li, L. Jaroszewski, A. Godzik (2001) Clustering of highly homologous sequences to reduce the size of large protein database. Bioinformatics, 17, 282-3; doi:10.1093/bioinformatics/17.3.282 Id. (2002) Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18, 77-82; doi:10.1093/bioinformatics/18.1.77

Clustal Omega (1.2.0): F. Sievers et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539; doi:10.1038/msb.2011.75