NAME

GeneTuner, a script for the manual curation of gene families.


SYNOPSIS

perl genetuner.pl [project_name] [gene_name]


DESCRIPTION

This program allows the user to define the exons of a set of genes in a set of genomic contigs. It is particularly suited for similarity-based predictions applied to gene families. Usually, GeneTuner is run downstream of BlastSniffer, which provides all of the input files this program needs. The purpose of both programs is to automatically perform several time-consuming steps in the manual annotation of genes by homology.

Usually, annotations start with a TBLASTN comparison between a known protein and a genomic sequence. The user must then decide which resulting hits correspond to bona fide exons of new putative genes. BlastSniffer handles this task. Then, every exon junction has to be manually assessed and corrected, and the new gene must be rebuilt by copying the specified subsequences from the template contig. GeneTuner lets the user choose exon junctions, and also add and remove exons, directly from the genomic sequence. To help this task, the template sequence is shown aligned to the contig.

Furthermore, the user does not need to keep track of the frame. The length of every exon is a multiple of three, so that the reading frame is always correct. At the end, the program decides automatically if each junction must be edited to improve the similarity to the template sequence and/or to get canonical exon/intron junctions.

The recommended procedure to use these programs consists of the following steps:

  1. Store all of the starting protein sequences in one folder (ppath) with a given extension (aa_ext). Optionally, store the corresponding nucleotide sequences with the same names in a different folder (npath) and/or with a different extension (nt_ext).
  2. Run TBLASTN to compare each of the starting protein sequences with the template genomic sequence (dbpath). Store the output files in a folder (tbnpath) with the extension .tbn.
  3. Run BlastSniffer on those files (see BlastSniffer documentation).
  4. Open GeneTuner and load the project file created by BlastSniffer. This can be done from the command line or by choosing Load a project at the welcome screen. Every time you load a project, a different tbn file is edited, unless gene_name is provided after project_name.
  5. Edit the predicted exons, save the result, and exit. Repeat steps 4 and 5 until you have processed every tbn file.

Welcome screen

Unless a valid project name is provided in the command line, the user will be taken to the welcome screen. This screen allows the user to create a new project (usually not necessary) or load an existing project. Once a project is created or loaded, GeneTuner processes the next tbn file and takes the user to the edition screen.

Edition screen

These are the elements of the edition screen:

  ____________________________________________________
 |                                                    |
 |c19h_usp1       length: 785                         |(1)
 |Chromosome: Chr8        strand: 1                   |(2)
 |                                                    |
 |                                                    |
 |                                                    |
 |                                                    |
 |                  |25424819                         |(3)
 |                           995            1009      |(4)
 |                           gttcctgcagcacag          |(4)
 |                           |||||||||||||||          |
 | TTTCTCTGTTTCCCAGTGACCAGGTGGTTCCTGCAGCACAGCCCCCCTC  |(5)
 |                                                    |
 | F  L  C  F  P  V  T  R  W  F  L  Q  H  S  P  P  P  |(6)
 |  F  S  V  S  Q  *  P  G  G  S  C  S  T  A  P  L    |(6)
 |   S  L  F  P  S  D  Q  V  V  P  A  A  Q  P  P  S   |(6)
 |                  |  |  |  |  |  |  |  |        |   |
 |                  D  Q  V  V  P  A  A  Q  S  -  S   |(7)
 |                  58                                |(7)
 |                                                    |
 |                                                    |
 |                                                    |
 |                                                    |
 |Lacking residues                                    |(8)
 |____________________________________________________|
  1. Name of the tbn file in process, and length of the protein sequence.
  2. Name and strand (+/-) of the chromosome (or contig) beeing displayed.
  3. Current cursor position in the contig.
  4. Matching sequence in the starting nucleotide sequence, found by BLASTN.
  5. Sequence of the contig.
  6. Translation of the contig in all three frames in the displayed strand.
  7. Matching sequence in the starting protein sequence, found by TBLASTN.
  8. Warnings related to the current exon. Messages and prompts for the user are also displayed here.

Controls

The editing screen is controlled with the keyboard. Controls are case-insensitive.

Navigation

Basic navigation through the contig is performed with the cursor keys. Left and right move the cursor one base at a time in normal mode and three bases at a time in exon extension mode. Up and down arrows place the cursor in the previous or next exon boundary. Since the alignment of the contig with the template sequence is very important, it can be browsed like the exons, with "K" and "L" for the BLASTN alignment and "," and "." for the TBLASTN alignment.

Edition of exons

When a putative gene is loaded, its genomic sequence is shown, with 5000 extra bases upstream and downstream. If more extra bases are needed, < adds 5000 extra bases upstream and > adds 5000 extra bases downstream.

The predicted exons are highlighted. To start a new exon, place the cursor at the beginning and press the space bar. Then, extend the exon up to the end and press the space bar again. While extending the exon, the minimal step is changed to three bases instead of one, so that the translation frame is always conserved.

If the space bar is pressed while the cursor is inside an exon, the user will enter extension mode for that exon. To directly change the start or end of an exon, place the cursor on the new boundary and press "Z" to change the start of the next exon or "X" to change the end of the previous exon. If necessary, the boundary will be automatically shifted to preserve the translation frame.

To remove an exon, simply place the cursor inside that exon and press "A". The last exon edition can be undone by pressing Ctrl+Z.

Advanced search

When looking for dissimilar and/or short exons, it may be necessary to use some advanced search capabilities. First, the limit expect value of the BLASTN and TBLASTN comparisons may be lowered to improve their sensitivity. However, this will also increase the noise in the BLAST comparison. To avoid this noise increase, local TBLASTN comparisons can be performed with selected chunks of query protein and genomic sequence. Finally, users can perform nucleotide and peptide pattern searches.

BLAST sensitivity

By default, GeneTuner performs gapped, unfiltered BLASTN and TBLASTn comparisons with a limit expect value of 10. This value can be raised or lowered with the keys "E" and "D". Raising the limit expect value is useful when long stretches of the starting and target sequences are dissimilar. On the other hand, this procedure is likely to produce multiple non-significant hits, which may also slow down execution of the program.

Local TBLASTN

TBLASTN searches can be confined to specific regions of the query protein and the target contig. This prodecure allows the user to increase the sensitivity of the search while avoiding hits outside the region of interest. Also, this mode is optimized for short query sequences. The boundaries for this search can be defined in two complementary ways:

The limit expect value for the current local TBLASTN comparison can be raised or lowered by pressing "Y" and "H". Local hits are discarded when another local or general TBLASTN is performed.

Search patterns

The genomic contig can be probed with nucleotide and peptidic patterns. This allows the user to find very short exons and conserved motifs which may not be amenable to BLAST searches.

To enter a nucleotide pattern, press Ctrl-F. To enter a peptide pattern, press Ctrl-B. Then, press F2 to find the previous occurrence of the pattern upstream from the cursor or F3 to find the next occurrence of the pattern downstream from the cursor. Patterns are based on regular expressions from Perl. The following elements are supported:

Project files

Project files are saved into the projects folder with .gt extension. They contain the following information:

 tbnpath  => path to the folder where the starting ".tbn"
             files are stored
 basepath => path to the folder where the result folders
             will be stored
 dbpath   => path to the genomic (template) sequence
 ppath    => path to the starting protein sequences folder
 aa_ext   => extension of the starting protein sequences
 npath    => path to the starting nucleotide sequences
             folder (optional)
 nt_ext   => extension of the starting nucleotide sequences
             (optional)

Both GeneTuner and BlastSniffer can create compatible project files from user input.


ARGUMENTS

 --help        print this help
 =head1 OPTIONS

none implemented


FILES

 genetuner.pl       script file


DEPENDENCIES

GeneTuner requires BioPerl, namely 'Bio::DB::Fasta' and 'Bio::SeqIO' modules. Win32 version also requires Win32::Console, while *NIX version requires Term::Screen.


INPUT

BlastSniffer creates input files GeneTuner needs. For a gene called example.aa, GeneTuner needs one file called example.xml at the basepath/example folder. This file must contain an extensible markup format (XML) with the description of the predicted gene. Here is a simple example:

 <gene>                                            
   <exon>                                           
     <chromosome>chr8</chromosome>                  
     <template>                                     
       <from>28268822</from>                        
       <to>28268947</to>                            
     </template>                                    
     <warnings>Query does not start at 1</warnings> 
   </exon>                                          
   <exon>                                           
     <chromosome>chr8</chromosome>                  
     <template>                                     
       <from>28269898</from>                        
       <to>28270059</to>                            
     </template>                                            
   </exon>                                          
 </gene>


OUTPUT


LICENSE

This program is free software and can be redistributed under the same terms as Perl. See http://www.perl.com/pub/a/language/misc/Artistic.html


AUTHOR

Copyright (C) 2008, Victor Quesada

e-mail: quesadavictor@uniovi.es