GENE TEAMS and DOMAIN TEAMS
The algorithmic part:
Bergeron, A., Corteel, S.,
Raffinot, M., 2002. The algorithmic of gene teams .
In: Workshop on Algorithms in Bioinformatics (WABI).
No. 2452 in Lecture
Notes in Computer Science. Springer-Verlag, Berlin,
pp. 464--476.
Béal, M-P.,
Bergeron, A., Corteel, S., Raffinot, M., 2003. An Algorithmic View of Gene Teams
In Theoretical Computer Science , 320(2-3),
pages 955-418, 2004.
The bioinformatic part:
Gene Teams
Luc, N, Risler, J.-L., Bergeron, A., and Raffinot, M.,
2002.
Gene Teams:
A New Formalization of Gene Clusters For Comparative
Genomics , In
Comput Biol Chem. 2003 Feb;27(1):59-67.
Domain Teams (New !)
S. Pasek, A. Bergeron, J-L. Risler, A. Louis, E. Ollivier
and M. Raffinot, 2004
Identification of Genomic Features using Domain Teams
Technical report 01-07-2004-LGI of Laboratoire Génome
et Informatique, Evry, France.
Paper
(pdf + gzip)
Source code and supplementary
material.
Old Team program (but
still of use !)
The TEAM (version 3) program is here (tar
+gzip): TEAM-3.0.tar.gz.
It has been written by Nicolas Luc and Mathieu Raffinot.
This archive contains 9 files:
TEAM-3.0/Makefile
TEAM-3.0/LICENCE
TEAM-3.0/team_option.c
TEAM-3.0/team_option.h
TEAM-3.0/parse_annot_file.c
TEAM-3.0/parse_annot_file.h
TEAM-3.0/teams-10-Oct-2002.c
TEAM-3.0/Example.annot
TEAM-3.0/Example.data
To compile it, simply
tape make in the subdirectory TEAM-3.0. This sould build the
software team. Note that it uses getopt.h that must be available on your system.
Here is the README.
See the LICENCE file before any utilisation.
I. USAGE (version 3 )
team -f <data file> -d <delta> [-s] [[-v] [-n]
[-p]] [-c] [-g <name of designated gene>]
[-a <file of annotations>]
***
Required parameters
-f <data file> marks the name of the main file input,
in the entry
format that is described in section II.
-d <integer> is the maximal distance allowed in
the gene team between two
consecutives genes.
***
Optional parameters
[-s]: Silence
option, do not ouput anything (for speed tests).
[-v]: verbose
option.
Default: prints the indices of the element
of the gene teams on each strand,
separted by as many '*' than their distance on the strand.
Together with:
[-n]: the names of the genes are printed
on each strand instead of their indices
[-p]: the distance is indicated with '<-
dist ->' instead of '*'
[-c]: takes in account the circularity of
certain genomes if they are marked as circular in
the input file (DIR-LENGTH C-[size],
see next session)
[-g <name of designated gene>]: outputs only the gene teams containing
the designated gene.
[-a <file of annotations>]: annotations of the groups of genes
of the
input file, in the format specified in the next section.
II. Input Format
1. Input format
of the main data file
A typical example:
file "Example.data":
DIR-LENGTH L C-12
C-9 C-7
IdA
2 4
4 1
IdB
1
7 9 4
TestC
4 8
5 6
ElE
3 6
3 2
IdFinal
6 11
6 3
Line
DIR-LENGTH is required. It specifies the Direction (Linear: L,
Circular: C) of the chromosomes. If the chromosome is circular (C), it
absolutely must be followed ("-") with its length. "C-12" means
Circular with length 12.
Then, each gene identifier IdA, IdB, ... , is followed (tabulations or
spaces) with its positions on each chromosome.
2. Input format of the optional annotation
file
A typical example corresponding to Example.data is " Example.annot":
A> IdA
Annotation part 1
Annotation part 2 of IdA
A> IdB
Annotation of IdB
A> TestC
Annotation of TestC
A> IdFinal
Annotation of IdFinal
A> ElE
Annotation ELE1
Annotation ELE2
Annotation ELE3
Each
gene identifier (separated with A>) is followed by its annotation
on all genomes. The gene identifiers are not required to appear in the
same ordre than in the .data file.
IV Example of Output and explanations
The simplest case, it just returns the
differents gene teams with the name of the identifiers.
[root@localhost PROG]# ./team -f Ex.data -d 2
-> Lonely gene: [IdFinal]
-> Lonely gene: [IdB]
-> Lonely gene: [TestC]
-> Gene team: [IdA,ElE]
Again:
[root@localhost PROG]# ./team -f Ex.data -d 3
-> Gene team: [IdB,IdA,ElE,TestC,IdFinal]
Verbose
option,
It returns the gene teams and a schematic representation of their positions
on the genome.
All chromose are here considered as linear (no -c option).
[root@localhost PROG]# ./team -f Ex.data -d 2 -v
Gene: 0, Name IdA
Gene: 1, Name IdB
Gene: 2, Name TestC
Gene: 3, Name ElE
Gene: 4, Name IdFinal
All genes will be considered
Number of input genes: 5
Number of input chromosomes: 4
-> Lonely gene: [IdFinal]
-> Lonely gene: [IdB]
-> Lonely gene: [TestC]
-> Gene team: [IdA:0,ElE:3]
strand[1](line) :
0 3
strand[2](line) :
0 * 3
strand[3](line) :
3 0
strand[4](line) :
0 3
Verbose
option with -n:
[root@localhost PROG]# ./team -f Ex.data -d 2 -v -n
Gene: 0, Name IdA
Gene: 1, Name IdB
Gene: 2, Name TestC
Gene: 3, Name ElE
Gene: 4, Name IdFinal
All genes will be considered
Number of input genes: 5
Number of input chromosomes: 4
-> Lonely gene: [IdFinal]
-> Lonely gene: [IdB]
-> Lonely gene: [TestC]
-> Gene team: [IdA,ElE]
strand[1](line) :
IdA ElE
strand[2](line) :
IdA * ElE
strand[3](line) :
ElE IdA
strand[4](line) :
IdA ElE
Verbose
option with -p:
[root@localhost PROG]# ./team -f Ex.data -d 2 -v -p
Gene: 0, Name IdA
Gene: 1, Name IdB
Gene: 2, Name TestC
Gene: 3, Name ElE
Gene: 4, Name IdFinal
All genes will be considered
Number of input genes: 5
Number of input chromosomes: 4
-> Lonely gene: [IdFinal]
-> Lonely gene: [IdB]
-> Lonely gene: [TestC]
-> Gene team: [IdA:0,ElE:3]
strand[1](line) :
0 3
strand[2](line) :
0 <-1-> 3
strand[3](line) :
3 0
strand[4](line) :
0 3
Verbose
option with -n and -p:
[root@localhost PROG]# ./team -f Ex.data -d 3 -v -n -p
Gene: 0, Name IdA
Gene: 1, Name IdB
Gene: 2, Name TestC
Gene: 3, Name ElE
Gene: 4, Name IdFinal
All genes will be considered
Number of input genes: 5
Number of input chromosomes: 4
-> Gene team: [IdB,IdA,ElE,TestC,IdFinal]
strand[1](line) :
IdB IdA ElE TestC <-1-> IdFinal
strand[2](line) :
IdA <-1-> ElE IdB TestC <-2->
IdFinal
strand[3](line) :
ElE IdA TestC IdFinal <-2-> IdB
strand[4](line) :
IdA ElE IdFinal IdB <-1-> TestC
Circular
chromosome. The -c option is set.
[root@localhost PROG]# ./team -f Ex.data -d 2 -v -n -c
Gene: 0, Name IdA
Gene: 1, Name IdB
Gene: 2, Name TestC
Gene: 3, Name ElE
Gene: 4, Name IdFinal
All genes will be considered
Number of input genes: 5
Number of input chromosomes: 4
-> Lonely gene: [IdFinal]
-> Lonely gene: [IdB]
-> Gene team: [IdA,ElE,TestC]
strand[1](line) :
IdA ElE TestC
strand[2](line) :
IdA * ElE * TestC
strand[3](line) :
ElE IdA TestC
strand[4](line) :
TestC * IdA ElE
The
[line] indication justs indicates if the instance of the gene team
on that strand has to be considered as circular or not. Warning: It do
not
correspond to the circularity of linearity of the original genome.
With annotations:
[root@localhost PROG]# ./team -f Ex.data -d 2 -v -n -c -a Example.annot
Reading the annotation file: Example.annot
... done (5 annotations)
Gene: 0, Name IdA
Gene: 1, Name IdB
Gene: 2, Name TestC
Gene: 3, Name ElE
Gene: 4, Name IdFinal
All genes will be considered
Number of input genes: 5
Number of input chromosomes: 4
-> Lonely gene: [IdFinal]
*** IdFinal
\ Annotation of IdFinal
-> Lonely gene: [IdB]
*** IdB
\ Annotation of IdB
-> Gene team: [IdA,ElE,TestC]
*** IdA
\ Annotation part 1
\ Annotation part 2 of IdA
*** ElE
\ Annotation ELE1
\ Annotation ELE2
\ Annotation ELE3
\
*** TestC
\ Annotation of TestC
strand[1](line) :
IdA ElE TestC
strand[2](line) :
IdA * ElE * TestC
strand[3](line) :
ElE IdA TestC
strand[4](line) :
TestC * IdA ElE