GENE TEAMS and DOMAIN TEAMS


The algorithmic part:


Bergeron, A., Corteel, S., Raffinot, M., 2002. The algorithmic of gene teams .
    In: Workshop on Algorithms in Bioinformatics (WABI). No. 2452 in Lecture
    Notes in Computer Science. Springer-Verlag, Berlin, pp. 464--476.

Béal, M-P., Bergeron, A., Corteel, S., Raffinot, M., 2003. An Algorithmic View of Gene Teams
      In Theoretical Computer Science , 320(2-3), pages 955-418, 2004.


The bioinformatic part:


Gene Teams

Luc, N,  Risler, J.-L.,  Bergeron, A., and Raffinot, M.,
2002.
Gene Teams:
    A New Formalization of Gene Clusters For Comparative Genomics
, In
      Comput Biol Chem. 2003 Feb;27(1):59-67.



Domain Teams (New !)

S. Pasek, A. Bergeron, J-L. Risler, A. Louis, E. Ollivier and M. Raffinot, 2004
     Identification of Genomic Features using Domain Teams
    Technical report 01-07-2004-LGI of Laboratoire Génome et Informatique, Evry, France.
Paper (pdf + gzip)
Source code and supplementary material.




Old Team program (but still of use !)

The TEAM (version 3) program is here (tar +gzip): 
TEAM-3.0.tar.gz.
It has been written by Nicolas Luc and Mathieu Raffinot.

This archive contains 9 files
:
TEAM-3.0/Makefile
TEAM-3.0/LICENCE
TEAM-3.0/team_option.c
TEAM-3.0/team_option.h
TEAM-3.0/parse_annot_file.c
TEAM-3.0/parse_annot_file.h
TEAM-3.0/teams-10-Oct-2002.c
TEAM-3.0/Example.annot
TEAM-3.0/Example.data

To compile it, simply tape make in the subdirectory TEAM-3.0. This sould build the
software
team. Note that it uses getopt.h that must be available on your system.

Here is the README.
See the LICENCE file before any utilisation.


I. USAGE  (version 3 )

team -f <data file> -d <delta> [-s] [[-v] [-n] [-p]] [-c] [-g <name of designated gene>]
        [-a <file of annotations>]


*** Required parameters

-f <data file> marks the name of the main file input, in the entry
format that is described in section II.


-d <integer> is the maximal distance allowed in the gene team between two
consecutives genes.


*** Optional parameters

[-s]: Silence option, do not ouput anything (for speed tests).

[-v]: verbose option.
      Default: prints the indices of the element of the gene teams on each strand,
               separted by as many '*' than their distance on the strand.
      
      Together with:

      [-n]: the names of the genes are printed on each strand instead of their indices
      [-p]: the distance is indicated with '<- dist ->' instead of '*'

[-c] takes in account the circularity of certain genomes if they are marked as circular in
       the input file (DIR-LENGTH C-[size], see next session)


[-g <name of designated gene>]: outputs only the gene teams containing the designated gene.

[-a <file of annotations>]: annotations of the groups of genes of the
                            input file, in the format specified in the next section.



II. Input Format


1. Input format of the main data file

A typical example: file "Example.data":
DIR-LENGTH     L     C-12      C-9     C-7
IdA                        2           4          4         1
IdB                        1           7          9         4
TestC                    4           8          5         6
ElE                        3          6           3         2
IdFinal                  6          11         6         3

Line DIR-LENGTH is required. It specifies the Direction (Linear: L,
Circular: C) of the chromosomes. If the chromosome is circular (C), it
absolutely must be followed ("-") with its length. "C-12" means
Circular with length 12.

Then, each gene identifier IdA, IdB, ... , is followed (tabulations or
spaces) with its positions on each chromosome.


2. Input format of the optional annotation file

A typical example corresponding to Example.data is " Example.annot":

A> IdA
Annotation part 1
Annotation part 2 of IdA

A> IdB
Annotation of IdB

A> TestC
Annotation of TestC

A> IdFinal
Annotation of IdFinal

A> ElE
Annotation ELE1
Annotation ELE2
Annotation ELE3

Each gene identifier (separated with A>) is followed by its annotation
on all genomes. The gene identifiers are not required to appear in the
same ordre than in the .data file.


IV Example of Output and explanations


The simplest case, it just returns the differents gene teams with the name of the identifiers.


[root@localhost PROG]# ./team -f Ex.data -d 2
  -> Lonely gene: [IdFinal]

  -> Lonely gene: [IdB]

  -> Lonely gene: [TestC]

  -> Gene team:   [IdA,ElE]


Again:


[root@localhost PROG]# ./team -f Ex.data -d 3
  -> Gene team:   [IdB,IdA,ElE,TestC,IdFinal]



Verbose option,

It returns the gene teams and a schematic representation of their positions on the genome.
All chromose are here considered as linear (no -c option).

[root@localhost PROG]# ./team -f Ex.data -d 2 -v

Gene: 0, Name  IdA
Gene: 1, Name  IdB
Gene: 2, Name  TestC
Gene: 3, Name  ElE
Gene: 4, Name  IdFinal

All genes will be considered

Number of input genes: 5
Number of input chromosomes: 4

  -> Lonely gene: [IdFinal]

  -> Lonely gene: [IdB]

  -> Lonely gene: [TestC]

  -> Gene team:   [IdA:0,ElE:3]
strand[1](line) :
  0  3
strand[2](line) :
  0  *  3
strand[3](line) :
  3  0
strand[4](line) :
  0  3


Verbose option with -n:

[root@localhost PROG]# ./team -f Ex.data -d 2 -v -n

Gene: 0, Name  IdA
Gene: 1, Name  IdB
Gene: 2, Name  TestC
Gene: 3, Name  ElE
Gene: 4, Name  IdFinal

All genes will be considered

Number of input genes: 5
Number of input chromosomes: 4

  -> Lonely gene: [IdFinal]

  -> Lonely gene: [IdB]

  -> Lonely gene: [TestC]

  -> Gene team:   [IdA,ElE]
strand[1](line) :
  IdA  ElE
strand[2](line) :
  IdA  *  ElE
strand[3](line) :
  ElE  IdA
strand[4](line) :
  IdA  ElE


Verbose option with -p:

[root@localhost PROG]# ./team -f Ex.data -d 2 -v -p

Gene: 0, Name  IdA
Gene: 1, Name  IdB
Gene: 2, Name  TestC
Gene: 3, Name  ElE
Gene: 4, Name  IdFinal

All genes will be considered

Number of input genes: 5
Number of input chromosomes: 4

  -> Lonely gene: [IdFinal]

  -> Lonely gene: [IdB]

  -> Lonely gene: [TestC]

  -> Gene team:   [IdA:0,ElE:3]
strand[1](line) :
  0  3
strand[2](line) :
  0 <-1->  3
strand[3](line) :
  3  0
strand[4](line) :
  0  3


Verbose option with -n and -p:


[root@localhost PROG]# ./team -f Ex.data -d 3 -v -n -p

Gene: 0, Name  IdA
Gene: 1, Name  IdB
Gene: 2, Name  TestC
Gene: 3, Name  ElE
Gene: 4, Name  IdFinal

All genes will be considered

Number of input genes: 5
Number of input chromosomes: 4

  -> Gene team:   [IdB,IdA,ElE,TestC,IdFinal]
strand[1](line) :
  IdB  IdA  ElE  TestC <-1->  IdFinal
strand[2](line) :
  IdA <-1->  ElE  IdB  TestC <-2->  IdFinal
strand[3](line) :
  ElE  IdA  TestC  IdFinal <-2->  IdB
strand[4](line) :
  IdA  ElE  IdFinal  IdB <-1->  TestC



Circular chromosome. The -c option is set.

[root@localhost PROG]# ./team -f Ex.data -d 2 -v -n  -c

Gene: 0, Name  IdA
Gene: 1, Name  IdB
Gene: 2, Name  TestC
Gene: 3, Name  ElE
Gene: 4, Name  IdFinal

All genes will be considered

Number of input genes: 5
Number of input chromosomes: 4

  -> Lonely gene: [IdFinal]

  -> Lonely gene: [IdB]

  -> Gene team:   [IdA,ElE,TestC]
strand[1](line) :
  IdA  ElE  TestC
strand[2](line) :
  IdA  *  ElE  *  TestC
strand[3](line) :
  ElE  IdA  TestC
strand[4](line) :
  TestC  *  IdA  ElE


The [line] indication justs indicates if the instance of the gene team
on that strand has to be considered as circular or not. Warning: It do not
correspond to the circularity of linearity of the original genome.


With annotations:

[root@localhost PROG]# ./team -f Ex.data -d 2 -v -n  -c -a Example.annot
Reading the annotation file: Example.annot
... done (5 annotations)

Gene: 0, Name  IdA
Gene: 1, Name  IdB
Gene: 2, Name  TestC
Gene: 3, Name  ElE
Gene: 4, Name  IdFinal

All genes will be considered

Number of input genes: 5
Number of input chromosomes: 4

  -> Lonely gene: [IdFinal]

  *** IdFinal
    \ Annotation of IdFinal

  -> Lonely gene: [IdB]

  *** IdB
    \ Annotation of IdB

  -> Gene team:   [IdA,ElE,TestC]
  *** IdA
    \ Annotation part 1
    \ Annotation part 2 of IdA
  *** ElE
    \ Annotation ELE1
    \ Annotation ELE2
    \ Annotation ELE3
    \
  *** TestC
    \ Annotation of TestC

strand[1](line) :
  IdA  ElE  TestC
strand[2](line) :
  IdA  *  ElE  *  TestC
strand[3](line) :
  ElE  IdA  TestC
strand[4](line) :
  TestC  *  IdA  ElE