Source | Dataset name |
---|---|
Genome assembly of A. thaliana | A. thaliana_chromosomes.fasta |
RepeatExplorer2 clustering results: | A. thaliana RepeatExplorer2 - Archive |
CLUSTER_TABLE_manual_correction.csv | |
A.thaliana TAIR annotation: | A.thaliana_TAIR_genes.gff |
A.thaliana_TAIR_ssr.gff | |
A.thaliana_TAIR_transposons.gff |
Input Parameter | Value |
---|---|
RepeatExplorer archive | A. thaliana RepeatExplorer2 - Archive |
minimal read depth coverage of contigs | 5 |
min_length | 50 |
Input Parameter | Value |
---|---|
Contigs - Library of Repeats from TAREAN/RepeatExplorer pipeline | Contigs(Repeat library) extracted from RepeatExplorer archive (obtained in step 3.) |
CLUSTER_TABLE from RepeatExplorer pipeline | CLUSTER_TABLE_manual_correction.csv (imported from data library in step 1) |
Output will be named Annotated Repeat Library. It is a FASTA file with updated sequence names that include classification information:
>CL3Contig14#All/repeat/mobile_element/Class_I/LTR/Ty3_gypsy/chromovirus/Tekay
ATAAACTGTTGTTTTCCTTTGACAGGCTGGGTAGTATTATGTTAGCCACGTTATGCTGTCAAAATTTTATTGATTGGTGG
GTTATTTAATTAGCCACTGCAGTAGGCGGGTTTCCGTGGACCAGCCTATTAGAGGGGCACGGTAAACCCCGTTATATTTA
CCTTAGTATGAGAGGCGCGAAGGATATCTCCTAAGGTACGATAGAGTTCACTTCACGCC
>CL3Contig16#All/repeat/mobile_element/Class_I/LTR/Ty3_gypsy/chromovirus/Tekay
ATAGGAATAATTATACGAGTTCGAATCAATAGCGATGTAGAAATCTTGCTATTGTGAAGTGAGTAACCTTATCATCATCT
TAATTATCTAAAGTATGTTTT
>CL12Contig19#All/repeat/mobile_element/Class_I/LTR/Ty1_copia/SIRE
GGAGAAATGACGGGCTGAAGAATATTGAGAAACAGAACTGACTTAGTCGACCAAGATGTGAGTTAGTCGACTAAATGCTC
TCTGCCAAAATCTGGACTTCAAGACAAAACCAACTTAGTCGACCAAGAAATGAGTTAGTCGACTAAATGC
Input Parameter | Value |
---|---|
Genome Assembly to annotate | A.thaliana_chromosomes.fasta (imported from data library in step 1) |
RepeatExplorer based Library of Repetitive Sequences | Annotated Repeat Library (obtained in previous step 4.) |
sensitivity | Default sensitivity |
Run time ~ 2 minutes
There are two output datasets, the Raw output from RepeatMasker and the parsed Repeat Annotation in the GFF3 format.
Rename the GFF3 output to “RM_RE_library”. This GFF3 file will be used later as an input for genome browser.
Input Parameter | Value |
---|---|
Choose the type of sequence data | Fasta |
Sequences in fasta format | A. thaliana_chromosomes.fasta (imported in step 1. |
Select taxon and protein domain database version (REXdb) | Viridiplantae_version_3.0 |
Select scoring matrix | BLOSUM80 |
Run iterative search | No |
Run time ~ 60 minutes
This tool creates three output datasets. For subsequent steps we will use the output labeled as DANTE on data 1, full output which is an annotation in the GFF3 format.
Rename the output dataset to “DANTE”
Input Parameter | Value |
---|---|
Filtered gff3 output from DANTE pipeline | DANTE (output from step 6.) |
Reference sequence matching DANTE output | A.thaliana_chromosomes.fasta (imported from data library in step 1) |
Run time ~ 5 minutes
This output contains the LTR retrotransposon annotation in the GFF3 format.
Rename the output dataset to “Unfiltered_LTR_retrotransposons”. This output must be processed to exclude potentially chimeric elements.
Input Parameter | Value |
---|---|
GFF3 output from DANTE_LTR retrotransposon identification pipeline | Unfiltered_LTR_retrotransposons 7. |
Reference sequence matching input GFF3 | A.thaliana_chromosomes.fasta (imported from data library in step 1) |
This step creates several outputs:
Dataset | Note |
---|---|
Validated LTR retrotransposons annotation (GFF3) | Annotation in GFF3 format |
Non-redundant library of LTR retrotransposons (FASTA) | Sequences in FASTA format to be used for library based assembly annotation |
Library of full length LTR retrotransposons (FASTA) | Sequences in FASTA format to be used for library based assembly annotation. This dataset will be used in the next step for library based assembly annotation |
Library of 5'LTR of retrotransposons (FASTA) | |
Library of 3'LTR of retrotransposons (FASTA) | |
LTR retrotransposons lengths summary | Graphical summary |
Reported retrotronsposons are divided into four ranks and nformation about ranks is a part of GFF3 file (Validated LTR retrotransposons annotation):
Rank | Annotation |
---|---|
DLTP | Elements with identified protein Domains, 5'LTR, 3'LTR, TSD and PBS |
DLP | Elements with identified protein Domains, 5'LTR, 3'LTR and PBS (TSD was not found) |
DLT | Elements with identified protein Domains, 5'LTR, 3'LTR and TSD (PBS was not found) |
DL | Elements with protein Domains, 5'LTR and 3'LTR (PBS and LDS were not found) |
D | Custer of proteins Domains with same classification |
Rename GFF3 annotation output from this tool to “LTR_retrotransposons”.
Input Parameter | Value |
---|---|
Genome/ Assembly to annotate | A.thaliana_chromosomes.fasta (imported from data library in step 1.) |
RepeatExplorer based Library of Repetitive Sequences | Library of full length LTR retrotransposons (FASTA) (from step 8.) |
sensitivity | Default sensitivity |
Rename the GFF3 output to “RM_LTR_library”
Input Parameter | Value |
---|---|
Reference genome to display | Use a genome from history |
Select the reference genome | A.thaliana_chromosomes.fasta |
We will add two track groups to the genome browser.
Click on +Insert Track Group, rename Track Category from “Default” to “Annotation”. Then click on +Insert Annotation Track select all GFF3 datasets available in the history. Bellow change JBrowse Track Type [Advanced] to “Neat Canvas Features”.