Abstract
The rapid accumulation of sequenced plant genomes in the past decade has outpaced the still difficult problem of genome-wide protein-coding gene annotation. A substantial fraction of protein-coding genes in all plant genomes are poorly annotated or unannotated and remain functionally uncharacterized. We identified unannotated proteins in three model organisms representing distinct branches of the green lineage (Viridiplantae): Arabidopsis thaliana (eudicot), Setaria viridis (monocot), and Chlamydomonas reinhardtii (Chlorophyte alga). Using similarity searching, we identified a subset of unannotated proteins that were conserved between these species and defined them as Deep Green proteins. Bioinformatic, genomic, and structural predictions were performed to begin classifying Deep Green genes and proteins. Compared to whole proteomes for each species, the Deep Green set was enriched for proteins with predicted chloroplast targeting signals predictive of photosynthetic or plastid functions, a result that was consistent with enrichment for daylight phase diurnal expression patterning. Structural predictions using AlphaFold and comparisons to known structures showed that a significant proportion of Deep Green proteins may possess novel folds. Though only available for three organisms, the Deep Green genes and proteins provide a starting resource of high-value targets for further investigation of potentially new protein structures and functions conserved across the green lineage.
Original language | American English |
---|---|
Number of pages | 15 |
Journal | Plant Direct |
Volume | 7 |
Issue number | 12 |
DOIs | |
State | Published - 2023 |
NREL Publication Number
- NREL/JA-2700-83602
Keywords
- Arabidopsis
- Deep Green conserved proteins
- functional annotation
- protein structure
- Setaria