The advent of rapid and inexpensive DNA sequencing has led to an explosion of data waiting to be transformed into knowledge about genome organization and function. Gene prediction is customarily the starting point for genome analysis. This paper presents a bioinformatics study of the oil palm genome, including comparative genomics analysis, database and tools development, and mining of biological data for genes of interest. We have annotated 26,059 oil palm genes integrated from two independent gene-prediction pipelines, Fgenesh++ and Seqping. This integrated annotation constitutes a significant improvement in comparison to the preliminary annotation published in 2013. We conducted a comprehensive analysis of intronless, resistance and fatty acid biosynthesis genes, and demonstrated that the high quality of the current genome annotation. 3,658 intronless genes were identified in the oil palm genome, an important resource for evolutionary study. Further analysis of the oil palm genes revealed 210 candidate resistance genes involved in pathogen defense. Fatty acids have diverse applications ranging from food to industrial feedstocks, and we identified 42 key genes involved in fatty acid biosynthesis in oil palm. These results provide an important resource for studies of plant genomes and a theoretical foundation for marker-assisted breeding of oil palm and related crops.
- oil palm
- gene prediction