A CATALOG OF GENES FOR PLANT GLYCEROLIPID BIOSYNTHESIS
Sergei Mekhedov, Oscar Martínez de Ilárduya, and John Ohlrogge
Dept. of Botany and Plant Pathology .Michigan State University. East Lansing, MI 48824

Please note: This catalog is part of a manuscript:
Mekhedov, S., de Ilárduya, O., and Ohlrogge., J.(2000)
TOWARDS A FUNCTIONAL CATALOG OF THE PLANT GENOME:
A SURVEY OF GENES FOR LIPID BIOSYNTHESIS, Plant Physiology 122:389-401
Abstract
Conclusions

UPDATE:   For an expanded update, which focuses on Arabidopsis please see:  
THE ARABIDOPSIS LIPID GENE DATABASE(Sorry IE 6. Please)

The current version of this catalog contains more than 2600 sequence files, many of them with annotation and results of our analysis. This version is updated as of Aug. 1999 and  includes essentially all publicly available genomic, cDNA, EST and GSS sequences for 62 plant polypeptides involved in lipid metabolism in higher plant species.  An important feature of the catalog are the multiple alignments of amino acid sequences deduced from genomic and EST sequences.  This version of the dataset accounts for approximately 70% of the Arabidopsis genome.
NOTE:  Many of the pages of this database are best or correctly viewed only with a 17 inch or larger monitor or with screen resolution of 800 x 600 or greater. Multiple alignment and some other files are large and may require substantial time to download via modem or from non USA locations. If access to information from this site is too slow from your computer, you may request a CD-ROM version by sending a written request, via non-electronic mail, to

Sergei Mehkedov, Dept. Botany and Plant Pathology, Michigan State University, East Lansing, MI 48824, USA
 
CATALOG Table 1. List of reactions surveyed and estimated numbers of genes for each reaction in Arabidopsis and rice.
Schematic of fatty acid biosynthesis.
Table 2 and histograms. Comparison of numbers of ESTs in public databases for plant enzymes involved in fatty acid and lipid metabolism.
Schematic of glycerolipid biosynthesis.
Table 3. Genes which are missing from GenBank for plant enzymes involved in fatty acid and lipid metabolism.
Please let us know if you find errors, omissions, or other problems.

Please send comments to Sergei Mekhedov at mekhedov@pilot.msu.edu

ABSTRACT OF SUBMITTED MANUSCRIPT

Public databases now include vast amounts of recently acquired DNA sequences which are only partially annotated and often are annotated by automated methods which are subject to errors. Maximum information value of these databases can be derived only by further detailed analyses which frequently require careful examination of records in the context of biological functions. In this study we present an example of such an analysis focused on plant glycerolipid synthesis. Public databases were searched for sequences corresponding to 62 plant polypeptides involved in lipid metabolism. Comprehensive search results and analysis of genes, cDNAs and ESTs are available on-line (http://www.canr.msu.edu/lgc)

Multiple alignments provided a method to estimate the number of genes in gene families.Further analysis of sequences allowed us to tentatively identify several previously undescribed genes.For example, two genomic sequences were identified as candidates for the palmitate-specific monogalactosyldiacylglycerol desaturase (FAD5). A candidate genomic sequence for keto-acyl-ACP synthaseinvolved in mitochondrial fatty acid biosynthesis was also identified. Biotin carboxyl carrier protein (BCCP) in Arabidopsis is encoded by at least two genes and the most abundant BCCP transcript so far has not been characterized. The large number (>165,000) of plant ESTs also provides an opportunity to perform "digital northern" comparisons of gene expression levels across many genes.EST abundance in general correlated with biochemical and flux characteristics of the enzymes in Arabidopsis leaf tissue.In a few cases, statistically significant differences in EST abundance levels were observed for several genes which catalyze similar reactions in fatty acid metabolism.For example, the FatB acyl-ACP thioesterase ESTs occur 21 times compared to 7 times for FatA acyl-ACP thioesterase although flux through the FatA reaction is several fold higher than FatB. Such comparisons may provide initial clues toward previously undescribed regulatory phenomena. The abundance of ESTs for ACP compared to that of stearoyl-ACP desaturase and enoyl-ACP reductase suggests that concentrations of some enzymes of fatty acid synthesis may be higher than their acyl-ACP substrates.

CONCLUSIONS

By surveying GenBank data and ESTs, the data mining analyses described in the manuscript have yielded several new types of information. First, a number of previously un-described genes for plant glycerolipid synthesis have been putatively identified. Second, the extent to which proteins of the plant lipid synthesis pathway are encoded by gene families, and the size of each family has been estimated.Third, more than 160,000 publicly available ESTs have been analyzed to provide a digital northern estimate of gene expression levels for 62 plant proteins involved in plant lipid metabolism. With only a few exceptions, the EST abundance patterns for Arabidopsis and rice are very similar adding support to this method of estimating relative gene expression levels. Such a pathway wide overview has not been available through previous analyses and has provided new insights regarding the regulation of expression of the pathway. In the near future, with further rapid accumulation of sequence data, such a detailed analysis of gene sequences and ESTs for many metabolic pathways will become an essential approach that will contribute to the development of a functional catalog of plant genes. Of course, such analyses require revision as more information becomes available and the establishment of a website provides users a convenient source of such updates. The database constructed from this survey will be updated as the complete Arabidopsis and rice genome sequences become available.