Genetic characterization of Liriodendron seed orchards with EST-SSR markers

Liriodendron tulipifera L., is a wide-spread, fast-growing pioneering tree species native to eastern North America. Commonly known as yellow-poplar, tulip tree, or tulip-poplar, the species is valued, both ecologically and economically. It is perhaps the most commonly used utility hardwood in the USA, and is planted widely for reforestation and, in varietal forms, as an ornamental. Although most seedlings used for reforestation today derive from collections in natural populations, two known seed orchards, established from plus-tree selections, i.e. superior phenotypes, in the 1960’s and 1970’s have been used for local and regional planting needs in Tennessee and South Carolina. However, very little is known about the population genetics of yellow-poplar nor the genetic composition of the existing seed orchards. In this study, 194 grafted yellow-poplar trees from a Clemson, SC orchard and a Knoxville, TN orchard were genetically characterized with 15 simple sequence repeat (SSR) markers developed from expressed sequence tags (ESTs). Of the 15 EST-SSR markers, 14 had a polymorphic information content (PIC) of at least 0.5. There was no significant difference between the Clemson and Knoxville orchards in average effective number of alleles (5.93 vs 3.95), observed and expected heterozygosity (Ho: 0.64 vs 0.58; He: 0.74 vs 0.70), Nei’s expected heterozygosity (0.74 vs 0.58), or Shannon’s Information index (1.84 vs 1.51). The larger Clemson orchard exhibited a significantly greater number of observed alleles than the Knoxville orchard (15.3 vs7.4). Overall, substantial genetic diversity is captured in the Clemson and Knoxville orchards.


Introduction
Liriodendron tulipifera L., commonly known as yellow-poplar or tulip-poplar, is a wide-spread, fast-growing pioneering hardwood species of considerable economic value in the forests of eastern North America. Yellow-poplar is distributed predominantly east of the Mississippi River from the gulf coast to southern Canada (28° to 43° north latitude) [35]. According to the forest inventory analysis [11], as surveyed from 2006-2012, the total saw log volume of L. tulipifera on timberland in the United States was 25.9 billion cubic feet, with the majority (65%) located in the southeastern United States. The species is shade intolerant and highly competitive, growing faster than Acer rubrum L. (red maple) and Quercus rubra L. (northern red oak) seedlings under a variety of silvicultural understory treatments (Beckage and Clark 2003). Yellow-poplar is often seen as a pioneering species in old fields. As a component of 16 forest cover types, this species' degree of dominance has created differentiation between the ecological communities [46]. In addition, yellow-poplar is valued as a nectar source for honey production, as a source of wildlife food (mast), and as a large shade tree in urban plantings [3]. The wood of yellow-poplar is used in a diverse range of products, such as in furniture, pallets and framing construction as well as pulp [12,41]). Chemical extracts from yellow-poplar wood or leaves have proven useful, doi: 10.7243/2050-2389-4-1 such as sesquiterpenes which have an anti-tumor effect and antifeeding for herbivores [27], and antimicrobial alkaloids [2].
L. tulipifera has been cultivated since 1663 [5] and is currently widely planted in eastern forests. Although seed orchards have been established to meet local or regional planting needs in the U.S.A. [6,36], genetic diversity of Liriodendron seed orchards in relation to natural stands has not been studied. Because seed orchards is the bridge between breeding and silvicultural activities, genetic diversity of tree seeds orchards determines the genetic quality of future forest stands and forms the basis for further improving the management of genetic resources and for the genetic modification of cultivars to meet new environmental challenges. Thus, the lacking information limits utilization of these Liriodendron orchards in a tree improvement program.
The primary goal of our study was to determine the genetic composition and diversity in two Liriodendron seed orchards in the southeastern USA. Although the two species separated 10~16 million years ago [32], they are quite similar morphologically and are cross fertile [26,34], and the hybrids exhibit heterosis [31,39]. Because the incomplete records suggest that the Clemson orchard may contain L. chinense or hybrids, we first used the sequence of a chloroplast gene, maturase K (matK), to discriminate the two Liriodendron species and their hybrids. Then we investigated the genetic diversity and allele richness among selections of this unique native species in each orchard as a first step toward contrasting orchard-produced seedling diversity with natural diversity. We chose simple sequence repeat (SSR) markers (also called microsatellites) in the study, because SSR markers are co-dominant, easily reproduced and scored, highly polymorphic, abundant through the genome, and have higher information content than isoenzyme and dominant markers [45].

Plant materials and DNA isolation
Fresh leaves of all Liriodendron trees (165) from the Clemson seed orchard and 31 trees from the Knoxville seed orchard were collected in the spring of 2013 and stored in plastic bags at -80°C prior to DNA isolation. All these trees represented different clones as validated by the SSR markers used in this study. Leaves from a Liriodendron tulipifera tree (accession number 70921 H) from the US National Arboretum (collected by Kevin Conrad) were also included in the study. Total genomic DNA was isolated from leaves using a CTAB protocol as described in [16] and suspended in TE buffer (Tris base 6.1g/L, EDTA 0.37 g/L, pH 8). The quality and concentrations of genomic DNA from individual plants were determined with a NanoDrop 3300 (Thermo Scientific, Wilmington, Delaware, USA) and by electrophoresis on 0.8% agarose gels.
The conditions for polymerase chain reactions (PCR) were as follows: 5 minutes of initial denaturation at 94°C, 35 cycles of touch-down PCR with 30 seconds of denaturation at 94°C, 30 seconds of annealing at 60-50°C (first cycle 60°, then each subsequent cycle 1°C lower than the previous until 51°C annealing temperature, followed by 25 cycles each with a 50°C annealing temperature), and 3 minutes of extension at 72°C, and a final extension at 72°C for 10 minutes. Before being sequenced with 1 ul of 10 uM forward or reverse primer, PCR products were cleaned with ExoAP mix (89 uL H2O+ 10 uL 5000U/mL Antarctic Phosphatase +1 uL 20000U/ mL Exonuclease I) for 30 minutes in a reaction containing 1 uL of PCR product and 1uL of ExoAP mix, followed by a heat inactivation step at 80°C for 15min. An 834 bp-segment of maturase K gene from each tree was used for alignment with MUSCLE and curated with Gblocks, and a phylogenetic tree was built with maximum likelihood (PhyML) (http://www. phylogeny.fr/) [7].
The maturase K gene sequence of L. tulipifera (GI: 5731451), L. chinense (GI: 7239759), and a hybrid (GI: 389955358) available in GenBank were included in the analysis.

L. tulipifera EST-SSR markers, PCR amplification, and allele sizing
Twenty simple sequence repeat (SSR) markers (also called microsatellites) were used to investigate the genetic composition of the Liriodendron seed orchards. These markers included seven Expressed Sequenced Tags (EST)-SSR markers (LT002, LT015, LT021, LT086, LT096, LT131, LT157) previously characterized by electrophoresis on 8% polyacrylamide gels [42] and thirteen new markers (LTCU19, LTCU40, LTCU51, LTCU53, LTCU125, LTCU139, LTCU142, LTCU143, doi: 10.7243/2050-2389-4-1 LTCU145, LTCU150, LTCU151, LTCU152, LTCU154) mined from a comprehensive EST dataset [22]. PCR amplification for each marker was performed with genomic DNA of Liriodendron trees from the Clemson and Knoxville seed orchards and the US National Arboretum. For a more cost-effective 153 primer screening, a M13 tail (5'-CACGACGTTGTAAAACGAC-3') was added to the 5'-end of the forward primer of each marker pair in order to amplify the fragments using a complementary adapter with a fluorescent dye (6-FAM, VIC, NED, or PET) at its 5'-end (Applied Biosystems, Foster City, California, USA). Polymerase chain reactions were carried out in a 12.5-μl solution comprising: approximate 75 ng DNA template, 0.052 U/μL Promega Taq DNA polymerase, 0.16 nM forward primer, 0.4 nM reverse primer, 0.4 nM fluorescent M13 primer, 0.24 mM each dNTPs, and 1.2×Promega PCR buffer. The PCR profile consisted of an initial denaturation at 94°C for 3 minutes followed by 10 cycles of 1 minute at 94°C, 1 minute at annealing temperature ( Table 1), and 1 minute 15 seconds at 72°C, and then 35 cycles of 1 minute at 94°C, 1 minute at 58°C, and 1 minute at 72°C, with a final extension of amplified DNA at 72°C for 5 minutes.
An aliquot of 1.5 μl PCR products were treated with 1.5 μl of 10-fold diluted ExoSAP-IT (Affymetrix Inc. Cleveland, OH, USA) to remove single stranded primers which might influence fragment analysis at 37°C for 30 minutes and then at 80°C for 15 minutes. After being diluted to 100 ng/μl, 1μl of each sample was mixed with 0.1 μl of LIZ600 and 8.9 μl of Hi-Di Formamide, denatured at 95°C for 5 minutes, and then put on ice for 10 minutes before being separated on an ABI 3730 Genetic Analyzer. The Dye set was DS-33 (6-FAM, VIC, NED, PET and LIZ). Allele sizes were scored with GeneMapper (4.0) (Applied Biosystems, Foster City, California, USA). Functional annotation of EST-SSRs was performed by applying a homology search of reassembled ESTs against the non-redundant (nr) NCBI database using the BLASTx algorithm [1].

Data analysis
MICRO-CHECKER [38] was employed to check for potential genotyping errors arising from large allele drop-out and stuttering. Observed and expected heterozygosities and polymorphic information content (PIC) were calculated using Cervus 2.0 [25]. Deviations from Hardy-Weinberg equilibrium and the Shannon's Information index were calculated with GENEPOP (http://genepop.curtin.edu.au/, Raymond and Rousset 1995).

Distinguishing between two Liriodendron species with maturase K (matK) sequence
It is not clear when L. chinense was first introduced to U.S.A., but the two Liriodendron species hybridize readily [26] and efforts in crossing have been well-documented [e.g., 31,34]. The two species are similar morphologically, except that L. chinense is smaller in stature and has larger, more deeply lobed leaves and smaller flowers. However, our attempt to tell these two species apart by morphology failed: the leaf shape varied depending on age (Supplementary figure S1) and the flowers were located at a too high for sampling. Molecular techniques including biochemical analysis [34], isozymes [14], and fingerprinting with random amplified polymorphic DNA (RAPD) [21] have been explored in discrimination of Liriodendron species and their hybrids. In 2012, Zhang et al., reported an SSR marker that amplified a 190-bp fragment from L. chinense, a 180-bp fragment from L. tulipifera, and both 190-and 180-bp fragments from hybrid. In this study, matK sequence was employed. The matK gene locates within the intron of the trnK and codes for maturase like protein involved in Group II intron splicing [37]. The trnKUUU-matK region, ranging from approximately 2.2 kb (liverworts) to 2.6 kb (seed plants) in size, is universally present in land plants and only few exceptions of a secondary loss or reorganizations are known to date [40]. Because the matK gene evolves more rapidly, compared to other plastid genes, it has become a valuable marker for lower-level phylogenetic reconstruction of systematic and evolutionary studies. The Clemson seed orchard contains 165 surviving trees. The matK sequence was amplified from each of the 165 trees in the Clemson orchard (Supplementary figure S2). When the amplicons were pair-end sequenced, an 834-bp segment of high quality was obtained for each tree, representing 55% of the full-length gene. There were only eight nucleotides different between the two Liriodendron species within the 834bp segment. As shown in Figure 1 and Supplementary figure S3, only Tree#CU24 and #134 were not L. tulipifera. Their hybrid status was established by having seven nucleotides different from L. tulipifera and 2 nucleotides different from L. Chinense. These results confirm the record of hybrids being planted in the Clemson Orchard. Thus, these two hybrids were excluded in the genetic composition analysis. Further, our study indicates that L. tulipifera, L. Chinense, and their hybrids contain unique nucleotide compositions in matK sequence that can be utilized in distinguishing the species and hybrids.

Amplification of EST-SSR loci in Liriodendron
No evidence for large allele dropout was found for any of the 20 markers. Stuttering occurred in five markers: LT131, LT157, LTCU40, LTCU142, and LTCU143, and these five markers were excluded in further analyses. All of the remaining 15 markers were polymorphic in both Clemson and Knoxville orchards.
The 20 markers were also tested on one L. tulipifera and one L. chinense tree from the US National Arboretum. Eleven loci were heterozygous and one locus failed in the L. tulipifera tree. PCR amplification for all 20 markers was successful in the L. Chinense tree, although sizing in an ABI 3730 Genetic Analyzer failed for LT157 and LTCU142, due to stuttering. Fourteen loci were heterozygous in the L. chinense tree. This indicates a high frequency of transferability of L. tulipifera EST-SSR markers in L. Chinense, supporting the previous 4 doi: 10  findings of 72.4% success rate by [42] and 82.1% by [43]. This is expected because EST-SSRs have generally demonstrated a high frequency of cross-species transferability despite less polymorphism compared to genomic SSRs [9,12,44]. Among the 194 L. tulipifera trees included in the study, the number of alleles per locus ranged from 3 to 26 (mean=13.0) ( Table 2) Many genomic resources, such as expressed sequence tag (EST) databases [15,22,23] and genomic DNA libraries [24], have been developed for L. tulipifera. Through these resources, several thousand putative SSR markers have been identified by in silico mining. However, only 345 L. tulipifera SSR markers having been tested for polymorphism by polyacrylamide denaturing gels [42,43]. Compared to other species, Liriodendron has lacked development of polymorphic and informative SSR markers.
As a result, no genetic linkage maps of Liriodendron have been reported. This is in contrast with the species'ecological and economic value and phylogenetic position as a basal angiosperm.

Genetic composition of the L. tulipifera orchards
While there were only two loci (LT002 and LT015) significantly deviating from Hardy-Weinberg proportions in the Clemson population, there were 10 deviating loci in the Knoxville population (p>0.05) (Supplementary Table S1 and S2). This may be due to insufficient sample size from the Knoxville population. As shown in Tables 3 and 4 However, the differences were not statistically significant (p=0.05, t-Test) except for observed number of alleles. The different number of trees from the orchards included in the study, 163 from the Clemson vs 31 from the Knoxville, may be a contributing factor. This is the first report of genetic composition of Liriodendron cultivated populations in North America and has provided the basic data of genetic diversity and allele richness among selections of this unique native species. [43] examined 27 trees from a cultivated population of L. tulipifera in the Jurong, Jiangsu Province of China with 39 polymorphic EST-SSR loci through electrophoreses in 6% polyacrylamide denaturing gels and visualization with silver nitrate staining. It was found that the number of alleles per locus ranged from three to 18 and the average Ho and He were 0.68 and 0.78, respectively. Compared to this cultivated population in China, the two However these studies utilized either allozymes or amplified fragment length polymorphism (AFLP) markers, which usually have lower information content than SSR markers. None of the reported expected heterozygosities from these studies exceeded 0.29. Overall, substantial genetic diversity is captured in the Clemson and Knoxville seed orchards.

Conclusion
The data obtained in this study will be useful in future applications such as prediction of genetic gain and gene diversity in the seed orchards. Nei's genetic distance between the two orchards was 0.39, which was the lowest among all comparisons ( Table 5). The L. chinense and L. tulipifera trees from the National Arboretum exhibited the largest genetic distance (1.17). The two orchards and the L. tulipifera sample from the US National Arboretum grouped together in the UPGMA dendrogram. The genetic distance of the hybrids in the Clemson orchard was closest to the Clemson orchard (0.50), followed by the Knoxville orchard (0.80) and L. chinense from the National Arboretum (0.88), and then by the L. tulipifera from the National Arboretum (1.17) (Figure 2). With a widespread range of distribution, L. tulipifera has adapted to many different ecological conditions and is one of the species becoming increasingly dominant in forests due to its quick respond to increases in light to the forest floor and rapid initial growth rate [8]. Its increasingly important roles in forestry and wood products is making studying Liriodendron of great interest.
Our study provides a first look at the genetic diversity and allele richness among selections of this unique native species,  and provides a foundation for further genetic and breeding exploration. The polymorphic markers developed in this study will serve as a resource enabling the future study of population dynamics and adaptive variation in Liriodendron.