A mini foxtail millet with an Arabidopsis-like life cycle as a C4 model system

Foxtail millet (Setaria italica) is an important crop species and an emerging model plant for C4 grasses. However, functional genomics research on foxtail millet is challenging because of its long generation time, relatively large stature and recalcitrance to genetic transformation. Here we report the development of xiaomi, a rapid-cycling mini foxtail millet mutant as a C4 model system. Five to six generations of xiaomi can be grown in a year in growth chambers due to its short life cycle and small plant size, similar to Arabidopsis. A point mutation in the Phytochrome C (PHYC) gene was found to be causal for these characteristics. PHYC encodes a light receptor essential for photoperiodic flowering. A reference-grade xiaomi genome comprising 429.94 Mb of sequence was assembled and a gene-expression atlas from 11 different tissues was developed. These resources, together with an established highly efficient transformation system and a multi-omics database, make xiaomi an ideal model system for functional studies of C4 plants. This study developed xiaomi, a mini foxtail millet mutant, as a C4 model plant that has a short life cycle and small stature. To further enhance its model plant function, xiaomi’s genome was sequenced and an efficient transformation system was established.


Articles
Nature PlaNts (Fig. 1f and Supplementary Table 1). Thus, five to six generations of xiaomi can be completed in growth chambers in one year. Because of the small size of xiaomi, a set of 1,296 plants can be grown on two planting racks in a three-dimensional space of 1.2 m × 0.55 m × 2.0 m, similar to the space needed to grow an equal number of Arabidopsis plants.
A mutation in the PHYC gene is causal for the characteristics of xiaomi. To identify the causative mutation(s) responsible for early heading of xiaomi, we crossed it with G1-a landrace with a heading date of ~75 DAS under the long-day conditions. All 9 F 1 plants exhibited the G1-like late-heading phenotype, and the resulting F 2 populations comprising 268 individual plants showed a segregation ratio of 3:1 (χ 2 = 0.318 < χ 2 0.05(1) = 3.841) for the G1-like late heading date to the xiaomi-like early-heading date, suggesting that the early-heading date of xiaomi was caused by recessive mutation at a single locus. Using 106 early flowering F 2 individuals, this locus was mapped to a 212-kilobase (kb) region on chromosome 9, which harbours 27 genes, according to the annotation of the xiaomi reference genome (Fig. 2a). Comparison between the xiaomi genome sequence and the Jingu21 genome sequence, which was generated by genome resequencing, revealed the presence of only a single mutation in the mapped region-a transversion from G to T in the coding region of the gene Si9g09200 in xiaomi, which encodes a putative PHYC protein ( Fig. 2a and Supplementary Table 2). Of the 77 early-heading F 2 individuals examined, all were the T/T homozygous mutants. By contrast, of the 256 late-heading individuals examined, 92 were the G/G WT homozygotes and the other 164 were G/T heterozygotes, reflecting a perfect association between the genotypes and phenotypes. This mutation resulted in the creation of a stop codon, causing a truncated protein accounting for approximately 71% of transcripts (transcript 1) of the gene (Fig. 2a-d and Extended Data Fig. 1). Interestingly, this mutation also led to alternative splicing responsible for a frameshift deletion  Fig. 1). On the basis of the prediction, the truncated proteins lack about two-thirds (peptide 1) or one-third (peptide 2) of the second Per-Arn-Sim (PAS) domain, and the entire histidine kinase A (phospho-acceptor) (HK) domain and histidine kinase, DNA gyrase B and HSP90-like ATPase (HD) domain 13 (Fig. 2d). We sequenced another mutant, named xiaomi-2, which also derived from Jingu21. The xiaomi-2 mutant showed an early-heading phenotype similar to that of xiaomi (Extended Data Fig. 2a-d). Sequence comparison of the PHYC locus revealed a single point mutation (T674A) in the first exon of PHYC in xiaomi-2 resulting in a change from a conserved leucine to histone (Extended Data Figs. 2e,f and 3). This single nucleotide polymorphism (SNP) is perfectly associated with phenotypic segregation of 82 early-heading (A/A genotype) and 84 late-heading (49T/A genotype and 35T/T genotype) M3 plants derived from an M2 heterozygous mutant. Together, these observations confirm that the early-heading phenotype resulted from the mutation at the PHYC locus.

Nature PlaNts
To understand how the mutation at the PHYC locus affects flowering time under the long-day conditions, we performed RNA-sequencing (RNA-seq) analysis using the second leaves from 30-DAS plants (approximately 10 d before xiaomi began heading), with WT leaves collected at 30 DAS as a control. We found that the expression levels of several genes orthologous to the Arabidopsis oscillator genes PSEUDO-RESPONSE REGULATORs (PRRs), PHYTOCLOCK 1 (PCL1) and GIGANTEA (GI), which are critical for photoperiodic flowering, were significantly altered in xiaomi (Supplementary Fig. 1 and Supplementary Table 4). As expected, a putative downstream photoperiod gene orthologous to Ghd7 showed about 95-fold decrease in expression level, whereas the putative flowering genes orthologous to EARLY HEADING DATE 1 (Ehd1), HEADING DATE 3a (Hd3a) and APETALA1 (AP1)/FRUITFULL (FUL)-like MADS box genes, respectively, exhibited significant increases expression level in xiaomi (Supplementary Fig. 1 and Supplementary Table 4). Ehd1 is a key transcriptional regulator in photoperiodic flowering pathway in rice, which could active the transcription of the florigen Hd3a in leaves. The Hd3a protein produced in leaves is then transported to shoot apical meristem, where it induces the expression of AP1/FUL-like MADS box genes. Overall, these observations suggested that the early-heading phenotype of xiaomi was caused by disruption of the photoperiodic pathways.
Assembly and annotation of the xiaomi genome. To facilitate the use of xiaomi as a model plant, a total of 41.54 Gb (94.78× coverage) high-quality single-molecule real-time (SMRT) subread sequences were generated and assembled into 429.45 Mb of scaffold sequences, with a contig N50 of 19.85 Mb (Supplementary Tables 5-7). Of these, 399.40 Mb of scaffold sequences were anchored to nine super-scaffolds (chromosomes) with 137.33 million Hi-C-based paired-end reads (Extended Data Fig. 4 (Table 1). k-Mer analysis suggests that the draft assembly covers approximately 98.10% of the entire genome ( Supplementary Fig. 2). The error rate of the assembly is about 0.001% (one error per 100 kb), as estimated by Illumina DNA short reads. Single-copy orthologue analysis showed that 97.78% of the 1,440 benchmarking universal single-copy orthologs (BUSCO) genes were completely covered by the xiaomi genome, with only 0.90% incomplete and 1.32% not assembled or annotated (Supplementary Table 9). Collectively, these results indicate that the xiaomi genome v.1.0 can be used as a gold-standard reference by the research community.
Using a combination of de novo prediction and homology-based comparison, a total of 237.28 Mb (55.19%) of the xiaomi genome sequences were annotated as repetitive elements (Table 1 and  Supplementary Table 10). We annotated 34,436 protein-coding genes using 671,853 full length non-chimeric (FLNC) isoform reads produced by PacBio RS II and approximately 1,054.5 million short RNA-seq reads produced by the HiSeq X Ten platform, and a combination of ab initio prediction and protein-homology-based searches (Table 1 and Supplementary Table 11), of which 32,743 (95.08%) were located in the nine pseudochromosomes (Supplementary Table 12). These genes were searched against Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Eukaryotic Orthologous Groups (KOG), TrEMBL and nr databases and compared with the annotation of Arabidopsis and rice to retrieve homologues with known functions and a total of 33,789 genes (98.12%) were annotated (Supplementary Table 13). In addition, we annotated 919 rRNA genes, 3,516 transfer RNA genes, 2,631 pseudogenes, 340 microRNA (miRNA) precursors, 28,260 long non-coding RNA (lncRNA) precursors, and 1,318 circular RNA (circRNA) precursors (Table 1).

Nature PlaNts
All the genomic and transcriptomic data are publicly accessible through our user-friendly database (http://sky.sxau.edu.cn/MDSi. htm). In this database, researchers can navigate the genome by chromosome coordinates, gene or transcript symbols, or by BLAST search against the xiaomi genome, coding sequences or peptide sequences.
Comparison of genome sequences from xiaomi and other foxtail millet varieties. Compared with the three previously released genome sequences from Yugu1 5 , Zhanggu 8 and TT8 9 varieties, the xiaomi genome showed the highest quality in terms of the genome coverage, contig N50 values, and contig and gap numbers (Supplementary Table 14 Table 18). Only 1,030 genes in xiaomi were absent in both Yugu1 and Zhanggu; these were considered to be xiaomi-specific (Supplementary Table 19). A marked phenotypic difference between xiaomi and Yugu1 is their susceptibility and resistance, respectively, to downy mildew ( Supplementary Fig. 3), but it is unclear whether this difference is associated with any of the detected variety-specific genes.

Construction of a dynamic gene-expression atlas for xiaomi.
To develop a reference gene-expression atlas for functional interpretation and investigation of gene function, we measured transcript levels in 11 diverse tissues representing the major organs over various developmental stages of xiaomi (see Methods for details). A total of 1,054.51 M raw reads (~30 M reads per sample) were produced and analysed. A total of 31,226 (90.68%) genes were expressed in Of these genes, 85 (0.25%), including one transcription initiation factor gene (Si3G07600) and two ubiquitin-conjugating enzyme coding genes (Si1G37980 and Si2G05250), were constitutively expressed in all assayed tissues (Supplementary Table 21). These genes would be useful for transcript normalization before comparative gene-expression analysis. Moreover, we identified 1,218 organor tissue-specific genes and 1,226 organ-or tissue-preferentially expressed genes (Supplementary Figs. 4b,c and 5 and Supplementary  Tables 22 and 23).

Establishment of an efficient
Agrobacterium-mediated genetic transformation system. To pave the way for functional genomics studies, we tested various factors to develop an Agrobacterium-mediated transformation protocol for xiaomi. We challenged mature seeds as a starting material for callus induction to avoid the costly need for growing plants if fresh tissues such as the young inflorescence or immature embryos are used for callus induction 14 . After a series of trials, we observed that primary calli were not suitable for use in transformation, mainly due to the soft texture. Following three rounds of subculture on an improved callus-induction medium (CIM), however, compact embryogenic calli were obtained (Fig. 5a). We used the green fluorescent protein (GFP) reporter gene to monitor Agrobacterium infection efficiency, as indicated by multiple green spots in the calli (Fig. 5b,c), and effectiveness for selecting out the transgenic callus (Fig. 5d,e). The regeneration ability of xiaomi was well maintained on the CIM during subculture (Fig. 5f). Roots of transgenic plants expressing GFP could be induced easily on the rooting medium and rooted plants survived well after transplanting to soil ( Fig. 5g-i). We compared two commonly used selectable markers neomycin phosphotransferase II (NPT-II) and hygromycin phosphotransferase (HPT), and obtained transformation efficiency ranging from 8.05% to 38.75%, with an average of 23.28% for NPTII, and from 3.08% to 16.67%, with an average of 8.72% for HPT (Supplementary Table 24). We then confirmed the presence of transgenes in primary putative transgenic plants (T0) by PCR using primers to amplify the GFP gene, UBI promoter, and HPT or NPTII selectable marker genes ( Fig. 5j and Supplementary Fig. 6). We also observed GFP expression in both    Table 25). We grew T1 plants representing eight transgenic events in pots and observed no obvious phenotypic differences from non-transformed xiaomi plants ( Supplementary  Fig. 7). Collectively, we demonstrated the transgenic nature of the plants generated by Agrobacterium-mediated transformation. Thus, we have established an efficient protocol that allows production of transgenic plants ready to transplant to soil in 2-3 months from Agrobacterium infection or 4-5 months from callus initiation from mature seeds (Fig. 6).

Discussion
Foxtail millet is an emerging C 4 model plant suitable for investigation of various biological phenomena that are absent in other model plants such as Arabidopsis and rice. In this study, we identified xiaomi, a mutant with reduced stature and short generation time, created a reference genome and a gene-expression atlas from vari-ous tissues, and developed a highly efficient transformation protocol. The results demonstrate the suitability of xiaomi as a model system to investigate C 4 grass biology and other important molecular mechanisms including, but not limited to, higher nitrogen use efficiency, abiotic and biotic stress responses, downy mildew resistance, domestication and evolution. Compared with C 3 species, C 4 plants usually show higher rates of photosynthesis as well as higher nitrogen and water efficiencies. The most productive crop species, such as maize, sorghum and sugarcane, are C 4 plants, and C 4 plants contribute to about a quarter of global primary biomass production despite comprising only 3% of all land plant species 15 . Due to such high productivity, introducing C 4 pathway genes into major C 3 crops such as rice seems to be a promising strategy to meet the growing demand for food production 16 . Towards implementation of such a strategy, it is essential to elucidate genetic and molecular mechanisms underlying the differentiation of C 3 and C 4 anatomical, physiological and biochemical features. Among C 4 plants, maize and sorghum are the major contributors to world food production, whereas sugarcane and switchgrass are major bioenergy plants. However, all these plants possess relatively large statures, large genomes, long life cycles, and are difficult to transform. About ten years ago, Brutnell  light, confirming transmission of transgene to progeny. All experiments were performed with eight independent biological repeats and at least six samples were tested for each biological repeat except j, which was performed with three repeats. Scale bars, 2 mm (a-e,h,k,l) and 2 cm (f,g,i).

Nature PlaNts
proposed green foxtail as a C 4 model plant, considering the advantages of its relatively short stature, simple growth requirements and rapid life cycle. Since then, great progress has been made in genome assembly, transformation technology and germplasm generation as well as mutant isolation and characterization in green foxtail 5,[17][18][19][20] . Compared with its wild progenitor green foxtail, foxtail millet is more suitable as a model plant. First, the seeds of foxtail millet are generally non-shattering and non-dormant, and easier to collect and germinate; second, foxtail millet has been widely cultivated for both human food and animal feed in the arid and semi-arid regions of the world, particularly in China and India, and would be easier to deploy for grain production. These characteristics, together with the short life cycle and small plant size, makes xiaomi an ideal model plant to accelerate research in millet and other C 4 plants.
We acknowledge that there remain limitations to the direct use of xiaomi for certain studies. For example, it may not be suitable for directly evaluating grain yield of foxtail millet cultivars in the field condition, although such traits may be dissected into individual yield-related components such as grain size, 1,000-grain weight and seed number per panicle. Nevertheless, some of the limitations may be at least partially overcome by growing xiaomi under short-day conditions or crossing xiaomi plants with WT plants to produce progeny for use in subsequent investigations of the traits of interest.
Other early flowering mutants, such as Xiaowei 21 in rice and Micro-Tom 22 in tomato, have been used for conducting large-scale indoor research. Similar to those of xiaomi, the phenotypic changes of Xiaowei were caused by the deficiency of a haem oxygenase involved in the biosynthesis of a phytochrome chromophore 21 . Indeed, accelerated flowering under noninductive photoperiods was also observed in the phyC mutants in Arabidopsis 23 and rice 24 . Thus, it is highly probable that xiaomi-like mutants can be created using any millet variety by editing the PHYC gene.
At present, around 10% of xiaomi genes are not captured in the tissues used for construction of the gene-expression atlas; nevertheless, the majority of these 'unexpressed' genes have homologues or orthologues in Arabidopsis and rice, suggesting that they would be expressed in other tissues, at other developmental stages, or under specific growth conditions, and additional RNA-seq analysis should enable construction of a more comprehensive gene-expression atlas in the future.
Transformability is an essential prerequisite for a model plant. Arabidopsis can be efficiently transformed by floral dipping, which has enabled its rapid adoption for basic research in plant biology worldwide. Recently, a similar approach (spike dip transformation) has been explored in green foxtail with reported success 19,25 . However, we were unable to recover any transgenic plant from xiaomi using this method. Transgenic plants were successfully produced using calli induced from immature embryos or inflorescences in both foxtail millet and green foxtail, although at low efficiency 26,27 . The disadvantage of using fresh tissues is that plants must be grown periodically to ensure a constant supply all year round, which is time consuming and costly. Thus, we attempted to use mature seeds as an explant source for callus induction. In repeated trials, we observed that the primary calli induced from mature embryos were not suitable for direct use in transformation, most probably due to their watery and soft nature. We then focused our efforts on the development of embryogenic calli by subculture and optimization of the infection and selection steps by monitoring expression of the reporter GFP. We also compared selectable markers and identified NPTII as an efficient marker for foxtail millet transformation. The transformation method established in this study has a 3.5-fold higher efficiency than the previously reported method 26 , and with further improvement should encourage broad adoption of xiaomi as a model for basic and applied research, especially in C 4 plants.

Methods
Plant materials and growth conditions. xiaomi was identified from an EMS-mutagenized M2 population of Jingu21, a variety of foxtail millet widely cultivated in North China for its good grain quality and high yield. The xiaomi mutant was maintained by self-pollination in the laboratory for ten generations, leading to a very low level of heterozygosity. Foxtail millets were grown in the experimental field in Taigu, Shanxi, China (37° 25′ 13″ N, 112° 35′ 26″ E). For indoor research, plants were grown in an auto-controlled growth chamber or culture room equipped with full spectrum (420-730 nm) LED light sources, under 28 °C:22 °C day:night cycle with a 14 h photoperiod and 350-700 μmol m −2 s −1 light intensity unless otherwise specified. To shorten the life cycle and reduce plant stature, we optimized growth conditions for xiaomi. In brief, xiaomi seeds were soaked in water overnight at room temperature and sown in a soil mix of nutrient soil, sandy soil and vermiculite (3:2:1, v/v/v) watered with B5 solution (water content approximately 25%, w/w). Plants were grown under 16 h photoperiod and watered to maintain 10%-15% water content.
For genome sequencing, the aboveground tissues, including leaves, stem and young panicle were collected from a single healthy xiaomi plant at the pollination stage for PacBio SMRT DNA sequencing. Young leaves from a single healthy xiaomi or WT plant were collected for genome resequencing.
For the expression atlas sequencing, 11 diverse tissues representing the major organ systems were collected, with 3 or 5 biological replications. These tissues were 3 d imbibed seeds (seed), 2-week-old whole seedling (seedling), root, stem, the top first fully extended leaf of 2-week-old seedling (leaf 1), the top second leaf of 30-day-old plants (leaf 2), flag leaf (leaf 3), the fourth leaf (leaf 4), immature panicle (panicle 1), panicle at pollination stage (panicle 2) and panicle at grain-filling stage (panicle 3). For seed germination, the surface-sterilized seeds were placed on Whatman no. 1 filter paper soaked with distilled water and cultured for 3 d, allowing them to germinate. For the 2 week seedling stage, the seeds were sown in soil and the whole seedlings (seedling) and the first immature leaves (leaf 1) were sampled at 2 weeks after germination. Leaf 2 is the top second leaf of 30 d xiaomi seedlings (10 d before heading). Samples of stem, leaf 3 (flag leaf), leaf 4 (the top fourth leaf) and panicle 3 were all collected at the grain-filling stage. Each biological replicate included at least five healthy xiaomi plants randomly selected from the field or auto-controlled growth chamber. All samples were immediately frozen in liquid nitrogen and stored until use.
Map-based cloning. We crossed xiaomi with the cultivar G1 to generate a F 2 mapping population. Using 45 recessive F 2 plants with the typical xiaomi-like early-heading phenotype, we firstly mapped the XIAOMI locus to a 5.45 Mb interval between the two indel markers, M3374 and M8819 on chromosome 9. We further developed 9 new markers within this interval and finally narrowed down the locus to a 212-kb region between two SNP markers, M5479 and M5690. A candidate gene was then identified by genome resequencing of and comparison of this region between Jingu21 and xiaomi. Sequences of all primers used in map-based cloning are listed in Supplementary Table 3. DNA and RNA isolation. For PacBio single-molecule sequencing, DNA was extracted from a single healthy xiaomi plant as described in the 'Preparing Arabidopsis Genomic DNA for Size-Selected ~20 kb SMRTbell Libraries' protocol (http://www.pacb.com/wp-content/uploads/2015/09/

Nature PlaNts
Shared-Protocol-Preparing-Arabidopsis-DNA-for-20-kb-SMRTbell-Libraries. pdf). For Illumina HiSeq sequencing, DNA was isolated from leaf tissues using cetyltrimethylammonium bromide methods 28 with modifications. About 100 mg young leaf was ground to a fine powder in liquid nitrogen. The powder was then placed in 2 ml microtubes containing 1 ml preheated 2% cetyltrimethylammonium bromide extraction buffer (adding 0.5% β-mercaptoethanol just before use) and incubated at 65 °C for 30 min. The samples were then centrifuged and the resultant supernatant was extracted with 800 μl chloroform: isoamyl alcohol (24:1, v/v). The supernatant DNA was transferred to a new microtube containing 800 μl cold isopropanol and 80 μl 3 mol l −1 sodium acetate to precipitate the DNA. The precipitate was dissolved in 100 μl ddH 2 O containing 10 ng μl -1 RNase and incubated at 37 °C for 30 min. Finally, the DNA was isolated using magnetic beads. The quality and integrity of extracted DNA was assessed with a Qubit Fluorometer (Life Technologies) and separated in 0.8% agarose gels. Total RNA was isolated with RNAprep Pure Plant Kit (Tiangen Biotech) or Plant RNA kit (OMEGA) according to the manufacturer's instructions. The integrity and quantity of extracted RNA were analysed on an Agilent 2100 bioanalyzer and by agarose gel electrophoresis.

Genome-sequencing library construction, PacBio SMRT and HiSeq sequencing.
The DNA libraries for PacBio SMRT sequencing were prepared following the PacBio standard protocols and sequenced on a Sequel platform by Biomarker Technologies. In brief, genomic DNA from a single xiaomi plant was randomly sheared to an average size of 20 kb, using a g-Tube (Covaris). The sheared gDNA was end-repaired using polishing enzymes. After purification, a 20-kb insert SMRTbell library was constructed according to the PacBio standard protocol with the BluePippin size-selection system (Sage Science) and sequences were generated on a PacBio Sequel (9 cells) and PacBio RS II (1 cell) platform by Biomarker Technologies.
Illumina HiSeq DNA libraries were made following standard protocols provided by Illumina. About 5 μg extracted DNA was fragmented randomly and DNA fragments of the desired length were gel purified. These DNA samples were end-repaired and ligated to the adapter and were then pooled, purified and amplified with primers compatible with the adapter sequences, and used to construct a 270-bp paired-end library. The library was sequenced on an Illumina HiSeq X Ten sequencing platform by Biomarker Technologies.
PacBio assembly, correction and validation. The single-molecule sequencing data were assembled following a hierarchical approach, with correction, assembly and polishing 29 . In brief, a subset of longer reads was selected as seed data and corrected through Canu 30 (v.1.5) and Falcon 31 (v.0.3.0). The error-corrected reads were assembled using Falcon and Canu. Since the Canu and Falcon assemblies each contained some regions that were missing from the other, the two initial assemblies were merged using Quickmerge v.0.2 (https://github.com/mahulchak/quickmerge) to produce a more contiguous assembly. Finally, the draft assembly was polished to obtain the final assembly. The first-round polishing adopted the quiver/arrow algorithm using SMS data with the 40 threads. The second polishing adopted the pilon algorithm (v1.22, https://github.com/broadinstitute/pilon) using Illumina HiSeq-sequencing data.
Hi-C library preparation, sequencing, and raw read processing. The Hi-C library was prepared as described previously 32 with minor modifications. Nuclear DNA was cross-linked in situ with formaldehyde, extracted and then digested with HindIII at 37 °C overnight. After digestion, the sticky ends were filled in, biotinylated and then ligated to each other randomly to form chimeric circles. Biotinylated DNA fragments were reverse cross-linked with proteinase K and purified by phenol extraction, followed by a phenol/chloroform/isoamyl alcohol extraction. Then, the purified DNA was sheared to a size of 300-700 bp with a Covaris S220 instrument (Covaris). The sheared DNA was end-repaired with T4 DNA polymerase. The biotin-tagged ligation products were isolated with MyOne Streptavidin C1 Dynabeads (Life Technologies). Bead-bound Hi-C DNA was amplified and purified for preparing the sequencing library. Finally, the Hi-C library was paired-end sequenced on an Illumina HiSeq X Ten platform.
The Hi-C reads were aligned to the draft assembly using the BWA aln algorithm 33 with default parameters, and the quality was then assessed using HiC-Pro v.2.8.0 (http://github.com/nservant/HiC-Pro). The invalid interaction pairs, including self-circle ligation, dangling ends, PCR duplicates and other potential assay-specific artefacts were discarded. The unique valid interaction pairs (non-redundant, true ligation products) were uniquely mapped onto the draft assembly contigs, which were grouped into 9 chromosome clusters, and scaffolded by Lachesis 32 using the following parameters: cluster min re sites = 52, cluster max link density = 2; cluster noninformative ratio = 2; order min n res in trun = 46; order min n res in shreds = 42.

Repeat annotation, gene prediction and functional annotation.
For the repeat annotation of the xiaomi genome, both structural predictions and de novo approaches were adopted. Specifically, the primary repeat library of xiaomi was built from the de novo approach using LTR_Finder 34  For predicting genes, a combination of ab initio-based approaches, homology-based methods and supporting PacBio isoform sequencing (Iso-seq) were used to conduct a comprehensive search for consensus gene sets. For ab initio-based gene prediction, five gene-finding programs, Genscan 40 44 (v.2006-07-28) were used to detect genes in the repeat masked xiaomi genome with the default parameters. For homology-based prediction, proteins previously annotated in A. thaliana, S. italica, O. sativa, S. bicolor and Z. mays were downloaded and mapped to the xiaomi genome using BLAST and homologous genes were identified using GeMoMa 45 (v.1.3.1). Newly generated xiaomi PacBio Iso-seq and RNA-seq data were directly mapped to the xiaomi genome and assembled by PASA 46 (v.2.0.2). Finally, the results obtained from the above approaches were integrated into a consensus gene set of xiaomi using EVM (v.1.1.1) with default parameters. These protein-coding genes were named using the following gene model nomenclature: Si (for S. italica) followed by the chromosome number and gene number on the chromosome, going from top to bottom in steps of 10. For example, the first and the second genes on chromosome 1 were named Si1G00010 and Si1G00020, respectively.
Pseudogene GeneWise (v.2.4.1) was used to predict the candidate gene structure on the basis of the homogenous alignments. We filtered the GeneWise results to retain only those with at least 95% coverage of the protein. The gene structures with frameshift mutations were considered to be candidate pseudogenes.
Genome quality evaluation. The quality of the xiaomi assembly was assessed by examining the alignment ratio of HiSeq short reads and the presence of well-conserved core eukaryotic genes. The short reads generated by the Illumina HiSeq platform were aligned to the xiaomi assembly using BWA (v.0.7.10-r789). To further evaluate the completeness of the xiaomi gene models, BUSCO 51 (v.2.0) analysis was undertaken with genome mode and embryophyta_odb9 dataset (http://busco.ezlab.org/datasets/embryophyta_odb9.tar.gz) as a reference. The embryophyte_db9 dataset contains 1,440 protein sequences and orthologous group annotations for major clades. The proportion of complete and partial core eukaryotic genes was assessed as a measure of the completeness of the xiaomi assembly.

RNA-sequencing library preparation, Iso-seq and HiSeq sequencing. For
Iso-seq, eight tissues, including seed, seedling, root, stem, young leaf (leaf 1), mature leaf (leaf 3), pollinated panicles (panicle 1) and panicles at the filling stage (panicle 3), were collected for RNA isolation. Equal amounts of total RNA from each tissue were pooled together to identify as many isoforms as possible. SMRTbell libraries were prepared according to the Iso-seq protocol using the Clontech SMARTer PCR cDNA Synthesis Kit and the BluePippin size-selection system (Sage Science). The first cDNA strand was synthesized using SMARTer PCR cDNA Synthesis Kit (Takara Biotechnology). After cycle optimization, large-scale PCR was performed to generate double-stranded cDNA for size selection on the BluePippin system (Sage Science). Then, another large-scale PCR was performed using the eluted DNA to generate more double-stranded cDNA. Re-amplified cDNA was purified, repaired and ligated with hairpin adapters. To minimize the bias that favours sequencing of shorter transcripts, multiple size-fractionated libraries (0-1, 0.5-1, 1-2, 1-3, 2-3 and 2-8 kb) were constructed according to the manufacture's instruction. Finally, a total of 15 SMRT cells were sequenced on a PacBio RS II platform.
For transcriptome atlas sequencing with the Illumina HiSeq X Ten platform, RNA-seq libraries were constructed using the NEBNext Ultra RNA Library Prep Kit for Illumina (no. E7770, New England BioLabs) according to the manufacturer's instructions. In brief, mRNA was purified from total RNA using NEBNext Poly (A) mRNA Magnetic Isolation Module (no. E7490, New England BioLabs) and fragmented into approximately 200-nt RNA short fragments. The fragmented mRNAs were then used as templates to synthesize the first-strand and the second-strand cDNAs. After end repair and adapter ligation, the products were selected by Agencourt AMPure XP beads (Beckman Coulter) and amplified to create a cDNA library by PCR. In total, 35 RNA-seq libraries were made from 11 different tissues with five biological replicates for leaf 2 and three biological replicates for others. All libraries were sequenced using an Illumina HiSeq X Ten platform by Biomarker Technologies.

Nature PlaNts
RNA-seq read processing, clustering analyses, Z-score and coefficient of variation expression analysis. Illumina RNA-seq reads of xiaomi were cleaned using Trimmomatic 52 (v.0.38) with parameters 'ILLUMINACLIP:TruSeq3-PE. fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:30 HEADCROP:10' . The clean reads were mapped to the xiaomi genome using HISAT2 53 (v.2.04) with default parameters. Gene-expression analysis and quantile normalization were conducted using R with the TPM (ref. 54 ). Genes with TPM > 0 in a given organ were considered to be expressed in that organ. Tissue-preferential and -specific genes were identified according to their TPM. Fold changes greater than ten between tissues showing the highest and the second highest expression levels were considered to be tissue-specific genes, whereas fold changes of at least 5 but no more than 10 were considered as tissue-preferentially expressed genes. Constitutively expressed genes were identified by coefficient of variation (CV) analysis. CVs ranged from 10.30% to 331.66%, representing the most stably expressed genes to the most differentially expressed genes. A gene with CV < 20% and a difference of no more than twofold between the highest and lowest levels of a gene transcript in any organ was considered to be constitutively expressed.
Hierarchical clustering analysis was performed using the pheatmap package (v.1.0.12) in R. Distance analysis was calculated using pairwise Pearson correlation.
Non-coding RNA isolation, library preparation, sequencing and sequence data analysis. The miRNAs were isolated using the EASYspin Plant microRNA Kit (RN4001, Aidlab Biotechnologies) according to the manufacturer's protocol. The miRNAs from the tissues for atlas analysis were mixed equally and used for library construction. The sequencing library was prepared using the NEB Multiplex Small RNA Library Prep Kit for Illumina kit (New England Biolabs) following the manufacturer's recommendations. In brief, the small RNAs were ligated with 3′ and 5′ SR adapters, reverse-transcribed and amplified. Amplified cDNA constructs between 140-160 bp in size were selected and sequenced using the Illumina HiSeq X Ten platform with single-end reads of 50 nucleotides. The raw reads were trimmed by removing adapter sequences and low-quality reads containing ploy-N, with adapter contaminants of less than 18 nt. Then, the high-quality clean reads of small RNA were mapped to the xiaomi reference sequence and miRNAs were identified using MiRDeep2 55 .
Strand-specific RNA-seq libraries for lncRNA and circRNA identification were generated using Ribo-Zero Magnetic Kit (Illumina) and NEBNext Ultra Directional RNA Library Prep Kit (no. E7420, New England BioLabs) following the manufacturer's recommendations. In brief, total RNA was treated with the Ribo-Zero Magnetic Kit to remove ribosomal RNA. After fragmentation, the rRNA-depleted RNA was reverse-transcribed using random primers, followed by the second-strand synthesis. The resulting double-stranded DNA was ligated to adapters after purification, end repair and ligation of a poly A tail. Subsequently, the cDNA was digested with uracil-specific excision reagent (USER) enzyme to degrade the cDNA strands containing U instead of T. The first-strand cDNA that preserved the direction of the RNA was amplified and the products were purified. Finally, the strand-specific cDNA library was sequenced using an Illumina HiSeq X Ten platform with 150-bp pair-end reads. The resulting directional paired-end reads were filtered and trimmed using Trimmomatic 52 (v.0.38). Then, the clean reads were mapped to the xiaomi genome sequence using HISAT2. To construct transcripts, the mapped reads were assembled de novo using Cufflinks (http:// cole-trapnell-lab.github.io/cufflinks/). The assembled transcripts were annotated using the xiaomi genome to identify protein-coding transcripts. After filtering of the protein-coding genes, lncRNAs were identified using the following parameters: (1) fragments per kilobase of transcript per million mapped reads ≥ 0.5; (2) the transcripts were longer than 200 bp. CircRNAs were identified essentially according to the method described by Memczak et al. 56 .

Identification of SNPs, small indels and PAVs.
We identified SNPs and small indels (length <100 bp) with MUMmer 57 (v3.23) (http://mummer.sourceforge.net/) between the xiaomi and Yugu1 genomes. In brief, the xiaomi pseudochromosome sequence was mapped to its corresponding Yugu1 pseudochromosomes with MUMmer, and then SNPs and indels were identified using Show-SNPs. PAVs were extracted by scanPAV 58 with default parameters. The resulting PAVs of up to 1,000 bp were filtered out as noise.
Agrobacterium-mediated genetic transformation, GFP fluorescence observation and molecular analysis. This method was developed following the protocol for mature seed-based transformation in rice 14 , with improvements to make it suitable for foxtail millet. For callus induction, palea and lemma of xiaomi mature seeds were mechanically removed to reduce potential contamination. The dehusked seeds were surface-sterilized in 70% (v/v) ethanol for 2 min, and then in 10% bleach containing 0.1% tween 20 for 20 min, and finally rinsed 5 times with autoclaved water. The sterilized seeds were transferred onto sterile paper towels to remove excess water. The seeds were placed on CIM (4 g l −1 CHU (N6) basal salt with vitamins, 30 g l −1 sucrose, 2 mg l −1 2,4-dichlorophenoxyacetic acid, 0.3 g l −1 casein acid hydrolysate, 2.8 g l −1 proline, 0.1 g l −1 myo-inositol, 0.1 mg l −1 6-benzylaminopurine, 8 g l −1 agar, pH 5.7). The seeds were incubated at 28 °C in the dark. After 8-10 weeks induction, the callus could be seen. To obtain high-quality, regenerable calli, the initially formed callus was divided into 2-3 mm pieces and transferred onto fresh CIM. After three rounds of subculture, the calli became yellowish and were ready for transformation.
The vectors pCAMBIA1305-GFP and p8-GFP, both harbouring the GFP gene as a reporter, were used for protocol development and method optimization. The HPT gene in pCAMBIA1305-GFP and NPTII gene in p8-GFP were tested for their effectiveness in selection of transformants. Both vectors were introduced into the Agrobacterium strain EHA105 by electroporation. The EHA105 cells were cultured in YEB medium (5 g l −1 beef extract, 5 g l −1 peptone, 1 g l −1 yeast extract, 5 g l −1 sucrose, 10 mM magnesium sulfate, pH 7.0) overnight until optical density at 600 nm (OD 600nm ) = 1.0. For infection and co-cultivation, the actively proliferating calli were infected with Agrobacterium cells (OD 600nm = 0.5) in the infection medium (0.44 g l −1 Murashige-Skoog salts, 1×B5 vitamins, 68 g l −1 sucrose, 36 g l −1 glucose, 1 g l −1 asparagine, 1 g l −1 casamino acids, 0.2 g l −1 cysteine, 2 mg l −1 2,4-dichlorophenoxyacetic acid, 200 μM acetosyringone, pH 5.2) for 5 min. The calli were then blotted on sterile filter paper and transferred onto infection medium solidified with 8 g l −1 agarose for 3 d co-cultivation at 22 °C in the dark.
Expression of GFP was monitored with a Leica M305FCA fluorescence stereo microscope equipped with a DMC6200 camera during co-cultivation, selection and shoot regeneration. Transgenic GFP plants were confirmed by PCR genotyping using the GFP, UBI, HPT and NPTII specific binding primers listed in Supplementary Table 3.
T-DNA identification of the transgenic xiaomi lines. We identified the T-DNA insertion sites in 13 independently transgenic xiaomi plants, including 7 pCAMBIA1305-GFP and 6 p8-GFP transgenic lines, by genome resequencing. Approximately 50 T1 transgenic young seedlings of each line were used for DNA extraction and sequencing. Approximately 12 Gb of data (~28× coverage) was obtained for each line. T-DNA insertion site(s) were identified using TDNAScan 59 . Primers used for PCR confirmation of insertion sites are listed in Supplementary Table 3.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.