Microprocessor mediates transcriptional termination in long noncoding microRNA genes

MicroRNA (miRNA) play a major role in the post-transcriptional regulation of gene expression. Mammalian miRNA biogenesis begins with co-transcriptional cleavage of RNA polymerase II (Pol II) transcripts by the Microprocessor complex. While most miRNA are located within introns of protein coding genes, a substantial minority of miRNA originate from long non coding (lnc) RNA where transcript processing is largely uncharacterized. We show, by detailed characterization of liver-specific lnc-pri-miR-122 and genome-wide analysis in human cell lines, that most lnc-pri-miRNA do not use the canonical cleavage and polyadenylation (CPA) pathway, but instead use Microprocessor cleavage to terminate transcription. This Microprocessor inactivation leads to extensive transcriptional readthrough of lnc-pri-miRNA and transcriptional interference with downstream genes. Consequently we define a novel RNase III-mediated, polyadenylation-independent mechanism of Pol II transcription termination in mammalian cells.

inefficiently spliced RNA that lacks an internal 3 kb intron (Fig. 1c). The transcript size indicated that the 3′ end lay close to the pre-miR-122 hairpin, ~2.5kb upstream of a previously identified polyadenylated 3′ end 27 (Fig. 1a). We did not detect any longer lnc-pri-miR-122 transcripts. Immunoprecipitation with an antibody directed against the m 7 G cap demonstrated that lnc-pri-miR-122 was capped, similar to GAPDH mRNA and in agreement with existing CAGE data 28 (Fig. 1d). However, quantitative RT-PCR (RT-qPCR) and northern analysis of pA-selected RNA indicated that lnc-pri-miR-122 was nonpolyadenylated, in contrast to GAPDH mRNA, but similar to U6 snRNA (Fig. 1e).
These results suggested that the lnc-pri-miR-122 3′ end is generated by Drosha cleavage and not by CPA. To characterize the role of the Microprocessor in lnc-pri-miR-122 3′ end formation, we compared the effects of siRNA-mediated knockdown in Huh7 cells of DGCR8 versus the CPA endonuclease CPSF-73. Depletion of both proteins was effective (Fig. 3c). RT-qPCR analysis of RNA isolated from nuclear chromatin, which is enriched in nascent transcripts 29 , was used to compare profiles across lnc-pri-miRNA-122 and the protein coding gene GAPDH. The level of each RT-qPCR product was normalized to the intron 1 product as a measure of basal nascent transcription, and is shown relative to control siRNA-treated cells. These nascent transcript profiles showed a clear increase in level downstream of the pre-miR-122 sequence in DGCR8 but not CPSF-73 depleted cells (Fig.  3a). We found that Drosha depletion elicited readthrough transcription similar to DGCR8 depletion, while Dicer depletion had no effect (Supplementary Figure 3a). This argues against indirect effects of miRNA depletion on termination and strongly suggests a direct role for the Microprocessor in lnc-pri-miR-122 transcription termination. In contrast, GAPDH nascent transcripts showed no effect of DGCR8 depletion, but as expected showed increased level downstream of the PAS indicating a strong termination defect following CPSF-73 depletion (Fig. 3b). Nuclear run on (NRO) analysis confirmed that DGCR8 knockdown caused a termination defect for lnc-pri-miR-122 but not GAPDH, while CPSF-73 knockdown inhibited termination in GAPDH but not lnc-pri-miR-122 (Fig. 3a,b). Pol II chromatin-immunoprecipitation (ChIP) also confirmed transcriptional readthrough on lnc-pri-miR-122 following DGCR8 depletion ( Supplementary Fig. 3b). In sum, we establish that the Microprocessor dictates lnc-pri-miR-122 3′ end formation and transcriptional termination, in contrast to GAPDH which relies on the orthodox CPA complex.

CPA does not occur on lnc-pri-miR-122 transcripts
The transcriptional readthrough we observed following DGCR8 knockdown implied that lnc-pri-miR-122 does not switch to the efficient CPA mechanism of transcriptional termination when the Microprocessor mechanism is inhibited. To address this question directly, we pA-selected RNA from Huh7 cells with or without DGCR8 knockdown. Similarly to U6 snRNA, lnc-pri-miR-122 transcripts remained pA− even when their 3′ end formation was compromised by DGCR8 depletion. In contrast, GAPDH mRNA was strongly pA+ under both conditions (Fig. 4a). By next generation sequencing analysis of chromatin-associated RNA from Huh7 cells (chromatin RNA-seq), we found that DGCR8 depletion leads to extensive transcriptional readthrough for over 5kb downstream of the pre-miR-122 hairpin in lnc-pri-miR-122 (Fig. 4b), despite the presence of several consensus PAS.
These results indicated that Pol II transcribing lnc-pri-miR-122 fails to recognize PAS even when Microprocessor cleavage is inhibited. This surprising finding suggested that Pol II might be recruited to the endogenous lnc-pri-miR-122 promoter in a CPA refractory form. To investigate this further, we cloned lnc-pri-miR-122 under the control of the human immunodeficiency virus (HIV) long terminal repeat (LTR) promoter with or without pre-miR-122 hairpin deletion (Fig. 5a). The resulting wild type (WT) and deleted (Δ) plasmids were transfected into HeLa cells, which do not express endogenous lnc-pri-miR-122 ( Fig.  1b), together with a plasmid encoding the Tat transcriptional activator. We confirmed that mature miR-122 was expressed from the WT but not Δ plasmid (Fig. 5b). Unspliced and spliced lnc-pri-miR-122 transcripts generated from the WT plasmid were the same size as the endogenous transcripts in Huh7 cells (Fig. 5c), indicating that the site of 3′ end formation is independent of promoter or cell type. Deletion of the pre-miR-122 hairpin or depletion of Drosha or DGCR8 led to production of unspliced and spliced lnc-pri-miR-122 transcripts that migrated at a higher molecular weight than wildtype transcripts (Fig. 5d,e). pA fractionation indicated that lnc-pri-miR-122 RNA generated from the WT plasmid was pA−, similar to U6 snRNA, but became pA+ when DGCR8 was depleted, similar to GAPDH mRNA (Fig. 5f). The 3′ end of Δ RNA was mapped by 3′RACE to a PAS 91nt downstream of the pre-miR-122 hairpin (pA1, Fig 5a). Mutagenesis indicated that this PAS was necessary for 3′ end generation in Δ, but not WT, lnc-pri-miR-122 transcripts (Fig. 5g). This confirmed that in a plasmid context, lnc-pri-miR-122 3′ ends were generated by the Microprocessor, but in the absence of this processing, CPA occurred at a site downstream. Importantly, this was in contrast to the endogenous transcripts, where transcriptional readthrough occurred when Microprocessor cleavage was inhibited, and the PAS was not used (Fig. 3,4).

Microprocessor terminates transcription of most lnc-pri-miRNA
Having demonstrated that chromatin RNA-seq is an effective method of identifying defects in transcriptional termination following Microprocessor depletion (Fig. 4b), we extended this analysis to HeLa cells on a genome-wide scale. HeLa cells were chosen because they express a relatively high number of miRNA. We found that either Drosha or DGCR8 depletion, by siRNA treatment (Fig. 6e), resulted in transcriptional readthrough in most of the expressed lnc-pri-miRNA (Supplementary Table 1). MIR181A1HG and MIR17HG are shown as specific examples (Fig. 6a, Supplementary Fig. 4a), while metagene analysis showed a general termination defect with transcription extending more than 10 kb downstream of lnc-pri-miRNA following Microprocessor depletion (Fig. 6b). A few lnc-pri-miRNA did not shown transcriptional readthrough following Microprocessor depletion (shaded area, Supplementary Table 1); MIRLET7BHG is shown as an example ( Supplementary Fig. 4b). Importantly, Dicer knockdown did not affect lnc-pri-miRNA transcriptional termination, confirming that the effects of Microprocessor depletion are direct and not due to loss of mature miRNA ( Supplementary Fig. 5).
In marked contrast, protein coding genes that harbor intronic pre-miRNA showed no termination defect following Microprocessor depletion. Rather, intronic sequence containing pre-miRNA was stabilized (higher reads) as shown for MCM7 (Fig. 6c), which has three miRNA (miR-25, 93 and 106b) in its penultimate intron. Presumably this selective intron stabilization was caused by the loss of Drosha-mediated co-transcriptional cleavage 8 . Metagene analysis revealed that Microprocessor activity generally had no effect on transcriptional termination for these protein coding miRNA genes (Fig. 6d). This implies a functional difference between the processing of miRNA in introns of protein coding transcripts, which use CPA-mediated termination irrespective of internal Microprocessor cleavage, in contrast to lnc-pri-miRNA transcripts, which rely on the Microprocessor for efficient termination. Chromatin RNA-seq analysis in Huh7 cells also identified transcriptional readthrough in lnc-pri-miRNA, but not protein coding pri-miRNA, following DGCR8 depletion. Although the pattern of pri-miRNA expression differs between Huh7 and HeLa, many of the same lnc-pri-miRNA were affected in both cell lines (Supplementary Table 2).
We also investigated the effect of Microprocessor depletion on levels of transcription at gene 5′ ends (TSS). While no change in nascent transcript level was observed for lnc-pri-miRNA genes, protein coding genes hosting pre-miRNA showed transcript reduction following DGCR8 and especially Drosha knockdown ( Supplementary Fig. 6a,b). This suggests that the Microprocessor may have a positive influence on gene transcription, as previously noted 30 . It also suggests that transcriptional initiation differs at lnc-pri-miRNA and protein coding pri-miRNA gene promoters. As a control for the quality of our chromatin RNA-seq data we showed that duplicate libraries display high correlation ( Supplementary  Fig. 7).

Microprocessor prevents transcriptional interference
We identified specific examples of lnc-pri-miRNA genes in which the transcriptional readthrough induced by Microprocessor depletion extended into a downstream protein coding gene, either in convergent or tandem orientation. We reasoned that such readthrough transcription might downregulate the invaded gene by a transcriptional interference mechanism 31 . For the tandem MIR17HG-GPC5 locus, Microprocessor depletion caused the MIR17HG transcript to extend over 20 kb, reading into GPC5 (Fig. 7a, Supplementary Fig.  8). Chimeric transcripts were readily detected, as was a substantial reduction in GPC5 exon 1 RNA levels (Fig. 7b, d, Supplementary Fig. 8). Both GPC5 mRNA and protein levels were more than 70% reduced (Fig. 7e,f), indicating a clear transcriptional interference effect caused by loss of Microprocessor-mediated termination. For the convergent OGFRL1-LINC00472 locus, loss of Microprocessor caused LINC00472 transcripts to read through into OGFRL1, again causing transcriptional down-regulation (Fig. 7c,d). OGFRL1 mRNA levels dropped 80% while protein levels were 50% lower (Fig. 7e,f). Possibly this protein has higher stability than GPC5. These data imply that convergent transcription can also induce gene inactivation, possibly by Pol II collision effects 32 . Notably Dicer depletion had no effect on GPC5 or OGFRL1 mRNA level (Fig. 7e), indicating that the effects of DGCR8 knockdown are due to transcriptional interference and not miRNA-mediated mRNA destabilization.

Most lnc-pri-miRNA remain pA− after Microprocessor depletion
Similar to endogenous lnc-pri-miR-122, Microprocessor-terminated lnc-pri-miRNA appears to be insensitive to the presence of cryptic PAS, invariably present within their gene and 3′ flanking regions. To further investigate the use of PAS in pri-miRNA, we performed nuclear pA+ and pA− RNA-seq in HeLa cells with or without DGCR8 knockdown. We found that the majority of lnc-pri-miRNA existed as predominantly pA− transcripts, and those that showed extensive readthrough upon loss of Microprocessor remained pA− (Supplementary Table 3). MIR17HG is shown as a specific example (Fig. 8a). It is remarkable that for these Pol II transcripts PAS remain opaque to RNA processing by the CPA complex. However, a few lnc-pri-miRNA utilize PAS to some extent, especially following Microprocessor inactivation. Thus MIRLET7BHG transcripts switched from mainly pA− to pA+ following Microprocessor depletion (Fig. 8b, Supplementary Table 3), and efficient termination occurred at a canonical PAS positioned immediately downstream of pre-miR-let7b. This is similar to the switch to CPA at a downstream PAS that we observed in ectopically expressed lnc-pri-miR-122 (Fig. 5), indicating that this distinction is biologically relevant.

Discussion
We have identified a CPSF-73 independent, Microprocessor-driven transcription termination mechanism for pri-miR-122 lncRNA. This results in the production of unstable nuclear unspliced and spliced lnc-pri-miR-122 transcripts with 3′ ends defined by Drosha cleavage (Fig. 1,2). By genome-wide analysis, we found that transcriptional termination by the Microprocessor is a feature shared with most other lnc-pri-miRNA genes in both HeLa and Huh7 cells (Fig. 6 Previous evidence indicated that pri-miRNA are typical capped, polyadenylated Pol II transcripts. This is clearly established for protein coding genes containing intronic miRNA 8 , but is also true of the few lnc-pri-miRNA for which the 3′ end has been characterized, such as pri-miR-21 4,33 and C. elegans let-7 34 . Our genome-wide analysis confirmed that CPA does occur in a minority of lnc-pri-miRNA (Supplementary Table 3), but showed that the Microprocessor-mediated termination mechanism predominates (Supplementary Table 1,2). A previous study showed that Drosha processing of pre-miRNA hairpins can attenuate downstream transcription by providing an entry site for Xrn2. However, these experiments were carried out using plasmid constructs that lacked a PAS, and did not provide evidence that Microprocessor cleavage could actually replace CPA as a mode of transcriptional termination 9 . A role for Drosha cleavage at the HIV LTR in preventing productive transcription elongation in the absence of Tat also indicates that the Microprocessor can disrupt the transcription machinery 35 , but does not connect miRNA processing with transcriptional termination. In contrast, we have demonstrated that Microprocessor cleavage mediates transcriptional termination on endogenous pri-miRNA transcripts, and moreover that this is limited to lnc-pri-miRNA. This departs from the clear current consensus that pri-miRNA are typical capped and polyadenylated transcripts (see recent review 5 ), a view derived from analysis of protein coding pri-miRNA and confirmed for protein coding genes by our genome-wide analysis.
The Microprocessor termination pathway adds to a short list of non-canonical mechanisms of Pol II transcriptional termination. Termination of histone mRNA does not involve polyadenylation, similar to pri-miR-122, but requires cleavage by CPSF-73, in common with CPA 36 . In yeast, the Nrd1-Nab3-Sen1 pathway terminates Pol II transcription of small nuclear (sn)RNA, small nucleolar (sno)RNA and cryptic unstable transcripts (CUTs) [37][38][39] , while in mammals the Integrator complex mediates transcriptional termination and subsequent processing of snRNA 40,41 . Importantly, both the Nrd1 and Integrator pathways are only used for termination on short transcripts. In contrast, the Microprocessor mechanism described here provides an alternative to CPA in terminating transcription several kilobases downstream of initiation.
The closest parallel to this Microprocessor-dependent termination pathway is in budding yeast, where Rnt1 cleavage leads to pA-independent termination of Pol II transcription when CPA fails, preventing readthrough transcription and subsequent transcriptional interference 20,22 . Rnt1-terminated transcripts are rapidly degraded, similar to lnc-pri-miR-122. Transcriptional termination following either Rnt1 cleavage or CPA occurs as a result of Rat1 or Xrn2 degradation of the nascent transcript downstream of the cleavage site 15,16,20 . However, we observed no effect of Xrn2 depletion on lnc-pri-miR-122 transcriptional termination (data not shown), in contrast to the role for Xrn2 in RNA degradation following Drosha cleavage of an intergenic clustered pri-miRNA 9 . The mechanism of Pol II termination following Microprocessor cleavage of lnc-pri-miR-122 may involve other nucleases or termination factors, such as Pcf11 18,42 or the mammalian ortholog of Sen1, Senataxin 43 . Of note, we observed variable levels of chromatin RNA-seq signal downstream of the pre-miRNA hairpin following DGCR8 knockdown among different Microprocessor-terminated lnc-pri-miRNA. For example, the amplitude of readthrough transcription is higher in MIR181A1HG and MIR17HG than lnc-pri-miR-122 ( Fig. 4b,6, Supplementary Fig. 4, Supplementary Table 1,2). The decrease in lnc-pri-miR-122 readthrough transcription may result from low-level Microprocessor-driven termination mediated by residual DGCR8 following siRNA transfection, with gene-specific differences possibly due to some pre-miRNA hairpins competing more effectively for the remaining Microprocessor. Alternatively, it is possible that other, non-CPA, mechanisms can displace transcribing Pol II from these genes.
Although Microprocessor-driven termination is a common feature of lnc-pri-miRNA, it is not universal. This raises the question of whether specific genetic features are necessary for Microprocessor-mediated termination to occur. We find that this mechanism can be used by lncRNA irrespective of whether the pre-miRNA is located in an exon or intron (Supplementary Table 4c), at a range of distances from the TSS (Supplementary Table 4a,b). It is possible that the efficiency of termination is affected by the Microprocessor cleavage event itself. For example, protein cofactors are known to assist in Microprocessor release of specific pre-miRNA 5 , while sequence features surrounding the pre-miRNA hairpin can influence the efficiency of processing 44 . Many Microprocessor-terminated lnc-pri-miRNA contain clustered pre-miRNA hairpins, which raises the interesting question of whether Microprocessor cleavage at a specific pre-miRNA drives transcription termination. As some transcriptional readthrough occurs following Microprocessor cleavage, similar to the continued Pol II transcription following cleavage at a PAS in CPA-dependent termination, it is not possible to precisely define which cleavage event drives termination based on our chromatin RNA-seq data. A recent study defined a 'Microprocessing index' (MPI) that shows variable cleavage efficiency for different pre-miRNA 45 . Of the 13 HeLa lnc-pri-miRNA detected in this study, 11 contain a pre-miRNA hairpin with MPI <-1.0 indicating efficient co-transcriptional processing, and 7 of these contain a hairpin with MPI <-3.0, indicating highly efficient processing 45 . Although the pool of lnc-pri-miRNA that we detect in HeLa is too small to draw statistically robust conclusions, this raises the possibility that rapid Microprocessor cleavage may be important for transcriptional termination.
We found that Microprocessor-driven transcriptional termination is used by most lnc-pri-miRNA (73%), but not protein coding pri-miRNA (Fig. 6, Supplementary Table 1,2), demonstrating a fundamental difference in RNA processing between lncRNA and protein coding transcripts. The Microprocessor mechanism has parallels to another CPAindependent mechanism of transcriptional termination used by the lncRNA MALAT1, where 3′ end formation and concomitant release of a small RNA is mediated by the tRNA biogenesis endonucleases RNase P and RNase Z 46 . However, lncRNA derived from bidirectional firing at protein coding gene promoters, known as upstream antisense (ua)RNA transcripts, terminate transcription by CPA and tend to use promoter proximal PAS 47,48 . The difference between pA− lnc-pri-miRNA transcripts as described in this study and CPAcompetent lncRNA derived from antisense promoter activity may relate to promoter specificity. Pol II elongation complexes set up on a protein coding gene promoter may be CPA-responsive irrespective of promoter directionality. In contrast, lnc-pri-miRNA promoters may form a different type of Pol II elongation complex that is CPA nonresponsive but Microprocessor-active. A role for the promoter is supported by our observation that most lnc-pri-miRNA, including lnc-pri-miR-122, do not use CPA even when the Microprocessor is depleted, instead showing extensive transcriptional readthrough and remaining pA− (Fig. 8a, Supplementary Table 3). In contrast, ectopically expressed lncpri-miR-122 uses Microprocessor cleavage to mediate transcriptional termination but switches to CPA when this mechanism is inhibited (Fig. 5). Therefore, the PAS downstream of pre-miR-122 can function, but not in the context of the endogenous gene. The biological relevance of this distinction is clear from our observation of a switch to CPA in MIRLET7BHG transcripts following Microprocessor depletion (Fig. 8b).
LncRNA genes are generally thought to be similar to mRNA genes, with similar chromatin profiles, transcriptional regulation and splice signals. However, lncRNA genes tend to have fewer and longer exons than mRNA genes, and there is a trend for less efficient splicing of lncRNA than of mRNA 49,50 . We find that CPA is inefficient on lnc-pri-miRNA (Supplementary Table 3). As splicing and CPA function to generate a stable cytoplasmic coding transcript, there would be little need for these processes to occur on most lncRNA.
Microprocessor-driven transcriptional termination occurs on lncRNA genes with either intronic or exonic miRNA (Supplementary Table 4c), suggesting that splicing in lncRNA transcripts is not functionally important. Possibly spliced transcripts generated from lncRNA genes such as pri-miR-122 are simply a by-product of transcription; as the splicing machinery is recruited co-transcriptionally, some default recognition and processing of splice sites occurs. LncRNA remain relatively uncharacterized, and it is possible that multiple classes of lncRNA may exist that use different mechanisms of RNA processing.
The use of Microprocessor-driven 3′ end formation by a subset of pri-miRNA genes raises the question of why these genes use this mechanism. A major consequence is the generation of pA− lnc-pri-miR-122 transcripts that are rapidly degraded (Fig. 1,2). This suggests that this mode of termination might have evolved to limit the production of spliced cytoplasmic pri-miRNA transcripts. The concurrent generation of a spliced mRNA that occurs from intronic coding miRNA genes may not be desirable for all miRNA genes. miR-122 expression is very high in liver cells, with an average of 66,000 copies per cell 23 , and the accumulation of a similar number of copies of the spliced host transcript might be problematic. The miRNA genes we identify as using the Microprocessor-mediated transcriptional termination pathway include others that can be highly expressed and are biologically important, such as the miR-17~92a (MIR17HG) cluster which is highly expressed in embryonic cells and cancer 51 .
Our genome-wide analysis identified a few examples of Microprocessor depletion leading to transcriptional interference with downstream genes in either tandem or convergent orientation (Fig. 7, Supplementary Fig. 8). For two such genes, we observed a strong reduction in mRNA and protein levels following DGCR8 knockdown, confirming that gene expression is affected. It is unclear how widespread this transcriptional interference is. It is likely that it will be influenced by the relative transcriptional activity of the upstream and downstream genes, the distance between them and their chromatin context, and that tissuespecific examples will exist. As Drosha expression changes in different tissues and during differentiation 52 , transcriptional interference may also differ in these situations.
In conclusion, we have identified a novel RNase III-mediated transcriptional termination pathway in mammalian cells. This Microprocessor mechanism provides an alternative to CPA, generating unspliced and spliced transcripts of multiple kb, and is specific to lncRNA. We propose a model in which Microprocessor-driven transcriptional termination of lnc-pri-miRNA prevents readthrough and interference with downstream genes. At the same time, a miRNA is produced, while the host transcript is not polyadenylated and so rapidly turned over in the nucleus (Fig. 8c). This may be valuable for highly expressed miRNA such as miR-122, allowing the cell to achieve high levels of the miRNA without concomitant generation of high levels of an unwanted host transcript.

PCR primers and siRNA sequences
See Supplementary

Plasmid Constructs
Construction of βwt (formerly labeled HIVβ) has been described previously 53 . The pri-miR-122WT construct was made by insertion of a genomic PCR fragment generated using the primers Pri122qF/Pri122qR on Huh7 genomic DNA. The resulting ~8kb PCR fragment was ligated into a cloning vector prepared by long range PCR amplification of βwt with primers Open_BetaqR/B10 qF using Prime STAR HS DNA polymerase (Takara). The Quick change II XL site directed mutagenesis kit (Stratagene) was used to generate the various mutants using the following primer sets: pri-miR-122Δ (DELTA mirqF/DELTA mirqR); pA1mt (pA1MTqF/pA1MTqR).

Cell culture and transfection of siRNA and plasmids
HeLa, Huh7 and HepG2 cells were maintained in DMEM supplemented with 10% fetal bovine serum. For Huh7 and HepG2 culture, 1% non-essential amino acids (Invitrogen) were also included in the culture media. RNAi was performed using lipofectamine RNAiMax (Invitrogen), with siRNA delivered at 30nM final concentration. A second siRNA treatment was performed at 48h, and cells harvested at 72h after the first hit. Lipofectamine 2000 (Invitrogen) was used to deliver 0.1μg pri-miR-122 plasmid and 0.025μg pTAT per well of a 6 well plate.

RNA isolation and northern blot
RNA was isolated using TRIzol reagent (Ambion) according to the manufacturer's instructions. Total RNA from human liver was purchased from Agilent Technologies (Cat. No. 540017). Northern blotting was carried out using standard procedures on equal molar quantities of RNA. Membranes were probed with a random-primed 32 P-labelled DNA fragment corresponding to nt 3077-3707 (exon probe) and nt 1563-2005 (intron probe) of pri-miR-122 in Ultrahyb (Ambion). A fragment corresponding to nt 685-1171 of γ-actin was used as a loading control. For miRNA northern blots, the small RNA fraction was isolated by dissolving total RNA in 300μL of TE buffer with addition of equal amounts of PEG solution (20% PEG 8000, 2M NaCl). Samples were mixed and incubated for at least 30 min on ice, followed by centrifugation at 14,000g for 15min and isopropanol precipitation of the supernatant. Small RNA were run on a 18% polyacrylamide (19:1) urea gel and analyzed as described before 54 using a 32 P-end-labeled oligonucleotide complementary to miR-122.

Reverse transcription and real-time qPCR analysis
Total RNA was treated with DNase I (Roche) and reverse-transcribed using SuperScript Reverse Transcriptase III (Invitrogen) and random primers (Invitrogen). Real-time quantitative PCR (qPCR) was performed with 2x Sensimix SYBR mastermix (Bioline) and analyzed on a Corbett Research Rotor-Gene GG-3000 machine.

Nuclear-cytoplasmic fractionation
The procedure for isolating nuclear and cytoplasmic RNA has been described elsewhere 55 .

m 7 G cap selection
Capped nuclear RNA was immunoprecipitated using a mouse monoclonal antibody against the 5′-terminal m 7 G cap (Cat.No.201001; SYSY Synaptic Systems) according to the manufacturer's protocol. RNA was analyzed by qPCR as described above and compared to 10% input.

PolyA+ and polyA− RNA separation
DNase I-treated nuclear RNA was incubated with oligodT magnetic beads (Dynabeads mRNA purification kit, Invitrogen) to isolate either polyA+ RNA, which was bound to beads, or polyA− RNA, which was present in the flowthrough after incubation. OligodT magnetic bead selection was performed twice to ensure pure polyA+ or polyA− populations. The polyA− RNA population was further processed with the Ribo-Zero Magnetic Kit (Human/Mouse/Rat, Epicentre) to deplete most of the abundant ribosomal RNA.

In vitro polyA tailing and 3′ RACE
To detect the 3′ end of pri-miR-122, in vitro polyA tailing of nuclear RNA from Huh7 was carried out first using the polyA tailing kit (AM1350; Ambion) according to the manufacturer's protocol. Purified RNA was reverse-transcribed using an oligodT 24 V anchor using SuperScript III (Invitrogen). Compatible linker and sense primer (fwd1) were used for PCR. PCR fragments were gel purified and cloned into TA cloning vector (StrataClone PCR cloning kit) for sequence verification.

RNA stability
Transcript half-life was estimated after actinomycin-D (5μg/ml) treatment of Huh7 cells. RNA was extracted from cells at different time points after addition of actinomycin-D using TRIzol. Transcript levels at the indicated time points were analyzed by RT-qPCR. The relative levels of expression of each transcript at different time points were plotted relative to the levels at time=0, which were set to 1.

Br-UTP nuclear run-on analysis
The Br-UTP NRO was carried out largely as described 30 followed by RT-qPCR analysis as described above. The primers used are listed in Supplementary Table 6.

Chromatin RNA isolation
The procedures for separating nuclear RNA into chromatin-associated and released fractions have been described before 56 . Chromatin-associated RNA from Huh7 cells was analyzed using RT-qPCR as detailed above. The primers used are listed in Supplementary Table 6.

ChIP analysis
ChIP analysis was carried out as previously described before 15 . 5μg antibody was used per ChIP. The following antibodies were used for ChIP: rabbit total RNAPII (N-20; sc-899; Santa Cruz Biotechnology). Immunoprecipitated DNAs were used as templates for qPCR.

RNA-sequencing
Chromatin-associated RNA was isolated as described above with the omission of tRNA in solution preparation. Nuclear polyA− or polyA+ RNA were prepared as described above. RNA-seq was performed by the High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics, University of Oxford. RNA Samples were ribodepleted using Ribo-Zero rRNA removal kit (Human/Mouse/Rat, EpiCentre RZH110424). Libraries were prepared using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina, v1.0 (cat # E7420) using manufacturer's guidelines with an exception of using our own 8bp tags for indexing according to 57 . Libraries were sequenced on an Illumina HiSeq-2000 using 100bp and 50bp paired end reads, v3 chemistry.

Bioinformatic analysis
Datasets-All human miRNAs and their genomic coordinates were obtained from miRBase release 20 58 . Annotation for protein coding genes (GRCh37) was obtained from Ensembl, release 74 59 . Annotation for long noncoding RNA (lncRNA) was obtained from GENCODE v7 catalog of human long noncoding RNA 49 .
We separated the miRNA into two groups: lnc-pri-miRNA (if the miRNA overlapped with an lncRNA) and protein coding pri-miRNA (if the miRNA overlapped with an Ensembl protein coding gene), based on the genomic coordinates and taking the strand orientation into account. miRNA harboring genes having length ≥200 bp were considered for this study. This resulted in 112 lnc-pri-miRNA and 967 protein coding pri-miRNA. An additional 18 lnc-pri-miRNA were added to this list which mapped to genes having 'lincRNA' as Ensembl gene biotype and gene length ≥200 bp.
Genomic distribution of miRNA as belonging to protein coding genes and long non-coding genes was based on Ensembl gene biotype annotation ( Supplementary Fig.1).

Mapping of sequencing reads
Paired-end reads for each sample were mapped to the human genome reference assembly GRh37/hg19 (build 37.2, Feb 2009) using the Bowtie2 alignment software 60 . Prior to alignment, the first 12 nucleotides were trimmed from all the reads owing to the low quality of the bases. Uniquely mapped reads with no more than two mismatches were retained for further analysis. For nuclear polyA+ data, we filtered out reads that had 8 or more genomically encoded A-stretch at their 3′ ends. A statistical summary of read alignments can be found in Supplementary Table 5 for HeLa and Huh7 chromatin RNA-seq and HeLa nuclear polyA+ and polyA− RNA-seq.

Calculation of metagene profiles
We used the Ensembl gene annotation to define transcription start and end sites. In-house Perl and Python scripts were used to compute metagene profiles. Metagene profile ( Figure 6) for Control siRNA, Drosha siRNA and DGCR8 siRNA in HeLa cells was calculated for a region from the start of the miRNA host gene to 1 kb upstream of the start of the next downstream gene. For this, read counts were normalized to total sequencing depth. The region extending from TSS to TES was scaled to 4 kb and the region from downstream of TES to 1 kb upstream of the start of the next gene was scaled to 10 kb. Normalized read counts were plotted for each 10 bp bin.
To investigate the profile surrounding TSS upon Microprocessor depletion, we plotted normalized read count across a region of 1kb upstream and downstream of annotated TSS for Control siRNA, Drosha siRNA and DGCR8 siRNA in HeLa cells.
For supplementary Table 1 and 2, normalized read count (RPKM) was calculated over a region from the TES (3′ end) of the lnc-pri-miRNA to 1 kb upstream of the TSS of the next downstream gene for miRNA harboring lncRNA genes that are expressed (RPKM≥1) in HeLa and Huh7 cell lines.

Classification of polyA+, polyA− and bimorphic transcripts
Classification of transcripts into polyA+, polyA− and bimorphic was done as described 62 .
Briefly, all expressed transcripts were classified as polyA+, polyA− and bimorphic predominant transcripts based on their relative abundance, calculated using BPKM (bases per kilobase of gene model per million mapped bases) in the polyA+ and polyA− sample for each condition (Control siRNA and DGCR8 siRNA). See Supplementary Table 3.

Modified images
Original images of gels, autoradiographs and blots used in this study can be found in Supplementary Data Set 1.  PolyA polymerase (PAP)-dependent 3′ end mapping using 3′ RACE. Position of 3′ RACE PCR product shown by gel fractionation and location of mapped 3′ end cleavage products are shown by red arrow heads on the pre-miR-122 hairpin structure (see Supplementary Fig. 2). c. Pri-miR-122 and GAPDH mRNA distribution were determined between cell fractions (WC denotes whole cell, N nuclear and C cytoplasmic) by RT-PCR using indicated primers. Western blot shows purity of nuclear and cytoplasmic fractions by use of cytoplasmic and nuclear specific protein antibodies. d. RNA stability following actinomycin D inhibition of transcription at various time points measured by RT-qPCR of lnc-pri-miR-122 transcripts versus GAPDH mRNA. RNA levels are expressed relative to the levels at time=0, which were set to 1. Huh7 cells were used in all experiments. Error bars represent s.d. of an average (n=3 independent experiments).      Microprocessor-driven termination in lnc-pri-miRNA generates miRNA and a pA− host transcript that is rapidly degraded. Lnc-pri-miRNA may be CPA-incompetent, leading to transcriptional readthrough and interference when Microprocessor cleavage is inhibited, or CPA-competent allowing effective transcription termination even in the absence of Microprocessor. Red thunderbolt with black fill depicts CPA-mediated cleavage. Blue thunderbolt with red fill depicts Microprocessor-mediated cleavage.