The mosaic genome of indigenous African cattle as a unique genetic resource for African pastoralism

Cattle pastoralism plays a central role in human livelihood in Africa. However, the genetic history of its success remains unknown. Here, through whole-genome sequence analysis of 172 indigenous African cattle from 16 breeds representative of the main cattle groups, we identify a major taurine × indicine cattle admixture event dated to circa 750–1,050 yr ago, which has shaped the genome of today’s cattle in the Horn of Africa. We identify 16 loci linked to African environmental adaptations across crossbred animals showing an excess of taurine or indicine ancestry. These include immune-, heat-tolerance- and reproduction-related genes. Moreover, we identify one highly divergent locus in African taurine cattle, which is putatively linked to trypanotolerance and present in crossbred cattle living in trypanosomosis-infested areas. Our findings indicate that a combination of past taurine and recent indicine admixture-derived genetic resources is at the root of the present success of African pastoralism. Whole-genome sequence analysis of 172 indigenous African cattle from 16 breeds identifies 16 loci linked to environmental adaptations among crossbred animals, including a highly divergent locus in African taurine cattle putatively linked to trypanotolerance.

C attle play an important role across African economies and societies as a primary source of wealth 1,2 . They provide nutrition, manure and draught power, and are often used to pay as bride wealth 1,2 . Today, at least 150 indigenous cattle breeds have been recognized across the different agro-ecologies of the African continent 3 , each with unique phenotypic and adaptive characteristics 4,5 .
Previous studies 5,6 have indicated that the dispersion and diversity of African cattle followed the history and development of African pastoralism. It is understood that the humpless Bos taurus and the humped Bos indicus originated from domestications of distinct auroch Bos primigenius subspecies with an ancestral divergence time of ~200,000 to less than 1 million yr ago [7][8][9][10] . The oldest uncontroversial evidence of domestic cattle in Africa dates back to circa 5750-4550 bc in Egypt's Western Desert at Nabta-Keseiba and circa 7000 bc in Kerma, Sudan 11 . These B. taurus cattle were introduced through North Africa and reached the Western and Eastern continent. They remained largely confined to the Saharan-Sahelian belt 12,13 , until circa 4,000-3,000 yr ago, when they reached the Tilemsi Valley tributary of the Niger River in West Africa 14 , the Lake Turkana basin of East Africa 15,16 and the Horn of Africa 17 . The main arrival of B. indicus started around 700 ad along the Red Sea and the Indian Ocean coastal areas, at the outset of the Swahili civilization 18,19 (Fig. 1a), which subsequently led to crossbreeding between B. indicus and already established African taurine.
However, the timing of the taurine × indicine admixture event(s) and their impacts on the development of African pastoralism remain unknown. Archeological evidence indicates that the development of sub-Saharan cattle pastoralism was a complex process that may not have proceeded as smoothly as its modern prevalence suggests 20,21 . In particular, environmental climatic and infectious disease challenges (for example, bovine malignant catarrhal fever, East Coast fever, foot-and-mouth disease, Rift Valley fever and trypanosomosis) likely have led to patchy and delayed establishment of herding across East Africa 16,20,22 .
Today, the majority of African cattle are B. taurus × B. indicus humped populations of diverse phenotypes. They are classified as African Sanga (crossbred between Taurine and Zebu cattle), African Zenga (crossbred between Sanga and Zebu) and African Zebu 3,23 . The African Sanga, an Abyssinian word meaning bull, likely originated in North-East Africa with subsequent dispersion in the Central Lake Region and Southern Africa 14 . A few taurine populations found within the tsetse-belt in West Africa are the only pure African taurine cattle left on the continent 6,24 .
African humped cattle carry only taurine mitochondrial DNA haplotypes [25][26][27] . The Y-chromosome microsatellite indicates the presence of both indicine and taurine Y chromosomes on the continent 5,28 . Furthermore, autosomal genome-wide analyses show that African humped cattle contain taurine background with different levels of genetic contributions across populations, but with little variation within a population [29][30][31] . These analyses suggest that selection played a role in shaping the B. taurus × B. indicus admixture proportion in African cattle, with admixture increasing diversity and providing new genetic resources for human and natural selection 32 . This may have facilitated dispersion and colonization of new habitats 33 . Several recent studies have addressed the effects of admixture and introgression among the Bos species. They have identified loci derived from donor species, which have contributed to the adaptation of recipient species [34][35][36] . However, admixture and introgression also have a cost, as they may reduce the reproductive fitness due to genome incompatibility 37 .
Here, we generated whole-genome sequences of 114 cattle that belong to 12 indigenous African cattle populations and two African buffalo. We combined these with the previously sequenced genomes of 58 cattle from four additional African populations 31,38 . These populations represent the main African cattle groups (Supplementary Note). Using this unique set, we date a main taurine × indicine admixture event and assess the present genome ancestry of African cattle, supporting that a combination of these two ancestries is at the root of the success of African pastoralism. including 331 samples were classified according to their phenotypes as follows: African Taurine (AFT) 3 , African Humped cattle (AFH) (including African Indicine (AFI) 3,31,39,40 , African Sanga (AFS) 31,40 , African Zenga (AFZ) 3 and Sheko), Eurasian Taurine (EAT) (including European Taurine (EUT) and Asian Taurine (AST)) and American-Australian-Asian Indicine (AAI) (including American-Australian Indicine (AMI) and Asian Indicine (ASI)) ( Fig. 1 Table 3 and Extended Data Fig. 1).
Population structure and genetic diversity of African cattle. Population structure and relationships. To characterize the structure of the African populations, we performed principal component analysis (PCA) of the 331 animals (Fig. 2a). All AFH position between EAT and AAI, along eigenvector 1, which explains ~15% of the total variation. AFT Muturu and N'Dama are close to EAT along the eigenvector 1. Most of the AFH cattle cluster together regardless of their breed memberships, leaving only Ankole, Mursi and Sheko outside the main cluster toward the AFT Muturu and N'Dama. The PCA results also show that Muturu and N'Dama, our representative of the AFT population, are separated from the other cattle groups (eigenvector 2, ~2.5% of total variation). Sheko positions close to the AFH, as similarly reported in other studies 5,43 .
Genetic clustering analysis using ADMIXTURE 44 corroborates the pattern found in PCA ( Fig. 2b and Extended Data Fig. 2 The genome-wide autosomal SNPs show reduced levels of heterozygosity in the taurine (0.0021 ± 0.0005 per base pair (bp)) compared with all other populations (0.0048 ± 0.0008 per bp). Heterozygosity values of AFH are similarly higher across populations (0.0046 ± 0.0003 per bp). AAI shows a higher level of heterozygosity compared with AFH (0.0052 ± 0.0014 per bp) (Extended Data Fig. 4). The degree of inbreeding measured by runs of homozygosity (ROH) shows that taurine populations, including Muturu and N'Dama, have a higher level of inbreeding compared with the other populations. AAI shows a similar pattern of ROH distribution to AFH (Extended Data Fig. 5).
Genome-wide admixture signatures in African cattle. Evidence of intensive admixture across African cattle. To further analyze and quantify admixture levels in African cattle, we examined patterns of allele sharing using f 3 , D and f 4 ratio statistics 45 . In the group-based analyses, we used EAT and AAI as a single group considering their genetic similarity compared with the African populations. Only Muturu and N'Dama show no evidence of admixture in f 3 analysis assuming EAT and AAI as proxies for unadmixed taurine and indicine cattle, respectively. For the D statistics, which are more robust to the effect of population-specific drift, this is only the case for the Muturu (Fig. 3a and Supplementary Tables 4 and 5). The positive f 3 statistic in N'Dama is likely due to a recent population bottleneck and subsequent allele frequency changes by genetic drift 45 , as suggested by its high ROH counts and lengths (Extended Data Fig. 5). As Muturu shows no evidence of admixture (Supplementary Tables  4 and 5), we recalculated f 3 and D statistics using Muturu as a proxy for unadmixed taurine. These showed consistent results compared with those when EAT was the proxy (Supplementary Tables 4  and 5). The admixture proportions estimated by f 4 ratio statistics

Dating taurine × indicine admixture across African cattle.
Having established the level of taurine × indicine admixture among African cattle, we then estimated the timing of its generation using admixture LD decay. We first employed a single-pulse admixture model using ALDER. Across all AFH populations, excluding the Kenya Boran, admixture times range from 126.88 (Mursi) to 181.58 (Fogera) generations ago (mean 153.67) (Fig. 3c and Supplementary  Table 7). Additionally, we analyzed our data using MALDER 46 to assess the possibility of multiple admixture events. After fitting a single-pulse model, MALDER analysis did not add a new admixture event with enough significance. Also, the lower significance (Z-score) and larger standard errors of the double-pulse model fitting compared with the single-pulse model fitting support the single-pulse admixture model for our data (Fig. 3d). When we combined AFH populations, excluding Ankole, Kenya Boran, Mursi and Sheko, we obtained a similar result (Supplementary Table 8).
Only the Kenya Boran has a different timing of admixture among the AFH populations, with a very recent admixture signal and similar significances for both the single-and double-pulse model fittings (Fig. 3d). These results support recent and ancient admixture signals in Kenya Boran (Extended Data Fig. 6). The Kenya Boran originates from the Ethiopian Boran 47,48 . After they migrated from Ethiopia to Kenya, they underwent selection and improvement with European taurine in the early twentieth century 47,48 . These recent crossbreeding events most likely correspond to the admixture signal (12.77 ± 12.96 generations ago) of the Kenya Boran (Extended Data Fig. 6). We also detect an ancient admixture signal (132.28 ± 13.60 generations ago) in the Sheko.
In N'Dama, we detected only a recent admixture signal (21.36 ± 2.50 generations ago) (Supplementary Table 7). Previous studies have shown that the N'Dama is composed of several subpopulations with varying degrees of indicine ancestry 5,24,49 . The N'Dama population here is from The Gambia, where an indicine ancestry has previously been documented 5,24,49 . Our results now provide a timescale for this recent admixture event.
We also performed GLOBETROTTER 50 analysis, based on haplotype sharing, as an alternative method to estimate admixture time. The 14 African cattle populations, excluding Muturu and N'Dama, show robust evidence of admixture (bootstrap P < 0.01) (Supplementary Table 9). In addition, admixture time estimates from the populations with best-guess model 'one-date' range from 94.85 to 158.08 generations ago, in agreement with the results from ALDER (Fig. 3e). The exceptions are the Kenya Boran and Kenana, with best-guess model 'multiple-dates' (Supplementary Table 9).
Selection signatures with an excess of taurine or indicine ancestry in African humped cattle. Our genome-wide analysis shows that all sampled African cattle breeds, except Muturu, have taurine and indicine ancestry, with little variation within a population. In such crossbreeds, a haplotype of either taurine or indicine ancestry may confer a relative adaptive advantage following selection pressures. Accordingly, such haplotypes will be selected in the admixed African cattle population over time.
We employed two approaches to identify such loci and haplotypes. We first explored ongoing selective sweep using the integrated haplotype score (iHS). Taking the top 1% windows in terms of the proportion of SNPs with |iHS| ≥ 2 (≥60.00%), we obtained a total of 496 windows of 50 kilobase (kb) length as candidates under selection (Extended Data Fig. 7a). The 494 protein-coding genes overlapped with these windows show significant enrichment in 'defense response to bacterium' (GO:0042742) and 'keratinization' (R-BTA-6805567) (false discovery rate-adjusted P < 0.05) (Supplementary Table 10). These 496 windows have a lower average taurine ancestry (26.14%) than other iHS percentiles as well as the whole genome (32.49%) (Extended Data Figs. 8 and 9). Also, the average taurine ancestry of the windows is outside the empirical distribution generated by resampling (Extended Data Fig. 10). This indicates that the overall ancestry of these selected loci is more skewed toward indicine than the whole genome.
We then inferred local ancestry across the genome using LOTER 51 and selected the top 0.5% windows with the highest taurine or indicine ancestry (Extended Data Fig. 7b). Of these 496 windows, 63 windows identified in the previous iHS analysis were further considered. After filtering out windows with pairwise F st value between the reference populations (EAT and AAI) less than the genome-wide level (<0.2296) and merging adjacent windows, 16 genomic regions were retained, of which three and 13 show an excess of taurine and indicine ancestry, respectively. Eleven of the regions with an excess of indicine ancestry have been identified as selection signal in previous African cattle studies (Table 1). To our knowledge, none of the regions with an excess of taurine ancestry was previously reported under selection in African cattle. The taurine and indicine excess regions overlap with nine and 51 protein-coding genes, respectively.
The longest region, 600 kb in length, is observed at BTA7 (Table 1). It includes 12 significant windows with 92.05% average indicine ancestry, which is much higher than the 67.51% genome-wide average. Downstream of this region, we found three smaller regions of 150-, 200-and 50-kb length with high average indicine ancestry of 91.28%, 91.28% and 92.62%, respectively (Table 1). This cluster of four candidate regions spans 1.40 megabases (Mb) of BTA7 (49.75-51.15 Mb). It shows a reduced level of diversity within AFH and an increased level of genetic differentiation between AFH and EAT. EAT are randomly divided into two subgroups, EATa and EATb, and AFB is the outgroup. Blue and pink colors indicate taurine and indicine ancestries, respectively. c, Admixture times in generations are estimated by ALDER 70 with two reference populations, EAT (n = 103) and AAI (n = 56). The numbers of biologically independent animals used in this analysis for each breed are as follows: Afar (9), Ankole (10), Arsi (10), Barka (9), Butana (20), Ethiopian Boran (10), Fogera (9), Goffa (10), Horro (11), Kenya Boran (10), Kenana (13), Mursi (10), N'Dama (13), Ogaden (9) and Sheko (9). The data points are presented as estimated admixture times in generations ±s.e. Thick and thin horizontal bars represent ±1 s.e. and ±3 s.e., respectively. The Sheko is indicated in yellow. d, Admixture times in generations are estimated by both single-(left) and double-pulse (middle and right) models using MALDER 46 with two reference populations, EAT (n = 103) and AAI (n = 56). The numbers of biologically independent animals used in this analysis for each breed are identical to those of the ALDER analysis in c. The data points are presented as estimated admixture times in generations ±1 s.e. The y axis indicates Z-score for each model fitting. e, The comparison between estimates from the GLOBETROTTER analysis (x axis) and those from ALDER analysis (y axis). The red line indicates y = x. The data points are presented as estimated admixture times in generations ±1 s.e. (horizontal and vertical bars). Standard errors were estimated by leave-one-chromosome-out jackknifing (ALDER) or by bootstrapping (GLOBETROTTER). The numbers of biologically independent animals used in each of the analyses for each breed are identical to those of the ALDER analysis in c. The Sheko is indicated in yellow.
The region with the highest taurine ancestry (61.34%) is of 200-kb length (BTA11: 14.65-14.85 Mb) ( Table 1). As for the BTA7 region, it shows reduced genetic diversity (Fig. 5). However, we observe an increased level of genetic differentiation between AFH and AAI as well as extended haplotype sharing between EAT and AFH (Fig. 5). This region overlaps with seven protein-coding genes ( Table 1), one of which linked to the inflammatory response 59-61 and tick infestation 62 (NLRC4). To identify such loci, we performed population branch statistics (PBS) analysis 64 , comparing AFT and EAT using AAI as an outgroup. After filtering out windows with less than 10 SNPs, we remained with 1,239,021 autosomal windows (50-kb sliding windows with 2-kb overlapping step). PBS values ranged from −0.1156 to 0.8341, with a mean of 0.0314. After removing windows with F st value (AFT versus EAT) less than 0.1 ( Supplementary Fig. 1) from the highest 0.1% PBS windows, we considered the remaining windows as candidate selection signal specific to AFT (Supplementary Table 11).
The strongest PBS signal (0.6740) overlaps with SDK1 on BTA25 (40,052,001-40,102,000), approximately 300 kb upstream of CARD11 (Fig. 6). At this region, F st values between AFT and EAT (F st = 0.5173) or AAI (F st = 0.5308) are much higher than the genome-wide level (F st = 0.1106 and F st = 0.1825, respectively) ( Fig. 6b). We observe a unique AFT haplotype pattern compared with EAT and AAI, which is present in some AFH breeds ( Supplementary Figs. 2 and 3).

Discussion
In this study, we first highlighted the taurine × indicine admixture characteristics of 16 indigenous African cattle populations, 14 of them living in the Horn of Africa, the main entry point of Asian zebu on the African continent. Then, we identified and dated the main taurine × indicine admixture event, which has shaped today's genome of these crossbreeds, to around 150 generations ago. We also identified candidate selected regions in these admixed populations, including immune-response-and heat-tolerance-related genes in haplotypes of indicine origins and inflammatory-response-related genes in haplotypes of taurine origins. Last, we identified a locus of African taurine origin putatively linked to trypanotolerance. Together, these results support our hypothesis that the present success and dispersion of African pastoralism followed the arrival of indicine cattle and their crossbreeding with local taurine cattle.
Our estimation under a single-pulse admixture model dates back the admixture time of AFH to around 150 generations ago. Assuming a cattle generation time of 5-7 yr (refs. 65,66 ), it corresponds Bahbahani et al. 30 Kim et al. 31 Bahbahani et al. 76 Gautier  to about 750-1,050 yr ago at the beginning of the second millennium ad (950-1250 ad). According to historical records, Asian zebu arrival along the Horn of Africa started earlier, around 700 ad, following the Islamization of the East African coast and the onset of the Swahili civilization 19 , in agreement with the earliest noncontroversial archeological evidence in the Horn of Africa for African humped cattle, dated around the mid-first millennium ad 18 . Therefore, our results suggest that indicine cattle remained initially confined to the East African coastal areas for at least 2-3 centuries before crossing extensively with taurine cattle. Then, during the second millennium ad, the complex human history of the Horn of Africa, characterized by multiple human population movements and dispersion 67 as well as climatic fluctuation 16,68 , would have further contributed to the landscape of today's genome admixture in East African cattle. Interestingly, a previous study indicates an admixture event in two West African zebu populations at around 500 yr ago 66 . This timing is in agreement with our earlier East African dating of taurine × indicine crossbreeding, which would have been followed by the movement of East African humped cattle along the Sahelian belt and crossbreeding with local taurine cattle in West Africa. The same study identified a more recent admixture event in the West African Borgou around 20 generations ago 66 . This is at approximately the same time as the one identified in our study in the N'Dama from The Gambia. These more recent admixture events may have been linked to the rinderpest epidemics of the end of the nineteenth century 69 . We cannot exclude the possibility that more ancient taurine × indicine admixture events have contributed to the genetic composition of the AFH population from the Horn of Africa. Indeed, the haplotype sharing-based and LD-based admixture dating have limited power to detect admixture signals older than about 200 generations ago 50,70 . However, if this was the case, their admixture signals would have been likely erased by the more recent ones identified here.
The ancestry of the selection signatures in AFH was found to be more skewed toward indicine than the genome-wide average. Domestic cattle are not native to the African continent; African taurine cattle originate from the Near East 3 , while indicine cattle were introduced into Africa after their domestication on the Indian subcontinent 3 . On reaching the African tropical environments, the Near East taurine cattle must have faced major environmental challenges. However, indicine cattle found across the tropical Indian subcontinent may have been better preadapted to African environments and, in particular, to its climatic characteristics 71 . These preadaptations would have facilitated indicine introgression into local inland taurine populations and the dispersion of crossbred animals. However, African livestock diseases (for example, trypanosomosis, bovine malignant catarrhal fever, East Coast fever and Rift Valley fever) would have represented major constraints to the dispersion of indicine × taurine crossbred cattle 22 . Here, the tolerance of African taurine cattle to trypanosomosis 4 as well as the resistance of indicine cattle to infestation with ticks and to heat stress have proven advantageous [72][73][74] .
Heat tolerance, a characteristic of zebu cattle 73,74 , is a candidate for indicine preadaptations to climatic challenges. We found two heat shock protein genes (HSPA9 and DNAJC18) at BTA7, which were previously reported as candidate selective loci in African and Asian indicine cattle 30,[75][76][77] . We also found a water-reabsorption-related gene, GNAS, at BTA13. The protein encoded by GNAS mediates antidiuretic hormone arginine vasopressin (AVP) to aquaporin-2 (AQP2) water channels, contributing to the water conservation pathway of the kidney 78 . Considering the adaptation of Asian zebu cattle to the arid environments 79 , we infer that the indicine haplotype of GNAS contributes to the local adaptation of AFH to the arid areas of the continent. Also, the immune-related genes at BTA7 (MATR3, MZB1 and STING1) and BTA3 (ATG4B (ref. 80 )) ( Table 1)  ticks and tick-borne diseases, such as East Coast fever. STING1 is essential for DNA-mediated type I interferon production and host defense against DNA viral pathogens 81 , and therefore might confer some tolerance to viral infections such as Rift Valley fever and foot-and-mouth disease.
The identification of an autosomal taurine background in all African cattle leads us to expect a contribution of local taurine ancestry to environmental adaptation and thus its contribution to the success of African cattle pastoralism. One example is the candidate region at BTA11, which overlaps with NLRC4 (ref. 59 ) involved in the inflammatory response. It shows extensive haplotype sharing between AFH and taurine cattle (AFT and EAT). Considering the lack of EAT ancestry in AFH cattle, this haplotype likely originates from AFT. Its presence in AFH may have resulted from selection for a better control of the inflammatory response following infections with diseases such as East Coast Fever and Rift Valley Fever 82,83 .
Similarly, across large areas of sub-Saharan Africa, cattle have been exposed to the challenge of trypanosomosis, a severe obstacle to livestock productivity in Africa 84 . African taurine cattle show tolerance to Trypanosoma sp infection, controlling both the effect of infection (for example, anemia and weight loss) and the level of blood parasites 85 . Accordingly, we expect to detect selection signals in some of the humped cattle exposed to trypanosomosis challenges.
In our PBS analysis, a selection signature in AFT was found upstream of CARD11, which encodes a protein essential for the signaling of T and B cells in the innate and adaptive immune systems [86][87][88] . Importantly, it was reported as a differentially expressed gene between the trypanotolerant N'Dama and trypanosusceptible Kenya Boran 89 . We suggest that this candidate region plays a role in regulating CARD11 expression and contributes to the adaptation of AFT and AFH populations to trypanosomosis challenge. Accordingly, this taurine region is expected to be observed in crossbreeds (Sheko, Horro and Mursi), whose natural habitats are infested with tsetse flies 90,91 . However, as a complex quantitative trait [92][93][94] , the potential regulatory element upstream of CARD11 should be regarded as one of many genetic factors contributing to trypanotolerance. Accordingly, it is worth mentioning that the windows within the highest 0.1% PBS value include several genes (FAAP24 (ref. 95 ), WDR48 (ref. 96 ), LRRC8A (ref. 97 ) and IFNAR1 (ref. 98 )) related to anemia and immune response (Supplementary Table 11).
In conclusion, despite the environmental complexity of the African continent, and cattle domestication outside its geographic area, we currently find domestic cattle across all African agro-ecologies. The results presented here support that taurine × indicine admixture events followed by taurine and indicine ancestry selection across the genome is at the root of the success of African cattle pastoralism. These findings are far-reaching in today's context of improving livestock productivity to respond to the needs of the growing human populations, with further crossbreeding of indigenous African cattle with exotic cattle recommended as one of the pathways for the continent's food security. A complete characterization at the genome level of African cattle unique adaptations will open the door to sustainable crossbreeding programs combining local environmental adaptation and increased exotic productivity.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41588-020-0694-2.  (FC121-2001)). Briefly, 1 μg of genomic DNA was fragmented using a Covaris Focused-Ultrasonicator, and repaired. An ' A' was ligated to the 3′ end of the fragments, followed by Illumina adapter ligation. The product was further size-selected for 400-500 bp, PCR-amplified and validated using the Agilent Bioanalyzer. Finally, the DNA was sequenced using the HiSeq2000 platform (Illumina) by Macrogen.

Methods
Our previously published data of 53 commercial taurine 31,103,104 and 48 African 31 cattle, as well as publicly available data of 10 African taurine, 50 European taurine, 34 American-Australian zebu and 22 Asian zebu 105,106 , were used in this study in addition to the generated sequence data. We generated genotype data following the 1000 Bull Genomes Project Run 8 guideline (17 October 2019) (http:// www.1000bullgenomes.com/). We first examined a per-base sequence quality for the raw sequence reads using the fastQC software v.0.11.8 (ref. 107 ), and removed low-quality bases and artifact sequences using Trimommatic v.0.39 (ref. 108 ). The high-quality sequence reads were mapped against the bovine reference genome (ARS-UCD1.2) using bwa mem v.0.7.17 (ref. 109 ) with default parameters. We then used Samtools v.1.9 (ref. 110 ) to sort bam files and create index files. For the mapped reads, potential PCR duplicates were identified using the 'MarkDuplicates' of Picard v.2.20.2 (http://broadinstitute.github.io/picard). The 'BaseRecalibrator' and 'PrintReads' of the genome analysis toolkit (GATK) v.3.8 (GATK) 111 were used to perform base quality score recalibration (BQSR). The known variants file (ARS1.2PlusY_BQSR_v3.vcf.gz) provided by the 1000 Bull Genomes Project was used for masking known sites for all individuals except the two African buffalos (AFB). The before/after BQSR reports were checked by running ' AnalyzeCovariates' to ensure that base quality scores were corrected as expected. For the two AFB samples, we performed an initial round of variant calling on unrecalibrated data. We then performed BQSR by feeding the variants obtained from the initial variant calling as known sites to BaseRecalibrator and finally checked the convergence of base quality improvement.
To check the confidence of variant calls from the resequencing analysis, we additionally genotyped 69 cattle samples using the BovineSNP50 Genotyping BeadChip (Illumina). After filtering out SNPs based on GeneCall score < 0.7, common loci of SNP chip and DNA resequencing data were extracted and examined to assess concordance between genotypes from the two different platforms. We also incorporated the genotype data of 45 samples from our previously published study 21 into this assessment to check the reliability of our current pipeline.
Population differentiation and structure. For PCA, we used the Genome-wide Complex Trait Analysis (GCTA) 113 tool v.1.93.0 to estimate the eigenvalues and eigenvectors, incorporating genotype data from 331 individuals, excluding two African Buffalos. For admixture analysis, we performed LD-based pruning for the genotype data using PLINK v.1.9 (ref. 114 ) with '-indep-pairwise 50 10 0.1' option as recommended by the developer. Admixture v.1.3.0 (ref. 44 ) was run, increasing K from 1 to 10, where K is the assumed number of ancestral populations. The delta K method was used to choose the optimal K 115 . Genetic distances between cattle breeds were estimated with the F st estimator as described by Weir and Cockerham 116 using PLINK v.1.9 (ref. 114 ).
Individual heterozygosity (theta) based on Felsenstein's model of substitutions 119 was estimated using the ATLAS v.0.9 (ref. 120 ) program, which takes into account the depth coverage and sequencing error of each locus. ROH were analyzed using VCFtools v.0.1.17 (ref. 102 ), filtering out ROH segments of <50 kb.
Test for admixture and estimation of admixture proportion. We used the f and D statistics to test and quantify admixture in African cattle. We used our variant calls (~17.7 million SNPs) and the linearly interpolated recombination map derived from a large US Department of Agriculture dairy cattle pedigree 121 . All statistics were computed using ADMIXTOOLS v.5.1 (ref. 45 ) with standard errors obtained from a block jackknife with 5-cM block size. Z-score was calculated on the standard errors. Three types of statistics were used in these analyses with the following notations. Note that EAT was replaced with Muturu, when we used Muturu as the surrogate population close to the source population in the three statistics.
The f 3 statistic was used to test for evidence that African cattle populations are derived from the admixture of two populations (EAT and AAI). X is the target African population of interest and EAT and AAI are populations close to the source populations. A significant negative f 3 statistic is considered evidence of historical admixture in the X population. In contrast, a positive value does not always mean there is no admixture, as a high degree of drift specific to the X population can mask the negative signal 45 . D EAT; X; AAI; AFB ð Þ The D statistic was used to evaluate gene flow between different cattle populations. X is the target African population. If we ascertain AFB as an outgroup, a significant positive value indicates gene flow between EAT and AAI, while a significant negative value indicates gene flow between X and AAI.
α ¼ f 4 EATa; AFB; X; AAI ð Þ =f 4 EATa; AFB; EATb; AAI ð Þ The f 4 ratio (α) quantifies the mixing proportion of an admixture event using the ratio of two f 4 statistics. We specified X as the target African population and AFB as an outgroup. EAT is randomly divided into two subgroups, EATa and EATb, to provide a pair of populations that are completely admixed. Under this specification, the α value is interpreted as the mixing proportion of EAT ancestry in the target African population X.
Estimation of admixture time. The time of admixture was first estimated with ALDER v.1.03 (ref. 70 ), which provides an LD-based admixture time, using the default parameters with a minimum genetic distance (mindis) of 0.5 cM. For this, we used our variant calls (~17.7 million SNPs) and the linearly interpolated recombination map derived from a large US Department of Agriculture dairy cattle pedigree 121 . If a population is derived from an admixture between two source populations close to the reference populations, the pairwise LD in this population, weighted by the allele frequencies in the reference populations, shows an exponential decay as a function of the genetic distance. ALDER fits this decay and then infers the admixture time from the decay rate of the fitted curve.
We additionally used the modified version of ALDER (MALDER v.1.0 (ref. 46 )), which allows multiple admixture events, to compare the agreements of single-and double-pulse admixture models with our data. For estimating admixture time using ALDER and MALDER, we performed two analyses for each African cattle population using two sets of reference populations (EAT and AAI, Muturu and AAI). The fitted curve of both the single-and double-pulse admixture models for Kenya Boran was visually checked using the 'nls' function implemented in R. For all of the admixture time estimations, standard errors were estimated from a leave-one-chromosome-out jackknifing.
In addition, we used GLOBETROTTER 50 on 14 African cattle populations (AFH) to estimate haplotype sharing-based admixture time. The GLOBETROTTER method uses a coancestry curve, in which a measure of how often pairs of haplotypes separated by a genetic distance X come from each respective source population is plotted as a function of the genetic distance X (ref. 50 ). Given a single admixture event, haplotypes inherited from each source population theoretically have an exponential size distribution, which leads to an exponential decay of the coancestry curve 50 . GLOBETROTTER fits this curve, allowing us to estimate the rate of the exponential decay, which is an estimate of the admixture time 50 .
We specified the 14 African humped cattle populations and the other non-African cattle populations as target and donor populations, respectively. This specification indicates that target haplotypes are allowed to be copied from the donor haplotypes, not from the other target haplotypes. This is recommended when a similar admixture history is shared across the target populations 50 .
To reduce the computational load, we performed LD-based pruning for the phased data using PLINK v.1.9 (ref. 114 ) with '-indep-pairwise 50 10 0.1' option. The known genetic map 121 was interpolated against this reduced data, not allowing interpolation for gaps larger than 50 kb. Using the loci of the LD-pruned data, for which the recombination rates are available on the interpolated genetic map (~0.72 million SNPs), we performed GLOBETROTTER analysis as the following: (1) first, we ran ten rounds of the expectation-maximization iterations for BTA1, 2, 7 and 12 using ChromoPainter v.2 (ref. 122 ) with '-in' and '-iM' switches, which result in estimates of the switch rate and global mutation rate parameters; (2) we then averaged the estimated parameters from (1) over all individuals and chromosomes, and used these as fixed estimated values (-n 514.030 -M 0.005127882) for the second running of ChromoPainter v.2 (ref. 122 ) on all individuals; (3) we summed the 'chunk length' output from (2) across chromosomes using ChromoCombine, and obtained a single 'chunk length' output; (4) we also obtained ten painting samples for each target individual by running ChromoPainter v.2 (ref. 122 ) with the fixed parameters averaged over all target individuals (-n 632.949 -M 0.006501492); (5) using the summed chunk length from (3) and ten painting samples from (4), we ran GLOBETROTTER with the 'prop.ind: 1' and 'null.ind: 1' options; and (6) to check the significance of admixture evidence, bootstrapping was performed with 100 replicates using 'prop.ind: 0' and 'bootstrap.date.ind: 1' options. In the bootstrap replicates, the proportion of inferred generations(s) that were between 1 and 400 was considered as evidence of detectable admixture 50 .

Detection of selection signatures in African humped cattle.
To detect ongoing selection signatures in AFH genomes (n = 149), we employed the iHS 123 implemented in HAPBIN v.1.3.0 (ref. 124 ) using the default settings except for the '-f 0.01' option. For each SNP, the ancestral allele was defined as the allele fixed in the AFB outgroup. After computing iHS values for each SNP, they were grouped into 2% frequency bins and standardized. The proportion of SNPs with |iHS| ≥ 2 was then calculated in each nonoverlapping window of 50 kb. In this step, windows with less than 10 SNPs were removed. We considered windows within the highest 1% of the empirical distribution for the proportion of SNPs with |iHS| ≥ 2 as candidate regions with selection signal.
Local ancestry inference in African humped cattle. Using the genotype data phased in the iHS analysis, we performed local ancestry inference implemented in the LOTER package 51 to infer taurine-indicine ancestry along the AFH genomes. We specified 103 individuals of EAT and 56 individuals of AAI as reference populations, assuming that a haplotype of an admixed AFH consists of a mosaic of existing haplotypes from the two reference populations. Using LOTER, we first assigned each allele to taurine or indicine ancestry and calculated the frequency of assigned taurine or indicine ancestry within AFH. The resulting frequencies were then averaged over each nonoverlapping window of 50 kb. For the windows with the highest or lowest 0.5% of the empirical distribution for averaged taurine ancestry, we additionally filtered out windows with pairwise F st values between reference populations less than genome-wide level (<0.2296) to reduce false positives from the admixture in each reference population. The remaining windows were considered as candidate regions with excess or deficiency of taurine ancestry. In light of the history of indicine cattle on the Indian subcontinent and in the Americas, it is possible that they contain some taurine background, although at low frequencies [125][126][127] . However, this will not result in false positives. Rather, it could lead to few false negatives since there are similar haplotypes to select in the LOTER algorithm, which may mask an excess of a particular ancestry.

Detection of selection signatures in African taurine cattle.
To detect selection signatures in AFT after divergence from EAT, we employed the PBS developed by Yi et al. 64 . For each window with 50-kb size and 2-kb step, we calculated the PBS as follows: where T ij represents estimated branch length between i and j populations based on pairwise Weir and Cockerham 116 F st estimated by VCFtools v.0.1.17 (ref. 102 ). A represents the target population (AFT), while E and O represent the control population (EAT) and the outgroup (AAI), respectively. A population PBS value conceptually represents the amount of allele frequency change at a given locus since its divergence from the other two populations. From this statistic, we intended to discover selection signatures in AFT cattle following their ancestral migration into the African continent.
Annotation and functional enrichment analysis. The annotation of the candidate regions was based on the ARS-UCD1.2 Gene Transfer Format file (.gtf) from Ensembl release 99 (ref. 128 ). For functional enrichment analysis of a candidate gene set, a statistical overrepresentation test in PANTHER v.15.0 (ref. 129 ) was used based on the GO-Slim Biological Process terms and REACTOME pathway 130 with default settings. A false discovery rate-adjusted P value of 0.05 was used as the threshold for statistical significance.
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The newly generated sequences for 114 African cattle and two African buffalo samples are available from the Sequence Read Archive (SRA) with the Bioproject accession number PRJNA574857. The publicly available sequences were downloaded from the SRA and China National GeneBank (CNGB) with the following project accession numbers; CNP0000189 (Achai, Bhagnari, Cholistani,

April 2020
Gene set enrichment analysis: PANTHER v15.0 Reference-based consensus sequence generation: bcftools v1.8 For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The newly generated sequences for 114 African cattle and 2 African buffalo samples are available from Sequence read archive (SRA) with the Bioproject accession number PRJNA574857. The publicly available sequences were downloaded from SRA and China National GeneBank (CNGB) with following project accession numbers; CNP0000189 (Achai, Bhagnari, Cholistani, Dajal

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
A minimum sample size of 9 animals per breed or 18 autosomal chromosomes was chosen under the expectation that for a biallelic locus of equal frequency for each allele (50%), random fixation of one allele in a breed will have a sufficiently low probability (0.5 to the power of 18) to indicate selection signal. There are no further consideration in determining sample size than the one described. Note that we collected samples, excluding 1st or 2nd degree of relatives based on the pedigree or farmer interview.
Data exclusions We did not exclude any data.

Replication
To assess the confidence of SNPs identification, we performed SNP genotyping for a subset of whole samples (n = 114), all of which were successful.
Randomization No randomization was required as no analyses involved in selection of subset of animals or informative SNPs.

Blinding
Blinding was not required, as no human participant was involved in our experiment or analyses.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.