Using big data in cattle practice

The concept of big data, associated data sources and analytics is becoming increasingly talked about both in society as a whole and within the livestock industry. This article provides a clinician-focused review of what big data means, how it is already influencing cattle farm and veterinary businesses, and where this may lead in the future. Cattle clinicians have a major role to play in making the best use of big data to improve animal health and efficiency.


What is big data?
Before a practitioner can embrace 'big data', it is important that they understand what the term really means. It has become a buzz-phrase over the past 10 years. A number of different definitions of big data have been provided, but a common theme among these is the 'Vs' of big data. Originally, 'four Vs' were often referred to, but over time the list has grown, and some sources refer to many more Vs when defining big data. The original four Vs are: n Volume: Most definitions of big data start with the fact that it needs to be 'big', and early definitions in particular tended to focus on data set size (especially in relation to conventional processing methods). Key problems here are that 'bigness' is inherently a subjective and relative concept (ie, what would be considered big data for the beef industry may be extremely small in absolute size compared to, say, data generated by social media interactions). It also tends to change over time as the capacity to collect and store data expands and associated costs fall.
n Veracity: Data quality is a key consideration wherever data are analysed, but as the volume of data grows and is increasingly used to inform high-stake decisions, it becomes even more critical. This can be particularly problematic where data is being used for purposes other than that for which it was originally recorded, and where multiple sources of data are integrated for analysis. Data quality is discussed further in Box 1.
n Velocity: This can either refer to the speed with which information is accumulated, or the speed with which it can be interrogated and analysed.
n Variety: The ability to integrate multiple sources of data to provide additional insight is one of the key benefits of applying big data approaches.
Visualisation is also commonly mentioned, referring to the need for effective graphical ways to visualise large data sets in order for end users to be able to make use of them quickly and easily (see Hermans and others [2018] for a summary of some aspects relevant to dairy data). Value refers to the requirement for any use of big data to add value to a process to be useful. Other less commonly mentioned concepts include volatility (usually meaning the time for which data is considered useful before becoming obsolete), variability (the concept that, as well as coming from multiple sources, the nature of the data may change) and vulnerability (the organisational risks associated with storing big data, especially where it includes personal data). Examples of how some of these concepts of big data can be applied in dairy farming are shown in Fig 1. Interestingly, more recent attempts to define big data have become wider, and the term is now Using big data in cattle practice The concept of big data, associated data sources and analytics is becoming increasingly talked about both in society as a whole and within the livestock industry. This article provides a clinician-focused review of what big data means, how it is already influencing cattle farm and veterinary businesses, and where this may lead in the future. Cattle clinicians have a major role to play in making the best use of big data to improve animal health and efficiency.
often used to refer to situations where meaningful insight can be gained from amalgamating and analysing raw data. Much current activity in herd health and production management in dairy herds falls under this definition.
Box 2 provides a glossary of some common terms used in relation to big data. This article aims to provide the cattle practitioner with an accessible overview of how big data and associated ideas and technology may influence farm businesses and veterinary practice. For a more technically focused review, readers are referred to Wolfert and others (2017).

Box 1: Data quality
For as long as clinicians have been attempting to use herd data to monitor performance and health, it has been widely recognised that a certain level of data quality is required for meaningful analysis: 'garbage in, garbage out' is a common maxim.
In the context of cattle farming, data quality most commonly refers to how accurately a given set of data reflects the events that have occurred in real life. A classic example would be the accuracy of recording insemination events in a dairy herd. Where such events are under-recorded (ie, not all inseminations result in a record), the herd's submission rate (proportion of eligible cows inseminated every 21 days) will tend to be underestimated, while conception rate (proportion of serves leading to a pregnancy) is overestimated.
Where the degree of under-recording is high, this can produce results that appear unlikely to the clinician (eg, a conception rate over 60 per cent is unlikely in most circumstances).
When monitoring performance in an individual herd, this is critical to bear in mind, but this becomes even more important where 'big data' principles are used to calculate performance metrics across a large number of data sets for benchmarking or automated reporting. In such contexts, efforts to develop and apply methods to automate measurement of data quality are useful. There are a number of statistical techniques that can help to detect data that is missing (Hudson 2015, Hermans andothers 2017). There are currently some applications of such statistical techniques implemented in software, but it is likely that this process will become more sophisticated and accurate in future.

Farm Animals
Collecting big data in the cattle industry Big data is pervading most aspects of industry and society, and the dairy and beef sectors are no different.

Dairy farms
The volume of data available on dairy farms has increased particularly rapidly with the relatively widespread adoption of on-animal sensor technology (most notably activity monitoring) and the advent of milking systems that can collect and store greater detail on the milking process (more obviously in robot systems, but also in conventional parlours).

Beef farms
The beef industry is interesting in that there is generally much less data-driven decision making on typical beef enterprises compared to dairy farms; however, there are a number of ways in which big data concepts are particularly applicable to beef farming, and the next decade may see more engagement of the beef sector with recording and using data. For example, using statutory data recording, for example, by combining registration and movement data from the online British Cattle Movement Service (BCMS) database with medicines use data to estimate disease incidence, can provide highly useful insight even where record keeping is relatively minimal (Hewitt and others 2018).

Analysing the data
It is important to remember that the increasing quantities of data being generated on many farms are worthless from a decision making perspective unless it is analysed or processed in time to yield critical information that can then be employed to make informed decisions.
As in other sectors, agricultural big data will have no real value without appropriate analytics (ZhongFu and others 2013). Historically, this process has been hampered by a failure of analytical techniques and computing power to keep pace with the scale of data collection, leaving decision makers 'data rich, information poor'. This has been resolved to a large extent in recent years, unlocking value from information in a manner that can support informed decisions (Tien 2013).

Machine learning
Machine learning has played a key role in this process, and has become a common term over the past decade. In many ways, machine learning is similar to existing statistical methods, in that it is a way of looking for pat-Box 2: Big data jargon n Artificial intelligence A branch of computer science dealing with simulating or using mechanisms from intelligent thought in people to solve problems.
n Machine learning A set of tools (usually considered to be a subset of artificial intelligence) most often used for classification, prediction or pattern recognition problems. Machine learning techniques are often defined by their ability to 'learn' (usually in the sense of modifying an algorithm) from data.
n Algorithm A sequentially defined set of operations that convert one or more inputs into one or more outputs. Many machine learning (see above) techniques result in algorithms designed to solve a particular problem (eg, classifying the raw output from multiple on-cow sensors into a yes/no output representing whether the cow is likely to be in oestrus). Algorithms can often be represented as flow charts.
n Internet of (agri) things This generally refers to the connection of devices to the internet, including devices not primarily used for internet access. In the wider world, this pertains to objects as simple as light switches, or as complex as cars. Within agriculture, objects could be tractors and other machinery or on-animal sensors.
n Precision agriculture The concept that farming outcomes can be improved by making measurements and decisions at a more granular (ie, precise) level of detail than has traditionally been the case. The phrase originated mostly in relation to arable farming, typically implying use of sensor technology to modify an activity (eg, using sensors mounted on farm machinery or a drone to adjust chemical application rates for small subareas within fields, rather than at whole field level). More recently, terms such as 'precision livestock farming' and 'precision dairying' have become much more commonly used. In many ways, technologies such as robotic milking (which predates most arable applications by some years) are good examples of precision agriculture, but future applications could include scraping systems driven by image analysis, for example.
n Disruptive technology An innovation (either a physical product or a new data processing or analytical approach) which forces other market competitors to change their offering. This can be as a result of superior performance, reduced price or other differentiating factors.

Deciding whether to invest in sensor technology
For farmers, the decision to invest in sensor technology to support decision making will depend largely on the perceived cost-benefit of the system (Lima and others 2018). This is an area where the veterinary practitioner can play a critical role, acting as an independent adviser who can review the evidence and help decide if a particular system is likely to be suitable for the purpose intended. It should always be remembered that new technology is always 'competing' for a limited farm investment budget, so money spent in this area cannot be used for other improvements, which in some cases might be expected to give better returns. This can be challenging, as there is often limited good-quality evidence on which to base expectations of a system (Box 3). In part, this is because improvements will vary a great deal between different farms; so, for example, a study comparing different heat detection systems would need to measure a very large number of herds over a prolonged period of time, making it expensive and difficult to carry out.
The nature of machine learning also makes it hard to gather evidence on what to expect from a particular technology, as the algorithms are inherently updatable, and work best when 'trained' on additional new data to improve accuracy. In many cases, the expected performance of a system will therefore tend to increase over time. Vets also have a role in helping farmers to get the best value from any systems which they have invested in; for example, by helping set up and use alert lists or thresholds within the system or by advising on environmental or management changes to allow technology to work better.

Value to the veterinary business
As traditional income streams continue to be eroded in farm animal practice, there is a growing need for vets to deliver active herd health programmes to their clients, and to change the emphasis from being a reactive ambulatory practitioner to a proactive advice-oriented consultant (Down and others 2012).
Big data and associated analytics have the potential to aid with this transition by helping the practitioner to make use terns in data. A key characteristic of a machine learning approach is that its performance tends to improve when it is exposed to more data; some methods are based on understanding the way the human brain processes information. Very broadly, machine learning processes usually aim to make a prediction of an outcome based on many inputs (where the system is provided with cases of known outcome to 'learn' from; this is 'supervised' machine learning) or to identify data patterns or clusters where there is no outcome ('unsupervised' machine learning). Conventional statistical approaches (such as regression modelling, which has been widely applied in the field of dairy science for many decades) are sometimes considered as types of supervised machine learning.
Can big data add value?
There are many ways in which big data is already adding value to farm businesses, either through improving the performance or efficiency of a system, or by automating processes to reduce labour cost and improve consistency.
Use of activity monitoring for oestrus detection is a clear example here, whereby a number of big data-related concepts (on-animal sensor technology, amalgamation of data from multiple sources, data preprocessing and machine learning) come together in a product that can improve performance while also reducing labour costs (Fig 2).
While the cost-effectiveness of automatic oestrus detection systems has been reported in the veterinary literature (van Asseldonk and others 1999), there are also examples where investment in sensor systems has failed to result in any tangible improvements in animal health, productivity or farm profitability. One possible explanation for this is that farmers may not fully use the potential of sensor systems (Steeneveld and Hogeveen 2015), but it is important to see the potential benefits in the context of the wider farm system. For example, improvement in submission rates as a result of activity monitoring are likely to be much smaller where there is limited space for cows to express heat. Day Farm Animals of current data to predict different outcomes that would be expected in light of different decisions. Examples of 'predictive biology' already exist in the literature and will rapidly become the norm with advances in on-farm technologies, scientific approaches and data handling capabilities (Green and others 2016). These predictive models are especially powerful when they provide probabilistic outputs, which allow decisions made to reflect the attitude to risk of the decision maker.
Despite the rapid growth in the availability of biosensors and our ever-improving ability to apply big data analytics to the resulting data, the decision making and corresponding interventions remain largely the responsibility of the veterinarian and farm team, as the automated treatment of cows remains largely unfeasible at present. This is another important area requiring engagement from the veterinary practitioner who is well placed to provide insight into what the results of data analysis mean, as well as providing evidence-based advice with respect to possible interventions. It is likely that decision support systems will play an increasing role in this process as they become integrated into sensor systems, but the vet will always have an important role to play in terms of quality control and herd health advice.

Robotic milking systems
Robotic voluntary milking systems provide a good example of existing big data in action on dairy farms. Robotic milking set-ups are increasingly common, and often include a number of on-animal and in-line sensors as part of the system. These typically capture real-time data at a very high level of detail (eg, raw activity or rumination data in small time units recorded by on-cow sensors, or multiple measurements of milk flow and concentrate intake during each milking session). These data are then preprocessed (eg, by smoothing activity data to average over a longer time-period and reduce the 'noise' in the signal). These data are often accessible to the user through graphs or other visualisations, allowing evaluation of relatively detailed information. In many cases, multiple sources of data are then aggregated and a machine learning-derived algorithm applied to detect deviation from expectation (eg, where rumination, activity, yield and temperature data are combined to generate an alert list of cows which should be examined for signs of ill health; see Fig 2). Much of this system is equally applicable outside of a voluntary milking context, and it seems likely that uptake of this approach in herds milked through conventional parlours will increase in future.

Web-based products
Sensors are not the only area where big data are influencing dairy management: there are an increasing number of examples where other sources of data are amalgamated and analysed (Table 1). There are a number of commercially available web-based products where multiple sources of data for a herd are amalgamated, processed and visualised. For example, data from a milk recording organisation may be combined with BCMS data and/ or event data entered by farm staff to generate charts showing how a particular performance metric changes over time, and how this benchmarks against other similar herds. This process is often dependent on different agencies allowing access to their data via an automated process; this is becoming increasingly common in the dairy industry. It has also been important in arable farming, where a large number of commercial and open-source platforms exist to facilitate data exchange. This 'babel' of competing alternative data structures and data exchange systems can sometimes be problematic, and a common data schema (eg, a consistent set of event definitions) could facilitate data exchange while minimising loss of information.
Challenges and opportunities for the future of big data A number of existing projects and near-market or earlylife products are likely to influence the use of big data in cattle farming in the near future. A number of industry initiatives aiming to amalgamate data from multiple sources (eg, the Livestock Industry Data Exchange Hub pilot project) are already running. Such projects have the potential to make it easier for practitioners to access data relating to the animals under their care, although a number of obstacles (including data protection issues) may hinder this. Legal issues around data sharing and protection can also be important at the level of individual veterinary businesses, especially in view of the newly enacted General

Box 3: Sensor technology
Sensors are a major source of big data in cattle farming and represent an area where these techniques are currently most widely applied. Most sensors that are used to measure physiological or behavioural parameters on cattle farms focus on the detection/monitoring of mastitis and fertility (oestrus), with a growing number also being marketed for the detection of lameness and metabolic conditions. n Mastitis Electrical conductivity is the most commonly reported sensor, followed by sensors to detect milk colour and certain enzymes, such as haptoglobin, l-lactate dehydrogenase and N-acetyl-β-d-glucosaminidase.
n Oestrus The most commonly used measure is cow activity using pedometers, activity meters or three-dimensional (3D) accelerometers. Other fertilityrelated sensors include those that record milk progesterone levels, mounting behaviour and body temperature.
n Lameness Sensors include pedometers and activity meters, 3D-accelerometers, force-plates and video cameras (with output analysed by computer vision). For these systems to add value they must be able to detect the milder presentations of lameness, which are the ones most likely to be missed by farmers (Leach and others 2010). The ability for any of the locomotion sensors currently available to detect subtle manifestations of lameness remains uncertain (Van Nuffel and others 2015).
n Metabolic conditions This includes sensors that measure the pH of rumen fluid, rumen/ear canal temperature, milk butterfat or ϐ-hydroxybutryate concentrations, rumination frequency and body condition score. The use of these sensors is less prevalent perhaps due to the complex nature of many metabolic disorders and less validation of their clinical application within the veterinary literature.
It is difficult to compare the performance of the various sensor types because of the large variation in reported performance, gold standards, test scales, and algorithms used. Of the more commonly evaluated sensors, reported sensitivities and specificities have ranged between 70 and 91 per cent and 87 and 98 per cent, respectively, for conductivity sensors, and between 80 and 90 per cent sensitivity with greater than 90 per cent specificity for heat detection via pedometers/ accelerometers (Rutten and others 2013). While this is useful information, specificity (the proportion of true negative outcomes correctly classified) in particular can be difficult to interpret in practice, especially where the prevalence of the outcome is low. Positive predictive value (PPV) (the probability that a positive test result is correct) tends to be more intuitive to interpret and apply, and studies reporting this are highly valuable (Holman and others 2011), although it is important to remember that, as PPV is related to the prevalence of the outcome event, this will vary from herd to herd. A growing number of products are being marketed that combine multiple sensors (eg, accelerometers combined with realtime location and rumination time). Such combinations potentially provide a very powerful aid to decision making, but as with any investment it is crucial to appraise potential value added against the investment required.
Data Protection Regulation (GDPR). Details of this regulation are outside the scope of this article, so readers are referred to Read (2018) for a review of GDPR from a general veterinary perspective, but it is worth noting that animal data are not covered by the GDPR.

Centralised data hubs
At an on-farm level, the emergence of open-source (ie, freely available) platforms allowing different farm software products to communicate with each other is also likely. These have the potential to make integration of data from multiple sources (eg, from herd management software, milking plant software and a milk recording organisation) more straightforward, in turn making it simpler for the clinician to analyse data and add value to a farm business. Fig 3 shows an example of data transfer between on-farm systems. Both the centralised 'data hub' and increased exchange of information between onfarm system concepts also have the potential to open new doors for research based on routinely collected data.

Syndromic surveillance
Availability of centralised data hubs also creates opportunities for syndromic surveillance. This process uses real-time data to assist in early detection of diseases by looking for clusters of events or measurements that deviate from the expected norm. A classic example is the use of frequency of search engine queries for early detection of human influenza outbreaks (Ginsberg and others 2009). As syndromic surveillance represents a relatively low-cost method compared to many conventional approaches; it has attracted some interest at national level, for example in detection of exotic disease incursion (Marceau andothers 2014, Dórea andVial 2016). However, this approach could also be taken at veterinary practice level; for example, by identifying farms where measured outcomes (such as conception rate or milk yield) deviate from what would be expected based on previous data both from that individual herd and from others in the practice. Big data also has the potential to transform research into health, welfare and production on dairy herds. This is true both at individual herd level (eg, by using multiple sensing systems to add value to research on behaviour and disease in facilities such as the Centre for Dairy Science Innovation at the University of Nottingham) and across multiple herds, where data amalgamation techniques and machine learning have the potential to unlock increasing value from routinely recorded data.

Potential threats to cattle practice income
As with most disruptive innovations, the advent of big data and associated technologies presents both an opportunity and a threat to cattle practitioners. A clear potential threat to practice income is the improvement in technologies associated with reproductive management in dairy herds. Routine fertility visits have been a core source of fee income from dairy herds for at least the past two decades, but as technology to improve oestrus detection becomes more effective and less costly it is likely that the quantity of veterinary time required will be reduced. This could result from improvements to existing technology (such as the ongoing improvements in activity monitor systems), appearance of new products (such as in-line milk progesterone monitoring), or the increase in accessibility of existing products (through improved ease of use and falling cost). This, along with a number of other trends discussed in more detail by Statham and others (2013), places increased emphasis on practice business models generating income from delivery of herd-level advice and consultancy. Big data offers opportunities to enhance and streamline this type of service ( Table 2). The increased availability of algorithmic decision support tools (both at individual animal and at herd level) is also likely to be a positive for clinicians, at least in the short-to mediumterm, where they will augment an individual's clinical decision making and allow practitioners to give more evidence-based (and possibly probabilistic) advice. Further into the future, it is possible that such systems could come to represent a threat to veterinary income by partially supplanting the clinician's role.

Conclusion
The big data revolution is already having an influence on the dairy industry, as it has in other sectors. The timing is currently ideal for the veterinary profession to ensure that they are at the forefront of these developments, taking the opportunities created by big data and helping farmers maximise value from investment in technology.
It is important to remember that this does not generally require a clinician to possess 'big data' skills as such; but a skill set including understanding of the underlying biology alongside knowledge of epidemiology gives cattle vets the potential to make a big difference to the amount of value unlocked by these changes. The cattle clinician has a key role in adding value to big data by interpreting and applying insights alongside detailed understanding of the goals and context of a specific farming enterprise to improve animal health and productivity.

Declaration of competing interests
Chris Hudson reports personal fees from Zoetis, outside the submitted work. Jasmeet Kaler reports grants and collaboration with Intel, and collaborations with Hewlett Packard Enterprise/DXC Technology, BT, Dunbia and Farm Wizard outside the submitted work. 1. Data quality is a key consideration when using big data, and under-recording of (especially unsuccessful) insemination events is perhaps the commonest problem with dairy reproductive data. What effect would this have on apparent submission and conception rates?
a. Submission rate and conception rate would both be artificially high b. Submission rate would be artificially high and conception rate artificially low c. Submission rate would be artificially low and conception rate artificially high d. Submission rate and conception rate would both be artificially low 2. Which of the following do you think represents the best example of an algorithm used in dairy herd management?
a. Interpretation of raw activity monitor data to generate alerts for cows in heat b. Automatic collation of multiple sources of data for herd benchmarking c. Use of a smartphone app to record disease event data 3. A sensor device with a low sensitivity but high positive predictive value would . . . a. Detect most of the true outcome events but often generate false positive alerts b. Detect most of the true outcome events and generate few false positive alerts c. Detect relatively few of the true outcome events and generate few false positive alerts d. Detect relatively few of the true outcome events and often generate false positive alerts Relax after a day at the London Vet Show by attending our glamourous black tie evening at the luxurious five-star London Marriott West India Quay featuring: a red carpet welcome, followed by a fabulous drinks reception and a delicious three course dinner (which meets BVA's high welfare standards), with entertainment, DJ and dancing.