A study on decision-making of food supply chain based on big data

As more and more companies have captured and analyzed huge volumes of data to improve the performance of supply chain, this paper develops a big data harvest model that uses big data as inputs to make more informed production decisions in the food supply chain. By introducing a method of Bayesian network, this paper integrates sample data and finds a cause-and-effect between data to predict market demand. Then the deduction graph model that translates products demand into processes and divides processes into tasks and assets is presented, and an example of how big data in the food supply chain can be combined with Bayesian network and deduction graph model to guide production decision. Our conclusions indicate that the analytical framework has vast potential for supporting support decision making by extracting value from big data.


Introduction
The data have now been woven into every sector of the global economy. Companies focus on capturing relevant information from multiple sources such as suppliers and customers made for a much clear and complete picture of the existing business process (Tien 2013). Big data analytics helps companies to identify new opportunities and requirements for new products and find ways of new services by integrating large amounts of trading information, real-time and historical information. Now, a complementary trend is under way. Information multiplies and is shared more widely around the world provides the basis for advance analysis of big data and enables us to find out new applications, such as the smart phone app that tells commuters when the next bus will arrive. This tendency carries profound significance for companies, governments, and individuals.
These developments have changed the operation management of the food supply chain beyond recognition. As companies capture, store, search, share and analyze huge volumes of data, radical customization and novel business models will be the new hallmarks of competition.
Systems Engineering Society of China and Springer-Verlag Berlin Heidelberg 2017 J Syst Sci Syst Eng Therefore, the application of big data in the food supply chain has been receiving increasing attention. Taylor & Fearne (2006) regard big data as the pre-requisites for the development of a more synchronized approach to demand and activity analysis for the food supply chain. Tien (2012) pointed out that big data analytics is a key support technology to implement mass-customization in food production such as nano-modified and nano-additives. Anica-popa (2012) indicates that data sharing in the food supply chain will improve the food quality and safety. All the analyses mentioned above show that the importance of big data to the food supply chain can never be denied.
As a result, companies need not only skills but also new perspectives on how big data helps solve problems in the food supply chain due to the exploding data. In this paper, we propose a big data harvest model for the food supply chain. Our intent is to develop a decision-support tool that converts data into sights to make more informed strategic decisions. The paper is organized as follows: Firstly, the value of big data in food network is described and a big data harvest model is developed, then the model is subsequently tested with an example. Finally, our conclusions and our findings are discussed.

Value of Big Data in Food Network
Food supply chain is a changing system of organizations, people, activities, information, and resources engaged in production, processing, distribution and the disposal of food to move a product from farms to consumers (Yu & Nagurney 2013). Every year, $14 trillion of foods is produced, packaged, and sold in worldwide, through a series of transactions between suppliers, retailers and customers. It is estimated that $120 billion to nearly $150 billion in value per year could be achieved through the use of big data in food consumption. Big data has been applied to produce, package, sale, and use of food products (see Table 1, Source of Big Data). Suppliers track a large amount of useful data to understand better how customers evaluate the food and feed that information back into food design process. Retailers can make full use of big data to segment consumer types, carrying out precision marketing and cultivating the customers' loyalty. Also, customers have broader access to a massive amount of food information, making a more informed decision. For example, customers can be able to know the price of food ahead of time to decide which to buy.
What is more, it has been estimated that hundreds of billions in value per year could be enabled by the use of big data in food logistic. These data are captured by government, transportation operators, individuals, and third-party data providers(see Table 1, Source of Big Data). One of the largest potential benefits can be obtained by using big data to enhance the ability to deliver and adapt to customers in real time. Another benefit is that companies can optimize every process step from procurement to producing to marketing by uncovering new insights that are hidden within the data. • Smarter and faster decision-making; • Delivering the optimal experience for the customers; • Cost and privacy concerns; • Technical challenges; • Extent and quality of the available data What is more, it has been estimated that hundreds of billions in value per year could be enabled by the use of big data in food logistic. These data are captured by government, transportation operators, individuals, and third-party data providers (see Table 1, Source of Big Data). One of the largest potential benefits can be obtained by using big data to enhance the ability to deliver and adapt to customers in real time. Another benefit is that companies can optimize every process step from procurement to producing to marketing by uncovering new insights that are hidden within the data.
Generally speaking, there are five main ways to leverage big data in a food network that gain insights into opportunities and challenges and have implications on how organization will have to be designed, organized, and managed.
(1) Creating transparency: As big data in food network become more available across sectors, transparency of data drives transformation, increases productivity and leads to informed decision making.
(2)Enabling experimentation to identify anomalies, detect fraud and improve performance: Big datamuch of it unstructured or machine-generatedneeds to be collected, integrated and analyzed in real time to discover anomalies and fraud that help organizations improve operations and develop services.
(3)Micro-segmentation to customize actions: Big data make it possible to work through various streams of customer data to enable the definition of increasingly finer segments and take precise marketing to meet customers' needs.
(4)Replacing/supporting decision making and data analyzing with automated algorithms: Big data analytics and visualization of automated algorithms allows organizations to find unknown patterns that occur in food network in a time-efficient and cost-effective manner. J Syst Sci Syst Eng (5)Innovating new business models, products, and services: Using vast amounts of data provides new perspectives that can fuel innovation in food products and services, such as offering clues about how customers will behave.

Big Data Harvest Model
Although the potential value in big data is tremendous, it is extremely hard for existing analytics to analyze high volume (and variety) of data in real time and produce useful information (Tien & Goldschmidt-Clermont 2009). Although many data techniques might help managers to produce a lot of information, they are unfocused, and hence inefficient. So it is imperative to provide an analytical framework for structures and links various streams of data to create a coherent picture of a particular problem -so that a better insight into the issue been analyzed and could be gained.

Supply Demand
Food supply chain management Demand forecasting

Choosing products
Translating products demand into production process

Dividing process into tasks and assets
Chain coordination and continuous evalution Big data in food chain Figure 1 Decision making based on big data in the food supply chain Therefore, we propose a better analytic infrastructure to make use of the available big data to gain competitive advantages in food network management (see Fig. 1). Firstly, we identify the products that could meet future markets from big data analytics; then, we translate products demand into processes and divide processes into tasks and assets; finally, we meet the market demand through chain coordination and continuous evaluation.
For the first part, there are so many methods such as the Delphi method, time series analysis, regression analysis to predict the market demand for food. These methods mainly use historical data to forecast the market demand, but market demand depends on a variety of complex factors, including service quality, consumer groups and government policy. Moreover, these factors can be obtained from big data. If we adopt these factors into consideration, we can improve the precision of prediction to ensure product success. Because Bayesian networks can make effective use of all available data, diagnose what causes high preference and incorporate expert knowledge by representing the relationship among a set of variables (Heckerman et al. 1995, Jensen 1996, so that we use the Bayesian networks which link various streams of data in food chain to predict the market demand. Anderson et al. (2004) regard Bayesian network methodology as the implementation mechanism for causal modelling and build a Bayesian network model of customer service satisfaction. Corney (2000) applies Bayesian networks to a typical food design problem and the results show that they are powerful tools to aid consumer preference modelling from a combination of data and expertise. Further applications of Bayesian networks in food production include food security, food risk and consumer behaviours (Stein 2004, Albert et al. 2008, Van 2004).
The structure of a Bayesian network is a directed graphical model in which nodes mean random variables of interest and directed arcs represent direct causal or influential relation between nodes (Pearl 1986). Each node X has a probability distribution ( | ) P X X π ( ) which expresses the uncertainty of the interdependence of the variables, where X π ( ) is the parent set of if the node X has no parents).
Therefore, together with the independence assumption, for a Bayesian network consisting of n nodes 1 2 ( , , , ) n X X X  , we can factor out joint probability distribution: In particular, production decision will be provided by calculating and analyzing the Bayesian network which is set up based on the big data in the food supply chain. An analytical framework is presented in Fig. 2  In the second part, to the products demand that the first part is analyzed, we probe into an analytic technique that translates products demand into processes and divides processes into tasks and assets. Li (1994) proposes an analytic technique called deduction graph model that allows firms to incorporate their own competence sets with other firms. It provides a sequence of optimized expanding process in a visual way by linking different competence sets from various sources (Li et al. 1999). Although this approach has not been adopted in big data analytic area, we have developed it and make it possible to provide the right analytic capabilities to help firms to produce a detailed process design to enhance food supply chain innovation.

Numerical Examples
Our aim is to develop an optimization model to extract value from big data to improve food supply chain performance, which can also help incorporating capabilities and information (big data) of group decision makers to maximize big data benefits. The following sections describe the detailed application of the proposed analytic approach in a food company.

Construction of a Bayesian Network
Considering a food company is keen to explore how to make use of the value from big data to acquire potential value and enhance their supply chain performance. The company managers forecast the market demand through the use of the Bayesian networks. A brief description of the steps is represented below.
The first step is data collection. As the source and foundation of forecasting is always from purchasing behavior, searching recordings, and comments on their social networks, there is no doubt that "Big Data" can have a significant influence on customers' preferences (Li et al. 2015). In order to select an appropriate sample data from the big data in the food supply chain, the company, combined with prior knowledge from food market, identifies and describes the factors that affect the market demands under advice from experts and decision makers. These factors mainly include food attributes and the chemical and physical properties of the product related to these attributes (Wolters & Van Gemert 1990). Once these factors are determined, they will be the nodes 1 2 ( , , , ) n X X X  in the Bayesian network. Based on these factors, the company collects m representative consumers, where each consumer contains a value assignment for each factor.
The second step is to pre-processing the sample data. The values of factors need to be discrete by adopting the clustering algorithm or hierarchical category before modelling in order for the propagation and inference algorithms in the next couple of sections.
The third step is designed to build a Bayesian network. Building a Bayesian network includes two parts. One is to identify the network structure. The other is to determine the conditional probability table. A selection of search algorithms which can be used in learning of the Bayesian networks is shown in   Figure 3 presents a part of a Bayesian Network for organic food preference, though it will be more complex in practice (Cene & Karaman 2015). And it is built by using the K2 algorithm (Cooper & Herskovits 1992), given the data D that the previous parts are processed, a Bayesian network is set up that maximizes  (4) The fourth step is to forecast the market demand. A Bayesian network is a bi-direction inference method where inputs can predict the outputs and vice versa (Lu et al. 2009). So given the values of the observed nodes, the company calculates the probability distribution of the target nodes to predict the demand for food or diagnose the likely causes of a perfect product. Then, the company identifies what kinds of food can satisfy most customers' preferences and have vast potential for future development.
The fifth step is the sensitivity analysis. The sensitivity analysis is the important basis for decision-making. It can determine the variable that has the greatest influence on consumer preference. This means that consumers are more willing to purchase if satisfaction from the variable is high. MI (Mutual Information) would be used for sensitivity analysis. MI is a measure of the dependence between two random variables and is more suitable for Bayesian network to sensitivity analysis (Nicholson & Jitnah 1998). It is the reduction in uncertainty of X due to knowing Y , and vice-versa. The MI between two variables X and Y is given by: where ( , ) p X Y is the joint probability distribution function, and ( ) p X and ( ) p Y is the edge of the probability distribution function of X and Y respectively. ( , ) I X Y presents the influence of X on Y . The larger the value of the ( , ) I X Y is, the greater the effects of X on Y . Then the importance of the variable would be ranked according to the value of ( , ) I X Y . And the variable that has higher prioritization rank should be given more attention and real-time control of the production processes.
Moreover, Bayesian network can be further updated to respond to the changing market demand. When new data are obtained, the company can continuously refine the Bayesian network by modifying some local part of it, so that the company is able to quickly change existing running processes to satisfy the customer requirements.

Deduction Graph Model
Specifically, the company identifies five different types of foods that will satisfy most customers' preferences through the Bayesian network analysis. The identified products are: A, B, C, D, E. The company also identified the features of the foods and the relevant production processes (raw materials, machines, skills and so on) needed to manufacture the five different foods i.e. a, b, c, d, e, f, g, h, i, j, k, l, m, and n,  with each of a, b, c, d, e, f, g, h, i, j, k, l, m and n representing a unique required production process, respectively.
Specifically, different types of foods require different production processes to produce. Table  3 shows the needed production processes to make a specific product. For example, to produce C will require d, h, m and n.

Table 3 Different production processes required by products(" √"means required) a b c d e f g h i j k l m n
Having identified the required production processes for different products, both factory managers are asked to point out the existing production processes available in departments A and B. The existing production processes of department A( A S ) are identified as: c, d and e. Whereas, the existing production processes of department B( B S )are: a, b and f. A quick analysis shows that both departments A and B don't have all the required production processes to produce the five newly identified foods. Thus, to make foods that require new production processes, the departments should purchase the production processes from other departments or expand its existing production processes. The selling price for production processes in each department is estimated in Table 4. For example, the selling price for production process c in department A is 1 unit, and 1.5 units for production process f in department B. Based on the selling price, the expanding cost for department A is shown in Table 5 (a), and for department B in Table 5 (b). The expanding cost for buying new production processes takes into account of the time, labour, energy, funds and so on. There are also compound nodes, such as d ^ e and a ^ b. In order to produce the new foods, the needed production processes will be obtained by learning from existing production processes or by purchasing from other departments directly.  Based on the above analysis, the two manufacturing departments should focus on different product families. From the production processes learning costs, we can figure that department A is more suitable to manufacture A, B and C, whereas, the department B should responsible for D and E producing. Table 4 shows the foods to be produced in departments A and B. In the Table 6, A, B and C are denoted as X1, X2, X3 respectively, whereas D and E are denoted as Y1 and Y2. The possible earning revenue for a different product mix is listed in Table 7. For instance, if department A makes food X1 and department B makes food Y1, the possible profit earned by A is 4.5 and the possible profit earned by B is 3.
The assumption is that both departments are willing to collaborate. They are ready to communicate to achieve the entire maximum profit.

4.2.1The Competence Network
A production processes network can brightly depict the possible ways of expanding a production process to manufacture new foods (Li 1999). The network developed in this case contains compound nodes and considers a cyclical situation. Fig.4(a) shows the expanding process of department A to produce X1, X2, or X3, and Fig.4(b) shows the expanding process of department B to produce Y1 or Y2 based on its current production processes a, b, and f. Each node represents each production processes. The arc shows there is a connection between the two nodes, such as, a→c means production process c can be learned from production process a. As for d and m, there is no arc between these nodes, denoting that to learn d from m or to learn m from d is almost impossible. The number on the arc means the cost spent on obtaining the production processes. There are also compound nodes, such as d ^ e and a ^ b. The compound node can only be used when the decomposed nodes are obtained. In order to produce the new foods, the needed production processes will be obtained by learning from existing production processes or by purchasing from other departments directly. For example, in Figure 4(a), the production process f can be learned from production processes e, c, and d ^ e with the cost of 2.5, 2, and 1, respectively. But A also can purchase production process f from department B with the cost of 1.5. Also, e→f →g→i shows the learning sequence indicating that the learning process starts from e, learns f, then learns g, and then leans i from g. The final objective of the production process network is using optimization way to find the best sequence with the highest profit in food production.

Network Flow Approach
The results of the example problem can be formulated as the linear mixed 0-1 optimization model (Li 1999). However, when the size of the problem is increasing, mixed 0-1 programming is not running quickly enough. Kim & Hooker (2002) indicate that a minimum-cost flow problem is already well suited for mixed 0-1 programming and can be solved better and faster with its advantage increasing with problem size. So we translated the deduction graph model into a minimum-cost flow problem to find an optimal solution.
Four assumptions in the network flow approach are specified as follows: Assumptions A1 All the departments can list all related information, e.g., production processes and associated prices.
Assumptions A2 All the departments would like to collaborate with each other.
Assumptions A3 A departments can freely purchase required production processes at listed prices from other departments.
Assumptions A4 All the departments are of benefit to a company.
Let S is the set of department's existing production processes, T is the set of required production processes for products, I is the set of intermediate production processes. We define a directed graph =( , ) Given a node i in =( , ) G V E , ( , ) r i j is the arc  Moreover, deduction graph model can be expanded to solve multi-level food quality problems, in which there are multi-levels of proficiency for the foods taken into deliberation. For example, the food X1 may have multi-quality levels designated by X1 1 (Normal), X1 2 (Good), X1 3 (Excellent). Likewise, the food X2, X3, Y1 and Y2 may have multi-quality levels (i.e., Y1 1 (Normal), Y1 2 (Good), Y1 3 (Excellent)). Each different level of food quality may lead to different results of reputation, intension of government's supervision and customer satisfaction. In this way, the proposed model can help us to select a feasible way, so that the expansion from the initial production processes to final products (five identified foods) can be reached at the lowest cost and the optimum proficiency of the food quality. Furthermore, this model can be further developed to an optimization approach of incorporating information/skills/service/products (big data) of group decision makers to reap the entire maximum profit. In this way, it works well in cyclic situations and can be used in analyzing efficient information transmitting control of the food network.

Discussion
Big data analytics in food network makes it possible to discover needs and create value, which has implications on how organization will have to be designed, organized and managed. Hence, we develop a big data harvest model that links large amounts of data to create a coherent picture of a particular problem-having identified the products that can meet future markets from big data and then identified the required production processes to produce the products.
On one hand, comparing with other analytical approaches, Bayesian networks have a number of features that make them suitable for demand forecasting. The results indicate Bayesian networks are valuable tools for representing the relationship among a set of variables from a combination of big data and expertise in the food supply chain. Through Bayesian network analysis, the food company can build a Bayesian network for food preference, find the types and features that a food product must have in order to be preferred and decide what to produce.
On the other hand, once the company identifies the types of foods that can meet future markets, the next steps the company must translate products demand into production processes and divide processes into tasks and assets. We develop the deduction graph model and make it possible to provide the exact analytic capabilities to help firms to produce a detailed process design. The results indicate that the deduction graph model can effectively help the food company to select the product produced by each department and combine departments' respective production processes to make such products to maximize their profits. The results also indicate that network flow approach can be used to find the optimal solution of the deduction graph model with fast specialized algorithms. The optimal solution is that department A produces X3 and department B produces Y2, the corresponding profit, respectively, is 4.5 and 1.5.

Conclusion
In this paper, we propose a big data harvest model that converted data into sights to gain competitive advantages in food supply chain management. The purposes of this study are twofold. One of the goals is to use big data in the food supply chain as inputs to make production decisions. The other is to apply the deduction graph model to translate products demand into processes and divide processes into tasks and assets.
Firstly, using Bayesian network can integrate the prior information and sample information in the food supply chain and find a cause-and-effect relationship between data to effectively predict the market demand and direct food production.
Secondly, the results indicate a deduction graph model is capable of incorporating production processes of departments to realize the profit maximization. In order to find the optimal solution, the deduction graph model can be translated into a minimum-cost flow problem.
We simply illustrate the application framework of using big data to make more informed production decisions in the food supply chain, however, it is necessary to provide technological support such as informationgathering techniques and Bayesian network inference techniques when the company plans and implements the application framework. What's more, the application of big data in other areas of the food supply chain should be addressed through further research.