Harvesting Big Data to Enhance Supply Chain Innovation Capabilities : An Analytic Infrastructure Based on Deduction Graph

Today, firms can access to big data (tweets, videos, click streams, and other unstructured sources) to extract new ideas or understanding about their products, customers, and markets. Thus, managers increasingly view data as an important driver of innovation and a significant source of value creation and competitive advantage. To get the most out of the big data (in combination with a firm’s existing data), a more sophisticated way of handling, managing, analysing and interpreting data is necessary. However, there is a lack of data analytics techniques to assist firms to capture the potential of innovation afforded by data and to gain competitive advantage. This research aims to address this gap by developing and testing an analytic infrastructure based on the deduction graph technique. The proposed approach provides an analytic infrastructure for firms to incorporate their own competence sets with other firms. Case studies results indicate that the proposed data analytic approach enable firms to utilise big data to gain competitive advantage by enhancing their supply chain innovation capabilities.

Many countries are now pushing for Digital Economy, and Big Data is increasingly fashionable in recent jargon. Wong (2012) states that the key factor to gaining competitive advantage in today's rapidly changing business environment is the ability to extract big data to gain helpful business insights. Being able to use big data allows firms to achieve outstanding performances against their competitors (Oh, 2012). For example, retailers can potentially increase their operating margins by 60 percent by tapping into hidden values in big data (Werdigier, 2009). Although a large capital and time should be invested in building a big data platform and technologies, the long-term benefits provided by big data to create competitive advantage is vast (Terziovski, 2010). Many researchers point out that firms can better understand customers' preferences and needs by leveraging data available in loyalty cards and social media (Bozarth et al., 1998;Tsai et al., 2013).
There are huge potential values that remain uncovered in big data. As Manyika et al., (2013) indicates, 300 billion dollars of potential annual value can be generated in US healthcare if organisations or governments can capture big data's value. Moreover, the commercial values of the personal location data all around the world are estimated to be 600 billion dollars annually (Davenport and Harris, 2007;LaValle et al., 2010). Different benefits can be gained for different industries, but it also can generate values across sectors (Mishra et al., 2013).
The announcement of big data as the national priority task in supporting healthcare and national security by the White House in 2010 further emphasizes the essential role of big data as a national weapon (Mervis, 2012).
Currently, there is a variety of analytics techniques contains predictive analytics, data mining, case-based reasoning, exploratory data analysis, business intelligence, and machine learning techniques that could help firms to mine the unstructured data i.e. understand customers' preferences and needs. However, the applications of existing techniques are limited (Tsikriktsis, 2005;Cohen et al., 2009). Wong (2012) points out that the existing techniques for big data analytic are, in general, likely to be mechanistic. Additionally, many researchers point out that big data analytic technique to aid the development of new products are relatively underemphasised (Ozer, 2011;Cheng et al., 2013;Manyika et al., 2013).
Clearly, there is a lack of analytical tools and techniques to assist firms to generate useful insights from data to drive strategy or improve performance (Yiu, 2012;Manyika et al., 2013). Thus, how could operations managers harvest big data to enhance supply chain innovation as well as to deliver better fact-based strategic decisions? Arlbjørn et al., (2011) state that supply chain innovation is a change within a supply chain network, supply chain technology, or supply chain process (or a combination of these) that can take place in a company function, within a company, in an industry or in a supply chain in order to enhance new value creation for the stakeholder. Many researchers pointed out that supply chain innovation is a vital instrument for improving the performance of a supply chain and it can provide firms with great benefits (Flint et al., 2005;Krabbe 2007). For example, it can significantly improve customer response times, lower inventories, shorter time to market for new products, improve decision making process as well as enabling a full supply chain visibility. Wong (2012) and Manyika et al., (2013) state that big data provides a venue for firms to improve their supply chain operations and innovation. With big data, firms can extract new ideas or understanding about their products, customers, and markets which are crucial to innovation. However, the main challenge to managers is to identify an analytic infrastructure that could harvest big data to support firms' innovation capabilities.
Analytics is the practice of using data to generate useful insights that can help firms make better fact-based decisions with the ultimate aim of driving strategy and improving performance (Wong, 2012). This paper seeks to develop and test an analytic infrastructure for a firm to incorporate its own competence sets with other firms. A firm's competence set (i.e. an accumulation of ideas, knowledge, information, and skills) is vital to its innovation capabilities (Yu and Zhang, 1993;Li, 1997;Chen, 2001;Schmenner and Vastag, 2006;Mishra and Shah, 2009). This research addresses the situation in which a firm is willing to harvest (i.e. from big data) and incorporate competence sets of others so that its innovation capabilities can be expanded.
To assist our understanding of harvesting big data to enhance innovation, this study will propose an analytics infrastructure for managing supply chain competence sets. Further, it will demonstrate how the proposed approach could be applied in a fast moving consumer fashion industry to assist managers to generate new product ideas, and identify the required competence sets to produce products in the most cost effective ways. Finally, the strength of the proposed approach, its limitations, and research implications of this work will be examined.

CHALLENGES IN BIG DATA HARVEST
Ohlhorst (2012) describes big data as having an immeasurable size of data, where the scale of data is too varied and the growth of the data is extremely quick, so that conventional information technologies cannot deal with the data efficiently. In the year 2000, only 800,000 petabytes (PB) of data were stored in the world (IBM, 2013). It is expected this number will reach 35 zettabytes (ZB) by 2020 (Wong, 2012;Yiu, 2012). The explosion of data leads to difficulty for traditional systems to store and analyse it (Huddar and Ramannavar, 2013;Zhan et al., 2014). Furthermore, there are many different types of data, such as texts, weblogs, GPS location information, sensor data, graphs, videos, audio and more online data (Forsyth, 2012). These varieties of data require different equipment and technology to handle and store (Bughin et al., 2010). Moreover, data has become complex because the variety has shifted from traditional structured data to more semi-structured and unstructured data, from search indexes, emails, log files, social media forums, sensor data from systems, and so on (Mohanty et al., 2013).
The challenge is that the traditional analytic technologies cannot deal with the variety (Zikopoulos and Eaton, 2012;Zhan et al., 2014). Eighty percent of data is now unstructured or semi-structured and almost impossible to analyse it (Syed et al., 2013). However, in the digital economy, a firm's success will rely on its ability to draw insights from the various kinds of data available to it, which includes both traditional and non-traditional. The ability to analyse all types of data will create more opportunity and more value for an enterprise (Dijcks, 2013;IBM, 2013).
On top of the variety, huge amounts of data are generated every second and increasing amounts of data have very short life (Xu et al., 2013). These entire situation leads to the increased demand of businesses to make more real-time responses and decisions (Minelli, 2012). A review of literature (Cohen et al., 2009;Zikopoulos and Eaton, 2011;Huddar and Ramannavar, 2013) shows that there are various existing techniques i.e. Hadoop and MapReduce which is available to managers to harvest big data. Apache Hadoop is an opensource software framework that allows users to easily use a distributed computing platform. It is capable of dealing with large amounts of data in a reliable, efficient and scalable manner. Its reliability is enhanced by maintaining multiple working copies of data and redistributing the failed node. Hadoop can parallel process the data to increase speed, and it has high scalability because it can handle PB level data (Lam, 2010). Moreover, the massive applications of data processing can be run on the Apache Hadoop. The Hadoop provides high reliability and a high fault tolerance to applications (Vance, 2009). MapReduce is a programming model to deal with large-scale data sets. It can run parallel computing and can be applied on Hadoop. It is used for distributing large data sets across multiple servers (Dean and Ghemawat, 2008).
However, it is extremely hard for existing analytics to analyse high volume (and variety) of data in real time and produce useful information (Bisson et al., 2010). Although such techniques might help managers to produce a lot of information, they are unfocused, and hence inefficient (McAfee and Brynjolfsson, 2012). A lot of effort and time is needed to sort out the information generated and to identify those that are relevant and viable. What is required is an analytic infrastructure that can structure and relate various bits of information to the objectives being pursued. Therefore, instead of just generating vast amount of information using existing software, managers need techniques to structure, and link various stream of data to create a coherent picture of particular problemso that a better insights into the issue being analysed could be gained. There are several sophisticated analytic techniques such as connectance concept (TAPS), influence diagram, cognitive mapping, and induction graph that managers could apply to make visual representation of the problem being analysed (please see Figure 1).  The Burbidge's connectance concept (Burbidge, 1984) enables managers to create a network of variables based on the 'cause and effect' relationships. Recently, the vast Burbidges' database has been computerised via Tool for Action Plan Selection (TAPS) by a team of researchers at Cambridge University (Tan and Platts, 2003;Tan and Platts, 2004). It has two basic functions: the first is connecting different variables, tools or objectives together and showing the clear relationship between each other (Tan and Platts, 2004); the second is to create a whole view of the action plan, after knowing the different sequences in achieving the target, it can help managers to choose the suitable action. This tool was adopted by many companies to solve manufacturing problems. In the big data environment, there are explosions of data and information, and big data analytics can figure out the relevant variables or competence sets, and classify them into different groups to enrich the TAPs network. However, although TAPS indicates how the actions can affect the objectives, it is a qualitative technique that unable to quantify the potential impact of each connectance.
Influence diagram is one of the most widely known and used cause-effect diagrams in operations management (Shachter, 1986;Smith, 1989;Guezguez et al., 2009). It is a systematic technique for identifying the possible root causes of a problem by breaking it down into components, and also the direction of the effect. An influence diagram attempts to represent all causal relationships in a manner that is non-ambiguous and probabilistic (Cobb and Shenoy, 2008). Cognitive mapping is used to explore and structure problems (Buzan, 1982). It allows an individual to acquire, store, recall, and decode information about the relative locations and attributes of phenomena in their everyday environment. It uses only text to build complex networks, which may have several foci (Fransoo and Wiers, 2006;Georgiou, 2009). Both influence diagram and cognitive mapping are useful techniques for managers to visually understanding 'as it' problems. However, both techniques lack analytical capabilities to process vast volume of data.
Induction graphs are a generalization of decision trees (Zighed and Rakotomalala, 2000). In a decision tree, the classification decision is made from root towards leaves without possible backward return from a node to a lower or higher level node in the tree. Induction graphs enable users to introduce links between different level nodes and thus compose a graph structure. This method is now much used in browsing data methods such as knowledge retrieval from the data which also called data-mining (Huyet and Paris, 2004).
Overall, these analytic mapping infrastructures are not necessarily optimised for the decision making task due to their general purposes. For example, Burbidge's connectance concept and influence diagram only focused on the qualitative relationship, while induction graph might lead to complicated decision problem that is difficult to solve. And also, cognitive mapping might result in overly complex models since it allows the development of multiple foci (see Figure 1).

THE PROPOSED BIG DATA ANALYTIC INFRASTRUCTURE
Thus, a much better analytic infrastructure is needed to assist managers to better make use of the available big data to gain competitive advantages. Instead of just generating vast amount of information using existing software, what managers need are techniques to structure, and link various stream of data to create a coherent picture of a particular problemso that a better insight into the issue being analysed could be gained. For example, having identified the products that could meet future markets from big data analysis; subsequently, how could managers identified the required competence sets to develop the new products? What managers need are an analytic infrastructure that use big data as inputs to make more informed strategic decisions. Li (1997) proposed an analytic technique called deduction graph model that allows firms to incorporate their own competence sets with other firms. It provides a sequence of optimised expanding process in a visual way by linking different competence sets from various sources (Li et al., 2000). Although this approach has not been adopted in big data analytic area, we believe it provides the right analytic capabilities to help firms to harvest big data to enhance supply chain innovation.
The deduction graph model proposed by Li (1997) illustrates the competence sets expanding process vividly. It is an optimisation model to cooperate with other competence sets (Yu and Zhang, 1992 (Li, 1997;Li et al., 2000). This analytic infrastructure is trying to build a deduction graph beginning from the starting node (Sk) to the ending node (Tr) through the intermediate nodes (I). Then it uses the 0-1 integer programming to get the optimised solution.
Li's deduction graph is an efficient mathematic method. It provides a learning network by connecting the related competence sets, and then it uses the optimisation programming to find optimal solutions to acquire the needed skills. It can provide more alternative process sequences to solve a problem. competence sets. In this way, the proposed analytic infrastructure can overcome Li (1997) deduction graph model's limitations and offer many potential values to companies. involves a two-step process to operationalise the proposed framework: data management and data analytics. In particular, this paper is mainly focuses on data analytics. Step one: Data management First of all, it is essential for organisations to understand what information they need in order to create as much value as possible. This is because some valuable company data are created and captured at high cost but most of them are ignored finally. Thus, it is significant to meet their bulk storage requirements in big data management stage for experimental data bases, array storage for large-scale scientific computations, and large output files (Sakr et al., 2011).
Data requirements could be different due to different organisations' needs and problems.
Then, a number of data pre-processing techniques, including data cleaning, data integration, data transformation and data reduction, can be applied to remove noise and correct inconsistencies from data sets. After that, data mining techniques can be used to help managers generate lots of useful information, involving internal skills (I), existing competence sets (Sk), needed competence sets (Tr) and the relevant skills as well as the learning cost data toward a specific issue. All these information captured is significant for the development of deduction graph models in step two.
Step two: Data analytics Data analytics involves data interpretation and decision making. We use deduction graph model in this step, which illustrates the competence sets expansion process vividly (Li, 1999).
As the internal skills (I), existing competence sets (Sk), needed competence sets (Tr) and the relevant skills as well as the learning cost data can all be acquired from step one via data mining. The harvested data will serve as inputs to the deduction graph, a unique mathematic model that can be built to address a particular problem. Then, managers can apply the deduction graph to visualise the expansion process and use LINGO software to obtain the optimal solution. Moreover, a knowledge network (we call it competence network) will be developed allowing managers to see various options to achieve their goals. Then, the optimisation programming could be used to help managers to find the optimal solution. The competence network also provides alternative paths to achieve a set goal. Thus, if the owner has more options for expanding its manufacturing process, it will be easier to make optimal decisions.

A CASE STUDY
A case study was conducted to evaluate the applicability of the proposed approach in In particular, the SPEC Company determines the preferences of their customers by analysing their registered information and recent shopping history from data warehouse. The SPEC Company has more than 6 million registered customers and their shopping history is changing all the time. Moreover, the company gathered feedback from their customers about their preferences. In order to identify each eyeglasses product and generate new product ideas, the company collected different source of data such as videos, photos, number of comments and number of followers from the most popular websites (i.e. eBay, amazon) by using Web Crawler, Web Page Cleaning and HTML parsing technologies. It is worth to mention that all these collected information has vast amounts of data where people produce and share every second. For example, On Facebook alone we send 10 billion messages including photos and videos per day, click the "share" button 4.5 billion times and upload 350 million new pictures each and every day (Thibeault and Wadsworth, 2014). Moreover, most of the information is unstructured data (i.e. photos, videos or social media) which means it cannot easily be put into tables. Furthermore, take Twitter posts as an example, the data quality and accuracy are less controllable. Thus, in order to harvest great values from big data, the trustworthiness of the data is a significant issue that Company SPEC needs to address.
Currently, Company SPEC was capable to analyse available data using the existing data mining techniques. The aim is to harvest the available unstructured data to serve as ideas for new production innovation and operations improvement. The approach, however, could lead to different part of information on the eyeglasses products. For example, in order to produce a new product, managers might get different answers from customer feedback, website information and user comments. The management was unable to combine (i.e. make sense) of the isolated group of processed information to create a coherent understanding of potential new product development ideas or trends. As a result, the management team was not confident that current approaches to extract understanding from big data are appropriate to assist them in future decision making.
The proposed big data analytic infrastructure is not just a combination of conventional big data techniques and deduction graph model, it was employed to assist managers in Company SPEC in making effective use of big data to support decision making (i.e. development of future products) as well as improve supply chain operations. It is based on real company data and overcomes the information connectivity problem. It uses different conventional big data techniques to harvest useful information from big data. For example, Apache Mahout for machine learning algorithms in business, Tableau for big data visualisation, Storm for analyse real-time computation system and InfoSphere for big data mining and integration. Then, deduction graph can be used to combine the useful information gathered to support managers in making a comprehensive decision towards a particular issue. Especially, Company SPEC is keen to explore how to make use of the value from big data to enhance their manufacturing department competence sets (i.e. that further strengthen product innovation capabilities etc.).
The following sections describe the detailed application of the proposed analytic approach in Company SPEC.

Company SPEC Manufacturing Processes
Company SPEC employs more than 200 employees, and annual turnover is about 33million reminbi. The firm has two main manufacturing departments: A and B. The case was championed by the Chief Operating Officer (CEO). Both factory managers of departments A and B, and the manager of the information management department (manager C) were also took part in the testing process.
Analysis of existing big data (by the information management department) indicated 5 different types of glasses will satisfy most customers' preferences and have vast potential for future development.  Table 2 shows the needed competences to make a specific product. For example, to produce active shutter 3D glasses will require active 3D technology (d), infrared receiver system (h), triple flash skill (m) and liquid crystal panel technology (n).     So far, Company SPEC has extracted many new understanding of customers' needs and market potential based on the gathered big data. However, managers were not able to make sure of the various bit of analysed information to make informed decisions. The CEO judgement or induce new supply chain innovation capabilities'. The comments were echoed by both factory managers. The next step of the process is to utilise the deduction graph model to create a competence network in order to better utilise the information generated from the big data analysis.

The Competence Network
A competence network can vividly express the possible means of expanding a competence set to manufacturing new products (Li, 1997). The network developed in this case contains compound nodes and considers a cyclical situation. Figure 3(a) shows the expanding process of department A to produce X1, X2, or X3, and Figure 3(b) shows the expanding process of department B to produce Y1 or Y2 based on its current skills a, b, and f. Each node represents each competence set or skill. The arc shows there is a connection between the two nodes, such as, a  c means skills c can be learned from skill a. As for d and m, there is no arc between these nodes, meaning that it is almost impossible to learn d from m or to learn m from d. The number on the arc means the cost spent on obtaining the skills. There are also compound nodes, such as d^e and a^b. The compound node can only be used when the decomposed nodes are obtained. In order to produce the new products, the needed skills will be obtained by learning from existing skills or by purchasing from other departments directly. For example, in Figure 3(a), the skill f can be learned from skill e, c, and d^e with the cost of 2.5, 2, and 1, respectively. But A also can purchase skill f from department B with the cost of 1.5. Also, e  f  g  i shows the learning sequence indicating that the learning process starts from e, learns f, then learns g, then leans skill i from g. The final objective of the competence network is using optimisation software to find the best sequence with the highest profit.
Based on the developed network, the challenge faced by Company SPEC can be formulated as a linear programming problem. A deduction graph will be generated to support the model in finding the optimal solution.
Finally, the above objective equations and constraints were set in LINGO to obtain the optimised solution. The solution is that department A should produce X2, and department B should produce Y1. The solution result is shown in Table 7. As for department A, skill i is learning from c, skill f is bought from department B, and skill g is learning from f. And for department B, skill c is bought from department A and skill j is learning from skill c. The expanding deduction graph is shown in Figures 4(a) and 4(b).

Product X2 Y1
Needed competence sets f, i, g j, c, b   of Company SPEC about the application process of the analytic infrastructure. The applicability of the proposed approach was evaluated based on the criteria of feasibility, usability, and utility (Platts, 1994).

Feasibility
The infrastructure was evaluated as feasible on overall. The CEO stated that the required information for this analytic method was appropriate. The information is real and relevant.
The timing of the application process was appropriate; it only takes half a day to finish.
Manager C pointed out that one of the many benefits attributed to the analytic process was the insight gained into the problem that was being modelled. He further said that "in the process of model building, we are forced to ask questions that may never have been asked and to examine the information generated from big data analysis". However, Manager B felt that a longer time scale would be needed if they would like to look into the model in more depth and examine the developed competence network in detail.

Usability
The infrastructure application was rated as quite clear. The outcome of the testing shows that the combination of data mining technique with deduction graph was useful. The participants felt that the proposed structure and network rules provided a useful guidance for building a useful competence network model. In term of ease of use, the analytic infrastructure was rated as easy to understand. The CEO commented that 'the mathematical property takes some time to understand but okay afterward'. Manager A felt that the process was very useful for them to develop a 'competence network' from a firm perspective. He further pointed out that 'each of us sees the factory operations through a unique set of lenses that is determined by our personal experiences, and our capabilities. Thus, none of us, as part of a functional group, have a good understanding of the competence sets entirely'. All the participants agreed that the process was appropriate and they had high confidence in the decision reached.

Utility
The utility of the proposed infrastructure was rated highly by the managers. The CEO commented the proposed technique and process helps Company SPEC to make use of the information generated from big data to offer new insights into product development and operations improvement. This method can be widely used in manufacturing operations, and the CEO has great interest to continuing applying the proposed analytic infrastructure in Company SPEC.
In general, all participants felt that the process provided a structured approach for decision making and the competence network helped to illustrate the competence set expanding process vividly. The application of the analytic infrastructure in Company SPEC indicates that the method has high feasibility, utility and usability. As the case was conducted in an eyeglasses company, it indicates that it was feasible for applications in the manufacturing settings. The CEO described it as 'a road map that provides many alternative ways to arrive at the destination'. The feedbacks also highlighted a number of research issues that remain to be addressed.

DISSCUSION
This section discusses the results of this work, and the wider implications for managers and academe. The findings are grouped and evaluated under two main areas: the value of an analytic infrastructure; and the value of competence network.

The Value of an Analytic Infrastructure
Big data analysis is far too frequently carried out relatively informally and generated vast amount of 'isolated' information. Managers might spend significant time to make sense of the analysed information. This often occurs in an ad hoc way based on the manager's past experience. This is understandable; faced with complexity, and the need to act, managers will tend to seek the comfort of the known (Tan and Platts, 2003). An analytic infrastructure provides a mechanism for combating this tendency. Our research shows that managers liked the structured approach that enables them to develop a visual decision path that captures the logic behind the variety of decisions made over the course of the competence set analysis process. Although the specific problem might be unique, they felt reassured that an approach to addressing it was well known. The CEO commented that with the analytic infrastructure, Company SPEC can utilise the full potential value offer in the big data analysis.

The Value of Competence Network
The main finding of this research has been the development and testing of an analytic infrastructure. This method combines deduction graph and data mining techniques. The combination can overcome the shortcoming of both methods. The existing data mining technique is useful to discover unknown information, but it cannot totally address the supply chain problems. Although such techniques might help managers to produce a lot of information, they are unfocused, and hence inefficient. A lot of effort and time is needed to sort out the information generated and to identify those that are relevant and viable. Therefore, instead of just generating vast amount of information using existing data mining software, managers need a better approach to structure, and link various stream of data to create a coherent picture of particular problemso that a better insights into the issue being analysed could be gained. The proposed analytic infrastructure shows the interrelationship of different competence sets visually, so the decision-makers have the clear view about the expansion of the competences sets. The analytic infrastructure is efficient to support decision making by offering managers more alternative choices and suggesting the optimal expanding process of incorporating a company's own competence sets with others.

CONCLUSIONS
Thus, although the term 'big data' is not new, the application of big data in supporting supply chain operations is a relatively new area (Cecere, 2013;Zhou et al., 2014). The case study results indicated that the proposed approach enables Company SPEC to: a) gain new product development ideas; and b) understand how different sub-firms or departments could work together to optimise the manufacturing processes and to produce new products in the most cost effective way.
We have demonstrated how the proposed infrastructure gives integrated support throughout the process, providing a more comprehensive functionality than is provided by the existing data mining or deduction graph model approaches discussed earlier in this paper. The deduction graph model captures and interrelates different competence sets, providing a comprehensive view of the firm capabilities for strategic analysis. It provides a proven way of eliciting and quantifying the relationships necessary to use the information harvest from big data. Using this analytic infrastructure, managers can model different supply chain operations and product development decisions and use the results to aid in supply operations strategy decisions as well as enhancing innovation capabilities.
To our knowledge, this is the first attempt that incorporated the big data analytics and applied it in a synergistic fashion with the deduction graph technique. The evidence provided in this paper reveals the promise of this combinatorial approach, which we believe is worth further developmental efforts from big data and supply chain operations management scholars.
While the proposed approach is potentially useful there are a number of research issues that remain to be addressed. Ongoing refinement and improvement is a fundamental component of valid research. First and foremost is to test the approach on a wide variety of product designs and supply chains in order to determine the general applicability of the approach. The second issue involves the assumption that each decision maker can freely exchange information and is willing to purchase and sell competencies at prescribed prices. The last issue is that the mathematical approach to acquire the optimal results is quite complex and tedious. Thus, future research should be carried out (for example, a software) to simplify the deduction graph computation.

Acknowledgement:
The author would like to thank Miss Fan Chen for her help in the case study data collection and analyses. The authors also want to thank Nottingham University Business School Spark Fund for the support of the research.
Also, should be less or equal than 6 and are integers.