A multi-objective hyper-heuristic based on choice function

Hyper-heuristics are emerging methodologies that perform a search over the space of heuristics in an attempt to solve difficult computational optimization problems. We present a learning selection choice function based hyper-heuristic to solve multi-objective optimization problems. This high level approach controls and combines the strengths of three well-known multi-objective evolutionary algorithms (i.e. NSGAII, SPEA2 and MOGA), utilizing them as the low level heuristics. The performance of the proposed learning hyper-heuristic is investigated on the Walking Fish Group test suite which is a common benchmark for multi-objective optimization. Additionally, the proposed hyper-heuristic is applied to the vehicle crashworthiness design problem as a real-world multi-objective problem. The experimental results demonstrate the effectiveness of the hyper-heuristic approach when compared to the performance of each low level heuristic run on its own, as well as being compared to other approaches including an adaptive multi-method search, namely AMALGAM.


Introduction
Most real-world problems are complex. Due to their (often) NP-hard nature, researchers and practitioners frequently resort to problem tailored heuristics to obtain a reasonable solution in a reasonable time. Generally, there are two recognized types of hyper-heuristics : (i) heuristic selection methodologies: (meta-)heuristics to choose (meta-)heuristics, and (ii) heuristic generation methodologies: (meta-)heuristics to generate new (meta-)heuristics from given components. A selection hyper-heuristic framework manages a set of low level heuristics and chooses one to be applied at any given time using a performance measure for each low level heuristic . The interest in selection hyperheuristics has been growing in the recent years. However, the majority of research in this area has been limited to single-objective optimization.
A limited number of studies on selection hyper-heuristics have been introduced for multi-objective problems (see Table 1). Burke et al. (2003b) presented a multi-objective hyper-heuristic based on tabu search (TSRoulette Wheel), applying it to space allocation and timetabling problems. Veerapen et al. (2009) described another hyper-heuristic approach comprising two phases, applying it to the multi-objective traveling salesman problems. McClymont & Keedwell (2011) used a Markov chain-based learning selection hyper-heuristic (MCHH) for solving a real-world water distribution networks design problem. A new hyper-heuristic approach based on a multi-objective evolutionary algorithm i.e. NSGAII (Deb & Goel, 2001) was proposed in (Gomez & Terashima-Marìn, 2010,2012. NSGAII learned to choose from a set of rules representing a constructive heuristic for 2D irregular stock cutting. In (Furtuna et al., 2012) a multi-objective hyper-heuristic for the design and optimization of a stacked neural network is proposed. The proposed approach is based on NSGAII combined with a local search algorithm (Quasi-Newton algorithm). Rafique (2012) presented a multi-objective hyper-heuristic optimization scheme for engineering system design problems. A genetic algorithm, simulated annealing and particle swarm optimization are used as low-level heuristics. de Armas et al. (2011) and Miranda et al. (2010) described a representation scheme to be used in hyper-heuristics for multi-objective packing problems. Kumari et al. (2013) presented a multi-objective hyper-heuristic genetic algorithm (MHypGA) for the solution of a multi-objective software module clustering problem. In MHypGA, different methods of selection, crossover and mutation operations of genetic algorithms incorporated as a low-level heuristics. Vázquez-Rodríguez & Petrovic (2013) proposed a multi-indicator hyper-heuristic for multi-objective optimization. This was approach based on multiple rank indicators that taken from NSGAII (Deb & Goel, 2001), IBEA (Zitzler & Künzli, 2004) and SPEA2 (Zitzler et al., 2001). Len et al. (2009) proposed a hypervolume-based hyper-heuristic for a dynamic-mapped multi-objective island-based model. Bai et al. (2013) proposed a multiple neighbourhood hyper-heuristic for two-dimensional shelf space allocation problem. The proposed hyper-heuristic was based on a simulated annealing algorithm.
Different frameworks have been proposed for mixing a set of existing algorithms applied to different problems, such as an adaptive multimethod search (AMALGAM) (Vrugt & Robinson, 2007;Raad et al., 2010;Zhang et al., 2010) and multi-strategy ensemble multi-objective evolutionary algorithm (Wang & Li, 2010).
None of the above have used multi-objective evolutionary algorithms (MOEAs), with the exception of (Gomez & Terashima-Marín, 2010;Vrugt & Robinson, 2007;Rafique, 2012) and none of the standard multi-objective test problems are studied, except in (McClymont & Keedwell, 2011;Vrugt & Robinson, 2007;Len et al., 2009;Vázquez-Rodríguez & Petrovic, 2013). Moreover, none of the previous hyper-heuristics make use of the components specifically designed for multi-objective optimization that we introduce. This paper highlights the need for scientific study in the research area of multi-objective evolutionary algorithms and hyper-heuristics. We focus on an online learning selection choice function based hyper-heuristic, to solve continuous multi-objective optimization problems, and their hybridization with multi-objective evolutionary algorithms which controls and combines the strengths of three well-known multi-objective evolutionary algorithms (NSGAII (Deb & Goel, 2001), SPEA2 (Zitzler et al., 2001), and MOGA (Fonseca & Fleming, 1998)). The choice function was successful when used as a selection method for single-objective optimization Kendall et al., 2002). To the best of our knowledge, no work been reported in the literature that utilizes the choice function as selection method within a hyper-heuristic framework for multiobjective optimization.  Kumari et al. (2013) Hypervolume Dynamic-mapped islandbased model Len et al. (2009) Particle Swarm Optimization, Adaptive Metropolis Algorithm, Differential Evolution Water resource problems \A number of continuous multi-objective test problems Vrugt and Robinson (2007), Raad et al. (2010), Zhang et al. (2010) Memory Strategy, Genetic and Differential Operators Dynamic optimization problems \A Number of continuous multiobjective test problems Wang and Li (2010) Genetic Algorithm, Simulated Annealing, Particle Swarm Optimization Engineering system design problems \A number of classical multiobjective test problems Rafique (2012) Simulated Annealing Shelf space allocation Bai et al. (2013) Our hyper-heuristic for multi-objective optimization addresses the research areas of multi-objective evolutionary algorithms and hyper-heuristics. Section 2 discusses each one of these areas. The rest of the paper is orga-nized as follows. Section 3 provides the details of the proposed hyperheuristic framework for multi-objective optimization. The empirical results comparing our approach to the well known multi-objective evolutionary algorithms that are used as the low level heuristics are presented in Section 4. The comparison of our multi-objective hyper-heuristic to other approaches over benchmark test problems and a real-world problem are presented in Section 5 and 6 respectively. Section 7 summarizes and discusses possible future research directions.

Multi-objective Optimization
A multi-objective optimization problem (MOP) comprises several objectives, which need to be minimized or maximized depending on the problem. In the literature, many similar techniques are presented for multiobjective optimization. An example is a posteriori search is conducted to find solutions for the objective functions. Following this, a decision process selects the most appropriate solutions often involving a trade off. Examples of this methodology are multi-objective evolutionary optimization (MOEA) methods, whether non Pareto-based or Pareto-based methods. The Pareto-based evaluation is a method used to evaluate the quality of MOP solutions. In Pareto-based methods, all objectives are simultaneously optimized by applying Pareto dominance concepts. The idea behind the dominance concept is to generate a preference between MOP solutions since there is no information regarding the objective preference provided by the decision maker. Tan et al. (2002) and Coello et al. (2007) present a more formal definition of Pareto dominance.
Def inition 1: A vector u = (u 1 , ..., u k ) is said to dominate another vector v = (v 1 , ..., v k ) (denoted by u ≼ v) according to k objectives, if and only if, u is partially less than v, i.e., ∀ i ∈ {1, ..., k}, u In the literature, various features for multi-objective optimization test problems are presented. Those features are designed to make the problems difficult enough to examine algorithmic performance. Examples of these features are deception (Goldberg, 1987;Whitley, 1991), multimodality (Horn & Goldberg, 1995), noise (Kargupta, 1995), and epistasis (Davidor, 1991). Moreover, other features of test problems are suggested in (Deb, 1999), such as multi-modality, deceptive, isolated optimum and col-lateral noise. These features can cause difficulties for evolutionary optimizers in terms of converging to the Pareto optimal front (POF) and maintaining the population diversity. Furthermore, some characteristics of the POF such as convexity or nonconvexity, discreteness, and nonuniformity could cause difficulties in term of the population diversity (Zitzler et al., 2000).
Typically, a test suite should include different test problems which consist of a wide range of characteristics and features as mentioned perviously. However, it is impractical to have a test suite that incorporates all possible combinations of features. The test suites most commonly employed as benchmark multi-objective problems in the MO/EA literature are the ZDT test suite (Zitzler et al., 2000), the DTLZ test suite (Deb et al., 2002), the WFG (Huband et al., 2006) and more recently LZ09 (Li & Zhang, 2009).
The Walking Fish Group's (WFG) test suite (Huband et al., 2006) consists of nine test problems (see Table 2). The benchmark problems fully satisfy the earlier recommendations. The WFG is designed only for real valued parameters with no side constraints. They make the problems easy to analyze and implement. The features of the WFG dataset are seen as the common choice for most MO/EA researchers (Huband et al., 2006). Unlike most of the multi-objective test suites such as ZDT (Zitzler et al., 2000) and DTLZ (Deb et al., 2002), the WFG test suite has powerful functionality since it has instances with distinct features compared to other test suites. Therefore, the WFG test suite is used as our benchmark dataset in this study.

Multi-objective Evolutionary Algorithms (MOEAs)
Many EA researchers would argue that evolutionary algorithm(s) are more suitable methods to deal with multi-objective optimization problems (Coello et al., 2007;Deb, 2001;Anderson et al., 2007;Bäck, 1996;Deb & Goldberg, 1989;Fonseca & Fleming, 1998;Zhang & Li, 2007;Miranda et al., 2010) because of their population-based nature, which means they can find Pareto-optimal sets (trade-off solutions) in a single run which allow a decision maker to select a suitable compromise solution.
MOGA was proposed by Fonseca & Fleming (1993). The Pareto ranking scheme is used i.e. each solution in the current population is given a rank based on their dominance (Veldhuizen & Lamont, 2000;Landa-Silva, 2003). A modified version of this algorithm has been proposed in (Fonseca & Fleming, 1998). This version employed restricted sharing between solutions that have the same rank and the distance between two solutions is computed and compared to the key sharing parameter σ share . While MOGA is efficient and easy to implement, its fitness sharing method prevents two vectors that have the same value in the objective space existing simultaneously unless the fitness sharing is genotypic-based.
NSGAII (Deb & Goel, 2001) is a non-explicit building block MOEA that incorporates the concept of elitism (Coello et al., 2007;Deb, 2005). The solutions compete, then each solution is are ranked and sorted based on its Pareto-optimal level. Genetic operators are applied to generate a new group of children who are then merged with parents in the population (Coello et al., 2007). Furthermore, a niching method based on crowding distance is used during the selection process in order to maintain a diverse Pareto front (Zhang & Li, 2007). Although NSGAII is efficient, it still has some drawbacks. It is unable to generate a Pareto optimal set in some regions of the search space, particularly in unpopulated regions (Coello & Pulido, 2001). In addition, its search bias strongly appears as the number of objectives rises (Jaszkiewicz, 2001). In other words, the algorithm seems to have bias towards some regions in the search space.
SPEA2 (Zitzler et al., 2001) incorporates a fine-grained fitness assignment strategy which considers the number of individuals for each solution that dominates it and which it is dominated by. It uses a nearest neighbor density estimation technique in order to increase the efficiency of the search. SPEA2 improves the archive truncation method that guarantees the preservation of boundary points by replacing the average linkage method used in the earlier version SPEA (Zitzler & Thiele, 1999). Experimental results show that SPEA2 performs well in terms of diversity and distribution as the number of objectives increases. In addition, it significantly outperforms its predecessor.

Studies on the Comparison of MOEAs
Most MOEAs have common strategies that are employed in their search process. However, they differ in the way that they apply these strategies. MOGA classifies the solutions based on the ranking scheme using linear or exponential interpolation and applies the sharing scheme in the objective space. NSGA uses dummy fitness values assigned to the solutions and applies the sharing scheme in the decision variable space (Veldhuizen & Lamont, 2000). NSGA with elitism performs as well as a SPEA (Zitzler et al., 2000).
In the literature, some studies have compared MOEAs' performance and quality against each other. A comparison study for SPEA2, NSGAII and MOGA on ZDT4 and ZDT6 problems (Zitzler et al., 2000) was presented in (Watanabe et al., 2002). With respect to the ratio of the nondominated individual metric (RNI), NSGAII has better performance than the others on ZDT4. However, SPEA2 outperforms MOGA and NSGAII for the same metric on ZDT6. The authors concluded this study by stating that SPEA2 has an advantage in its accuracy over NSGAII. While NSGAII is superior to SPEA2 in finding wide spread solutions. In (Khare et al., 2003), another a comparative study for NSGAII, SPAE2 and PEAS on four test problems (DTLZ1, DTLZ2, DTLZ3 and DTLZ6) (Deb et al., 2002) with 2-8 objectives was carried out. Three performance metrics were used for convergence and diversity of the obtained non-dominated set and the running time that a MOEA requires to execute. SPEA2 performs better than NSGAII in terms of convergence for a small number of objectives. However, both perform similarly for a higher number of objectives. SPEA2 and NSGAII have good performance with respect to the diversity, but they have some difficulties in the closeness of the obtained non-dominated set to the POF. In comparison, PEAS (Liu et al., 2007) performs very well in converging to the true front but it fails in diversity and it requires a longer computational time as the number of objectives increases. However, NS-GAII requires less run time to run as the number of objectives increases. Another comparative study between NSGAII and SPEA2 on the WFG test problems (Huband et al., 2006) with 24 real values and a different scale of objectives was presented in (Bradstreet et al., 2007). For two objectives, NSGAII is superior to SPEA2 on the WFG test problems with respect to the epsilon metric and the hypervolume (SSC). In contrast, SPEA2 outperforms NSGAII on all WFG problems expect WFG3 in three objectives with respect to the same two metrics. We can note from two last studies that the number of objectives can affect the performance of an algorithm. SPEA2 works well with a high number of objectives for WFG and a low number of objectives for DTLZ. The opposite is true for NSGAII. We can also observe from these comparative studies that an algorithm can perform better than other algorithms with respect to a specific metric on a certain problem, while another algorithm performs better than another algorithm with respect to another metric for the same problems. Moreover, the performance of a multi-objective algorithm could vary with respect to the number of objectives, it is able to effectively cope with. All these observations could be an advantage when combining different algorithms under a hyper-heuristic framework for multi-objective optimization to derive the strengths of the algorithms and avoid their weaknesses.

Selection Hyper-heuristics
In a hyper-heuristic approach, different heuristics or heuristic components can be selected, generated or combined to solve a given computationally difficult optimization problem in an efficient and effective way. The task of the high level strategy is to guide the search intelligently and adapt considering the success/failure of the low level heuristics or combinations of heuristic components during the search process. Hyperheuristics are sufficiently general and modular search methods enabling reuse of their components for solving problems from different domains (Qu & Burke, 2009). The focus of this study is selection hyper-heuristics which perform a search using two successive stages Ozcan et al., 2008): (meta-)heuristic selection and acceptance. An initial solution or (a set of initial solutions) is iteratively improved using the low level (meta-)heuristics until some termination criteria are satisfied. During each iteration, the (meta-)heuristic selection decides which low level (meta-)heuristic will be employed next. After the selected (meta-)heuristic is applied to the current solution/s, a decision is made whether to accept the new solution/s or not using an acceptance method. The low level (meta-)heuristics in a selection hyper-heuristic framework are in general human designed heuristics which are fixed before the search starts.
Usually, in a selection hyper-heuristic framework, there is a clear separation between the high level hyper-heuristic approach also referred to as a strategy and the set of low-level heuristics or heuristic components. It is assumed that there is a domain barrier between them (Burke et al., 2003a). The purpose of domain barrier is increase the level of the generality of hyper-heuristic by being able to apply it to a new of problem without changing the framework, only a new set of problem-related lowlevel heuristics need to be supplied. The barrier allows only problem domain independent information to flow from the low level to the high level, such as the fitness/cost/penalty value measured by an evaluation function, indicating the quality of a solution (Hussin, 2005). Low level heuristics, or heuristic components, are the problem domain specific elements of a hyper-heuristic framework. Hence they have access to any relevant information, such as candidate solution/s.
Most of the existing selection hyper-heuristics are based on perturbative low level heuristics, and favor single-point based search. Cowling et al. (2002) investigated the performance of different hyper-heuristics, combining different heuristic selection, with different move acceptance methods, on a real world scheduling problem. Simple Random, Random Descent, Random Permutation, Random Permutation Descent, Greedy and Choice Function were introduced as heuristic selection methods. The authors utilized the following deterministic acceptance methods: All-Moves accepted and Only Improving moves accepted. The hyper-heuristic combining Choice Function with All-Moves acceptance performed the best. In Kendall et al., 2002), the choice function heuristic selection method is adaptively ranks the low-level heuristics (h i ) using Eq. 1 .
where f 1 measures the individual performance of each low level heuristic, f 2 measures the performance of pairs of low level heuristics invoked consecutively, and finally, f 3 is the elapsed CPU time since the heuristic was last called. Both f 1 and f 2 support intensification while f 3 supports diversification. The parameter values for α, β and δ are changed adaptively based on a reinforcement learning strategy. In , the choice function based hyper-heuristic was applied to nurse scheduling and sales summit scheduling. The study shows that the choice function hyper-heuristic is successful in making effective use of low level heuristics, due to its ability of learning the dynamics between the solution space and the low level heuristics to guide the search process towards better quality solutions. There are an increasing number of studies showing the success of choice function heuristic selection (Özcan et al., 2008;Gibbs et al., 2011;Burke et al., 2012). In these studies, a choice function is used for solving single-objective problems. Unlike these studies, we introduce a selection hyper-heuristic framework based on choice function which is modified to deal with multi-objective optimization problems and a mechanism to rank low level heuristics for heuristic selection. More on selection hyperheuristics including an overview of hyper-heuristic components, different framework, application areas can be found in .

A Selection Hyper-heuristic Framework for Multi-objective Optimization
The design of the framework for our multi-objective hyper-heuristic is motivated in two ways.. Firstly, there is no existing algorithm that excels across all types of problems. In the context of multi-objective optimization, no single MOEA algorithm has the best performance with respect to all performance measures on all types of multi-objective problems. Some comparison studies in MOEAs which emphasizes this idea are presented in Section 2.2. Secondly, the hybridization or combining different of (meta-)heuristics/algorithms into one framework could yield promising results compared to (meta)heuristics/algorithms when used alone. we are looking to gain an advantage of combining different algorithms in a hyperheuristic framework for multi-objective optimization to get benefit from the strengths of the algorithms, whilst avoiding their weaknesses.
The idea of hybridizing a number of algorithms (heuristics) into a selection hyper-heuristic framework is straightforward. However, many design issues related to the development of hyper-heuristics for multiobjective optimization require more attention when designing such a framework to be applicable and effective. One of design issues choosing appropriate low-level heuristics, is a challenging task. Many questions arise. What is the heuristics (algorithms) are suitable to deal with multi-objective optimization problems, are priori approaches or a posteriori approaches more suitable etc?. As the aims of hyper-heuristic is to raise the level of generality, a posteriori approach is more suitable to achieve this aim. Unlike the priori approaches, there is no need to set objective preferences or weights prior to the search process in the posteriori approach such as MOEAs. We agree with many researchers (Coello et al., 2007;Deb, 2001;Anderson et al., 2007;Bäck, 1996;Deb & Goldberg, 1989;Fonseca & Fleming, 1998;Zhang & Li, 2007;Miranda et al., 2010) that evolution-ary algorithms are more suitable methods to deal with multi-objective optimization problems because of their population-based nature. This means that they can find Pareto optimal sets (trade-off solutions) in a single run, which allow a decision maker to select a suitable compromise solution (with respect to the space of the solutions). In the context of multiobjective hyper-heuristics, a decision maker could be a selection method that decides which is the best low level heuristics to select at each decision point. According to some studies in MOEAs, that presented in Section 2.2, we choose three well-known multi-objective evolutionary algorithms (NS-GAII (Deb & Goel, 2001), SPEA2 (Zitzler et al., 2001), and MOGA (Fonseca & Fleming, 1998)) to act as low level heuristics. Although NSGAII, SPEA2 and MOGA are no longer considered good, they are still viewed as a good baseline for MOEA research. They incorporate much of the known MOEA theory (Veldhuizen & Lamont, 2000) which make them applicable to investigate their combination under the multi-objective hyper-heuristic framework. As one of our hyper-heuristic aims is to benefit from the strengths of low level heuristics and avoid their weaknesses, the features of NSGAII, SPEA2 and MOGA enable us to investigate this aspect.
Another design issue related to the development of hyper-heuristics for multi-objective optimization is a selection mechanism. As a selection hyper-heuristic relies on an iterative process, the question arises what is an effective way to choose an appropriate heuristic at each decision point. In single-objective case, this criterion is easy to determine by measuring the quality of the solution such as the objective/cost value and time. However, this is more difficult when tackling a multi-objective problem. The quality of the solution is hard to assess as many different criteria be considered such as the number of non-dominated individuals and the distance between the non-dominated front and the POF. As we aim to keep the framework simple, we not employ information problem specific such as the number of objectives nor the nature of the solution space. We focus more on the performance of low level heuristics. Thus, high performing heuristics are chosen more frequently for intensification. Diversification is also taken into account, so that a balance between intensification and diversification is provided. The choice function heuristic selection gives a balance between intensification and diversification. As mentioned earlier, the measurement of the quality of the solution for multi-objective problems requires us to assess different aspects of the non-dominated set in the objective space. According to Tan et al. (2002), no single MOEA excels in all performance measures. We employ a learning mechanism based on different measures using a ranking scheme to provide a feedback about the quality of the solutions. We do not aim to choose a heuristic that performs well with respect to all measures. We aim to select a heuristic that performs well in most measures. The learning mechanism that is employed in our multi-objective hyper-heuristic are provided in Section 3.1.
In this study, we present a selection choice function based hyper-heuristic for multi-objective optimization denoted as HH CF. The choice function heuristic selection acts as the high level strategy which adaptively ranks the performance of three low level meta-heuristics deciding which one to call at each decision point during the search process. All-Moves is employed as a deterministic acceptance strategy, meaning that we accept the output of each low level heuristic whether it improves the quality of the solution or not.

Choice Function and a Ranking Scheme for Multi-objective Optimization
The HH CF framework imposes the domain barrier concept. No problem specific information is exchanged between the high level hyper-heuristic and low level meta-heuristics. However, the framework enables us to maintain relevant information on how each low level meta-heuristic performs. In order to provide some information as a feedback, regardless of the multi-objective problem, four performance metrics are maintained as shown in (Tan et al., 2002). The high level strategy selects one low level heuristic at each decision point according to the information obtained from the feedback mechanism. Note that the three low level heuristics operate in an encapsulated way. Each approach has its own characteristic as described in section 2.1, but they share the same population.
Four performance metrics are employed in the proposed HH CF framework as a feedback mechanism: • Algorithm effort (AE) (Tan et al., 2002): measures the computational effort of an algorithm to obtain the Pareto optimal set. It ranges from [0,∞). A smaller value of AE indicates better performance.
• Ratio of non-dominated individuals (RNI) (Tan et al., 2002): evaluates the fraction of non-dominated individuals in the population. It ranges from [0,1]. If RNI = 1, this indicates that all individuals for a given population are non-dominated .
• Size of space covered or so-called S metric Hypervolume (SSC) (Zitzler & Thiele, 1999): evaluates the size of the objective functions space covered by the solutions around the POF . It ranges from [0,∞). A higher value of SSC indicates better performance.
• Uniform distribution of a non-dominated population (UD) (Srinivas & Deb, 1994): evaluates the distribution of non-dominated individuals over the POF. It ranges from [0,1]. A higher value of UD indicates better performance.
These metrics are chosen as they have been commonly used for MOEAs to measure different aspects of the final population (Tan et al., 2002). In addition, they do not require prior knowledge of the POF. This means that our framework is suitable for tackling any given problem in future studies. The performance metrics (AE, RNI, SSC, UD) (Tan et al., 2002) are used as performance indicators for each low level heuristic. They serve the high level online learning mechanism which guides the search to determine which low level heuristic should be selected. Since the performance metrics provide values that are in different scalar units, it is not trivial to combine those values into a single score which can be used to select the best heuristic. Therefore, we use a ranking scheme in order to select the best performing heuristic at each step. This ranking scheme is simple and flexible. It enables us to incorporate any number of heuristics and even performance indicators.
We propose the following choice function heuristic selection based on two stage ranking scheme (f 1 ) to be used as a part of the selection hyperheuristic for multi-objective optimization.
Eq. 2 differs from Eq. 1 as it is adjusted to deal with a given multi-objective optimization problem, but their goal is the same, measuring the overall performance of a low level heuristic. In Eq. 2, CF (h) is an overall score of each heuristic h. f 1 (h) embeds the ranking scheme and is used for intensification. f 2 (h) is the number of CPU seconds elapsed since the heuristic was last called. This provides an element of diversification, by favoring those low level heuristics that have not been called recently. α is a large positive value (e.g. 100). It is important to strike a balance between f 1 (h) and f 2 (h) values, so that they are in the same scalar unit. The low level heuristic with the highest value of CF (h) is the heuristic that is applied.
In the case of multiple heuristics having the same scores, then we choose one of them randomly. Unlike the ranking scheme that was used in (Vázquez-Rodríguez & Petrovic, 2013) which ranked heuristics based on their probabilities against the performance indicators using a mixture of experiments, our ranking scheme operates in two successive stages. Fig. 1 illustrates how the ranking scheme works based on four performance metrics to rank three heuristics, denoted as h1-h3. In the first stage, each heuristic is ranked with respect to each performance indicator, separately. For this purpose, a matrix of N × M is used, where N and M is the number of heuristics and performance metrics, respectively. All heuristics are ranked according to their performances against each metric. The rankings of heuristics for each metric get recorded as a column of the matrix, the best and worst rank being 1 and N , respectively. If two heuristics have the same performance, both heuristics are assigned to the same rank. In the second stage, all heuristics are ranked according to their frequency count of the best rank, that is 1, from the first stage of ranking. This ranking is denoted as F req rank (h) for each heuristic. Finally, each heuristic is ranked according to its frequency of count of the best rank. In Fig. 1, h 2 has the best final rank, after the second stage of ranking.
As we do not only look for the heuristic that has the best performance, but we also aim to have a large number of non-dominated individuals. For each heuristic, the rank of the frequency count of the best rank is added to its RNI rank using the following equation: f 1 (h) represents the performance of an individual heuristic h. It is structured to favor the best performing low level heuristic with respect to as many metrics as possible used in the system. For example, h 2 has the highest f 1 (h) in Fig. 1 and if durations of all heuristics being invoked last time are the same, then h 2 will be selected, since it will have the highest score.  Figure 1: An example of how f 1 is computed based on the two stage ranking process, given three low level heuristics, denoted as h 1 , h 2 and h 3 . Those heuristics are ranked using four performance metrics of AE, RNI, SSC, and UD in the first stage. The ↓ and ↑ show whether heuristics are ranked in decreasing or increasing order for the associated metric

The Multi-objective Hyper-heuristic
Algorithm 1 shows how a greedy algorithm is applied initially to determine the best low level heuristic to apply in the first iteration (steps 2-6). All three low level heuristics are run (step 3). Then, the three low level heuristics scored and ranked using Eq. 3 and their choice function values are computed by using Eq. 2 (steps 4&5). The low level heuristic with the highest choice function value is selected (step 6) to be applied as an initial heuristic (step 8). Then, for all low level heuristics, the ranking mechanism is updated (step 9). The choice function values are also computed and updated (step 10). According to the updated choice function values, the low level heuristic with the highest choice function value is selected to be applied in the next iteration (step 11). This process is repeated until the stopping condition is met (steps 7-12). Note that the greedy algorithm is applied only once at the beginning of the search, in order to determine which low level heuristic to apply first. Then, only one low level heuristic is selected at each iteration. until (termination criteria are satisfied) 13: end procedure

Performance Comparison of HH CF and Low level Meta-heuristics
A set of experiments using the WFG test suite (Huband et al., 2006) is conducted to see the performance difference when using each individual multi-objective meta-heuristic (NSGAII, SPEA2, and MOGA) run on its own and the proposed HH CF selection hyper-heuristic approach, that combines the three multi-objective meta-heuristics. Although NSGAII and SPEA2 have previously been applied to the WFG test suite in (Bradstreet et al., 2007), we repeat the experiments, including MOGA, under our own experimental settings.

Experimental Settings and Performance Evaluation Criteria
Nine test problems for the WFG suite (WFG1-WFG9) have 24 real parameters including four position parameters, 20 distance parameters and two objectives. We agree with (Zitzler et al., 2000) that two objectives are enough to represent the essential features of multi-objective optimization problems to demonstrate the significance of the proposed approach. All settings for the test suit are fixed using the same settings proposed in the previous studies (Zitzler et al., 2000;Huband et al., 2006). According to (Voutchkov & Keane, 2010;Chow & Regan, 2012) an algorithm could reach better convergence by 6,250 generations. Therefore, the HH CF was terminated after 6,250 generations. That is, HH CF runs for a total of 25 iterations. In each iteration, one low level heuristic is applied and this is executed for 250 generations, with a population size of 100. The secondary population of SPEA2 is set to 100. The execution time takes about 10-30 minutes depending on the given problem. In order to make a fair comparison, each low level heuristic is used in isolation and is terminated after 6,250 generations. For all three low level heuristics, the simulated binary crossover (SBX) operator is used for recombination and a polynomial distribution for mutation (Deb & Agrawal, 1995).
For the WFG problems, 30 independent trials were run for each algorithm with a different random seed. The crossover and mutation probability were set to 0.9 and 1/24 respectively. The distribution indices for crossover and mutation were set to 10 and 20 respectively. In the measure of SSC, the reference points for WFG problems with k objectives was set r i = (0, i * 2), i = 1, ..., k (Huband et al., 2006) . The distance sharing σ for the UD metric and MOGA was set to 0.01 in the normalized space. These settings were used for SSC and UD as a feedback indicator in the ranking scheme of HH CF and as a performance measure for the comparison. All algorithms were implemented with the same common sub-functions using Microsoft Visual C++ 2008 on an Intel Core2 Duo 3GHz\2G\250G computer.
The comparison of the quality of solutions for multi-objective optimization is more complex than single-objective problems. The number of non-dominated individuals should be maximized, the distance of the nondominated front should minimized, i.e. the resulting non-dominated set should be distributed as uniformly possible and converge well toward the POF. Because of that,we use three performance metrics RNI, SSC, and UD, to assess the quality of approximation sets in different aspects. These performance metrics are also used to measure the effectiveness of the ranking scheme in the HH CF as they are employed as feedback indicators for low level heuristics. In addition, we used a t-test as the statistical test while comparing the average performance of a pair of algorithms with respect to a metric averaged over 30 trials. The null hypothesis is as follows: { H 0 the performance of a pair of algorithms have same means H 1 the performance of a pair of algorithms have different means We assume two independent samples, unequal variance and one-tailed distribution with a 95% confidence level. We aim to reject the null hypothesis and accept the alternative hypothesis and demonstrate that our proposed choice function based hyper-heuristics HH CF is statistically different from the three low level heuristics (NSGAII, SPEA2, and MOGA), when used in isolation. We use the following notation. Given two algorithms P (left) and Q (right), P-Q + (−) indicates that P performs better than Q on average and this performance difference is statistically significant. The ∼ sign indicates that both algorithms deliver a similar performance. n/a means the t-test is not applicable for two samples since they are completely equal.

Results
NSGAII, SPEA2, MOGA and HH CF are tested on the nine WFG test problems under the same experimental settings described in the previous section. Table 3 summarizes the average and standard deviation value pairs for each algorithm with respect to RNI, SSC, and UD over 30 trials. For all performance metrics, a higher value indicates a better performance. HH CF has a higher RNI value than MOGA while it has a lower value than NSGAII and SPEA2 for WFG1. HH CF has the highest value of SSC and UD metrics among the methods. We can put WFG5 and WFG6 in this category. For WFG2 and WFG3, HH CF has a RNI value similar to MOGA and lower than the others. With respect to SSC, HH CF has higher values than SPEA2 and MOGA and similar to NSGAII. However, HH CF has the highest value among other methods in the measure of UD. For WFG4 and WGF7, HH CF has the lowest (worst) RNI value and the highest UD value. HH CF has a higher value than MOGA similar to NSGAII and SPEA2 with respect to the SSC metric. For WFG8 and WFG9, the HH CF has the lowest value with respect to RNI and SSC metrics, and the highest value with respect to UD metric.
These performance results with respect to RNI, SSC and UD are also displayed as box plots in Figs. 2, 4 and 3 in order to provide a clear visualization of the distribution of the simulation data of the 30 independent runs. The statistical t-test comparing our proposed HH CF and the three low level heuristics (NSGAII, SPEA2, and MOGA), when used in isolation for the three performance metrics (RNI, SSC, and UD) are given in Table 4. We note that HH CF and other algorithms are statistically different in the majority of cases.  Fig. 2, NSGAII and SPEA2 perform better than the others and produce the highest value of RNI for all datasets. This performance variation is statistically significant as illustrated in Table 4. Moreover, NSGAII and SPEA2 performs the same across all benchmark with respect to RNI. However, HH CF and MOGA produce relatively low values for this metric. HH CF performs significantly better than MOGA on two instances of WFG1 and WFG5 and vice-versa for two instances of WFG8 and WFG9. For the rest of the instances, they deliver the same performance. This indicates that HH CF performs badly according to the metric of RNI and produces a low number of non-dominated solutions than other algorithms, except for MOGA.  Fig. 3, it can be seen that HH CF has the highest uniform distribution UD value across all test problems. This indicates that HH CF is superior to the other algorithms on all WFG instances in terms of the distribution of non-dominated individuals over the POF. This performance variation is statistically significant as illustrated in Table 4. HH CF performs significantly better than the other methods on all nine instances of WFG.
In Fig. 4, the performance of HH CF for SSC is relatively better than SPEA2 and MOGA across all test problems except for WFG9. HH CF performs significantly better than SPEA2 and MOGA on eight instances of WFG (see Table 4). HH CF also performs better than NSGA2 in WFG1, WFG5 and WFG6. This performance variation is statistically significant as illustrated in Table 4. HH CF performs significantly better than NSGAII on three instances (WFG1 and WFG5, WFG6).
Although HH CF performs similarly to NSGAII on WFG2, WFG3, WFG4, and WFG7, HH CF performs significantly slightly better than NSGAII on three instances (WFG2, WFG4 and WFG7) (see Tables 3 and4). For WFG8 and WFG9, HH CF does not perform well compared to the others, except MOGA. HH CF performs significantly worse than NSGAII and SPEA2 where HH CF performs significantly better than MOGA as shown in Table  4.
We note from the above results that HH CF performs worse than the low level heuristics when used in isolation with respect to the RNI metric, and it produces a low number of non-dominated solutions for most of the WFG problems. However, HH CF performs very well and produces nondominated solutions that are distributed uniformly well over the POF with respect to the UD metric when compared to the other methods. HH CF also performs better than the others in most of the WFG problems and produces non-dominated solutions with high diversity that cover a larger proportion of objective space with respect to the SSC metric, except for WFG8 and WFG9 where it failed to converge towards the POF. As WFG8 and WFG9 has a significant bias feature, HH CF may have difficulties coping with bias.
Generally, HH CF produces competitive results across most of the WFG problems with respect to two performance metrics (SSC and UD) out of the three metrics. Although HH CF obtains low number of solutions, it produces very good solutions in terms of diversity and convergence when compared to the low level heuristics when used in isolation. HH CF can benefit from the strengths of the low level heuristics. Moreover, it has the ability to intelligently adapt to calling combinations of low level heuristics.
To understand how the HH CF could obtain these results, we analyze the behavior of the low level heuristics in the next sub-section.

Behavior of Low Level Heuristics
We compute the average heuristic utilization rate which indicates how frequently a given low level heuristic is chosen and applied during the search process, across all runs, in order to see which low level heuristic is used more frequently. The results are presented in Fig. 5. The average heuristic utilization rate of NSGAII is at least 44% and is the highest among all low level heuristics for each problem, except WFG5 for which SPEA2 is chosen most frequently with a utilization rate of 55.72% during the search process. It explains why HH CF has either a similar or relatively better convergence to the POF for most of the test problems when compared with NSGAII. It is indicates that NSGAII performs best among other low level heuristics on most of the WFG problems. The authors theorize that HH CF, therefore, prefers NSGAII and it is chosen more frequently than the other low level heuristics. Our result is consistent with the result in (Bradstreet et al., 2007) that show the best performance is achieved by NSGAII on the WFG test functions with two objectives. The performance of MOGA is not that good on the WFG test, thus it is invoked relatively less frequently during the search process because of the diversification factor f 2 , in equation 3. However, MOGA still influences the performance of HH CF, negatively, in particular with respect to the RNI metric. This is due to that fact that MOGA does not have any archive mechanism or preserving strategy to maintain the non-dominated solutions during the search. The average utilization rate of MOGA is the highest for WFG8 (10.16%) and WFG9 (22.40%). This utilization rate explains why the performance of HH CF is the worst performing approach in terms of RNI. HH CF also faces some difficulty while solving WFG8 and WFG9 in terms of convergence.
In order to see the effectiveness of each chosen low level heuristic on the performance of HH CF, we investigate the performance of the low level heuristics with respect to the RNI, SSC and UD metrics at twenty five decision points during the search process. We observe that some problems are following a specific pattern to invoke the low level heuristics during the search. Each problem has its own pattern. For example, for WFG3, NSGAII is invoked and executed for the first seven consecutive decision points. Then SPAE2 is invoked for the next four decision points,  Figure 5: The average heuristic utilization rate for the low level heuristics (NSGAII, SPEA2 and MOGA) in HH CF on the WFG test suite followed by one iteration of MOGA. Then NSGAII is chosen for the rest of the search. More of these patterns are illustrated in Fig. 6. In order to analyze these results, we divide the WFG instances into four categories based on the performance of HH CF compared to the three low level heuristics being used in isolation with respect to RNI, SSC and UD as listed below: 1. WFG1,WFG5 and WFG6: • RNI: Better performance than MOGA and worse than NSGAII and SPEA2 • SSC: The best performance among NSGAII, SPEA2 and MOGA • UD: The best performance among NSGAII, SPEA2 and MOGA 2. WFG2 and WFG3: Figure 6: The average of RNI,SSC and UD values versus decision point steps plots across selected benchmark problems (the WFG3 ,WFG4 and WFG5). Each step in the plot is associated with the most frequently selected low level heuristics across 30 trials.
• RNI: Similar performance to MOGA and worse than NSGAII and SPEA2 • SSC: Better performance than SPEA2 and MOGA and similar to NSGAII • UD: The best performance among NSGAII, SPEA2 and MOGA 3. WFG4 and WGF7: • RNI: The worst performance among NSGAII, SPEA2 and MOGA • SSC: Better performance than SPEA2 and MOGA and similar to NSGAII • UD: The best performance among NSGAII, SPEA2 and MOGA 4. WFG8 and WFG9: • RNI: The worst performance among NSGAII, SPEA2 and MOGA • SSC: The worst performance among NSGAII, SPEA2 and MOGA • UD: The best performance among NSGAII, SPEA2 and MOGA For each category described above, except the last one, we have selected a sample problem to visualize the low level call patterns. WFG5 for the first category, WFG3 for the second category and WFG4 for the third category. For the last category, no specific pattern has been observed. The selected three problems have different problems features in terms of separability and modality Huband et al. (2006). The average of RNI, SSC and UD values versus decision point plots across selected benchmark problems (WFG3, WFG4 and WFG5) are shown in Fig. 6. Each step in the plot is associated with the most frequently selected low level heuristics across 30 trials. Since we employed All-Moves as an acceptance strategy, some moves are accepted even if it worsens the solution quality.
There is strong empirical evidence in the literature showing that the number of iterations is influential on the performance of a learning hyperheuristic (Burke et al., , 2012. Here, we observe from Fig. 6 that if the experiments were performed with shorter iterations, say a run was terminated at twelve instead of twenty five, then the algorithm would have ended up with worse SSC and UD values for the sample benchmark functions of WFG3, WFG4 and WFG5. It is clear from Fig. 6 that MOGA produces a worse solution with respect to RNI during the search and this solution is accepted influencing the performance of HH CF. However, some worsening moves could still lead to better solutions at the end of the search process with respect to a certain metric, such as the performance of HH CF with respect to the UD metric. SPEA2 produces low quality solutions in terms of the distribution along the POF, but this helps it to escape from the local optimum and obtain better solutions at the end. This is also true with respect to the SSC performance indicator. In addition, we note that HH CF has an advantage over MOGA and outperforms the three MOEAs methods with respect to the distribution of non-dominated individuals over the Pareto optimal front. It also has an advantage over NSGAII in terms of convergence, in that it performs better than all other methods in some problems while performing better or similar to NSGAII on the other problems. However, HH CF does not have an advantage over NSGAII and SPEA2 with respect to the non-dominated individuals in the population. HH CF performs poorly because of MOGA's effect. It worth noting that the fewer number of iterations is not sufficient for the learning heuristic selection method to distinguish the well performing low level heuristics from poorer ones. This is clear from Fig. 6 where increasing the number iterations improves the SSC and UD values for the selected problems. It can be concluded that our choice function hyper-heuristic can benefit from the strengths of the low level heuristics. And it can avoid the weaknesses of them (partially), as the poor performance of MOGA affects the performance of HH CF badly in the metric of RNI by producing low number of non-dominated solutions. We can avoid this by employing another acceptance move strategy instead of All-Moves. A non-deterministic acceptance strategy could accept worsening moves within a limited degree and help improve the quality of the solutions. However, HH CF has the ability to intelligently adapt to calling combinations of low level heuristics.

Performance Comparison of HH CF to the Other Multi-objective Approaches
The experiments are conducted to examine the performance of our proposed HH CF compared to two other multi-objective approaches; a random hyper-heuristics (HH RAND) and the adaptive multi-method search (AMALGAM) (Vrugt & Robinson, 2007). In a random hyper-heuristic (HH RAND), we employ a simple random selection instead of the choice function selection that is used in HH CF. No ranking scheme nor learning mechanism, is embedded into HH RAND. In HH RAND, we use the same three low level heuristics that are used in HH CF. The hypervolume (SSC) (Zitzler & Thiele, 1999)and the generational distance (GD) ( Van Veldhuizen & Lamont, 1998) metrics are used to compare the performance of the multiobjective hyper-heuristic for this set of experiments. The GD measures the distance (convergence) of the approximation non-dominated front to the POF. A smaller value of GD is more desirable and it indicates that the approximation non-dominated front is closer to the POF. We use t-test for the average performance comparison of algorithms and the results are discussed using the same notation as in Section 4.1.
In order to keep the computational costs of the experiments to an affordable level, all the methods were executed for 25,000 function evaluations with a population size of 100 and 250 generations in each run. Both HH CF and HH RAND are executed for 2500 function evaluations at each iteration. Depending on the given problem, the execution time of HH CF and HH RAND for one run takes about 5-12 minutes. Other parameter settings of AMALGAM are identical to those used in (Vrugt & Robinson, 2007). We used the Matlab implementation of AMALGAM obtained from the authors via personal communication. We implemented a C++ interface between AMALGAM and the WFG test suite's C++ code. All other experimental settings are fixed the same in Section 4.1.

Results
The performance values of HH CF and the other hyper-heuristics methods with respect to the performance metrics SSC and GD on the WFG problems are summarized in Table 5. For each performance metric, the average and standard deviation values are computed. These performance results with respect to SSC, GD are also displayed as box plots in Figs. 7 and 8 in order to provide a visualization of the distribution of the simulation data of the 30 independent runs. The statistical t-test comparing our proposed HH CF and other multi-objective hyper-heuristics for the metrics (SSC and GD) are given in Table 6. The results show that the HH CF performs better than the other algorithms in most cases. As expected, HH CF achieves better coverage and diversity than HH RAND according to two metrics. This is due to the learning mechanism that is used in HH CF which adaptively guides the search towards the POF. Interestingly, HH RAND performs better than AMALGAM according to the hypervolume metric except in WFG9. However, HH RAND performs worse than AMALGAM according to the GD metric on all problems. This performance variation is statistically significant as illustrated in Table 6. HH RAND performs significantly better than AMALGAM for the SSC metric on eight instances of WFG except in WFG9. HH RAND also performs significantly better than AMALGAM for the GD metric on three instances (WFG1, WFG6 and WFG7) while it performs significantly similar to AMALGAM on one instance of WFG5 where it performs significantly worse than AMALGAM for the rest.
Compared to AMALGAM, HH CF performs better with respect to the convergence and diversity for the most of the WFG problems. According to the SSC metric, HH CF produced non-dominated solutions that covers a larger proportion of the objective space than AMALGAM on all WFG problems except for WFG9. In Table 6, HH CF performs significantly better than AMALGAM on eight instances of WFG except for WFG9 where AMALGAM performs significantly better than HH CF on this instance. The superiority of HH CF in SSC metric is due to the stronger selection mechanism and the effective ranking scheme that rely on choosing a heuristic with the best SSC value at the right time (decision point) to guide the search to move toward more spaces around the POF. This result is more reliable as shown in Fig. 7.
According to the metrics of GD, HH CF is superior to AMALGAM on most of WFG problems as reported in Table 5 and displayed as box plots in Fig 8. In Table 6, HH CF performs significantly better than AMALGAM on five instances out of nine including WFG1, WFG2, WFG5, WFG6, and WFG7 for the metric of GD. Again, this result is due to the online learning selection mechanism and the ranking scheme in HH CF. The ranking scheme maintains the past performance of low level heuristics using a set of performance indicators that measure different aspects of the solutions. During the search process, the ranking scheme creates a balance between choosing the low level heuristics and their performance according to a particular metric. This balance enhances the algorithm performance to yield better solutions that converge toward the POF as well as distribute uniformly along the POF. However, AMALGAM performs significantly better than HH CF on the other four instances for GD (see Tables 5 and  6). This might be because of the nature of the problems that present difficulties for HH CF to converge toward the POF or might slow down the convergence speed such as the bias in WFG8,WFG9 and the multimodality of WFG4. It is good to report that AMALGAM has better performance according to the three metrics; SSC and GD in WFG9. This is shown in Table 6 that AMALGAM performs significantly better than others on one instance (WFG9).
For each problem, we computed the 50% attainment surface for each algorithm, from the 30 fronts after 25,000 evaluation functions. In Fig. 9 shows the POF and the 50% attainment surface of the algorithms. HH CF shows good convergence and uniform distribution for most datasets. It seems clear that HH CF has converged well on the POF in WFG1 and WFG2 when compared to other algorithms. Moreover, HH CF produced solutions that covered larger proportions of the objective space compared to the other algorithms. AMALGAM has poor convergence on the most problems. It has fewer number of solutions with poor convergence on WFG2. AMALGAM has no solutions over the middle-lower segments of the POF for WFG3, WFG5, WFG6, WFG7, and WFG8 and no solutions over the upper-middle segments of the POF for WFG4.
It can be concluded that all the above results demonstrate the effectiveness of HH CF in terms of its ability to intelligently adapt to calling combinations of low level heuristics and outperforming other hyper-heuristics for multi-objective optimization.

Performance HH CF on the Multi-objective Design of Vehicle Crashworthiness
More experiments are conducted over a multi-objective real-world problem, namely the design of vehicle crashworthiness problem (Liao et al., 2008), to evaluate the performance of our choice function based hyperheuristics (HH CF). The same performance evaluation criteria and algorithms are used as described in the previous section. In addition, the performance of HH CF is compared to NSGAII (Deb & Goel, 2001). The motivation behind applying HH CF to this problem is to see its performance on a real-world problem and to measure the level of generality that can achieve.

Problem description and Formulation
In the automotive industry, crashworthiness is a very important issue when designing a vehicle. Crashworthiness design of real-world vehicles involves optimization of a number of objectives including the head, injury criterion, chest acceleration, chest deflection, etc. However, some of these objectives may be, and usually are, in conflict with each other, i.e. an improvement in one objective value leads to deterioration in the values of the other objectives. Liao et al. (2008) presented a multi-objective design for the vehicle crashworthiness problem with three objectives considering the mass of the vehicle as the first design objective, while an integration of collision acceleration between t 1 = 0.05s and t 2 = 0.07s in the full frontal crash as the second objective function. The toe-board intrusion in the 40% offsetfrontal crash is tackled as the third objective. The second and third objectives are constructed from the two crash conditions to reflect the extreme crashworthiness and formulated as quadratic basis functions while the vehicle mass is formulated as a linear function as follows: Mass = 1640.2823 + 2.3573285t 1 + 2.3220035t 2 + 4.5688768t 3 + 7.7213633t 4 + 4.4559504t 5 (4) Ain = 6.5856 + 1.15t 1 − 1.0427t 2 + 0.9738t 3 + 0.8364t 4 − 0.3695t 1 t 4 + 0.0861t 1 t 5 + 0.3628t 2 t 4 − 0.1106t 2 1 − 0.3437t 2 3 + 0.1764t 2 4 (5) (6) So, the multi-objective design of vehicle crashworthiness problem is formulated as: where x=(t 1 , t 2 , t 3 , t 4 , t 5 ) T (7)

Experimental Settings
We performed 30 independent runs for each comparison method using the same parameter settings as provided in (Liao et al., 2008) with a population size of 30 and running for 50 generations in each iteration. In order to make a fair comparison, we repeated NSGAII experiments conducted in (Liao et al., 2008) under the same environment. All methods were run for 75,000 function evaluations. The distance sharing σ for the UD metric and MOGA was arbitrarily set to 0.09 in the normalized space. These settings were used for the UD as a feedback indicator in the ranking scheme of the HH CF and as a performance measure for the comparison. As the true Pareto front is unknown, we consider the best approximation found by means of combining results of all considered methods and used it instead of a true Pareto front for the metrics of GD. In the measure of SSC, the reference points in our experiments for k objectives can be set as r i = z nadir i + 0.5(z nadir i − z ideal i )(0, i * 2), i = 1, ..., k (Li & Landa-Silva, 2011). Other experimental settings are the same as those used in Section 4.1. All experiments are performed on an Intel Core2 Duo 3GHz/2G/250G computer.

Results
An initial set of experiments is performed to observe the performance variation of our approach with respect to the number of the decision points, denoted as N DP . N DP depends on the other parameters such as the number of function evaluations and the number of generations. During the initial experiments, the number of objective function evaluations is fixed as 1,500 for each stage which starts at each decision point. Each decision point is executed with the chosen MOEA for a fixed number of generations and population size, set to 50 and 30, respectively. HH CF is tested using two different values for N DP ; 25 and 50.
The performance of HH CF for the different values of N DP with respect to the performance metrics (RNI, SSC, UD, GD) on the vehicle crashworthiness problem is summarized in Table 7. The average values across 30 trials are used. We observe that the choice of N DP does not have an influence on the performance of HH CF based on RNI. The best SSC and UD average values are obtained when N DP is 50. However, HH CF obtains the best average value for UD with an N DP of 25. Fifty decision points produces better solutions, hence these results are used for HH CF while comparing its performance to the other approaches.
The mean performance comparison of HH CF, HH RAND, AMAL-GAM and NSGAII based on the performance metrics SSC and GD for solving the vehicle crashworthiness problem is provided in Table 8. The distribution of the simulation data of the 30 independent runs for the comparison methods with respect to these performance metrics are visualized in Fig. 10. For the metric of SSC, a higher value indicates a better performance while a lower value indicates a better performance for the metric of GD. Tables 8 and 9 show that HH CF has the highest average value among HH RAND and NSGAII with respect to the hypervolume (SSC), except AMALGAM where perform the best. With respect to the measures of GD, HH CF is superior to all comparison methods. The performance difference of HH CF from the other methods is statistically significant (see Table 9). HH CF performs significantly better than NSGAII in both metrics. In addition, HH CF achieves better coverage and diversity than HH RAND according to both metrics. This results is consistent with the result in Section 4. The results demonstrate that effectiveness of the learning multi-objective hyper-heuristic approach when compared to one without a learning mechanism. This is understandable, as it has been observed that the learning mechanism successfully guides the search towards the POF. Interestingly, HH RAND performs better than AMAL-GAM according to the GD metric. However, HH RAND performs worse than AMALGAM according to the SSC metric. In summary, HH CF performs the best considering convergence and diversity, producing better solutions that converge towards the POF compared to all comparison methods. Although HH CF produces acceptable solutions with respect to the measure of hypervolume, it performs worse than AMALGAM for the same metric. Generally, the results demonstrate the potential of HH CF for solving for solving this type of problem.

Remarks and Conclusion
studies on hyper-heuristics for multi-objective optimization are scarce. For the first time, a general selection hyper-heuristic framework for multiobjective optimization has been proposed. This framework is motivated by: (i) there is no existing algorithm which excels across all types of problems, and (ii) there is empirical evidence showing that hybridization or combining different (meta-)heuristics/algorithms could yield improved performance compared to (meta-)heuristics/algorithms run on their own. Hyper-heuristic frameworks, generally, impose a domain barrier which separates the hyper-heuristic from the domain implementation along with low level heuristics to provide a higher level of abstraction. The domain barrier does not allow any problem specific information to be passed to the hyper-heuristic during the search process. We designed our framework in this same modular manner. One of advantages of the proposed framework is its simplicity. The proposed framework is highly flexible and its components reusable. It is built on an interface which allows other researchers to write their own hyper-heuristic components easily. Even the low level heuristics can be easily changed if required. If new and better performing components are found in the future, the software can be easily modified to include those components for testing. A simple choice function, for the first time, is employed as a (high level heuristic) selection mechanism to deal with the multi-objective optimization problems. The choice function adaptively ranks the performance of three low-level metaheuristics, deciding which one to call at each decision point. All-Moves is employed as an acceptance strategy, meaning that we accept the output of each low level heuristic whether it improves the quality of the solution or not. In our multi-objective hyper-heuristic framework, a learning process is an essential component for guiding the heuristic selection method while it decides on the most appropriate heuristic to apply at each step of the iterative approach. We employed four performance metrics (algorithm effort (AE), ratio of non-dominated individuals (RNI), size of space covered (SSC) and uniform distribution of a non-dominated population (UD) to act as an online learning mechanism to provide knowledge of the problem domain to the selection mechanism. These metrics do not require a prior knowledge of the POF, which means that our framework is suitable for tackling any given real-world problem in future.
The performance metrics are integrated into a ranking scheme that we introduced in this study. Our ranking scheme relies on sorting the low level heuristics in descending order based on the highest ranking among the other heuristics.
The ranking scheme is simple and flexible and enables us to incorporate any number of low level heuristics. Three well-known multi-objective evolutionary algorithms (NSGAII, SPEA2, and MOGA) are incorporated into the multi-objective choice function hyper-heuristic framework to act as the low level meta-heuristics. Although NSGAII, SPEA2 and MOGA are not considered state-of-the-art MOEA, they are still viewed as a baseline of MOEAs. They incorporate much of the known MOEA theory which make them applicable to investigate their hybridization within our multiobjective hyper-heuristic framework.
Our multi-objective choice function based hyper-heuristic (HH CF) is tested over both benchmark test problems i.e the WFG test suite and realworld application i.e. the multi-objective design of vehicle crashworthiness with two and three objectives respectively. The experimental results demonstrate the effectiveness and potential of the proposed approach in solving continuous multi-objective optimization problems. HH CF outperforms the low level heuristics, i.e. NSGAII, SPEA2 and MOGA, when used in isolation, and to two other multi-objective hyper-heuristics; a random hyper-heuristics HH RAND and adaptive multi-method search AMAL-GAM on the WFG test suite. HH CF performs well in terms of the distribution of non-dominated individuals along the POF and obtains competitive results in terms of converging towards the POF. Moreover, this observation further is supported by empirical evidence obtained from testing HH CF against NSGAII, HH RAND and AMALGAM over the multiobjective vehicle crashworthiness design problem as a real-world problem. HH CF is superior to HH RAND and NSGAII for solving this problem in terms of convergence and diversity. In addition, HH CF outperforms AMALGAM according to the measures of generational distance. HH CF still produces solutions with acceptable quality with respect to the metric of hypervolume (SSC). However, it could not perform better when compared to AMALGAM. This is could be due to the dimensionally of the problem, as the HH CF beats AMALGAM in this metric over two objective problem, where the reverse is true with the three objective problem.
Generally, the results reported, in this study demonstrate the effectiveness of the learning multi-objective hyper-heuristic approach when compared to methodologies with no learning mechanism. This is understandable, as it has been observed that the learning mechanism successfully guides the search process towards the POF. Moreover, the experimental results show that our multi-objective choice function based hyper-heuristic can exploit and combine the strengths of multiple low level heuristics. The superiority of HH CF is due to online learning heuristic selection mecha-nism and the effective ranking scheme. The ranking scheme maintains the past performance of low level heuristics using a set of performance indicators that measure different aspects of the solutions. During the search process, the ranking scheme creates a balance between choosing the low level heuristics and their performance according to a particular metric. This balance enhances the algorithm performance to yield better solutions that converge toward the POF as well as distribute uniformly along the POF. Unfortunately, HH CF cannot avoid the weaknesses of the low level heuristics fully, as the poor performance of MOGA influences the performance of HH CF with respect to the ratio of non-dominated individual (RNI) by causing the generation of lower numbers of non-dominated solutions as compared to NSGAII and SPEA2. We can overcome this by employing another acceptance move strategy instead of All-Moves. Future work needed to investigate the performance of our choice function based hyper-heuristic when employing the alternative move acceptance strategy, that accepts worsening moves within a limited degree and help improve the quality of the solutions. This process is not a trivial process. It requires elaboration of existing methods and their usefulness in a multiobjective setting. The framework in which HH CF is used for managing a set of multi-objective meta-heuristics offers interesting potential research directions in multi-objective optimization. There is strong empirical evidence showing that different combinations of heuristic selection and acceptance methods in a selection hyper-heuristic framework yield different performances in single-objective optimization Asta et al., 2013). More multi-objective optimizers and even heuristic selection can be adapted from previous research in single-objective optimization could incorporated with our multi-objective hyper-heuristic framework. The proposed framework tackled continuous multi-objective optimization problems. Our aim is to test the level of generality of our framework further over a wide number of multi-objective problems including combinatorial, discrete and dynamic problems.