Grammatical Evolution Hyper-Heuristic for Combinatorial Optimization Problems

Designing generic problem solvers that perform well across a diverse set of problems is a challenging task. In this work, we propose a hyper-heuristic framework to automatically generate an effective and generic solution method by utilizing grammatical evolution. In the proposed framework, grammatical evolution is used as an online solver builder, which takes several heuristic components (e.g., different acceptance criteria and different neighborhood structures) as inputs and evolves templates of perturbation heuristics. The evolved templates are improvement heuristics, which represent a complete search method to solve the problem at hand. To test the generality and the performance of the proposed method, we consider two well-known combinatorial optimization problems: exam timetabling (Carter and ITC 2007 instances) and the capacitated vehicle routing problem (Christofides and Golden instances). We demonstrate that the proposed method is competitive, if not superior, when compared to state-of-the-art hyper-heuristics, as well as bespoke methods for these different problem domains. In order to further improve the performance of the proposed framework we utilize an adaptive memory mechanism, which contains a collection of both high quality and diverse solutions and is updated during the problem solving process. Experimental results show that the grammatical evolution hyper-heuristic, with an adaptive memory, performs better than the grammatical evolution hyper-heuristic without a memory. The improved framework also outperforms some bespoke methodologies, which have reported best known results for some instances in both problem domains.



Abstract-Designing generic problem solvers that perform well across a diverse set of problems is a challenging task.In this work, we propose a hyper-heuristic framework to automatically generate an effective and generic solution method by utilizing grammatical evolution.In the proposed framework, grammatical evolution is used as an online solver builder, which takes several heuristic components (e.g.different acceptance criteria and different neighborhood structures) as inputs and evolves templates of perturbation heuristics.The evolved templates are improvement heuristics which represent a complete search method to solve the problem at hand.To test the generality and the performance of the proposed method, we consider two wellknown combinatorial optimization problems; exam timetabling (Carter and ITC 2007 instances) and the capacitated vehicle routing problem (Christofides and Golden instances).We demonstrate that the proposed method is competitive, if not superior, when compared to state of the art hyper-heuristics, as well as bespoke methods for these different problem domains.In order to further improve the performance of the proposed framework we utilize an adaptive memory mechanism which contains a collection of both high quality and diverse solutions and is updated during the problem solving process.Experimental results show that the grammatical evolution hyper-heuristic, with an adaptive memory, performs better than the grammatical evolution hyper-heuristic without a memory.The improved framework also outperforms some bespoke methodologies which have reported best known results for some instances in both problem domains.

I. INTRODUCTION
ombinatorial optimization can be defined as the problem of finding the best solution(s) among all those available for a given problem [1].These problems are encountered in many real world applications such as scheduling, production planning, routing, economic systems and management [1].Many real world optimization problems are complex and very difficult to solve.This is due to the large, and often heavily constrained, search spaces which make their modeling (let alone solving) a very complex task [2].Usually, heuristic methods are used to solve these problems, as exact methods often fail to obtain an optimal solution in reasonable times.The main aim of heuristic methods, which provide no guarantee of returning an optimal solution (or even near optimal solution), is to find a reasonably good solution within a realistic amount of time [3,4].Meta-heuristic algorithms provide some high level control strategy in order to provide effective navigation of the search space.A vast number of meta-heuristic algorithms, and their hybridizations, have been presented to solve optimization problems.Examples of metaheuristic algorithms include scatter search, tabu search, genetic algorithms, genetic programming, memetic algorithms, variable neighborhood search, guided local search, GRASP, ant colony optimization, simulated annealing, iterated local search, multi-start methods and parallel strategies [3], [4].
Given a problem, an interesting question that comes to mind is:

Which algorithm is the most suitable for the problem at hand and what are the optimal structures and parameter values?
The most straightforward answer to the above question might be to employ trial-and-error to find the most suitable metaheuristic from the large variety of those available, and then employ trial-and-error to determine the appropriate structures and parameter values.While these answers seem reasonable, in terms of the computational time involved, it is impractical in many real world applications.Many bespoke meta-heuristic algorithms that have been proposed over the years are manually designed and tuned, focusing on producing good results for specific problem instances.The manually designed algorithms (customized by the user and not changed during problem solving) that have been developed over the years are problem specific, i.e. they are able to obtain high quality results for just a few problem instances, but usually fail on other instances even of the same problem and cannot be directly applied to other optimization problems.Of course, the No Free Lunch Theorem [5] states that a general search method does not exist, but it does not mean that we cannot investigate more general search algorithms to explore the limits of such an algorithm [6][7][8].
Numerous attempts have been made to develop automated search methodologies that are able to produce good results across several problem domains and/or instances.Hyperheuristics [6], meta-learning [9], parameter tuning [10], reactive search [11], adaptive memetic algorithms [12] and multi-method [13], are just some examples.The performance of any search method critically depends on its structures and parameter values [6].Furthermore, different search methodologies, coupled with different structures and parameter settings may be needed to cope with problem instances or different problem domains [9], [10].A search may even benefit from adapting as it attempts to solve a given instance.Therefore, the performance of any search method may be enhanced by automatically adjusting their structures or parameter values during the problem solving process.Thus, the ultimate goal of automated heuristic design is to develop search methodologies that are able to adjust their structures or parameter values during the problem solving process and work well, not only across different instances of the same problem, but also across a diverse set of problem domains [6], [9], [10].
Motivated by these aspects, particularly the hyper-heuristic framework [6], in this work, we propose a grammatical evolution hyper-heuristic framework (GE-HH) to generate local search templates during the problem instance solving process, as depicted in Fig 1.The evolved templates represent a complete local search method which contains the acceptance criteria of the local search algorithm (to determine away of escaping from local optima), the local search structures (neighborhoods), and their combination.The GE-HH operates on the search space of heuristic components, instead of the solution space.In addition, GE-HH also maintains a set of diverse solutions, utilizing an adaptive memory mechanism which updates the solution quality and diversity as the search progresses.We choose grammatical evolution to search the space of heuristic components due to its ability to represent heuristic components and it being able to avoid the problem of code bloat that is often encountered in traditional genetic programming.Our objectives are: -To design an automatic algorithm that works well across different instances of the same problem and also across two different problem domains.-To merge the strengths of different search algorithms in one framework.-To test the generality and consistency of the proposed method on two different problem domains.
The performance and generality of the GE-HH is assessed using two well-known NP-hard combinatorial optimization problems; examination timetabling (Carter [14] and ITC 2007 [15] instances) and the capacitated vehicle routing problem (Christofides [16] and Golden [17] instances).Although both domains have been extensively studied by the research community, the reasons of choosing them are twofold.Firstly, they represent real world applications and the state of the art results, we believe, can still be improved.Currently, a variety of algorithms have achieved very good results for some instances.However, most methodologies fail on generality and consistency.Secondly, these two domains have been widely studied in the scientific literature and we would like to evaluate our algorithm across two different domains that other researchers have studied.Although our intention is not to present an algorithm that can beat the state of the art, but rather can work well across different domains, our results demonstrate that GE-HH is able to update the best known results for some instances.The remainder of the paper is organized as follows: the generic hyper-heuristic framework and its classification are presented in Section II.The grammatical evolution algorithm is presented in Section III, followed by our proposed GE-HH framework in Section IV.The experimental results and result comparisons are presented in Section V and VI, respectively.Finally discussions and concluding remarks are presented in Sections VII and VIII.

II. HYPER-HEURISTICS
Meta-heuristics are generic search methods that can be applied to solve combinatorial optimization problems.However, to find high quality solutions, meta-heuristics often need to be designed and tuned (as do many classes of algorithms, including those in this paper) and they are also often limited to one problem domain or even just a single problem instance.The objective for a solution methodology that is independent of the problem domain, serves as one of the main motivations for designing hyper-heuristic approaches [6], [18].
Recently, significant research attention has been focused on hyper-heuristics.Burke et al. [6] defined hyper-heuristics as An automated methodology for selecting or generating heuristics to solve hard computational search problems.
One possible hyper-heuristic framework is composed of two levels, known as high and low level heuristics (see Fig. 2).
The high level heuristic is problem independent.It has no knowledge of the domain, only the number of heuristics that are available and (non-domain) statistical information that is allowed to pass through the domain barrier.Only the lower part of the framework has access to the objective function, the problem representation and the low level heuristics that have been provided for the problem.During the problem solving process, the high level strategy decides which heuristic is called (without knowing what specific function it performs) at each decision point in the search process.Unlike metaheuristics, hyper-heuristics operate over a search space of heuristics, rather than directly searching the solution space.

Fig.2. A generic hyper-heuristic framework
The low level heuristics correspond to a pool of candidates of problem dependent human-designed heuristics or components of existing heuristics which operate directly on the solution space for a given problem instance.Based on their past performance, heuristics compete with each other through learning, selection or generating mechanisms at a particular point to construct or improve a solution for a given problem instance.
The fact that the high level strategy is problem independent means that it can be applied to different problem domains with little development effort.Indeed, one of the goals of hyperheuristics is to raise the level of generality of search methodologies and to build systems which are more generic than other methods [6].
Burke et al. [6] classified hyper-heuristics into two dimensions, based on the nature of the heuristic search space and the source of feedback during learning (see Fig. 3).The nature of the heuristic search space can either be heuristics to choose heuristics or heuristics to generate heuristics.Heuristics can be called from a given pool of heuristics.For example, Burke et al. [19] used tabu search with reinforcement learning as a heuristic selection mechanism to solve nurse rostering and timetabling problems.Heuristics can also be generated by combining existing heuristic components.For example, Burke et al. [20], [21] employed genetic programming to evolve new low level heuristics to solve the bin packing problem.
The nature of the heuristic search space can be further classified depending on the type of low level heuristics as either constructive or perturbative.Constructive based hyperheuristics start with an empty solution, and select low level heuristics to build a solution step by step.Perturbation based hyper-heuristics start with an initial solution and, at each decision point, select an appropriate improvement low level heuristic to perturb the solution.Based on the employed learning methods, two subclasses are distinguished: on-line or off-line.Fig. 3.A classifications of hyper-heuristic approaches, according to two dimensions: (i) the nature of the heuristic search space and (ii) the source of feedback during learning [6].
In on-line hyper-heuristics, the learning takes place during the problem solving.Examples of online approaches include those based on genetic algorithms [22], tabu search [19], and local based search [23].In off-line hyper-heuristics, learning occurs during the training phase before solving other problem instances, examples include those based on genetic programming [20] and learning classifier systems [24].Recently, GE was utilized in [21] as an off-line heuristic builder to solve the bin packing problem.Our work differs from [21], where we use GE as an online solver builder, and is a much more general methodology that is able to address two problem domains, and produce best known results.In addition, the GE in [21] has been specifically designed and tested on the bin packing problem only (i.e. the grammar has been specifically designed for the bin packing problem only).
Our proposed GE-HH framework can be classified as an online generational hyper-heuristic.In this respect, it is the same as a genetic programming hyper-heuristic which generates heuristics.Genetic programming hyper-heuristics have been utilized to solve many combinatorial optimization problems including SAT [25], [26], scheduling [27] and bin packing [20], [28].A recent, and comprehensive, review on hyperheuristics is available in [29].
Most of the proposed genetic programming based hyperheuristic approaches, however, are constructive heuristics.Their general limitation is that they are tailored to solve specific problems (e.g.SAT, bin packing, and TSP) using a restricted constructive heuristic component.This limitation restricts their applicability to cope with different problem domains without any redevelopment (e.g.redefine the terminals and functions).In addition, previous genetic programming based hyper-heuristics were only applied to one single domain, which raises the question to what extent they will generalize to other domains.Motivated by the above, this work proposes an improvement based hyper-heuristic using grammatical evolution.The proposed framework takes several heuristic components (e.g.acceptance criteria and neighborhood structures) as input and automatically generates a local search template by selecting the appropriate combination of these heuristic components.The differences between our approach and the previous genetic programming based hyper-heuristics in the literature are: 1.The proposed framework generates a perturbation local search template rather than constructive heuristics.2. The proposed framework is not tailored to a particular problem domain e.g. it can be applied to several domains (the user only needs to change the neighborhood structures when applying it to another problem domain).3. The proposed framework utilizes an adaptive memory mechanism to maintain solution diversity.

III. GRAMMATICAL EVOLUTION
Grammatical evolution (GE) [30] is a variant of genetic programming (GP) [31].It is a grammar based GP that can evolve a variable-length program in an arbitrary language.Unlike GP, GE uses a linear genome representation rather than a tree.The clear distinction between the genotype and phenotype in GE allows the evolutionary process (e.g.crossover) to be performed on the search space (variable length linear genotypic) without needing to tailor the diversity-generating operator to the nature of the phenotype [30], [31].The mapping process of genotype and phenotype to generate a program is governed by a grammar which contains domain knowledge [30].The grammar is represented by Backus Naur Form (BNF).The program is generated by using a binary string (genome) to determine which production rule in the BNF definition will be used.The GE general framework is composed of three procedures: grammar, search engine and a mapper procedure (see Fig. 4). A. The BNF Grammar GE utilizes BNF to generate the output program [30], [31].A suitable BNF grammar must be defined when solving a problem, and the definitions vary from one problem to another.The BNF grammar can be represented by a tuple <T, N, S, P> where T is the terminal set, N is the set of non terminals, S is the start symbol (a member of N) and P is a set of production rules.If more than one production rule is used within a particular N, the choice is delimited with the '|' symbol.Below is an example of BNF grammar (adopted from [30]): T= {Sin, Cos, Tan, Log, +, -, /, *, (,)} // set of terminal N= {expr, op, pre_op} // set of non-terminal S= <expr>// starting symbol and P can be represented as // production rules (1) <expr>::= <expr><op><expr> (0) |<pre-op>(<expr>) (2) <op>: | / (2) B. The Search Engine GE uses a standard genetic algorithm as its search engine [30].
A candidate solution (genotype or chromosome) is represented by a one dimensional variable length string array.The gene in each chromosome is called a codon.Each codon is an 8-bit binary number (see Fig. 5).The codon values are used in the mapper procedure to determine which rule to be selected for the non-terminal symbol when it is converted [30] (see Section III-C).The GA starts with a population of chromosomes, which are randomly generated.The fitness of each chromosome is calculated by executing its corresponding program.The fitness function varies from one domain to another.GA operators (selection, crossover, mutation and replacement) are then applied.At each generation, the evolved solutions (children) from the crossover and mutation operators are evaluated by converting them into its corresponding program via the mapper function.
If the fitness of the new solution is better than the worst solution in the population, it will replace it.The process is repeated until a stopping condition is satisfied (e.g.number of generations).

C. The Mapper Procedure
The mapper function converts the genotype into a phenotype (i.e. a program).The function takes two inputs, the binary string (genotype) and the BNF grammar [30].The conversion from genotype to phenotype is carried out using the following rule: Rule= (codon integer value) MOD (number of rules for the current non-terminal) The mapper function begins by mapping the starting symbol into terminals.It converts each codon to its corresponding integer value.Assume we have the above BNF grammar (See Section III-A) and genotype (see Fig. 5).First of all, convert all codon values to integers (with reference to Fig 4,this will be 220,203,17,3,109,215,104,30).Then, starting from the starting symbol, apply the mapping rule to convert the leftmost non-terminal into a terminal until all non-terminals have been converted into terminals.The genotype-tophenotype mapping process of the above BNF grammar and the solution (genotype) is illustrated in Table 1.The mapper begins (see Table 1) with the starting symbol <expr>, and then reads the first codon (220).The starting symbol <expr> has four production rules to select from (see Section III-A).Following the mapping rules, the codon value and the number of production rules are used with the modular function to decide which rule to select, i.e. 220 MOD 4= 0, which means we select the first production rule (<expr><op><expr>).Since this production rule is not a complete expression (it has at least one non-terminal), rules will be applied again.The process will continue from the leftmost non-terminal in the current production rule.Continuing with <expr><op><expr>, take the next codon value (203), the next production rule will be (203 MOD 4= 3) <var><op><expr>.Since <var> has only one choice, <var> will be replaced by X and the production rules will be X<op><expr>.Continuing with the same mapper rules until all non-terminals are converted to terminals, the complete expression will be X-X.
During the conversion process, not all codons may be used, or after using all codon values not all non-terminals have been replaced by terminals.In the case where non-terminals have been replaced with terminals but not all codon values have been used, the mapper process will simply ignore the rest.If all codon values have been used but the expression is still invalid, a wrapper procedure is invoked.The wrapper procedure reads the codon value from the left to right for a predefined number of iterations.If the wrapper procedure is finished but the complete expression is still not available, the genotype is given the lowest fitness value.

IV. THE GRAMMATICAL EVOLUTION HYPER-HEURISTIC FRAMEWORK
In this section we present the grammatical evolution hyperheuristic (GE-HH) framework.Then, we introduce the adaptive memory mechanism, hybridizing it with GE-HH.

A. The Proposed Framework
It is well established that the efficiency of any problem solver relies on its ability to explore regions of the search space, which is strongly influenced by its structures and parameter values [7], [10], [12].Therefore, the performance of any search methodology can potentially be enhanced by automatically adjusting its structures and/or parameter values.In this work, we propose a grammatical evolution hyper-heuristic (GE-HH) framework that generates a different local search template (problem solver) to suit the given problem instance.The proposed framework takes several basic heuristic components as input and generates a local search template by combining these basic components.The process of combining heuristic components will be carried out automatically.Thus, the benefit of this framework is not only to generate different local search templates by combining basic heuristic components, but also to discover new kinds of heuristics, without relying on human interference.
As we mentioned earlier (Section III), there are three essential procedures of grammatical evolution algorithm: a grammar, a search engine and a mapper function.Our search engine (genetic algorithm), and the mapper function are implemented as in the original algorithm [30].The BNF grammar, which is problem dependent, must be defined in order to suit the problem at hand.Generally, the design of the BNF grammar, which decides which production rule will be selected, has a significant impact on the output, i.e. the programs.In our GE-HH framework, the basic heuristic components are represented by BNF.To design a complete BNF grammar one needs to carry out the following steps [30]:  Determine the terminals, non-terminals and starting symbol. Design the BNF syntax which may have problem specific function(s).
In this work, three different heuristic components (acceptance criteria (Ac), neighborhood structures (Ns) and neighborhood combinations (Nc)) are used as basic elements of the BNF grammar.We have selected these three components because they are recognized as crucial components in designing problem solvers [3], [18].These are explained as follows: 1.The acceptance criteria (Ac) decides whether to accept or reject a solution.A number of acceptance criteria have been proposed in the literature and each one has its own strengths and weaknesses.The strength of one acceptance criterion can compensate for the weakness of another if they can be integrated into one framework.In this work, we have employed several acceptance criteria.The acceptance criteria that are used in our GE-HH framework have been widely used in the literature [3], [6], [18], [29], and are presented below.

Improving or equal only:
The generated solution is accepted if the objective value is equal or better than the previous one.The local search template that uses this acceptance criterion will be executed for a pre-defined number of iterations.In this work, we have experimentally set the pre-defined number of iterations to100 non-improvement iterations [18].

AM
All Moves: All generated solutions are accepted without taking into consideration their quality.This criterion can be seen as a mutational operator which aims to diversify the search.The local search template that uses this acceptance criterion will be run for a pre-defined number of iterations.In this work, we have experimentally set the pre-defined number of iterations to 50 [18].

SA
Simulated Annealing: A move to a neighbor of the current solution is always accepted if it improves (or is equal to) the current objective value.However, non-improving moves are accepted based on a probability acceptance function, R<exp (δ/t), where R is a random number between [0, 1] and δ is the change in the objective value.The ratio of accepted moves to worse solutions is controlled by a temperature t which gradually decreases by β during the search process.In this work, β= 0.85 and the initial temperature t is 50% of the value of the initial solution, as suggested in [32], [33].The local search template that uses the SA acceptance criteria is terminated when t= 0.
EMC Exponential Monte Carlo: Improving solutions are always accepted.Worse solutions are accepted with a probability of R<exp (-δ), where R is a random number between [0, 1] and δ is the change in the objective value.The probability of accepting worse solutions will decrease as δ increases [34].The local search template that uses this acceptance criterion will be run for a pre-defined number of iterations.In this work, we have experimentally set the pre-defined number of iterations to 100.

Record-to-Record Travel:
A move to a neighbor solution is always accepted if it improves (or is equal to) the current objective value.Worse solutions are accepted if the objective value is less than R+D, where R is the value of the initial solution and D is a deviation.In this work, we set D= 0.03 and R is updated every iteration to equal the current solution.The local search template that uses the RR acceptance criteria is repeated until the stopping condition is met, set to 100 iterations [3].

GD
Great Deluge: Improving solutions are always accepted.A nonimproving solution is accepted if its objective value is less than the level initially set to the value of the initial solution.The value of level is gradually decreased by β. β is calculated by β = (f(initial solutions) -estimated(lower bound) / number of iterations).In this work, we set the number of iterations to 1000.The local search template that uses the great deluge acceptance criteria will terminate when the level is equal to, or less than, the best known solution found so far [3], [33].

NV
Naive acceptance: accepts all improving moves.Non improving moves are accepted with 50% probability.The local search template that uses this acceptance criterion is executed for a predefined number of iterations (100 iterations) [35].

AA
Adaptive Acceptance: accepts all improving moves.Non improving moves are accepted according to an acceptance Rate, which is updated during the search.Initially, acceptance Rate is set to zero.However, if the solutions cannot be improved for a certain number of non improvement iterations (i.e. 10 consecutive non improvement iterations), then acceptance Rate is increased by 5%.Whenever a solution is accepted, acceptance Rate is reduced by 5%.The local search template that uses this acceptance criterion will be run for a pre-defined number of iterations, experimentally set in this work as 100 iterations [35].
2. The second heuristic component that is used in our GE-HH framework are the neighborhoods structures (Ns) or move operators.The aim of any neighborhood structure is to explore the neighbor of current solutions or to generate a neighborhood solution.The neighborhood solution is generated by performing a small perturbation or changing some attribute(s) of the current solution.The neighborhood structures are critical in the design of any local search method [36].Traditionally, each neighborhood structure has its own characteristics (weaknesses and strengths), thus, several types of neighborhood structures may be needed to cope with changes in the problem landscape as the search progresses.In this work, we have employed several neighborhoods which are problem dependent.The descriptions of the neighborhood structures that have been used in our work, which are different from one domain to another, are presented in problem description sections (see Sections V-B4 and V-C4).3. The third heuristic component employed in our framework is the neighborhood combinations/operators (Nc).The aim of the neighborhood combinations/operators is to combine the strength of two or more neighborhood structures into one structure.Such combination has been shown to be very efficient in solving many optimization problems [37].The benefit of such an idea was first demonstrated using strategic oscillation in tabu search [38].Recently, Lu et al. [37] conducted a comprehensive analysis to assess the performance of neighborhood combinations within several local search methods (tabu search, iterated local search and steepest decent algorithm) in solving university course timetabling problems.Their aim was to answer why some neighborhood structures can produce better results than others and what characteristics constitute a good neighborhood structure.They concluded that the use of neighborhood combinations can dramatically improve local search performance.Other works which have also studied the benefit of using neighborhood combinations include [39], [40], [41].In this work, three kinds of neighborhood combinations/operators are used [37], [40], [18], which are described below.

Nc Description
+ Neighborhood Union: involves the moves that can be generated by using two or more different neighborhoods structures.For example, consider two different neighborhoods N1 and N2, which can be represented as N1∪N2 or N1+N2, then the union move includes the solution that can be obtained by consecutively applying N1 followed by N2 then calling the acceptance criterion to decide whether to accept or reject the generated solution.Besides combining the strength of different neighborhoods [37], when the search space is highly disconnected, such a combination might help escape from disconnected search spaces, that may not happen when using N1 alone.For example, in exam timetabling, the single move neighborhood structure which moves one exam from one timeslot to another one might lead the search to a disconnected search space when all exams which clash with another exam in every other timeslot often cannot be moved at all [42].Thus, combining a single move neighborhood with another neighborhood i.e. swap two exams, can help to find a clash free timeslot for the selected exam to be moved to.The same issue can also be observed in capacitated vehicle routing problems when using a single move neighborhood that moves a customer from one route to another.Random Gradient: A neighborhood structure is repeatedly applied until no improvement is possible.This is followed by applying other neighborhood structures.For example, consider two different neighborhoods; N1 and N2 are random gradient operators which can be represented as . The local search template will keep applying N1 as long as the generated solution is accepted by the local search acceptance criteria.When no improvement is possible the local search template stops applying N1 and restarts from the local optimum obtained by N1, but with neighborhood N2 [6], [18].T-R-S Token-Ring Search: The neighborhood structures of the generated template are consecutively applied one after another until the end of sequence.When the generated template moves to the next neighborhood structure in the sequence, it restarts from the local optimum obtained by the previous neighborhood structure.If the generated template reaches the end of the sequence, it restarts the search from the first neighborhood in the sequence using the local optimum obtained by the last neighborhood structure in the sequence [37], [40], [43].In this work, the token-ring search is set as a default in all generated local search template (there is no special symbol for it in the BNF grammar).Note that if there is no operator between neighborhood structures e.g.N1 N2, each neighborhood is applied only one time.For example, if we have N1 N2 N3 the local search template will apply N1 one time only, and then move to N2 which will also be applied once, and then move to N3.This is because there is no combination operator between these sequences of neighborhood structures.
After determining the basic elements of the BNF grammar, we now need to specify the starting symbol (S), terminals (T), non-terminals (N) and the production rules (P) that will represent the heuristic components.These are as follows:

Acceptance Criteria production rules Number of choices available for Ac =8
(3) <Lc>: LST Configurations production rules.

Number of choices available for Nb =1 to n Note that n represent the number of neighborhood structures that are used for each problem domain (see SectionsV-B4 and V-C4).
( Neighborhoods combination production rules.

Number of choices available for Nc =2
The above BNF grammar is valid for every local search template (LST) for both problem domains in the work.This is because each local search template (LST) has different rules and characteristics.Finding the best BNF grammar for every local search template (LST) would be problem dependent, if not problem instance dependent.Please note that not all local search templates will improve the solution because the employed acceptance criteria might accept worse solutions with a certain probability.For example, the local search that uses all moves acceptance criterion (AM) will accept any solution that does not violate any hard constraints regardless of its quality.The programs in our GE-HH represent local search templates or problem solvers.The local search template starts with an initial solution and then iteratively improves it.The initial solution can be randomly generated or by using heuristic methods (see Sections V-B3 and V-C3).Please note that the initial solution generation method is not a part of the GE-HH.In this work, we use two fitness functions.The first one, penalty cost, is problem dependent, and is used by the inner loop of the generated local search template in deciding whether to accept or reject the perturbed solution (see Sections V-B and V-C for more details about the penalty cost).The second fitness function is problem independent and it measures the quality of the generated program (local search template) after executing it.At every iteration, if the generated programs are syntactically correct (all non-terminals can be converted into terminals), the programs are executed and their fitness is computed from their output.In this work, the fitness function of the generated programs is calculated as a percentage of improvement (PI).Assume f 1 is the fitness of the initial solution and f 2 is the fitness of the solution after executing the generated programs, then PI= | (f With all the GE-HH elements (grammar, search engine, mapper procedure and fitness function) defined, the proposed GE-HH framework is carried out as depicted in Fig. 6.

B. Hybrid Grammatical Evolution Hyper-heuristic and Adaptive Memory Mechanism
Traditionally, previous hyper-heuristic frameworks that have been proposed in the literature operate on a single solution [6], [18], [29].Single solution based perturbative hyper-heuristics start with an initial solution and iteratively move from the current solution to another one by applying an operator such as 2-opt.Although single solution based methods have been widely used to solve several kinds of problems, it is accepted that pure single solution based methods are not well suited to fine tuning for large search spaces and heavily constrained problems [44], [45].As a result, single solution based methods have been hybridized with other techniques to improve their efficiency [45].Generally, it is widely believed that a good search methodology must have the ability of exploiting and exploring different regions of the solution search space rather than focusing on a particular region.That is, we must address the problem of exploitation vs. diversification, which is a key feature in designing efficient search methodologies [44].
In order to enhance the efficiency of the GE-HH framework and to diversify the search process, we hybridize it with an adaptive memory mechanism.This method has been widely used with several meta-heuristic algorithms such as tabu search, ant colonies, genetic algorithms and scatter search [46].The main idea is to enhance the diversification by maintaining a population of solutions.For example, the reference set in scatter search [46] which includes a collection of both high quality and diverse solutions.
In this work, the adaptive memory mechanism (following the approach in [47], [48]) contains a collection of both high quality and diverse solutions, which are updated as the algorithm progresses.The size of the memory is fixed (equal to the number of acceptance criteria, which is 8).Our adaptive memory works as follows:  Generate a set of diverse solutions.The set of solutions can be generated randomly or by using a heuristic method.In this work, the solutions are generated using a heuristic method (see SectionsV-B3 and V-C3). For each solution, associate a frequency matrix which will be used to measure solution diversity.The frequency matrix saves the frequency of assigning an object (exam or customer) to the same location.For example, in exam timetabling, the frequency matrix stores how many times the exam has been assigned to the same timeslot.Whilst, in the capacitated vehicle routing problem, it stores how many times a customer has been assigned to the same route.Fig. 7 shows an example of a solution and its corresponding frequency matrix.The frequency matrix is initialized to zero.We can see five objects (represented by rows) and there are five available locations (represented by columns).The solution on the left of Fig. 7 can be read as follows: object1 is assigned to location 1, object 2 is assigned to location 3, etc.The frequency matrix on the right side of the Fig. 7 can be read as follows: object 1 has been assigned to location 1 twice, to location 2 three times, to location 3 once, to location 4 four times and to location 5 once; and so on for the other objects. If any solution is improved by the GE-HH framework, we update the frequency matrix. Calculate the quality and the diversity of the improved solution.In this work, the quality represents the penalty cost which calculates the number of soft constraint violations (see Sections V-B and V-C).The diversity is measured using entropy information theory (1), (2) as follows [47], [48]: Where -eij is the frequency of allocating object i to location j.
-m is the number of objects.
εi is the entropy for object i.
 Add the new solution to the adaptive memory by considering the solution quality and diversity.
Fig. 8 shows the hybrid GE-HH framework with an adaptive memory mechanism.Algorithm 1 presents the pseudo-code of GE-HH.The algorithm starts by generating a set of initial solutions for the adaptive memory mechanism (see SectionsV-B3 and V-C3) and defining the BNF grammar (see Section IV-A).
It then initializes the genetic algorithm parameters and creates a population of solutions by assigning a random value between 0 and 255 for each chromosome gene (codons) [30].
For each solution (chromosome) in the population, the corresponding program is generated by invoking the mapping function.In order to ensure that there is no duplication in the generated program (i.e. the program does not have two consecutive operators) the program is checked by the edit function.For example, if the generated program is SA: N1N2++N2+N4, with consecutive ++ operators, the edit function will remove one of the + operators and the program will be SA: N1N2+N2+N4.One solution from the adaptive memory mechanism is then selected, to which the generated programs are applied.The adaptive memory is then updated.
Subsequently, the genetic algorithm is executed for a pre-defined number of generations.At every generation, offspring are generated by applying selection, crossover and mutation.The generated offspring (programs) are then executed.If the offspring is better than the worst chromosome, it is added to the population and the adaptive memory mechanism is updated.

V. EXPERIMENTAL RESULTS
In this section, we evaluate and compare the proposed GE-HH with the state of the art of hyper-heuristics, and other search methodologies.

A. GE-HH Parameters Setting
In order to find appropriate parameter values for GE-HH, we utilize the Relevance Estimation and Value Calibration method (REVAC) [49].REVAC is a steady state genetic algorithm that uses entropy theory to determine the parameter values for algorithms.Our aim is not to find the optimal parameter values for each domain, but to find generic values that can be used for both domains.To use the same parameter settings across instances of both domains, we tuned GE-HH for each domain separately and then used the average of them in value obtained by REVAC for all tested instances.In order to have a reasonable tradeoff between solution quality and the computational time needed to reach good quality solutions, the execution time for each instance is fixed to 20 seconds.The number of iterations performed by REVAC is fixed at 100 iterations (see [49] for more details).For each domain, the average values over all tested instances for each parameter are recorded.Then, the average values over all parameters are set as the generic values for GE-HH.The parameter settings of GE-HH that have been used for both domains are listed in Table 2.

B. Problem Domain I: Exam Timetabling Problems
Exam timetabling is a well known NP-hard combinatorial optimization problem [50] and is faced by all academic institutions.The exam timetabling problem can be defined as the process of allocating a set of exams into a limited number of timeslots and rooms so as not to violate any hard constraints and to minimize soft constraint violations as much as possible [51].In this work, we carried out experiments on the most widely used un-capacitated Carter benchmarks (Toronto b type I in [51]) and also on the recently introduced exam timetable dataset from the 2007 International Timetabling Competition, ITC 2007 [15].

1) Test Set I: Carter Uncapacitated Datasets
The Carter datasets have been widely used in the scientific literature [14], [51].They are un-capacitated exam timetabling problems where room capacities are ignored.The constraints are shown in Table 3.

Soft Constraints S1Carter: Conflicting exams (with common enrolled students) should be spread as far apart as possible to allow sufficient revision time between exams for students.
The quality of a timetable is measured based on how well the soft constraints have been satisfied.The proximity cost is used to calculate the penalty cost (equation 3) [14]. Where:  wi=2 |4-i| is the cost of scheduling two conflicting exams el and ek (which have common enrolled students) with i timeslots apart, if i=|tl-tk|<5, i.e. w0=16, w1=8, w2=4, w3=2 and w4=1; tl and tk as the timeslot of exam el and ek, respectively. skl is the number of students taking both exams ek and el, if i=|tl-tk| <5;  m is the number of exams in the problem  S is the number of students in the problems Table 4 gives the characteristics of the un-capacitated exam timetabling benchmark problem (Toronto b type I in [51]) which comprises 13 real-world derived instances.

2) Test Set II: ITC 2007 Datasets
The second dataset was introduced in the second International Timetabling Competition, ITC 2007, aiming to facilitate a better understanding of real world timetabling problems and to reduce the gap between research and practice [15].It is a capacitated problem and has several hard and soft constraints (see Tables 5&6, respectively).
The objective function from [15] is used (see equation 4).The ITC 2007 problem has 8 instances.Table 7shows the main characteristics of these instances.

3) Problem Domain I: Initial Solutions
As mentioned in Section IV-A, GE-HH starts by initializing the adaptive memory mechanism which contains a population of solutions.In this work, we employ hybrid graph coloring heuristics [52] to generate an initial population of feasible solutions for both the Carter and the ITC 2007 instances.The three graph coloring heuristics we utilize are:  Least Saturation Degree First (SD): exams are ordered dynamically, in an ascending order, by the number of remaining timeslots. Largest Degree First (LD): exams are ordered, in a decreasing order, by the number of conflicts they have with all other exams. Largest Enrolment First (LE): exams are ordered by the number of students enrolled, in decreasing order.
The solution construction method starts with an empty timetable and applies the hybridized heuristics to select and assign the unscheduled exams one by one until all exams have been scheduled.To select an exam, the hybridized heuristic (SD+LD+LE) firstly sorts the unscheduled exams in a non-decreasing order of the number of available timeslots (SD).Those with equal SD evaluations are then arranged in a non-increasing order of the number of conflicts they have with other exams (LD) and those with equal LD evaluations are then arranged in a non-increasing order of the number of student enrolments (LE).The first exam in the final order is assigned to the timetable.We assign exams to a random timeslot when it has no conflict with those that have already been scheduled (in case of ITC 2007, an exam is assigned to best fit a room), ensuring that all hard constraints are satisfied.If some exams cannot be assigned to any available timeslot, we stop the process and start again.Although there is no guarantee that a feasible solution can be generated, for all the instances used in this work, we were always able to obtain a feasible solution.

4) Problem Domain I: Neighborhood Structures
The neighborhood structures that we employed in the GE-HH framework for both Carter and ITC 2007, which are commonly used in the literature [42], are as follows: Nbe1: Select one exam at random and move it to any feasible timeslot-room.

Nbe2:
Select two exams at random and swap their timeslots (if feasible).Nbe3: Select two timeslots at random and swap all their exams.Nbe4: Select three exams at random and exchanges their timeslots at random (if feasible).Nbe5: Move the exam causing the highest soft constraint violation to any feasible timeslot.Nbe6: Select two exams at random and move them to another random feasible timeslots.Nbe7: Select one exam at random, select a timeslot at random (distinct from the one that was assigned to the selected exam) and then apply the Kempe chain neighborhood operator.Nbe8: Select one exam at random, select a room at random (distinct from the one that was assigned to the selected exam) and then move the exam to the room (if feasible).Nbe9: Select two exams at random and swap their rooms (if feasible).
Note that neighborhoods Nbe8 and Nbe9 are applied to ITC 2007 datasets only because they consider rooms.The neighborhood solution is accepted if it does not violate any hard constraints.Thus, the search space of GE-HH is limited to feasible solutions only.

C. Problem Domain II: Capacitated Vehicle Routing Problems
The capacitated vehicle routing problem (CVRP) is a wellknown challenging combinatorial optimization problem [53].The CVRP can be defined as the process of designing a least cost set of routes to serve a set of customers [53].In this work, we test GE-HH on two sets of benchmark capacitated vehicle routing problem datasets.These are the 14 instances introduced by Christofides [16] and 20 large scale instances introduced by Golden [17].The CVRP can be represented as an undirected graph G (V, E), where V= {v 0 , v 1 …v n } is a set of vertices which represents a set of fixed locations (customers) and E= {(v i , v j ): v i , v j V, i<j} represents the arc between locations (customers).E is associated with non-negative costs or travel time defined by matrix C= (c ij ), where c ij represents the travel distance between customers v i and v j .Vertex v 0 represents the depot which is associated with m vehicles of capacity Q 1 …Q m to start their routes R 1 …R m .The remaining vertices v 1 … v n represent the set of customers and each customer requestsq 1 …q n goods and serving time δ i .The aim is to find a set of tours that do not violate any hard constraints and minimize the distance.The hard constraints that must be respected are:  Each vehicle starts and ends at the depot  The total demand of each route does not exceed the vehicle capacity  Each customer is visited exactly once by exactly one vehicle  The duration of each route does not exceed a global upper bound.
The cost of each route is calculated using (5) [53]: ) (  ……….. (5) and the cost for one solution is calculated using ( 6): The two sets of benchmark problems that we have considered in this work have similar constraints and objective function.However, the complexity, instance sizes and customer distributions are different from one set to another.

1) Test Set I: Christofides Datasets
The first set comprises of 14 instances and was introduced by Christofides [16].The main characteristics of the problem are summarized in Table 8.The instance size varies from 51 to 200 customers, including the depot.Each instance has a capacity constraint.Instances 6-10, 13 and 14 also have a maximum route length restriction and non-zero service times.The problem instances can be divided into two types: in instances 1-10, the customers are randomly located, whilst, in instances 11-14 the customers are in clusters.

2) Test Set II: Golden Datasets
The second CVRP dataset involves 20 large scale instances presented by Golden [17] (see Table 9).The instances have between 200 and 483 customers, including the depot.Instances 1-8 have route length restrictions.3) Problem Domain II: Initial Solutions For both the Christofides and the Golden instances, the initial population of feasible solutions is constructed utilizing the savings algorithm [54].

4) Problem Domain II: Neighborhoods Structures
The neighborhood structures that we employ in GE-HH for both the Christofides and the Golden instances are the most common ones used to solve the capacitated vehicle routing problems in the literature.They are as follows: Nbv1: Select one customer at random and move it to any feasible route.Nbv2: Select two customers at random and swap their routes.Nbv3: Select one route at random and reverse a part of a tour between two selected customers.Nbv4: Select three customers at random and exchanges their routes at random.Nbv5: Select one route at random and perform the 2-opt procedure.Nbv6: Perform the 2-opt procedure on all routes.Nbv7: Select two distinct routes at random and swap a portion of the first route with the first portion and second route.Nbv8: Select two distinct routes at random and from each route select one customer.Swap the adjacent customer of the selected one for both routes.Nbv9: Select two distinct routes at random and swap the first portion with the last portion.Nbv10 Select one customer at random and move it to another position in the same route.
The neighborhood solution is accepted if it does not break any hard constraints.Thus, the search space of GE-HH is limited to feasible solutions only.

VI. COMPUTATIONAL RESULTS AND COMPARISON
To assess the benefit of incorporating an adaptive memory mechanism in GE-HH, for each domain, we have carried out two sets of experiments.The first one compares the performance of the grammatical evolution hyper-heuristic with an adaptive memory (GE-HH) and the grammatical evolution hyper-heuristic without an adaptive memory (GE-HH*) using the same parameter values and computational resources.The second test compares and analyses the performance of GE-HH against the state of the art of hyperheuristics and bespoke methods.For both experimental tests, we report the best, average, standard deviation and average time over 51 independent runs with different random seeds.By executing 51 runs, instead of 50, we can easily calculate the median value without the need for interpolation.The aim of executing the proposed hyperheuristic framework 51 runs is to get more information and to have a good indication regarding the algorithm consistency and generality, as it's highly recommended in the literature to have more than 30 runs in statistical analysis on algorithm performance [3].The results represent the cost of soft constraint violations.In addition, we also report, for each instance, the percentage deviation from the best known value found in the literature, calculated as follows (7): Where best GE-HH is the best result obtained over 51 independent runs by GE-HH and best* represents the best known value found in the literature.
We evaluate the performance of GE-HH by considering the following three criteria:  Generality: We define generality as the ability of GE-HH to work well, not only across different instances of the same problem, but also across two different problem domains. Consistency: This is the ability of GE-HH to produce stable results when executed several times for every instance.Typically, consistency is one of the most important criteria in evaluating any algorithm.This is because many search algorithms have a stochastic component, which leads to different solutions over multiple runs even if the initial solution is the same.We measure the consistency of GE-HH based on the average and the standard deviation over 51 independent runs. Efficiency: This is the ability of GE-HH to produce good results that are close or better than the best known value in the literature.We measure the efficiency of GE-HH by reporting, for each instance, the best and the percentage deviation, see ∆(%) in ( 7), from the best known results in the literature.
For all tested instances, except the ITC 2007 problem instances, we compare the GE-HH results with the state of the art in terms of solution quality rather than computational time.This is because the different computer resources researchers use which make the comparison difficult, if not impossible [39], [55].Therefore, we set the number of generations as the termination criteria.As for the ITC 2007 datasets, the organizer provided benchmark software to determine the allowed execution time [15].We have used this software to determine the execution time using our computer resources (i.e. 10 minutes).We have given extra time to GE-HH, due to the use of the adaptive memory (i.e.10.83 minutes).As a result, the execution time of our method is within the range of those published in the literature.

A. Problems Domain I: Computational Results on Exam Timetabling Problems 1) Test Set I: Carter Uncapacitated Datasets
Table 10 lists, for each instance, the best, average, standard deviation and average time obtained by GE-HH and GE-HH*.
From Table 10, one can clearly see that GE-HH outperforms GE-HH* across all instances.Furthermore, both the best and average results obtained by GE-HH are better than GE-HH* on all instances.We can also see that in GE-HH, on twelve of the thirteen instances, the standard deviation is lower than GE-HH*.However, the computational time is different where GE-HH* is lower than GE-HH.This is mainly due to the use of population of solutions and diversity updating mechanism in the GE-HH framework.The results reveal that the use of the adaptive memory mechanism has an effect on the ability of the GE-HH in producing good quality and consistent results over all instances.
We compare the performance of GE-HH against hyperheuristics and other bespoke methods (see Table 11).
Table 12 shows the comparison of the best and average results of GE-HH and other hyper-heuristic methods.We also report, for each instance, the percentage deviation (∆ (%)) from the best result obtained by other hyper-heuristics and instance ranking.As can be seen from Table 12, GE-HH finds better solutions for 7 out of 13 instances compared to other hyper-heuristic methods and obtained the second best results for the other 5 instances (except Rye-s-93 which obtained third best results).
Table 13 presents, for all instances, the best, average, percentage deviation (∆(%)) and instance ranking by GE-HH along with a comparison with respect to the best known results (shown in bold) in the literature obtained by bespoke methods.It can be seen that, even though GE-HH does not obtain the best solutions for all instances, over all, it obtains competitive results especially when considering the percentage deviation (∆(%)) from the best known value found in the literature.If  Results in Tables 12 and 13 demonstrate that, across all instances, GE-HH outperforms other hyper-heuristic methodologies and obtained competitive results compared to other bespoke methods.Except instance Ute-s-92 (ranked 6), the instance ranking varies between 2 to 4. Also, the percentage deviation indicates that GE-HH results are very close to the best known results.This demonstrates that GE-HH is able to generalize well over a set of problem instances rather than only producing good results for one or more of the problem instances.13): total penalty of 13 instances.TP (12): Total penalty of 12 datasets (excluding Pur-s-93-I).TP (11): Total penalty of 11 datasets (excluding Pur-s-93-I and Rye-s-93)."*" means GE-HH result is better than other methods."-"indicates no feasible solution has been found.Best results are highlighted in bold.∆*(%): the percentage deviation of the average value with regard to the best known results.13): total penalty of 13 instances.TP (12): Total penalty of 12 instances( excluding Pur-s-93-I).TP (11): Total penalty of 11 instances(excluding Pur-s-93-I and Rye-s-93)."-"means no feasible solution has been found.Best results in the literature are highlighted in bold.∆*(%): the percentage deviation of the average value with regard to the best known results.

2) Test Set II: ITC 2007 Datasets
The first set of experiments presents a comparison between GE-HH and GE-HH* as well as the results of GE-HH without the extra computational time (GE-HH**), i.e. the computational time is fixed the same as GE-HH*.The best, average, standard deviation of the results and the average time are reported in Table 14.It can be seen that, across all instances, GE-HH outperforms GE-HH* and GE-HH** (in most cases), not only on solution quality, but also on the average and the standard deviation.Comparing the results of GE-HH* with GE-HH**, the results demonstrate that GE-HH** outperforms GE-HH* on five out of eight instances.The average and standard deviation of GE-HH** are better than GE-HH* for all tested instances.The results demonstrate the importance of incorporating the adaptive memory mechanism within GE-HH as well as implying that GE-HH is more general and consistent.
We now compare the performance of GE-HH with the best available results in the literature which are divided into two groups (see Table 15): ITC 2007 winners (Table 16) and Post-ITC 2007 (Table 17 hyper-heuristic and bespoke methods).In addition, we also included the results of GE-HH** in the comparison to assess its ability in producing good quality solutions compared to ITC 2007 winners as well as post ITC 2007 methods.It is clear from Tables 16 and 17 that GE-HH is the overall best.The presented results demonstrate that GE-HH not only generalizes well over a set of problem instances, but also produces much higher quality solutions.One can also see that GE-HH** outperformed the ITC 2007 winners on 7 instances and post ITC 2007 methods on 4 out of 8 tested instances (see Tables 16 and 17).

B. Problems Domain II: Computational Results on Capacitated Vehicle Routing Problems 1) Test Set I: Christofides Datasets
The experimental results of GE-HH and GE-HH* are reported in Table 18, where for 4 out of 14 instances, GE-HH achieved better results than GE-HH* (tie on7 instances).The average results obtained by GE-HH on all instances are better than GE-HH* and the standard deviation is relatively small (varies between 0.00 and 0.93).Even though GE-HH did not outperform GE-HH* across all instances, however, the standard deviation reveals that GE-HH generalized well overall instances.Overall, the result implies that hybridizing the adaptive memory mechanism with GE-HH has made a significant improvement.
We compare the experimental results of GE-HH with the best available results in the literature in Table 19.To the best of our knowledge, only two hyper-heuristics have been tested on Christofides instances (first and second methods in Table 19) and both report the percentage deviation only.Due to the large number of bespoke methods that are available in the literature, we have only considered those that have produced the best known results and some of recent published methods.The considered methods are classified into single based and population based solution methods (see Table 19).Table 20 shows the comparison of GE-HH against hyper-heuristic methods in term of percentage deviation from the best known results.We can see that, for 9 instances GE-HH matches the best known results in the literature and for 4 instances, GE-HH produced a better quality (ranked first) when compared to other hyper-heuristics.The computational results of GE-HH compared to other bespoke methods are presented in Table 21, where for 9 out of 12 instances GE-HH has obtained the best known results.For the remaining instances, the quality of the solutions with regard to percentage deviation is between 1.9% and 0.11% and instance ranking varies between 2 and 4. According to this result, GE-HH is competitive with the presented bespoke methods.Considering the generality, it is obvious that GE-HH is able to produce good results across all instances and the percentage deviation is relatively small.

2) Test Set II: Golden Datasets
The computational results of GE-HH and GE-HH* are tabulated in Table 22.The presented results clearly show that GE-HH outperformed GE-HH* across all instances.Furthermore, the average and standard deviation of GE-HH is much better than GE-HH*, again indicating that the adaptive memory mechanism has a big impact on the performance and generality.
In order to assess the performance of GE-HH, the results of GE-HH are compared with the best available results in the literature.Again, due to the uncountable number of methods that have been tested on Golden instances, only those produced the best known results and few recent methods are considered as shown in Table 23.To the best of our knowledge, only one hyper-heuristic (first method in Table 23) has been tested on Golden instances.Table 24 gives the comparison results.From Table 24, one can find that, GE-HH reached the best known results for 4 out of 20 instances.For the other instances, the quality of solution (percentage deviation) is between 0.17% and 0.68% and instance ranking varies between 2 and 5. Compared to the hyper-heuristic method (first method in Table 24), GE-HH is able to obtain better solutions on 14 instances.When comparing with bespoke methods, for 4 instances GE-HH reached the best known results.GE-HH produces competitive results for the remaining 16 instances compared to other bespoke methods and very close to the best known value (percentage deviation).It should be noted that bespoke methods are specifically designed to produce the best results for one or more instances, whilst, one can see that GE-HH is able to obtain a much higher level of generality across all instances.-The development of a GE-HH framework that automatically generates templates of perturbation heuristics, demonstrating that strengths of different search algorithms can be merged into one hyperheuristic framework.-The integration of an adaptive memory mechanism, which contains a collection of high quality and diverse solutions, within a hyper-heuristic framework, and which also obtained consistent results, generalized across different problem domains and produced high quality solutions which are either competitive or better than (on some cases) other bespoke methods.-The development of a hyper-heuristic framework which can be easily applied to different problem domains without much effort (i.e. the user only needs to change the neighborhood structures).
Experimental results have demonstrated the effectiveness and the generality of this method on very well established benchmarks.In our future work, we intend to investigate the effectiveness of integrating GE-HH in the HyFlex framework (a benchmark framework for cross-domain heuristic search) that has been recently introduced [88,89].

TABLE 3
No student can sit more than one exam at the same time.

TABLE 5
No student can sit more than one exam at the same time.H2 ITC2007: There must be a sufficient number of seats to accommodate the exams being scheduled in a given room.H3 ITC2007: The length of exams assigned to each timeslot should not violate the timeslot length.H4 ITC2007: Some sequences of exams have to be satisfied.e.g.Exam_B must be scheduled after Exam_E.H5 ITC2007: Room related hard constraints must be respected e.g.Exam_B must be scheduled inRoom 3.

TABLE 9 GOLDEN INSTANCES Datasets Customers Capacity Max. tour length Service time
we consider an individual comparison, GE-HH is able to obtain better solutions on instances 8, 12, 11, 6, 7 and 2 compared to Mc 7 , Mc 8 , Mc 9 , Mc 10 , Mc 11 , and Mc 12 , respectively.Furthermore, only Mc 10 reported results for Pur-s-93 and Rye-s-93 instances, Mc 7 andMc 11 reported result for Rye-s-93 instance (we suspect, due to the complexity and inconsistencies in these instances).

TABLE 10
The time represents average time in minutes.Best results in the literature are highlighted in bold.The bold italic indicates that both methods produce the same result.

TABLE 14
HH: with the adaptive memory mechanism.GE-HH*: without adaptive memory.GE-HH**: with adaptive memory but the computational time fixed same as GE-HH* (10 minutes).Times represent average time in minutes.Best results are highlighted in bold.

TABLE 15
HH result is better than other methods."-" indicates no feasible solution has been found.Best results are highlighted in bold.∆*(%): the percentage deviation of the average value with regard to the best known results.

TABLE 18
HH: with the adaptive memory mechanism.GE-HH*: without adaptive memory.Time: represents average time in minutes.Best results are highlighted in bold.

TABLE 19
Note: HH: hyper-heuristic methods.NON-HH: bespoke methods.LS: local search methods.POP: population based methods

TABLE 20
Note: '*' indicates that the obtained result is the same as the best known result.BK: best known results in the literature."-" indicates no feasible solution has been found.Best results are highlighted in bold.∆*(%): the percentage deviation of the average value with regard to the best known results.

TABLE 21 RESULTS
OF GE-HH COMPARED TO BESPOKE METHODS Note: '*' indicates that the obtained result is the same as the best known result."-" indicates no feasible solution has been found.Best results are highlighted in bold.

TABLE 22 RESULTS
OF GE-HH COMPARED TO GE-HH* Note: GE-HH: with the adaptive memory mechanism.GE-HH*: without adaptive memory.Time represents average time in minutes.Best results are highlighted in bold.As shown throughout this work, in both problem domains (exam timetabling and capacitated vehicle routing problems), GE-HH obtained competitive results, if not better (on some instances), when compared against existing best methods in the literature.GE-HH is able to update the best known results for some instances (on both domains).In both domains, our GE-HH outperformed previously proposed hyper-heuristic methods.We note that, for both domains, the standard deviation is relatively small.Also, the percentage deviation demonstrates that, in both domains, GE-HH results are very close to the best known.This positive result reveals that our GE-HH is efficient, consistent and generalizes well over both domains.In our opinion, this is due to the following.(i)Thecapability of GE-HH in dealing with different problem instances by evolving different local search templates during the problem solving process.By evolving different local search templates, GE-HH can easily adapt to any changes that might occur during problem solving.(ii)Sincesomeproblem instances are very difficult to solve and have many local optima, GE-HH struggles in obtaining good quality solutions without getting stuck in local optima.Therefore, by incorporating the adaptive memory mechanism, GE-HH is more effective in diversifying the search of solutions by exploring different regions.Overall, the benefit of the proposed method is its ability to find the best solver from the supplied pool of solvers (local search acceptance criteria) as well as the best configuration for the selected solver.This alleviates the question of which solver one should use and what is the best configuration for it.Furthermore, it does not rely on complicated search approaches to find out how to generate a local search template.Rather, it provides a general mechanism regardless of the nature and complexity of the problems.It is simple to implement, and can be easily applied to other domains without significant effort (i.e.usersonly need to change the set of neighborhood structures).VIII.CONCLUSIONSIn this work, we have proposed a new improvement based hyper-heuristic framework for combinatorial optimization problems.The proposed framework employs a grammatical evolution algorithm (GE-HH) to search the space of basic heuristic components.These are: a set of acceptance criteria, neighborhood structures and neighborhood combinations and are represented by a grammar definition.The proposed framework takes these heuristic components as input and evolves several templates of perturbation heuristics during problem solving.The performance of the GE-HH is enhanced by hybridizing it with an adaptive memory mechanism which contains a set of high quality and diverse solutions.To demonstrate the generality, consistency and efficiency of the proposed framework, we have tested the proposed framework on two different and challenging problem domains, exam timetabling and capacitated vehicle routing benchmark problems, using the same parameter settings.The results demonstrate that GE-HH produces highly competitive solutions, if not better, and generalizes well across both problem domains.The main contributions of this work are: