A survey of search methodologies and automated system development for examination timetabling

Examination timetabling is one of the most important administrative activities that takes place in all academic institutions. In this paper, we present a critical discussion of the research on exam timetabling which has taken place in the last decade or so. This last ten years has seen a significantly increased level of research attention for this important area. There has been a range of insightful contributions to the scientific literature both in terms of theoretical issues and practical aspects. The main aim of this survey is to highlight the new trends and key research achievements that have been carried out in the last decade. We also aim to outline a range of relevant important research issues and challenges that have been generated by this body of work.We first define the problem and discuss previous survey papers. Within our presentation of the state-of-the-art methodologies, we highlight recent research trends including hybridisations of search methodologies and the development of techniques which are motivated by raising the level of generality at which search methodologies can operate. Summarising tables are presented to provide an overall view of these techniques. We also present and discuss some important issues which have come to light concerning the public benchmark exam timetabling data. Different versions of problem datasets with the same name have been circulating in the scientific community for the last ten years and this has generated a significant amount of confusion. We clarify the situation and present a re-naming of the widely studied datasets to avoid future confusion. We also highlight which research papers have dealt with which dataset. Finally, we draw upon our discussion of the literature to present a (non-exhaustive) range of potential future research directions and open issues in exam timetabling research.

Abstract. Examination timetabling is one of the most important administrative activities that takes place in all academic institutions. In this paper we present a critical discussion of the research on exam timetabling in the last decade or so. This last ten years has seen a significantly increased level of research attention for this important area. There has been a range of insightful contributions to the scientific literature both in terms of theoretical issues and practical aspects. The main aim of this survey is to highlight the new trends and key research achievements that have been carried out in the last decade. We also aim to outline a range of relevant important research issues and challenges that have been generated by this body of work.
We first define the problem and discuss previous survey papers. Within our presentation of the state-of-the-art methodologies, we highlight hybridisations and recent new trends concerning neighbourhood structures, which are motivated by raising the generality at which search methodologies can operate. Summarising tables are presented to provide an overall view of these techniques. We also present and discuss some important issues which have come to light concerning the public benchmark exam timetabling data. Different versions of problem datasets with the same name have been circulating in the scientific community for the last ten years and this has generated a significant amount of confusion. We clarify the situation and present a re-naming of the widely studied datasets to avoid future confusion. We also highlight which research papers have dealt with which dataset. Finally, we draw upon our discussion of the literature to present a (non-exhaustive) range of potential future research directions and open issues in exam timetabling research.

Introduction
Timetabling problems arise in various forms including educational timetabling (e.g. [34]), nurse scheduling (e.g. [22]), sports timetabling (e.g. [81]) and transportation timetabling (e.g. [101]). They have represented a challenging and important problem area for researchers across both Operational Research and Artificial Intelligence since the 1960s. Recent years have seen an increased level of research activity in this area. This is evidenced (among other things) by the emergence of a series of international conferences on the Practice and Theory on Automated Timetabling (PATAT) ( [20,21,28,[46][47][48]), and the establishment of a EURO (European Association of Operational Research Societies) working group on automated timetabling (see http://www.asap.cs.nott.ac.uk/watt). Burke, Kingston and de Werra [34] (2004) gave a definition of general timetabling, which covers many cases: A timetabling problem is a problem with four parameters: T , a finite set of times; R, a finite set of resources; M , a finite set of meetings; and C, a finite set of constraints. The problem is to assign times and resources to the meetings so as to satisfy the constraints as far as possible.
Among the wide variety of timetabling problems, educational timetabling is one of the most widely studied, from a practical viewpoint. It is one of the most important and time-consuming tasks which occur periodically (i.e. annually, quarterly, etc) in all academic institutions. The quality of the timetabling has a great impact on a broad range of different stakeholders including lecturers, students and administrators (see [134,140]). Variants of educational timetabling include school timetabling (class-teacher scheduling), university course timetabling, exam timetabling, faculty timetabling and classroom assignment. It has been observed that course and exam timetabling are relatively close problems [143] but very significant differences do exist [108]. This survey will concentrate on examination timetabling.
An excellent survey of examination timetabling was published in 1986 [51] and an insightful follow up paper appeared in 1996 [53]. However, a significant number of research papers in the area have been published since 1996. This paper will concentrate upon the research that has appeared since the publication of [53]. The last decade has seen the establishment of a collection of benchmark exam timetabling problems [55] which have been used by many of the examination timetabling research papers that have appeared since 1996. Moreover, there has been some confusion in the literature caused by the existence of different benchmark problem datasets with the same names. This paper aims to eradicate such confusion by presenting a definitive re-naming of the sets and by clarifying the situation over which papers dealt with which problems.

Examination Timetabling Problems
Exam timetabling problems can be defined as assigning a set of exams E = e 1 , e 2 , ..., e e into a limited number of ordered timeslots (time periods) T = t 1 , t 2 , ..., t t and rooms of certain capacity in each timeslot C = C 1 , C 2 , ..., C t , subject to a set of constraints. The complexities and the challenge presented by timetabling problems arise from the fact that a large variety of constraints, some of which contradict each other, need to be satisfied in different institutions ( [26,54]). In the timetabling literature, constraints are usually categorised into two types: hard constraints and soft constraints, which are explained below: -Hard Constraints cannot be violated under any circumstances (mainly due to physical restrictions). For example, conflicting exams (i.e. those which involve common resources such as students) cannot be scheduled simultaneously, i.e. if D ij is the number of students enrolled in both exams i and j; and x i ∈ T is the timeslot to which exam i is assigned, then: x i = x j ∀ i, j ∈ E, i = j and D ij > 0 Another example is that the number of students taking an exam cannot exceed the total seating capacity of the rooms, i.e. if in timetable t, s i is the number of students in exam i ∈ E, then: A timetable which satisfies all of the hard constraints is usually said to be feasible. -Soft Constraints are desirable but are not absolutely critical. In practice, it is usually impossible to find feasible solutions that satisfy all of the soft constraints. Soft constraints vary (and sometimes conflict with each other) from one institution to another in terms of both the types and their importance ( [26]). The most common soft constraint in the exam timetabling literature is to spread conflicting exams as much as possible throughout the examination session so that students can have enough revision time between exams. An example of another soft constraint which may conflict with this is to schedule all the large exams as early as possible to allow enough time for marking. The quality of timetables is usually measured by checking to what extend the soft constraints are violated in the solutions generated.
Due to the large variety of problems presented and investigated, it would be neither practical nor beneficial to present a comprehensive list of all the hard and soft constraints that occur in timetabling research. We list some of the key hard and soft constraints for exam timetabling in Table 1 and Table 2, respectively. We believe that these cover most of the constraints that have appeared in the literature. It can be observed that they can be roughly grouped as time related (No. 1. in Table 1 and Nos 1.-7. in Table 2) or resource related (No. 2. in Table  1 and Nos. 8.-11. in Table 2). Most of the survey papers reviewed in Section 1.2 present lists of constraints in exam and general timetabling. The hard constraints listed in Table 1 and the first soft constraint in Table 2 are those that have been mostly covered by the research in the literature. Primary Hard Constraints 1. No exams with common resources (e.g. students) assigned simultaneously. 2. Resources of exams need to be sufficient (i.e. size of exams need to be below the room capacity, enough rooms for all of the exams).
We will begin this critical review of the research area by overviewing a number of surveys that have appeared in the literature since the 1960s. Many of these papers cover educational timetabling in general and thus include discussions of examination timetabling in addition to other discussions. Primary Soft Constraints 1. Spread conflicting exams as even as possible, or not in x consecutive timeslots or days. 2. Groups of exams required to take place at the same time, on the same day or at one location. 3. Exams to be consecutive. 4. Schedule all exams, or largest exams, as early as possible. 5. Ordering (precedence) of exams need to be satisfied. 6. Limited number of students and/or exams in any timeslot. 7. Time requirements (e.g. exams (not) to be in certain timeslots). 8. Conflicting exams on the same day to be located nearby. 9. Exams may be split over similar locations. 10. Only exams of the same length can be combined into the same room. 11. Resource requirements (e.g. room facility).

Previous Surveys on Educational Timetabling
An early survey by Miles [113] in 1975 provided a useful bibliography of early developments in computer aided timetabling. Another well-known early survey by Schmidt and Strohlein [144] in 1979, including more than 200 references, covered almost all the work on timetabling before 1979. de Werra, in 1985 [68], introduced various mathematical (graph theoretical) models and briefly overviewed methods for class-teacher and course timetabling based on graph colouring and network flow methods. The author noted that exam timetabling and course scheduling were similar to each other although there were differences between them. In 1997 [69] the same author introduced some requirements in timetabling into restricted graph coloring models and reviewed some mathematical programming formulations.
Carter in 1986 [51] presented a review of the early research on practical applications of examination timetabling in several universities. He reviewed a variety of graph heuristics and pointed out that none of the algorithms/packages had been implemented in more than one institution. There was no standard data on which comparisons could be carried out. Also, measures of a problem's difficulty did not exist. In 1996, Carter and Laporte [53] updated the above survey to summarise the algorithmic approaches from 1986 to 1996. The criteria for the discussion was that the method should be either tested on real data or implemented in real world applications. They categorised the methods into four types: cluster methods, sequential methods, generalised search (metaheuristics) and constraint based techniques. They observed that the approaches implemented in practice were relatively simple variants of different methods and only addressed a subset of the constraints in the problems. The authors concluded by suggesting that timetabling researchers should report test results on benchmark problems to gain a better understanding of various approaches taken in exam timetabling. As we will see later in this paper, this is what has happened since 1996.
Burke et al constructed a questionnaire in 1996 on exam timetabling [26] and sent it to 95 British universities, of which 56 replied. The issues concerned included: -The structure of the problems (i.e. size, complexity and constraints, etc), -How the problems were solved, and -The objective of the timetabling problem (i.e. what constitutes good solutions).
The resultant data was analysed to provide information on the constraints involved between exams, students, departments, timeslots and rooms. In addition to the 13 constraints originally listed in the questionnaire, another 19 constraints were provided by the universities, demonstrating that in reality there is a wide variety of requirements among different institutions. It was found that just 21% of the universities used some form of computational help. Where timetables were constructed manually, half of the institutions did not base their solution on the previous year's timetable, requiring a workload of many months. The paper suggested some appropriate properties of automated timetabling systems that could be utilised in practice. This paper also provided some insight into the pertinent issues that impacted upon real world exam timetabling issues at the time. In 1997, Burke et al [30] presented a brief introduction to automated university exam timetabling research. The paper concentrated on techniques on university timetabling which were popular at the time.
Bardadym in 1996 [11] considered different issues in computer-aided management systems for timetabling. He discussed problems, requirements, data representations and mathematical models. Solution methods from the 1960s to the 1990s were also overviewed mainly covering heuristics, meta-heuristics and algorithmic tools for integration in decision support systems. Meta-heuristics and interactive timetabling were seen as the new wave of computer-aided timetabling systems. Open issues for future timetabling research were also discussed. Wren [161] in 1996 illustrated a useful and interesting link between scheduling, timetabling and rostering by studying an example of the Travelling Salesman Problem. He concluded that the similarity between timetabling and staff rostering may lead to successful cross-fertilisation on different types of problems. Indeed, recent research (as we shall see later) has provided some evidence to support this.
Schaerf in his 1999 survey [143] looked at the formulations of school, course and exam timetabling and declared that it is difficult to make a distinction between the latter two. Based on the definitions of variants of these problems, solution techniques particularly from artificial intelligence were classified and reviewed. Possible future directions were presented including specific techniques, standardization, approximability, the design of a powerful constraint language, and the combination of, and comparisons against different techniques.
Burke and Petrovic in 2002 [44], and in a follow up paper in 2004 [123] presented overviews of recent research conducted on university (course and exam) timetabling that had been carried out in their group including hybrid evolutionary algorithms, meta-heuristics, multi-criteria approaches, case-based reasoning techniques and adaptive approaches. An outline of research on sequential, clustering, constraint based techniques and meta-heuristic methods was also provided. Future directions highlighted knowledge based systems and approaches which aim to raise the generality of timetabling systems.
An article by Burke, Kingston and de Werra [34] (2004) discussed the application of graph coloring methods to timetabling. The authors considered class-teacher, course, exam and sports timetabling. This paper highlights the role that graph coloring methods have played in the timetabling literature over the last 40 years or so. Indeed, it points out that graph colouring ideas are incorporated in several modern hybrid meta-heuristic techniques.
Reviews concerning specific techniques on timetabling have also appeared which have reflected the speed at which timetabling research developments have been made. Burke and Landa Silva [35] in 2004 reviewed memetic algorithm methods that have been proposed to solve scheduling and timetabling problems. A number of issues concerning the design of memetic algorithms have been discussed. These include the domain knowledge that has been incorporated to deal with infeasibility, local search design concerning intensification mechanisms and the balance between genetic search and local search. The paper also draws attention to the design of self-adaptive memetic algorithms as an area of future timetabling and scheduling research. Multi-objective meta-heuristic techniques have also been reviewed by Landa Silva, Burke and Petrovic [102] (2004) for problems including educational timetabling. The paper covers recent multiobjective techniques including multi-phased approaches and multi-criteria evolutionary techniques. Issues that are discussed include problem formulations, problem domain knowledge and strategies used in local search.
From the above brief discussion, we can see that there are a number of excellent surveys in the literature concerning different issues that have impacted upon exam timetabling research. We also provide in Appendix A a list of PhD theses that have appeared during the years, where extensive reviews have been carried out upon specific aspects of course and exam timetabling. However, there is no comprehensive review which has dealt with the large body of exam timetabling research that has appeared in the last decade. This body of work includes a number of state-of-the-art approaches and has introduced a wide variety of diverse and successful methodologies. This paper aims to build on Carter and Laporte's 1996 survey [53] to provide a modern discussion of the methods and techniques that have been developed for this important problem. With this in mind, we will not discuss in detail the work that appeared before 1996. We aim to keep our bibliography of examination timetabling papers and our classification tables (see later on) up to date on the following web page http://www.asap.cs.nott.ac.uk/resources/data.shtml. We would be very grateful if authors could contact us as new papers appear in order to regularly update this public resource. Although we have covered all the relevant papers of which we are aware, we may have inadvertently omitted relevant papers that have already appeared. If so, we apologise and would welcome the opportunity to add them to the information available at our web site.
Although not specifically a survey paper, McCollum in 2007 [108] has discussed a number of papers in both course and exam timetabling with the aim of setting a research agenda to bridge the gap between timetabling research and practice. McCollum identified the development of timetables in institutions as multiphase procedure, and presented different real world modelling processes for exam and course timetabling. Of particular interest are the highlighted research challenges for both exam and course timetabling in the research.
Schaerf and Di Gaspero in 2007 [142] raised some particularly interesting and important issues in timetabling research. They discussed measurability and reproducibility in university (course and exam) timetabling. In addition to highlighting the importance of these issues when conducting timetabling research, the authors discussed practices that can contribute to the improvement of the two aspects. Indeed, the international competition in 2002 has set a standard on course timetabling. Our observations on the benchmark exam timetabling below and the construction of the benchmark dataset at http://www.asap.cs.nott.ac.uk /resources/data.shtml would further strengthen the importance of setting up standard scientific comparisons in timetabling research.
In addition to the surveys that have appeared over the years, an online bibliography was prepared by Kingston [98] in 1995 and includes more than 1000 references on automated timetabling.
The next sub-sections review the timetabling systems, languages and tools that were developed in the last decade. Furthermore, issues on timetabling models and complexity are also discussed. Section 2 will then review the timetabling research that has appeared. We have classified it according to the different techniques which have been investigated. For each technique reviewed, a corresponding summarising table is presented in Appendix B. In Section 3, we summarise all the work on three sets of benchmark exam timetabling problems, and clarify some issues on the consistency of the benchmark dataset introduced by [55]. Finally, we conclude the paper by summarising the research trends and presenting some future directions.

Timetabling Systems
During the years, a number of timetabling systems for both course and exam timetabling have appeared in the literature. However, many of them (especially before 1996) were specially developed for, and implemented at, particular institutions [53].
Hansen and Vidal [91] (1995) presented a nationwide exam timetabling system which was reported to have been in use since 1992 to solve the problems of centralised planning of both oral and written examinations for 248 high schools in Denmark. The complex problem with a variety of objectives was described and solved by a four-phase process dealing with different objectives using different techniques and heuristics. Some issues including the preparation of data and the scheduling of exams were discussed. Experiences during the development of the system and other practical issues including the maintenance of centralised information and communications were also discussed [90].
Lim et al [104] (2000) developed a timetabling system, which was a 3-tier client/server application, for both the course and exam timetabling problems at the National University of Singapore. The problems and the overall manual process were described. In the exam portion of the system, exams were weighted by three measures and assigned into the timeslots one by one using constraint propagation by an arc consistency algorithm. The timetables were generated by the system in a much shorter time and compared favourably with the manually generated ones from the previously used procedure. Ho, Lim and Oon [94] (2001) further developed the system by using a Tabu Search algorithm, which employed four moving operators to improve the solutions obtained. The Push Forward Insertion heuristic, which has been employed in vehicle routing problems, was used to help spread the exams across timetables. Real problems were used to test the approach. [72] (2001) developed a timetabling system to deal with both the course and exam timetabling problems at Athens University of Economics and Business. Firstly, an Integer Programming method was developed based on MPCODE and XPRESS-MP packages. This approach was employed to assign groups of courses to groups of timeslots for the course timetabling problem. Based on the course timetables, the initial exam timetables were built and were adjusted repeatedly by a heuristic approach which dealt with a number of constraints. This provided good and feasible solutions with minimum effort.

Timetabling Languages and Tools
Over the years, timetabling researchers have employed some general packages (such as ECLiPse for constraint logic programming [7]) to build timetabling systems. However, some packages and languages which are specialised on timetabling have also appeared to support representations and comparisons in timetabling research. This is a goal which (if it could be achieved) would undoubtedly benefit timetabling research. However, it is relatively true to say that the various attempts to suggest such standards have not yet been built upon and adopted by the community.
Burke, Kingston and Pepper [33] (1998) presented general requirements (generality, completeness and practicability) for building a standard data format for general timetabling problems based on set theory and logic. Examples were given to show how common constraints were modelled by using this data format. The objective is to provide an open way of making comparisons on results and exchanging data in timetabling research. [153] (1999) developed a language to specify constraint satisfaction problems so that constraint programming systems can be easily implemented for exam timetabling problems. The aim was to build a high level system which abstracted the details as much as possible so that end users, without knowledge of both the constraint programming and host languages, can focus on the problem specific information. A real exam timetabling problem was used as the example to build the constraint satisfaction system. [132] (2001) proposed a language, called UniLang, which used a list of synonyms to naturally represent data, various constraints, quality measures and solutions for general university timetabling problems. Eight classes of sub-problems in timetabling were defined and lots of examples were presented to interpret the language proposed. The language was converted into constraint logic programming in ECLiPse [7] for different problems.

Reis and Oliveira
Di Gaspero and Schaerf [76] (2003) introduced a software tool called EASYLOCAL++ for the implementation of a family of local search algorithms (Hill Climbing, Simulated Annealing and Tabu Search) on general timetabling problems. This represented an object-oriented framework which consisted of a hierarchy of abstract classes to take care of different aspects of local search. The main characteristics of the tool were reusability and generality, which were interpreted using examples from school timetabling, course timetabling and exam timetabling problems. It was employed to develop a family of Tabu Search methods in [75].
De Causmaecker et al [66] (2002) discussed how the Semantic Web can be used in timetabling. They studied, layer by layer, how this technology can be applied to interpret problem specific knowledge in timetabling using XML. An upper level timetabling ontology was presented to demonstrate the ability to support the fast development of applications on different timetabling problems, whose constraints and resources can be easily identified.
Chand [57] (2005) proposed a constraint based general model where constraints are grouped as domain, spread and CountResource, based upon which timetabling data and constraints can be transferred into a relational database. Examples of exam timetabling data by Burke, Elliman and Weare [27] and course timetabling data by Goltz and Matzke [77] were presented using the model proposed. The author declared that the format can be extended to include other constraints and can be applied to different languages. The author also provided a brief review of the relevant work on modelling timetabling data. Ranson and Ahmadi (2007) [128] briefly reviewed and analysed the limitations of the existing timetabling languages/standards. Based on the STTL language, which was developed for school timetabling [99] by Kingston, they designed a flexible language-independent timetabling model incorporating the features of object-oriented programming and the UML language.

Models and Complexity Issues
Over the last ten years, important issues concerning the models and the complexity of timetabling problems have been discussed in the literature. However, so far, there are still no universally accepted complete models. Not much deployment work has been carried out on this topic. The main area of research on complexity issues has been on school timetabling [143].
Cooper and Kingston [60] (1996) represented timetabling problems using a timetabling specification language called TTL. They proved that timetabling problems are NP-complete in five independent ways which actually occur in practice. Prospects were discussed to overcome the special cases in real problems.
de Werra, Asratian and Durand [70] (2002) studied the complexity of some variants of class-teacher timetabling problems. A simple model of the problem was first given, followed by the extension where the classes were partitioned into groups. The authors showed that when there was a teacher giving lectures to three groups of classes, besides giving lectures to individual groups, the problem is NP-complete. A polynomial procedure to find a timetable (of a certain number of timeslots) based on network flows was given for the problems where there were at most two groups of classes.

Examination Timetabling Approaches/Techniques
There has been a significant amount of exam timetabling research in the last ten years. We note that many of the successful methodologies that have appeared in the literature represent hybridisations of a number of techniques. Thus the classification, where possible, is by the main technique. However, several of the methodologies could have appeared in two or more of the classifications.

Graph Based Sequential Techniques
The paper by Welsh and Powell [155] in 1967 represented a very important contribution to the timetabling literature. It built the bridge between graph coloring and timetabling, which led to a significant amount of later research on graph heuristics in timetabling (e.g. [109,110]). In exam timetabling problems, the exams can be represented by vertices in a graph, and the hard constraint between exams is represented by the edges between the vertices. The graph coloring problem of assigning colors to vertices, so that no adjacent vertices have the same color, then corresponds to the problem of assigning timeslots to exams. Different soft constraints (such as those listed in Table 2) need to be considered separately and evaluated to represent a measure of solution quality.
The basic graph coloring based timetabling heuristics are constructive methods that order the exams by how difficult they are to be scheduled. The exams are then assigned, one by one, into the timeslots. A broad range of ordering strategies and their modified variants appear in the timetabling literature [51]. We list, in Table 3, some of the widely employed ordering strategies, most of which are based upon graph coluring heuristics. A random ordering method has also been employed in the literature to introduce randomness in hybrid approaches and provide comparisons.
Graph based heuristics which underpinned simple constructive methods played a very important role in the early days of timetabling research [51]. Although originally presented as techniques (albeit simple ones) in their own right, they are still being employed and adapted within hybridised methods in the current research literature. Their great strength is that they can provide reasonably good results within a small computational time and are very easy to implement. They are often used to construct initial solutions, or to build good portions of solutions before improvement techniques are applied (see more details in the Table 3. Widely Studied Ordering Strategies for Examination Timetabling Problems

Heuristics
Ordering Strategy Saturation increasingly by the number of timeslots available for the exam Degree [15] in the timetable at the time Largest decreasingly by the number of conflicts the exams have with Degree [16] the other exams Largest Weighted the same as Largest Degree but weighted by the number of Degree [55] students involved Largest decreasingly by the number of enrolments for the exam Enrolment [160] Random Ordering randomly order the exams Color Degree [55] decreasingly by the number of conflicts the exam has with those scheduled at the time following sections). A recent article by Burke, Kingston and de Werra [34] overviewed graph coloring techniques for general timetabling. In a particularly influential paper, Carter, Laporte and Lee [55] in 1996 studied the first five ordering strategies in Table 3 on real and randomly generated exam timetabling problems. Largest cliques, which are the largest subgraphs where each of the vertices is adjacent to all of the others, were used to build initial solutions, based on which graph colouring heuristics and backtracking techniques were employed to construct the solutions. The idea is that the size of the largest clique determines the least number of timeslots required for the problem. The results indicated that none of the heuristics outperformed any of the others over all of the problems tested. Another important contribution from this work is the introduction of a set of 13 exam timetabling problems, which became standard benchmarks in the field. They have been widely studied and used by different approaches during the years (see Table 6). We call this the University of Toronto data and discuss it further in Section 3.1. In 2001, Carter and Johnson [52] investigated sub-graphs which are sufficiently dense (almost cliques) on 11 of these instances. They observed that in real exam timetabling problems it was usually the case that there were many largest cliques, and that the almost cliques mechanism can potentially extend and improve the above approach.
Burke, Newall and Weare [43] in 1998 studied the effect of introducing a random element into the employment of graph heuristics (Saturation Degree, Color Degree and Largest Degree in Table 3) by developing two variants of selection strategies: (1) tournament selection that randomly chooses one from a subset of the first exams in the ordered list; and (2) bias selection that takes the first exam from an ordered list of a subset of all of the exams. These simple techniques, when tested on three of the Toronto datasets, improved the pure graph heuristics with backtracking in terms of both the quality and diversity of the solutions.
Burke and Newall in 2004 [40] investigated an adaptive ordering strategy which dynamicly ordered the exams during the problem solving in an iterative process. It was observed that a fixed pre-defined heuristic (employed as a measure of difficulty) in a traditional sequential strategy (as shown in Table 3) does not always perform well over the full range of problems. A heuristic modifier was designed to update the ordering of the exams according to the experience obtained with regard to the difficulty of assigning them in the previous iterations. Extensive experiments were carried out on 11 of the Toronto datasets, and another benchmark (which we call the Nottingham data, see Section 3.2). This approach was shown to be simple and effective (comparable with and occasionally better than, state-of-the-art approaches [55,75]). The method is not dependent on the initial ordering of exams.
Fuzzy logic was employed by Asmuni et al [8] in 2005 to order the exams to be scheduled based on graph coloring heuristics on the Toronto datasets. The idea is that when ordering the exams by how difficult they are, fuzzy functions can be used to give an appropriate evaluation. It was seen that different fuzzy functions need to be used on different problems to obtain the best results. In [9], a fuzzy system was developed to build a new evaluation function based on a series of rules to evaluate the quality of timetables where multiple criteria were involved. The approach was further improved by tuning the fuzzy rules and better results were obtained.
Corr et al [62] developed a neural network from which a measure of the difficulty of assigning exams during the timetable construction can be obtained by recursively inputting the updated solution construction states. The objective is to adaptively assign the most difficult exams at an early stage of solution construction. The neural network was trained by storing the states of timetable construction (feature vectors) using three graph heuristics. The work has demonstrated the feasibility of employing neural network based methods as an adaptive and generally applicable technique on timetabling problems.
Due to the limitations of constructive methods, where early assignments may lead to situations where no feasible timeslots are available for exams later on in the construction process, backtracking was studied (e.g. [55]). This process unassigns the early conflicting exams to allocate the current ones. A look ahead technique was also studied in [38] to address this issue (for more details see the discussion on memetic algorithms in Section 4.2).
As mentioned earlier, techniques which hybridise graph heuristics with other methods are still appearing in the most modern exam timetabling research literature. The employment of graph heuristics within hyper-heuristics is discussed in Section 2.6.

Constraint Based Techniques
Constraint logic programming [85,92] and constraint satisfaction techniques [14] have their origins in Artificial Intelligence research. Such methods have attracted the attention of researchers in timetabling due to the ease and flexibility with which they can be employed for timetabling problems. Exams are modelled as variables with finite domains. Values within the domains (representing the timeslots and rooms) for the variables are assigned sequentially to construct the solu-tions for the problems. Early research focused on finding feasible solutions (i.e. satisfying all hard constraints). Brailsford, Potts and Smith [14] in 1999 introduced various searching methods on constraint satisfaction problems and demonstrated that this technique could be applied to optimisation problems.
Constraint based techniques are usually computationally expensive due to the fact that the number of possible assignments increases exponentially with the number of variables. They, on their own, cannot usually provide high quality solutions compared with the state-of-the-art approaches [14] on complex optimisation problems. Different heuristics and techniques are usually integrated to reduce the time complexity for solving practical problems (see [53,156]). For example, the labelling strategy, where heuristics are usually employed, indicates the order in which the variables are to be initiated.
David [65] (1998) applied constraint satisfaction techniques to model an exam timetabling problem in a French school, the Ecole des Mines de Nantes. Time complexity was crucial. Thus partial solutions were first obtained. Based on these partial solutions, particular local repairing strategies were employed successively to obtain complete solutions and make improvements. The approach was run several times with different initial assignments to reduce the chance of missing good solutions. It was employed successfully in the school and can usually generate solutions within one second. [131] (1999) developed an examination timetabling system based on ECLiPSe [7], which is a Prolog based platform for developing various extensions in constraint logic programming. A set of hard and soft constraints in the problem were built into a constraint satisfaction model, where set variables were employed and handled by the libraries in ECLiPSe. Its application on random data and a large real exam timetabling problem in the University of Fernando Pessoa in Porto demonstrated the efficiency of the model. [111] (2003) employed constraint programming in a similar way to that of [13] using OPL [93], an optimisation programming language, to produce initial solutions. Then a Simulated Annealing and a hill climbing method (see Section 2.3 below) were used to improve the solutions. Variables (exams) were ordered by the sizes of their domains (available timeslots) and scheduled into the earliest timeslots one by one. The pure constraint programming obtained the best result for one of the Toronto datasets. The overall hybrid approach was tested on problems at the University of Melbourne, two variants of Toronto instances and the Nottingham data (see Section 3). This approach obtained the best results reported in the literature on several instances of the Toronto and Nottingham datasets at the time.

Merlot et al
Duong and Lam [67] (2004) also employed constraint programming to generate initial solutions for a Simulated Annealing methodology for the exam timetabling problems at HoChiMinh City University of Technology. Backtracking and forward checking were employed to reduce the searching effort. The labelling strategy dynamically ordered the variables (exams) by a number of factors such as the size of the domain and the number of students.
In summary, recent research on constraint based techniques represents hybridisations with other techniques. The labelling strategy is usually integrated with different problem specific heuristics for variable ordering and is crucial to the success of the method. The development of some powerful constraint programming systems/languages (e.g. ECLiPSE [7], CHIP [146], OPL [93]) significantly supported the construction of the complete exam timetabling systems in real world applications. However, only particular problems at different institutions have been tackled with this approach in the literature. Also the comparisons (with one exception -see below) have been against manually produced solutions. No comparisons have been made between constraint based techniques and other state-of-the-art approaches on the same problems, except for the paper by Merlot et al [111] which presents a method that is evaluated on the Toronto and Nottingham data. It is worth noting, though, that this method can produce the best results in the literature on some of these benchmark problems.

Local Search Based Techniques
Local search based techniques [125] (e.g. Tabu Search, Simulated Annealing and their variants) and Evolutionary Algorithms (see Section 2.4) are often classified to meta-heuristics [31,88]. Local search methods are a family of general techniques which solve problems by searching from an incumbent solution to its neighbourhood. Different neighbourhood structures and moving operators within the search space distinguish different local search techniques. The search is guided by a defined objective function, which is used to evaluate the quality of the generated timetables.
These techniques represent a large part of the work that has appeared in the last decade [53]. They have been applied on a variety of timetabling problems, mainly because different constraints can be handled relatively easy. The performance and efficiency of these techniques are highly dependent upon the parameters and search space properties (e.g. connectivity, ruggness), thus a lot of domain knowledge is usually built-in to deal with specific problems. A large amount of variants and combinations have been investigated. We will first deal with Tabu Search.

Tabu Search
Tabu Search [86,87] explores the search space by not re-visiting a list of recent moves (kept in a tabu list). They may, however, be selected if they generated the best solution obtained so far by using an aspiration strategy. Otherwise, the search moves to other solutions even if they are worse than the incumbent solutions, with the aim of escaping from local optima. Parameters need to be fine-tuned in designing the approach and this is very much dependant on the problem in hand. Such parameters include the tabu list and the stopping criteria among others.
Di Gaspero and Schaerf [75] (2001) carried out a valuable investigation on a family of Tabu Search based techniques whose neighbourhoods concerned those which contributed to the violations of hard or soft constraints. Exhaustive and biased selection strategies were also studied. The length of the tabu list is dynamic and the cost function is adaptively set during the search. The authors experimentally demonstrated that the adaptive cost function and the effective selection of neighbourhoods concerning the violations were key features of the approach. In 2002 Di Gaspero [74] improved the approach by employing multiple neighbourhoods based on a token-ring search which circularly employs recolor (change single exam) and shake (swap groups of exams), followed by kickers (change sequence of single exams) to further improve the solutions obtained. The technique extended the idea of diversifying the search from local optima.
White and Xie [157] (2001) developed a four-stage Tabu Search called OT-TABU, where solutions were gradually improved by considering more constraints at each stage, for the exam timetabling problem at the University of Ottawa. In addition to the recency short term memory, a frequency long term memory was also used to record the frequency of the most active moves in the search history. The size of the long term memory was set by analysing the number of less important exams in the problem. In [158] (2004) this approach was extended where both of the tabu lists could be dynamically relaxed (emptied) after a certain length of search time with no improvement. This approach compared favourably to those in [55] and [75] on the Toronto data. The authors experimentally showed that employing long term memory can significantly improve Tabu Search on real-world problems.
Paquete and Stutzle (2002) [121] developed a Tabu Search methodology for exam timetabling where ordered priorities were given for the constraints. The constraints were considered in two ways: (1) one constraint at a time from the highest priority, where ties were broken by considering the lower priority constraints; (2) all the constraints at a time, starting from the highest priority. The 2 nd strategy obtained better results, while the 1 st strategy was more consistent. The length of the tabu list was adaptively set by considering the number of violations in the solutions. It was observed that the length of the tabu list needed to be increased with the size of the problems.

Simulated Annealing
Simulated Annealing [1] is motivated by the natural annealing process [2]. The idea is to search a wider area of the search space at the beginning of the process by accepting worse moves with a higher probability, which is gradually decreased as the search continues. A temperature is used within a cooling schedule to control the probability of the acceptance of worse moves in the search. Many parameters need to be tuned in Simulated Annealing including the initial and final temperatures, and the cooling factor in the cooling schedule. These parameters affect the performance and success of this approach.
Thompson and Dowsland [152] (1998) carried out valuable work to develop a two-stage approach where feasible solutions from the 1 st stage fed into a Simulated Annealing process in the 2 nd stage to improve soft constraints satisfaction. As different objectives were dealt with in different stages in turn, the solutions from the early stage may be poor, and thus a backtracking technique was proposed. Dowsland [78] also observed that the way in which the neighbourhood was defined, the importance of objectives and the difficulty of objectives significantly affected the process. Based on their work in [151], the authors further investigated the Kempe chain neighbourhood, where chains of exams rather than individual exams were moved. This gave more flexibility to enable the movement of large difficult exams within the timetable. They concluded that the most important factors in Simulated Annealing were the cooling schedule and the way neighbourhoods were defined and sampled. The authors reported that the developed exam timetabling system had been used in Swansea University successfully since 1993.
Bullnheimer [17] (1998) discussed how a model for Quadratic Assignment Problems was adapted to formulate a small scale practical exam timetabling problem at the University of Magdeburg. The models enabled the university administrators to control how much the conflicting exams need to be spaced out. Simulated Annealing was employed where two sets of neighbourhood structures (moving the timeslots of exams and moving single exams) were studied. However, the details of the parameters in the algorithm were not given.
Merlot et al (2003) [111] employed a Simulated Annealing approach initialised by constraint programming techniques (see Section 2.2 on "Constraint Based Techniques") and followed by hill climbing to further improve the solution. A modified Kempe chain neighbourhood was employed. The best results at the time for several of the benchmark instances were achieved by this hybrid approach. Indeed, the method still has some of the best known results. The authors suggested that methods combining solution construction with local search will dominate the future of exam timetabling research.
Duong and Lam [67] (2004) employed Simulated Annealing on the initial solutions generated by constraint programming for the exam timetabling problem at HMCM University of Technology. A Kempe Chain neighbourhood was employed in the Simulated Annealing, whose cooling schedule was experimentally set using mechanisms and algorithms. The authors noted that when limited time is given, it is crucial to tune the components in Simulated Annealing to the specific problems to be solved.
Burke et al [19] (2004) studied a variant of Simulated Annealing, called the Great Deluge algorithm [80]. The search accepts worse moves as long as the decrease in the quality is below a certain level, which is originally set as the quality of the initial solution and gradually lowered by a decay factor. The decay factor and an estimate of desired quality represent the parameters in this approach. The authors noted that such parameters can be pre-defined by users, who are usually not experts on meta-heuristics. The initial solutions, however, need to be feasible to calculate the decay factor so a Saturation Degree was run a number of times, from which the best solutions were employed as the starting points. This Great Deluge approach was superior to a Simulated Annealing developed by the authors. It was shown to be effective and generated some of the best results on the Toronto and Nottingham datasets when compared with other approaches ( [55,75]). Comprehensive experiments were also carried out to analyse the trade-off between the time and solution quality on problems of different size. The approach was further studied in [39] where it was initialised by the adaptive ordering method in [40].

Other Local Search Based Techniques
Recently, along with the study of different ways of escaping from local optima in local search based techniques, some researchers turned to investigating the effect of designing different neighbourhoods and have obtained some success on timetabling problems. This demonstrated that not only the method of search, but also the structure of the neighbourhood had significant impact on the searching algorithms. For example, Kempe chain neighbourhood structures as mentioned above were investigated by a number of researchers in exam timetabling (see [56,64,111]). The idea is that chains of conflicting exams are swapped between timeslots. Reasons for why this neighbourhood structure worked well were analysed [152]. Other approaches concerned multiple neighbourhood structures [74]. Compared with standard moves on single exams, this brought more flexibility in the navigation of search spaces for different problems.
Abdullah et al [3] in 2007 developed a large neighbourhood search based on the methodology of improvement graph construction originally developed by Ahuja and Orlin [6] for different optimization problems. Instead of just considering traditional pair-wise exchange based operators, a tree-based neighbourhood structure was designed to carry out cyclic exchanges among all of the timeslots. The approach has provided the best results on a number of Toronto dataset problems at the time of publication. However, a large amount of computational time was needed. It was further developed in [4] where the improvement moves were kept in a tabu list for capacitated exam timetabling problems (Toronto c in Section 3.2).
Another technique concerning different neighbourhoods is Variable Neighbourhood Search (e.g. [89,114]). This approach systematically varies a number of neighbourhood structures. The aim is to escape from local optima by switching from the search space defined by one neighbourhood to another. However, not much work has been done in exam timetabling using this approach. Burke et al [25] (2006) investigated variants of Variable Neighbourhood Search and obtained the best results in the literature across some of the problems in the Toronto datasets. The results were further improved by using a standard Genetic Algorithm to intelligently select a subset of neighbourhoods. The latter approach has a strong link to the work in hyper-heuristics and indicated promising directions on developing general approaches on neighborhoods rather than directly on solutions. In hyper-heuristics, Variable Neighbourhood Search was also employed where graph heuristics rather than neighbourhoods were searched [126]. We present more details in Section 2.6 on "Hyper-heuristics".
In addition to designing different neighbourhood structures within local search based techniques, some researchers have also looked into how iterative techniques can help in solving complex problems. In Iterated Local Search [106], the search restarts after a certain criterion is met. The motivation is to explore more areas of search space within a short time. It was first applied to the graph coloring problem [122] in 2002.
Caramia, DellOlmo and Italiano [49] (2001) developed a fine-tuned local search method where a greedy scheduler assigned exams into the least possible number of timeslots and a penalty decreaser improved the timetable without increasing the number of timeslots. When no improvement could be made, the number of timeslots were increased gradually by a penalty trader. The process was repeated employing a permutation technique to reassign the priorities of exams. This approach still holds the best results reported in the literature on several instances of the Toronto datasets.
Casey and Thompson [56] (2003) investigated a Greedy Randomised Adaptive Search Procedures (GRASP) approach [133], which is a relatively new technique in timetabling. In GRASP, a local search algorithm is started iteratively after local optima are reached based on the initial solutions generated by a greedy approach. In [56], the initial solution in each iteration was generated by a modified Saturation Degree, where one exam from the first n (experimentally set as 2-6) exams ordered was assigned to the timetable. Backtracking was employed in conjunction with a tabu list to forbid indefinite cycles. A limited form of Simulated Annealing with high starting temperature and fast cooling was used in the improvement phase. Kempe chain moves were employed on exams that particularly contributed to the cost. The approaches applied to the Toronto datasets reported competitive results on some of the instances at the time.
In summary, during the last decade, local search based techniques have been very heavily studied and have obtained a marked level of success on timetabling. All of the work discussed above was either tested on benchmark data or implemented in real applications. Different ways of accepting the moves (i.e. moving strategies, acceptance strategies and selection strategies) were studied to enable escape from local optima. However, one significant drawback of these approaches is the effort required to tune the parameters for specific problems to get high quality solutions.

Evolutionary Algorithms
Genetic algorithms have been the most studied Evolutionary Algorithms in terms of exam timetabling research. In particular, hybridisations of genetic algorithms with local search methods (sometimes called memetic algorithms) have led to some success in the field.
Genetic Algorithms represents an analogy with the evolutionary process in nature by manipulating and evolving populations of solutions within the search space (see [88,129,141]). Solutions are coded as chromosomes and are evolved by a reproduction process using crossover and mutation operators, with the aim of getting better and better solutions through a series of generations. Parameters and operators in Genetic Algorithms need to be defined and set properly, making the approach (usually) more complicated than that of local search based searching methods. The searching strategy in Genetic Algorithms is fundamentally different from the local search based approaches discussed above in the sense that several solutions (a population of solutions) are dealt with at once rather than just one solution being improved through a series of iterations.
Corne, Ross and Fang [61] in 1994 provided a brief survey on using Genetic Algorithms in general educational timetabling and addressed some issues and future prospects. One contribution of the work showed that the use of direct representation in Genetic Algorithms was incapable of dealing with certain problem structures in some specially generated graph coloring problems. In 2003 Ross, Hart and Corne [138] updated the above work on evolutionary representations and algorithms used on various kinds of timetabling problems.
Ross, Corne and Terashima-Marin [136] (1996) showed that transition regions exist in solvable timetabling problems by experimenting upon specially generated graph coloring problems of different connectivity and homogeneity. The authors indicated that the study can assist the understanding of how different algorithms perform on complex timetabling problems. In 1998 Ross, Hart and Corne [137] provided further evidence for the weakness of the use of direct coding in Genetic Algorithms. They observed the failure of a number of (evolutionary and non-evolutionary) approaches to solve special classes of graph coloring problems and suggested that Genetic Algorithms should search for algorithms rather than actual solutions. Indeed hyper-heuristics (where a set of low level heuristics is searched by a high level algorithm -see more details in Section 2.6) do exactly this.
Terashima-Marin, Ross and Valenzuela-Rendon [148] in 1999 designed a clique-based crossover operator on timetabling problems that was transferred into graph coloring problems. Different recombination strategies were tested in the reproduction processes to maintain the cliques in parents into their offspring. They pointed out the same problem with direct representation in Genetic Algorithms as discussed above in [137]. They suggested alternatives for future work. In [149], they also studied the penalty function on both random and real timetabling problems employing Hardness Theory, which predicts where the hardest instances are within timetabling problems. However, they observed that adding this measure is not helpful in guiding Genetic Algorithms toward promising areas of the search space. Based on the above work, they investigated the non-direct coding in Genetic Algorithms [150], where solution construction strategies and heuristics rather than the actual solutions were coded (e.g. configurations of constraint satisfaction methods, ordering of nodes being assigned and heuristics dealing with constraints). Promising results obtained by this approach on the Toronto datasets indicated the potential of non-direct representations in Genetic Algorithms.
Erben [83] (2001) developed a grouping Genetic Algorithm where appropriate encoding, specially designed crossover and mutation operators, and fitness functions were studied. Genes were grouped for each color in graph coloring problems (which model the exam timetabling problems with only hard constraints).
Although the results were not competing with the best, the approach requires less computational time than some of the methods in the literature. It also studied, from a different aspect, the important issue of representations in designing Genetic Algorithms.
Sheibani [145], in 2002, built a special mathematical model and developed a standard Genetic Algorithm for solving exam timetabling problems in training centers with the objective of maximising the intervals between the exams. An activity-on-arrow network was employed to estimate the closeness between exams, which was used in the fitness function in the Genetic Algorithm.
Wong, Cote and Gely [159] (2002) discussed some issues concerning their implementation of a Genetic Algorithm for solving an exam timetabling problem at Ecole de Technologie Superieure, which was modelled as a Constraint Satisfaction problem. Tournament selection was used to select parents, and repairing strategies were incorporated with mutation to produce better candidates. In 2005, Cote, Wong, and Sabourin [64] investigated a bi-objective evolutionary algorithm with the objectives of minimising timetable length and spacing out conflicting exams. Two local search operators (a classic Tabu Search and a simplified Variable Neighbourhood Descent), instead of recombination operators, were employed to deal with hard and soft constraints. The approach obtained competitive results on a number of benchmark problems against some of the methods in the literature (e.g. [55,41,111]). The paper also provided a review on all the state-of-the-art approaches on the Toronto datasets at the time.
Ulker, Ozcan and Korkmaz [154] (2007) developed a Genetic Algorithm where Linear Linkage Encoding was used as the representation method. Different crossover operators were investigated in conjunction with this representation on benchmark graph coloring and exam timetabling problems with hard constraints (Toronto variant a). Promising results indicated that this encoding with appropriate genetic operators was a viable search methodology.

Memetic Algorithms
Memetic algorithms [116] are an extension of Genetic Algorithms whose basic idea is that individuals in a population can be improved during their lifetime (i.e. within a generation). This is often implemented by employing local search methods (in the form of hill climbing or repairing strategies, etc) on individual members of a population between generations. Burke and Landa Silva [35] discussed a number of issues concerning the design of memetic algorithms for scheduling and timetabling problems. Recent research ideas and future directions on this topic were presented.
The ability to explore the search space by employing a population based method and exploit region of it by using for local search enables such methods to deal effectively with large complex problems. However, there is usually a price to pay in terms of the computational time required. Also the right balance between exploration and exploitation needs to be established [35]. There exists a number of in-depth studies on memetic algorithms concerning structures of the search space and different ways of hybridising over a range of combinatorial optimisation problems (e.g. see [119,120]).
Burke, Newall and Weare [41] (1996) developed a Memetic Algorithm where light and heavy mutation operators were employed to reassign single exams and sets of exams, respectively. Neither of these mutations on their own provided substantial improvement on the solution quality. Hill climbing was used to improve the individuals and to improve the quality of timetables although a larger amount of computational time was required. Another contribution of this paper was the introduction of a new set of benchmark exam timetabling problems (named as the Nottingham data and described in Section 3.2). These have been widely used by a number of researchers in later work (see Tables 8 and 13). The same authors also investigated the effects of diversity in initial populations in memetic algorithms [42] (1998). To generate a high level of diversity in the initial population, randomness was introduced by using different selection strategies on graph heuristics (see [43]). Three diversity measures were also developed to study the trade-off between the quality and diversity. It was shown that the study of diversity in initialisation offered great potential benefits for memetic algorithms. Burke and Newall [38] (1999) presented a heuristic methodology for decomposing an exam timetabling problem into a series of sub-problems. They used the memetic algorithm of [41] to address the sub-problems. This approach is discussed in more details in the section on decomposition (Section 2.7).

Ant Algorithms
Ant Algorithms [71,112] belong to the family of population based techniques. They simulate the way ants search for the shortest route to food by laying pheromone on the way. The shortest trails generate stronger levels of pheromone over a period of time. In the algorithm, each ant is used to construct a solution and information gained during the search is maintained as pheromone, which is used to help generating solutions in the next stage. In exam timetabling, Ant Algorithms represent relatively recently explored techniques and have not been particularly widely studied. Relevant work does, however, exist on graph coloring problems [63], where the frequency of the colors assigned for the vertices in the solution construction were employed as the pheromone.
Naji Azimi [117] in 2004, implemented an Ant Colony System and compared it with Simulated Annealing, Tabu Search and a Genetic Algorithm under a unified framework for solving systematically designed exam timetabling problems. Initial solutions for the Ant Colony System were generated heuristically and improved by local search afterwards. The results analyzed over the running time indicated that the Ant Colony approach performed the best (although not on all of the problems) and Tabu Search had the highest level of improvement upon the initial solution randomly generated. Three variants of hybridisation on Tabu Search and the Ant Colony method were then studied in [118]. It was observed that the hybrid approaches work better than each single algorithm, and the sequential Ant Colony System followed by Tabu Search obtained the best results. However, only randomly generated data was used to test these algorithms. [79] developed Ant Algorithms based on the graph coloring model studied in [63] for solving the Toronto a version of the exam timetabling problem without soft constraints (i.e to find the lowest number of timeslots). Extensive experiments were carried out to measure the performance of the algorithm with different configurations. These include the initialisation methods (i.e. recursive Largest Degree and Saturation Degree), trail calculations, three variants of fitness functions and different parameter settings. The results obtained were competitive to the others on the same dataset. It was also observed that the initialisation methods had significant influence on the solution quality. Extensions of the algorithm to incorporate other constraints (i.e. time windows, seating capacities and second-order conflicts) were also discussed.

Dowsland and Thompson in 2005
Eley [82] (2007) compared two modified ant algorithms based on the Max-Min ant system for course timetabling [147], and the ANTCOL algorithm for graph coloring problems [63]. It was observed that the simple ant system ANTCOL outperformed the Max-Min ant system when both algorithms were hybridised with a hill climber. The author also concluded that adjusting parameters can considerably improve the performance of ant systems.

Artificial Immune Algorithms
Malim, Khader and Mustafa [107] (2006) studied three variants of Artificial Immune systems (a Clonal Selection Algorithm, an Immune Network Algorithms and a Negative Selection Algorithm) and showed that the algorithms can be tailored for both course and exam timetabling problems. However, there was a problem with the results presented in this paper. They were (after publication) found to represent an error in the code and, as such, are invalid.
In summary, evolutionary methods (particularly evolutionary hybrids) have been very effective in providing high quality solutions to exam timetabling problem. Recent research has discussed the issues of encoding to deal with the problem structures that direct coding is not capable of dealing with. This opened up a new research direction in Evolutionary Algorithms and has led to some of the initial work in Hyper-heuristics (see Section 2.6). Multi-criteria techniques also form an important research direction in the area of Evolutionary Algorithms for exam timetabling problems. More details are discussed in the next section.
Ant Algorithms (see [79,117]) and Artificial Immune Algorithms [107] have been applied in exam timetabling with some initial observations. As relatively new techniques, they represent some potential and should attract more attention in the exam timetabling domain.

Multi-Criteria Techniques
In the majority of algorithms/approaches on timetabling, weighted costs of violations of different constraints are summed and used to indicate the quality of the solutions. However, in real world circumstances, the constraints are often considered from different points of view by the different parties involved in the timetabling process [53]. The simple sum of costs on different constraints cannot always take care of the situation in such cases. Multi-criteria techniques have been studied recently in timetabling with the aim of handling different constraints easily by considering a vector of constraints instead of a single weighted sum. In multi-criteria techniques, each criterion can be considered to correspond to a constraint, which has a certain level of importance and is dealt with individually. In some approaches, multiple stages have been employed to deal with different objectives. Landa, Burke and Petrovic [102] provide a review of a large number of scheduling and timetabling applications which employ multicriteria techniques.
Colijn and Layfield (1995) [58] applied a multi-stage approach for the exam timetabling problem in the University of Calgary. In the 1 st stage, individual exams and whole sets of exams in timeslots were moved to reduce the number of students who were sitting two exams in a row. In the 2 nd stage, students taking three and four exams in a row were considered using a similar process. The authors also considered the cases where timetables have to be modified in unforeseen circumstances [59] in the 2 nd stage of the approach, which was a highly interactive process within a visual interface where exams can be moved, added or removed from the timetables.
Burke, Bykov and Petrovic [18] (2001) developed a two-stage multicriteria approach dealing with nine criteria in exam timetabling problems (e.g. room capacity, proximity of exams, time and order of exams, etc). In the 1 st stage, Saturation Degree was used to generate a set of feasible solutions, where each criterion was dealt with individually. The 2 nd stage then heuristically improved these solutions simultaneously. A multi-criteria method called Compromise Programming [163] was used where the quality of the solutions was evaluated by the distance between them to an ideal point representing optimal solutions concerning all criteria. This technique was further studied in [124] by Petrovic and Bykov based on the Great Deluge algorithm [19]. A reference point provided by users was used to draw a trajectory in the criteria space. The criteria weights can be dynamically changed to guide the search, starting from random points, towards the reference point. It aims at the ideal point in the criteria space. However, the initial weights needed to be set were dependant on the problems. Also the search was not guaranteed to converge. Published results from [41] were used as the reference points of the approach and the final results were better on some of the benchmark problems tested. These approaches provided the flexibility for timetablers to obtain desired solutions by managing the weights of different constraints.

Hyper-heuristics
The dependence upon parameter tuning or the way of embedding domain knowledge (i.e. the hard coding of hard and soft constraints) impacts upon metaheuristic development for examination timetabling. Some of the most effective techniques on the benchmark data in the literature are meta-heuristics. However, most of these methods represent a tailor made approach for one particular problem (in this case, exam timetabling). Such methods usually work poorly or are not capable of dealing with other problems. Indeed, it can be the case that such methods do not work consistently across other exam timetabling problem instances. Often, parameter tuning can play a significant role. The effort of tuning parameters to fit new problems can be thought of as being as difficult as that of developing new approaches. This well-known issue has led a number of researchers to develop new technologies aimed at operating at a higher level of generality.
Hyper-heuristics are motivated by such observations and are attracting an increased level of research attention. The term can be seen as representing heuristics that choose heuristics, i.e. a search space of heuristics is the focus of attention rather than a search space of solutions (as is the case with most implementations of meta-heuristics [29,135]). The aim is to develop more general approaches rather than to beat the fine-tuned and problem specific approaches which often require much effort on the tuning of parameters and are usually only appropriate for specific problems.
As mentioned above in Section 2.4.1, Ross, Hart and Corne [137] suggested that a Genetic Algorithm might be successfully employed in searching for a good algorithms rather than specific solutions. In [150], Terashima-Marin, Ross and Valenzuela-Rendon investigated using Evolutionary Algorithms to search for solution construction strategies.
Ahmadi et al [5] in 2003 developed a Variable Neighbourhood Search to find good combinations of parameterised heuristics for different exam timetabling problems. Permutations of the low level heuristics (i.e. seven exam selection, two timeslot selection and three room selection heuristics) and their associated parameters (weights) were employed to construct solutions.
Ross, Marin-Blazquez and Hart [139] in 2004, developed a general steady state Genetic Algorithm to search within a simplified search space of problemstate descriptions to construct solutions. The search of the Genetic Algorithm was on heuristics rather than actual solutions. Three different fitness functions were tested. The descriptions of the problem state (corresponding to heuristics) were experimentally studied with respect to these fitness functions. Promising results for both the benchmark course and exam timetabling problems demonstrated valuable potential research directions of this approach for a range of problems.
Kendall and Hussin [96,97] in 2005 investigated a Tabu Search hyperheuristic based on the work in [32] where both moving strategies and constructive graph heuristics were employed as low level heuristics. The algorithms were tested on exam timetabling problems from the MARA University of Technology [95] and it was shown that it produced better results compared with solutions that were generated manually.
Burke et al [23,45] investigated employing Case-Based Reasoning (see [103]), a knowledge based technique, as a heuristic selector for solving both course and exam timetabling problems. In [45] (2006), knowledge discovery techniques were employed to discover the most relevant features used in evaluating the similarity between problem solving situations. The objective was to choose the best heuris-tics from the most similar previous problem solving situation to construct good solutions for the problem in hand. The issue of defining the similarity between exam timetabling problems has also been studied in [24] in terms of choosing the best problem solving method. In [23] (2005), different ways of hybridising the low level graph heuristics (with and without CBR) were compared for solving the Toronto datasets. It was shown that employing knowledge based techniques rather than randomly/systematically hybridising heuristics in a hyper-heuristic framework presented good results. Yang and Petrovic [162] (2005) employed Case-Based Reasoning to choose graph heuristics to construct initial solutions for the Great Deluge algorithm and obtained the best results reported in the literature for several Toronto instances at the time. Attribute graphs studied in [36] were employed to model the constraints in the problems so that previous problems with similar constraints were retrieved to solve the problems in hand by reusing the most appropriate graph heuristics.
Burke et al [37] (2007) investigated using Tabu Search to find sequences of graph heuristics to construct solutions for timetabling problems. A different number of low level graph heuristics were studied in this graph based hyper-heuristic to adaptively assign the most difficult exams at different stages of solution construction. It was observed that the greater the number of intelligent low level heuristics, the better the performance may be. However, the size of the search space will grow significantly. Thus, the computational time may be an issue. The results on both course and exam timetabling problems were competitive with the best state-of-the-art approaches reported in the literature and demonstrated the simplicity and efficiency of this general approach. Qu and Burke [126] further investigated the effect of employing different high level search algorithms (i.e. Steepest Descent, Tabu Search, Iterated Local Search and Variable Neighbourhood Search) in the unified graph based hyper-heuristic framework for exam timetabling. Experimental results demonstrated that the hyper-heuristic method employed upon the search space of graph heuristics was not crucial. The characteristics of the neighbourhood structures and search space were analysed. It was shown that the exploration over the large solution space enabled the approach to obtain good results on both the exam and course timetabling problems.
Bilgin, Ozcan and Korkmaz [12] (2007) analysed 7 heuristic selection methods and 5 acceptance criteria within a hyper-heuristic by conducting an empirical study on both benchmark functions and exam timetabling problems. They concluded that different combinations of selection methods and acceptance criteria worked well on different problems, although some combinations worked slightly better than others on the instances tested.
Ersoy, Ozcan and Uyar [73] (2007) studied hyper-heuristic approaches where three hill climbers were applied in different orderings within a memetic algorithm. During the memetic algorithm, individuals were evaluated to keep track of the violations of each constraint type in the benchmark Toronto dataset. The approaches were compared with self-adaptive memetic algorithm hyperheuristics with different heuristic selection and acceptance criteria. It was shown that the memetic algorithm hyper-heuristic with a single hill climber at a time performed the best among all the approaches tested.
In summary, various strategies and methodologies have been employed as the high level selection methods in a hyper-heuristic framework to choose appropriate low level heuristics. These low level heuristics might be either construction or improvement heuristics. Such methods are laying the foundations of methodologies to automatically design and adapt timetabling heuristics. This has led to some work on analysing the search space of heuristics (rather than in solutions) with the goal of fundamentally understanding the search processes which underpin this new perspective on timetabling research [126].

Decomposition/Clustering Techniques
The idea of decomposition is that large problems are broken into small subproblems, for which optimal or high quality solutions can be obtained by relatively simple techniques as the search spaces of the sub-problems are significantly smaller than that of the original problem [50]. Although it has had some success [38], decomposition in timetabling has not attracted as much attention as might be expected because of two drawbacks. Firstly, early assignments may lead to later infeasibility, which was also a problem encountered in constructive methods in the early days of timetabling research. Secondly, globally high quality solutions may be missed as certain soft constraints cannot be evaluated when the problems are decomposed. The clustering methods studied in early timetabling research [53] can be seen as decomposition approaches in the sense that the exams are decomposed into conflict-free or low-conflict groups. Another way of decomposing the problems is by finding the largest clique in the graphs. This was studied by Carter, Laporte and Chinneck [54] (1994) and employed in their later work [55] (1996). Carter and Johnson [52] (2001) improved the approach by assigning the exams in all of the almost-cliques as they potentially represent the most difficult exams.
Burke and Newall [38] in 1999 investigated a decomposition approach by using sequential heuristics to assign the first set of n exams which were measured as the most difficult ones by graph colouring heuristics (i.e. Color Degree, Largest Degree, Saturation Degree -see Table 3). Backtracking and look-ahead techniques were employed to avoid making early assignments which lead to later infeasibilities. The exams assigned in previous stages were fixed and the subproblem at each stage was solved by the Memetic Algoritm developed in [41]. The algorithm dramatically reduced the time required and also produced high quality solutions on the Toronto and Nottingham data. At the time of publication, this paper had some of the best results on the capacitated benchmark problems (Toronto c in Section 3.2). The decomposition technique was actually independent of the memetic timetabling algorithm which was used on each of the decomposed subsets.
Lin [105] (2002) developed a multi-agent algorithm where problems were divided into sub-problems and solved by each agent locally. A broker was used to solve the remaining schedules including those that were de-allocated from local schedules. The global solutions were obtained by aggregating all the schedules generated by the agents and the broker. Both the Toronto data and randomly generated exam timetabling problems were tested and compared with the method of [137]. The approach worked well on sparsely scheduled problems but less well on dense problems.
Qu and Burke [127] in 2007 investigated an approach where exams were adaptively decomposed into two sets (difficult and easy) by how difficult they were to schedule in previous iterations of the solution construction. The complexity of the problem is thus reduced as two smaller search spaces were concerned while the overall quality of the timetables is also considered. The small portion of difficult exams obtained by the approach were found to make a significant contribution to the costs of the timetables generated. The approach obtained the best result on one of the problem instances at the time of publication (see Toronto c in Section 3.2).

Examination Timetabling Benchmark Data
The high level of research interest in examination timetabling has led to the establishment of a variety of different benchmark problems which have been widely studied. The established benchmarks, with variants of standard defined measures, have provided a way for meaningful scientific comparisons and the exchange of research achievements. However, there has been some confusion in the literature due to the circulation of two different versions of eight of these benchmark problems (from the University of Toronto datasets). One of the goals of this paper is to eradicate this confusion by establishing new names for each of the different versions. This, of course, means that we actually have 21 problems that have been studied in the literature (rather than 13). Another aim of this section of the paper is to summarise which of the methods that have appeared in the literature are the best on these benchmarks. This is particularly important given the confusion mentioned above.

University of Toronto Benchmark Data
Carter, Laporte and Lee [55] in 1996 introduced a set of 13 real-world exam timetabling problems from three Canadian highs schools, five Canadian universities, one American university, one British university and one university in Saudi Arabia. Over the years they were widely employed as testbeds in exam timetabling research. As mentioned above, there has been an issue concerning the circulation of different sets under the same name. This is discussed at length below.
In the problem, to indicate the density of the conflicting exams in each of the instances, a Conflict Matrix C was defined where each element c ij = 1 if exam i conflicts with exam j (have common students), or c ij = 0 otherwise. The Conflict Density represents the ratio between the number of elements of value "1" to the total number of elements in the conflict matrix.
Two variants of objectives were defined in the original dataset: -to minimise the number of timeslots needed for the problem (graph coloring) (named as Toronto a in Table 5); and -to minimise the average cost per student (named as Toronto b in Table 5).
For the 1 st objective, the aim is to find feasible timetables of the shortest length. For the 2 nd objective, an evaluation function was defined to calculate the cost of the timetables generated. For students sitting two exams s timeslots apart, the cost was assigned using proximity values w s , i.e. w 1 =16, w 2 =8, w 3 =4 w 4 =2, and w 5 =1. The aim is to space out the conflicting exams within a limited number of timeslots. The authors also introduced seven real world applications with side constraints (i.e. maximum room capacity per timeslot, pre-assigned exams, maximum number of exams per timeslot, no x exams in y timeslots, etc). This objective was modified later and tested by a number of approaches (see below).
During the years, however, two versions of the data were circulated and were tested by different approaches. To distinguish the data tested and to build a standard benchmark for future use in timetabling, we have carefully examined the data that has appeared in two different forms under the same name for eight of these benchmark problems. We list the characteristics of these two versions of data in Table 4. We have post-fixed "I" and "II", respectively, to the circulated datasets to distinguish between them. The post-fix "I" has been used for the problem instance which we believe has appeared most often in the literature. For the problem instances of post-fix "II", some confusion occurred as three of the instances (car91 I, car92 II and pur93 II) have conflicts on the number of enrolments (i.e. a different number of enrolments defined in two data files for each instance, see Table 4). Later on in this section we attempt to cast light on the question of which technique has been applied, to which version of these instances in the literature.
To avoid any further confusion, the definitive versions of these datasets are available at http://www.asap.cs.nott.ac.uk/resources/data.shtml (together with all the other datasets discussed in this paper).
Burke, Newall and Weare [41] in 1996 modified the objective of the six real world problems introduced in [55] by considering the maximum room capacity per timeslot, and adjacent exams on the same day. In 1998 [42], timeslots in the problems were distinguished by setting three timeslots a day from Monday to Friday and one timeslot on Saturday. The objective is to minimise the students sitting two consecutive exams on the same day and overnight. These two variants are named as a Toronto c and Toronto d in Table 5. Terashima-Marin et al in 1999 [150] modified the dataset by assigning, to each problem instance, an estimated number of timeslots and to each timeslot an estimated maximum seats/capacity. This variant is named Toronto e in Table 5.
The approaches developed and tested on different variants of the Toronto datasets during the years are listed in Table 6 (ordered by the year in which the  Table 5. Variants of the Toronto Benchmark Datasets.

Variants
Objectives Toronto a graph coloring to minimise the number of timeslots needed Toronto b un-capacitated with cost to space out conflicting exams within limited (fixed number of) timeslots Toronto c capacitated with cost to minimise students sitting two exams in a row on the same day Toronto d capacitated with same as above, and to minimise students sitting modified cost two exams overnight Toronto e estimated capacity and to minimise students sitting two adjacent exams timeslots on the same day work was published). The values in "()" following the variants of the data give the number of problem instances tested by the corresponding approaches. Most of the work did not specify the exact characteristics of the data tested, and in many of the papers it is impossible to determine which version (I or II) of the data was tested (for the eight problematical instances). We have attempted, in Table 10 (by contacting the authors), to clarify which versions of the datasets were used in each paper. If the entries are written in italics, we are not absolutely sure that the information with respect to this issue is correct. Otherwise, we have had the situation confirmed by the authors concerned.

University of Nottingham Benchmark Data.
Burke, Newall and Weare [41] in 1996 also introduced the 1994 exam timetabling data at the University of Nottingham as a benchmark. It was used later by a number of researchers to test different approaches. Table 7 presents the characteristics of the dataset. We know that 23 is the least possible number of timeslots due to the limitations on the room capacity. The objective is to minimise the students sitting two consecutive exams on the same day. The data can be downloaded from http://www.asap.cs.nott.ac.uk/resources/data.shtml. In [42] (1998), the above problems were further constrained by modifying the objective function to consider also consecutive exams overnight. In Table 7 we highlight these variants as Nottingham a and b. Table 8 presents the approaches applied to these datasets and the University of Melbourne datasets (see section 3.3 below) in the literature.

University of Melbourne Benchmark Data
Merlot et al [111] introduced exam timetabling datasets from the University of Melbourne at the PATAT conference in 2002. Two datasets were introduced. For these datasets, there were two timeslots on each day for each of the five workdays, and the capacity for each session varied. The availability of sessions for some of the exams was restricted. In one problem instance, this meant that no feasible solutions existed, so an alternative data set was created which allowed feasible solutions. These datasets can also be downloaded from http://www.asap.cs.nott.ac.uk/resources/data.shtml. The Melbourne datasets are summarised in Table 9.

Results on the Benchmark Problems
As mentioned above, there has been a large number of papers published which have worked with the datasets discussed above. In addition, the difficulties surrounding the publication of the Toronto datasets have led to some confusion over which methods were tackling which problems. Tables 10-12 attempt to clarify this. They list all of the methods which have addressed the Toronto problems and they attempt to illustrate which methods used which problems. This has been    a difficult task and the authors would welcome additional information which we will use to keep an updated version of the table at http://www.asap.cs.nott.ac.uk/resources/data.shtml. We would like to add more methods as they appear. Table 13 presents the results from different approaches applied to the Nottingham datasets a and b (see Table 7) and the Melbourne datasets I and II (see Table 7) in the literature.
Tables 10-13 also illustrate which of the methods are most effective in terms of solution quality. The very best results are presented in bold. We have not listed computational times for the following reasons. Firstly, many of these papers do not report the relevant times. Secondly, comparisons across very different platforms over the years are impossible. Thirdly, examination timetabling is a problem which is almost always tackled weeks or months before the timetable will be used. As such, it is definitely not a time critical problem and there are many real world scenarios where it would be perfectly reasonable to leave an algorithm running overnight or even over a weekend. Table 10. Results in the literature on the two versions of the Toronto Dataset b (see Table 5). Values in italics indicate that we are unsure about the accuracy in terms of the versions of the datasets used. Values in bold represent the best results reported. "-" indicates that the corresponding problem is not tested or a feasible solution cannot be obtained.  In the following section we will draw upon the above discussion to highlight a number of conclusions and to present some ideas for future research which are generated by these conclusions. It is worth noting that McCollum [108] and Burke et al [22] outline some future research directions in University timetabling and nurse rostering, respectively. There is, of course, some synergy with issues discussed in both these papers and these are alluded to here.

Future Research Directions
We will now outline some overall messages from our analysis of the literature and how these may influence future research directions. The following discussion presents a (non-exhaustive) list of future research directions.

(1) Meta-heuristic Development:
Meta-heuristics have attracted the most research attention in exam timetabling research. They will continue to attract attention in the near future across a number of different perspectives. In particular, it is important to explore these research directions within the context of real world problem solving environments.
In addition to a comprehensive treatment of Tabu Search, Simulated Annealing, Genetic Algorithms and various hybrids, some new exam timetabling techniques have recently been presented. For example, GRASP and Iterated Local Search build on the similar idea of exploring wider areas of the search space by using a multi-start greedy search technique to reduce the risk of being stuck in local optima. Variable Neighbourhood Search escapes from local optima by switching between the search spaces defined by different neighbourhood structures. Large neighbourhood search fulfils this by extending the flexibility of moves within the search space. In summary, these techniques extend the idea of helping the search to escape from local optima in a variety of ways and have obtained promising progress on a wide range of exam timetabling problems.
The development of these new techniques has opened up a wide variety of new research directions such as exploring alternative neighborhood structures, new multi-start techniques, hybridisation issues, alternative operators and many others. One of the key research goals is to provide an appropriate balance between exploration and exploitation in search algorithms.
Extensive study is also required to understand how to determine appropriate parameter settings for meta-heuristic methods. The determination of suitable initialisation methods and in-depth analysis of the effects of initialisation on a range of meta-heuristics is another important exam timetabling research topic. Theoretical issues (such as phase transition) and multi-criteria techniques represent other important directions in meta-heuristic exam timetabling research. Evolutionary methods and other population based techniques represent a significant proportion of the meta-heuristic literature on exam timetabling. There are many research directions generated by considering the hybridisation of metaheuristic methods particularly between population based methods and other approaches. A study of coding issues represents a new and promising direction in both evolutionary algorithms and hyper-heuristic research.

(2) Raising the Level of Generality of Search Methodologies:
More general and adaptive techniques have been explored and present promising future directions in establishing more generic search systems (which range across timetabling and other search problems).
Hyper-heuristics are concerned with searching for appropriate heuristics rather than concentrating on the problem specific details of actual solutions, which have been the focus of traditional search algorithms. This opens up a new direction of research and represents much potential in both practical applications and theoretical study. Adaptive techniques have also recently emerged where information collected during the problem solving is used to guide the search. Some work has been carried out on knowledge based techniques where the experience from previous problem solving drives the search. Further investigation of knowledge based techniques has the promise to underpin the development of fundamentally more general approaches. The goal is to deal automatically with different problems in a dynamic way so that extra human effort is not needed to fine-tune the approach.

(3) Understanding Search Spaces:
We do not understand why and how search methods work on complex problems. An analysis of heuristics/techniques concerning the nature of search spaces could be beneficial. It is generally accepted that little is known about the nature of search spaces, especially for complex real-world problems such as exam timetabling. A deeper understanding of search spaces and fitness landscape analysis is [130] which offers the possibility of understanding why certain algorithms work well on certain problems (or even instances) and yet work poorly on others. This is not restricted to exam timetabling but covers a much broader remit of problems.

(4) Inter-disciplinary:
Hybridisations of different techniques have been very widely investigated in recent exam timetabling research.
Although different authors have favored different approaches, it has been observed that hybrid approaches are usually superior to pure algorithms. For example, all of the recent work on constraint based techniques represent hybridisation with other techniques (see Section 2.2). However, in most of the cases, methodologies are hybridised in a sequential way rather than being efficiently integrated. More work needs to be done to not just simply combine but rather more meaningfully integrate different methodologies efficiently. Such research should draw upon research theme 4.2.3. For example, in memetic algorithms, local search is used co-operatively after each generation. In a hyper-heuristic, one approach taken is that low level heuristics are searched and combined adaptively during the problem solving. Further in-depth analysis and investigation can underpin the design and development of more powerful techniques.

(5) Closing the Gap between Theory and Practice:
New benchmark examination timetabling problems have been formed and thoroughly tested. Reformulations of problems will better reflect more constraints in real world environments.
Recent state-of-the-art approaches in exam timetabling research have carried out comparisons on the benchmark problems (see Section 3) that have appeared over the last ten years. This has led to fundamental developments in exam timetabling research. However, these problems still represent simplified versions of the problem. In the wider context of scheduling research, there has been much recent debate about the "gap between theory and practice". The same is true for exam timetabling research. A major research direction is represented by exploring the wide range of research issues that are opened up by considering the high levels of complexity that are generated by real world problems [108]. In addition, there still is no widely accepted universal data format and standard timetabling languages. The establishment of quality measures by standard techniques on both solution quality (objective functions) and computational time for exam timetabling problems also requires much work and is crucial in conjunction with the formation of benchmarks. The requirements for the development of automatic tools to support timetabling staff to save significant development time still exists. To encourage such development, we are building up an archive where benchmark exam timetabling problems are collected, together with a categorised updated timetabling bibliography (after 1995). We welcome contributions to this exam timetabling archive. It is held at http://www.asap.cs.nott.ac.uk/resources/ETTPbibliography.shtml.

Summary
In summary, it is possible to draw a number of conclusions from an in-depth survey of the examination timetabling literature in the last ten years. Firstly, there has been a significant number of research successes in that time. Secondly, the current state of the art provides a strong platform from a range of important research directions. Thirdly, future research requires a particular emphasis on the complexity of real world issues and this requires the establishment of more benchmarks that are drawn from real world problems. Fourthly, raising the level of generality of decision support systems (including for exam timetabling) represents an emerging theme. Finally, it is worth noting that successful papers in exam timetabling have been authored by researchers from which to explore a range of disciplinary backgrounds and particularly at the interface of Operational Research and Artificial Intelligence. Such interdisciplinary collaboration is crucial to scientific progress in the area. It is clear from this analysis of the literature that the future of exam timetabling research is inter-disciplinary. development of a line of investigation. The content of the tables is ordered by the year of publication to represent the development of related techniques over the years. In the tables, the term "practical" indicates that the corresponding work was tested/implemented on real world problems. "-" indicates that the corresponding properties were not presented in the paper. "Toronto", "Nottingham" and "Melbourne" refer to the benchmark problems described in Section 3.

Reference Techniques Problem Notes Burke&
Sequential methods Toronto Sub-problems solved by Memetic Newall 1999 to partition the Nottingham Algorithms [38] problems Lin [105] Multi-agent Toronto Aggregate schedules from agents and 2002 algorithm random a broker Qu&Burke Adaptive ordering Toronto Exams iteratively partitioned to two [127] 2007 and decomposition sets of different difficulty