A VARIABLE NEIGHBORHOOD SEARCH ALGORITHM WITH REINFORCEMENT LEARNING FOR A REAL-LIFE PERIODIC VEHICLE ROUTING PROBLEM WITH TIME WINDOWS AND OPEN ROUTES

. This paper studies a real-life container transportation problem with a wide planning horizon divided into multiple shifts. The trucks in this problem do not return to depot after every single shift but at the end of every two shifts. The mathematical model of the problem is ﬁrst established, but it is unrealistic to solve this large scale problem with exact search methods. Thus, a Variable Neighbourhood Search algorithm with Reinforcement Learning (VNS-RLS) is thus developed. An urgency level-based insertion heuristic is proposed to construct the initial solution. Reinforcement learning is then used to guide the search in the local search improvement phase. Our study shows that the Sampling scheme in single solution-based algorithms does not signiﬁcantly improve the solution quality but can greatly reduce the rate of infeasible solutions explored during the search. Compared to the exact search and the state-of-the-art algorithms, the proposed VNS-RLS produces promising results.


Introduction
Research on the Vehicle Routing Problem (VRP) can be tracked back to the truck dispatching problem proposed by Dantzig and Ramser [1]. It is defined as, starting from a depot, a number of vehicles with capacity constraints are to be routed to service a set of customers with demands and return to the depot. Each customer is visited only once. In the scheduling network of VRP, all the routes are Hamiltonian Cycles (close routes). The most common objectives in VRPs are minimizing the number of vehicles used (or routes) and minimizing the total travel cost (distance/time). After decades of study, VRP has become one of the mostly investigated combinatorial optimisation problems in operational research, leading to a large number of variants with different features [2], e.g. Vehicle Routing Problem with Time Windows (VRPTW), Vehicle Routing Problem with Pickups and Deliveries (VRPPD), Periodic Vehicle Routing Problem (PVRP), Open Vehicle Routing Problem (OVRP) and so on [3,4].

Basic VRP Variants
Based on the basic definition of VRP, in VRPTW, customers' demands are associated with time constraints. The time of servicing a customer must be within a specific time interval, and all vehicles must return to the depot before the end of the planning horizon given [5]. VRPTW is the basic model for many other more complicated VRP variants. In this section, we review the variants which are most relevant to our study.
In VRPPD, customers' demands are divided into two categories: pick up shipments from a source and deliver shipments to a destination. Various constraints on pickup and deliver points lead to diverse VRPPD variants [2]. If the depot is the only one pickup point while the customers are delivery points, or all shipments are picked up from customers and delivered to the depot, the problem is classified as a One-to-Many-to-One problem. Whilst customers can be both pickup and delivery points, it is a Many-to-Many problem. In One-to-One problems, one customer's pickup demand is another customer's delivery demand. Furthermore, if consolidation is allowed when picking up, it is called a Less-than Truckload Transportation problem, otherwise it is a Full Truckload Transportation (FTL) problem [6].
In some real-life problems, the planning horizon is long and divided into multiple periods/shifts, therefore notated as Multi-Period Vehicle Routing Problems (MPVRP) [7]. The vehicles must return to the depot every shift in MPVRP. Especially, if each customer has a specific visiting frequency within the planning horizon, this type of MPVRP is called Periodic Vehicle Routing Problem (PVRP). Each customer can be visited more than once in PVRP, and a scheduling solution is usually represented as a set of visiting day (shift) combinations for customers. PVRP may occur in grocery distribution, soft drink industry, waste collection and so on [8].
The earliest OVRP is proposed by Eppen and Schrage [9], where a fleet collect goods from the central depot and deliver them to a number of geographically scattered customers. The main characteristic of OVRP is that its routes are Hamiltonian paths (open routes) rather than cycles [10]. This characteristic reflects the reality that many companies in real-world do not own a fleet but instead hire external carriers, e.g. third party logistic providers and private vehicles, to service customers. Those hired vehicles do not need to return to the depot after servicing all customers assigned. The routes in OVRP terminate at the last customer serviced [11].

Extended VRP Variants and Solution Methods
After decades of study, both exact and approximate algorithms have been intensively investigated for diverse VRP variants. However, due to the NP-hard feature of VRPs [12], exact methods are more suitable to small or medium size VRPs [3]. On the other hand, in approximate approaches (or heuristics), metaheuristics have shown powerful performance in solving big-scale and complex VRPs.
The first exact method for OVRP is proposed in [13], but only on small and medium size instances (less than 151 customers and 14 vehicles). Tarantilis et al. [10,14] propose two local search metaheuristics with annealing based threshold accepting criterion and list based threshold accepting criterion respectively. Both algorithms outperform previous approaches for OVRP. Two years later, a Recordto-Record Travel algorithm is adopted as the acceptance criterion, and the associated algorithm generates better solutions than 11 previous algorithms for OVRP [11]. Two metaheuristic algorithms based on Tabu Search for OVRP can be found in [15,16]. A Static Move Descriptor is proposed in [17] to speed up the evaluation in best improvement search. The associated algorithm produces the best results on benchmarks of OVRP, however, it is not applicable to OVRP with Time Windows (OVRPTW) where time constraints are considered.
The first introduction of the OVRPTW is in [18]. In the proposed solution methodology, an insertion-based construction heuristic employs an improved IM-PACT criterion [19] to select customer to be inserted in to the routes. Since then, a large number of metaheuristics are developed for OVRPTW. A Variable Neighbourhood Search (VNS) algorithm for OVRPTW can be found in [20], which outperforms the IMPACT approach. An Ant Colony Optimization (ACO) algorithm and a Genetic Algorithm are also applied to the problems of relatively small size [21,22].
In a specific case of OVRPTW, the routes start from geographically scattered customers to pick up goods. After visiting the customers assigned, vehicles return to the central depot to unload the goods collected. The routes in this case are open at the starting point, which is opposite to the standard OVRP. An Adaptive Large Neighbourhood search algorithm is proposed for this Reverse-OVRPTW in [23]. School Bus Routing Problem is another special case of OVRP, where its morning and afternoon routing problems can be addressed as the same problem. Every morning, school buses send students to the school from pickup stops, while in the afternoon students are sent back to the stops in the reverse order of the morning routes [24]. In addition, Liu and Jiang [25] and Brito et al. [26] study the VRP with both open and close routes.
The first research on Periodic Vehicle Problem with Time Windows (PVRPTW) is in [27]. Considering travel time, capacity, duration and time windows, a construction heuristic followed by a Tabu Search (TS) is proposed. An improved TS method adopting the Forward Time Slack [28] is later proposed to further reduce the route [29]. An ACO model for PVRPTW can be found in [30] where a multiple pheromone information matrix is designed. The flexibility of customer visiting date in PVRPTW further increases the number of parameters in this ACO algorithm and the complexity of this methodology. More solutions for PVRPTW can be found in [31,32,33].
An Open Periodic Vehicle Routing Problem (OPVRP) model is introduced in [34]. The vehicles are not obliged to return to the depot at the end of each day in a planning horizon of multiple days. In the proposed solution construction heuristic, a k-means clustering algorithm is used to assign customers to routes, without considering the capacity limit. A feasibility procedure would then fix the infeasible assignments. This approach can quickly generate a feasible solution for small size problems. However, it would be inefficient for larger problems or when the depot is not in the geographic center. Time windows are not considered in this model.
Vidal et al. [35] propose a unified hybrid genetic search framework (UHGS) aiming to provide a general-purpose solver for diverse VRP variants. It produces the results better than or close to the state-of-the-art algorithms on benchmarks. However, as a genetic algorithm, it is shown that the computation time sharply increases when facing MPVRP. This is due to the issue of shift and route partition in the chromosome/genotype (solution representation), which is hard to handle for the huge number of possible positions of shift and route delimiters. This solution partition problem is tackled as a shortest path problem in UHGS. The long computation time impedes the application of UHGS to large-scale MPVRP. More metaheuristics for VRPs have been reviewed in [36,37] Most research formulate VRPs with connected network, where the connected nodes represent customers, demands, services and so on. The weight of an edge connecting two nodes represents the cost of a vehicle traveling from a node to the other. This method is easy to apply to the problems with simple service activities. However, in some real-life problems, the service activities are complex and hard to be simplified or combined. Bai et al. [38] use the loading and unloading nodes pair to represent a transportation task, converting a container transportation problem into a Set-Covering problem. The problem is then solved with an exact algorithm. This method is not suitable to those problems with many container terminals. [39] integrate loading, traveling and unloading activities into a task node for a pickup and delivery problem. This task node-based model simplifies the model for reallife complex problems, and is applicable to a large number of classic node-based algorithms, thus has been used in various VRPs research [40,41]. This paper addresses a real world container transportation problem, which shares some common features with the classic OVRP and PVRP. The mathematical problem model and an illustrative example are presented in Section 2. The proposed solution methodology is introduced in Section 3. In addition to an urgency level-based constructive heuristic, an improvement metaheuristic with reinforcement learning is also developed. Section 4 presents the experiment results and analysis on benchmark instances. The conclusions of this paper are presented in Section 5.

Problem Definition & Mathematical Model
The problem studied in this paper is a real-world inter-dock container transportation problem at Ningbo Port, which is the fifth largest port in the world. Everyday a fleet of 100 trucks transport commodities (containers) between nine container terminals (see Figure 1). Each commodity consists of a number of containers. The containers are picked up from source terminals and delivered to destination terminals, satisfying the time constraints for the commodities and drivers. Previous research on the truck scheduling in Ningbo Port can be retrieved via https://sites.google.com/nottingham.ac.uk/port-management. Bai et al. [38] have proposed a scheduling scheme based on PVRPTW for this problem. Since the size and operational cost of the fleet are fixed at the Ningbo Port, the port manager expects to further decrease the empty-load travel distance of the fleet, optimizing the utility of trucks. To this end, the current scheduling scheme is revised by reducing the empty-load travel from and to the depot, which is introduced below. This problem is a One-to-One VRPPD with FTL. The trucks used are identical and each truck can carry only one container at a time due to its capacity, thus consolidation is not allowed.
The commodities to be transshipped are usually issued several days before their shipment deadlines, which brings a long scheduling horizon. There are a few emergent commodities that are also submitted in practice, and they are either inserted to the current schedule or rejected, which is not discussed in this study. Each working day in the scheduling horizon is divided into two shifts (i.e. day and night shifts with 12 hours per shift) obliging the related regulations on drivers' working hours specified in Labour Law. In the first shift (day shift) everyday, trucks depart the depot with a list of tasks to complete. When all tasks assigned are completed, the trucks would park at the last task destinations, or at the first task source terminals of the second shift (night shift) as long as the trucks can arrive there before the end of the day shift. Then, at the beginning of the night shift, a new group of drivers are assigned their trucks at specified terminals (shift change). In the night shift, after completing all assigned tasks trucks must go back to the depot for maintenance and preparation of the next day. The travel of trucks heading to the next task source is empty-load, with no container transported.
Drivers take shift change in the middle of a working day at different terminals, instead of one specific depot. It can be found that, here the routes in each shift are Figure 1. Nine container terminals' locations in Ningbo Port, taken from Google Maps [42]. open as those in the OVRP. More precisely, the scheduled routes of day shifts are open as they do not finish at the depot, while the routes of night shifts are reverse open routes. The difference between this problem and the School Bus Routing Problem is that, in our problem, the routes in night shifts are not the same as the routes in day shifts in a reversed order.
The typical scheduling horizon in Ningbo Port spans from 2 to 4 days (4-8 shifts). The first shift of every working day is denoted as an odd-indexed shift, while the second shift is even-indexed. In such a multi-shift problem, the number of containers in each commodity can be seen as the visiting frequency of customers in PVRP. However, the model of PVRP cannot be directly applied due to the distinct practical constraints in this problem. In this paper, the Ningbo Port container transportation problem is modeled as a PVRPTW with Open Routes (PVRPTW-O). To the best of our knowledge, this is the first time PVRPTW-O is introduced in the literature.
In previous VRPPD research, the shipment loading and unloading time is usually ignored or simplified as they are small and/or identical. However, because of the limitation of cranes at terminals, the time of loading and unloading account for a considerable proportion of the total service time in PVRPTW-O. Besides, the service time are different at different terminals in PVRPTW-O, thus cannot be ignored or simplified. In our mathematical PVRPTW-O model, the method of [39] is adopted. The time of loading and unloading activities are combined with the traveling time, defining the task nodes with different service time. In the proposed model, a task node is composed of the loading at the source terminal, unloading at the destination terminal, loaded travel from the source to the destination, and the associated time window of the task.
Artificial depots (W ) are introduced to connect an odd-indexed shift (S odd ) to the following even-indexed shift (S even ). In S odd , artificial nodes are termination depots, while they become the starting depots in the following S even . When a truck travels from S odd to S even , the first source terminal of a route in S even serves as an artificial depot, if it can be reached before the end of S odd . Otherwise, the last destination terminal of the route in S odd will be the artificial depot.
An example of one day schedule (two consecutive shifts of an odd-indexed and an even-indexed) is presented in Figure 2. The fleet size is five (i.e. K = 5), corresponding to five routes. Take the first route as an example, two and three tasks are completed in the odd-indexed and the even-indexed shifts, respectively. The lines directly connecting starting depots and termination depots are empty routes, which means no task is completed on them. In this example, each shift has two empty routes. The artificial depots in W are either the last destinations on Shift 1 routes or the first sources on Shift 2 routes. The number of artificial depots visited is decided by the number of terminals at where shift-changes happen. In this example, two trucks change shift at the fourth artificial depot. The notations used are introduced in Table 1. Every scheduled route starts at a starting depot, connects a number of task nodes (tasks to be completed) and ends at a termination depot. It is notable that, some of these routes are empty, which means no task is completed on them.

S
The set of time-continuous working shifts, which can be divided into oddindexed shifts (S odd ) and even-indexed shifts (Seven).
[ai,bi] Time window for node i. The time window for node 0 is zero at the boundary of a shift. If a truck arrives at the source of i early, it has to wait until ai.
A Set of arcs. Each arc(i,j) represents that node j is immediately serviced/visited after servicing/visiting node i.

cij
The cost of empty load travel from node i to node j. When both i and j are task nodes, cij is the distance from the destination of i to the source of j. Otherwise, it is the distance between a terminal and the depot.

tij
The travel time from node i to node j. When both i and j are task nodes, tij is the travel time from the destination of i to the source of j. Otherwise, it is the travel time between a terminal and the depot. The time for servicing node i, which includes the loading time, transportation time (from pick-up source to delivery destination) and unloading time. The service time of a depot is zero.

W
Set of Artificial Depots. This set of nodes are introduced to represent the termination depots in S odd or starting depots in Seven. W refers to all possible physical terminal locations, but which artificial depots are visited vary in different solutions. Thus, an artificial depot may not be visited or may be visited more than once in W . The use of artificial depots guarantees the constraints on truck driver working time being satisfied.

Ti
The arrival time at node i.

Bi
The time to begin the service of node i.
x s ij A binary decision variable for nodes i, j ∈ N ∪ W . Its value is 1 if arc(i,j) is included in the solution in shift s, otherwise is 0. i, j ∈ W is not allowed The problem can be formally defined as follows: Subject to: x s vw = 0, ∀v ∈ W, w ∈ W, s ∈ S In odd-indexed shifts (∀s ∈ S odd ): j∈N \{0}∪W x s 0j = K, ∀s ∈ S odd (12) i∈N w∈W In even-indexed shifts (∀s ∈ S even ): x s 0j = 0, ∀j ∈ N \{0} ∪ W, s ∈ S even (16) w∈W j∈N x s wj = K, ∀s ∈ S even (17) i∈N \{0}∪W The objective of this problem is to minimize the total travel distance (TD), eq. (1). In classic VRPs, this objective is the secondary objective. However, the operational cost of the fleet in this real-life problem is fixed, thus, minimizing the travel distance become the only one objective. Since the loaded travel distance is fixed in each instance, the objective actually is equivalent to minimizing the empty-load travel distance.
Constraints (2) and (3) denote that every task node can be visited exactly once and all the tasks must be visited. Constraint (4) means the in-degree of a node equals its out-degree, specifying that a task may only be serviced after the previous task is completed. Constraints (2)-(4) together make sure arcs of over more than one shift are unacceptable. Constraint (5) is the arrival time at a task node. Constraint (6) defines the beginning time of servicing a task node. This time is calculated by the arrival time plus the waiting time at the source of a task. Constraints (5) and (6) enforce the correct temporal successive relationship between consecutive nodes. Constraints (7) and (8) are the time window constraints of each shift, while constraint (9) represents the time constraint on each task. The domain of the respective decision variable is defined by constraints (10) and (11). Especially, constraint (11) prohibits the travel between two artificial depots.
The starting and termination depots in odd-indexed shifts and even-indexed shifts are different. Constraints (12) and (14) represent that K trucks leave the physical depot (0) at the beginning of an odd-indexed shift, and they would stop at artificial depots at the end of the shift. Constraint (13) represents that no truck returns to the physical depot in odd-indexed shifts. Constraints (16) -(18) place the reverse restraints in even-indexed shifts. Constraint (15) defines the shift change from an odd-indexed shift to the following even-indexed shift on artificial depots, where the in-degree of each artificial terminal in S odd equals its out-degree in the following S even .
From this integer programming model, we can find that this problem is nonlinearly constrained and with a huge solution space. It is no doubt that the nonlinear constraints can be converted to linear ones, but leads to a more complex model, which is discussed in Section 4.2. The size of the solution space of PVRPTW-O is decided by the length of the scheduling horizon (|S|), the fleet size (K) and the number of tasks (n). Since the number of total routes in a solution could be up to |S| · K, while the number of permutations of tasks is n!, the size of the search space is |S| · K · n!. In real-life scenarios, n in the whole planning horizon could be larger than 1,000. This model is a basic framework for PVRPTW-O, in future extended research more specific constraints (e.g. subtour elimination constraints) can be added to further improve the problem formulation or expressing it in different forms. It is worth noting that, empty routes in S even do not mean their travel distances are zero. The cost of empty route can be zero, only when the connected artificial node represents the physical depot.

Motivitions
We use the CPLEX solver on both real-life and artificial benchmark instances, and the results (see Section 4.2) show that it is unrealistic to address the PVRPTW-O with exact methods. Heuristics and metaheuristics are thus investigated in this study for PVRPTW-O. A large number of metaheuristic algorithms have obtained promising results in VRPs, including Population-Based and Single Solution-Based metaheuristics, e.g. Evolutionary Algorithms (EA) [43], Population-Based Incremental Learning [44], Simulated Annealing [45], TS [46] and VNS [47].
Population-based approaches evolve a population of solutions during the search, showing a powerful performance for highly constrained and multi-objective VRPs [48]. However, PVRPTW-O is challenging to population-based approaches due to its high dimensional solution structure and large problem scale. The tasks in PVRPTW-O must be indexed from three dimensions in the solution (indexes of the shift, the route and the position in the route). This structure increases the difficulty of operations between solution individuals, e.g. the above-mentioned solution partition problem in Genetic Algorithms. In addition, when the problem size and population size are big, the computation time for the large population would be very long. Thus, population-based methods are not suitable to this large scale high-dimensional problem.
Single solution-based metaheuristics (or Local Search) use different strategies and neighbourhood operators to explore the solution space iteratively, while only one solution is updated in each iteration. Apart from straightforward operators, e.g. Swap and Insertion, many delicately designed neighbourhood operators are developed and applied in various metaheuristics, such as λ-opt [49], or-opt [50], Cyclic Transfers [51], 2-opt* [52], CROSS-exchange [53] and so on. More successful single solution-based approaches can be found in [54].
VNS systematically changes neighbourhood operators to explore search space and shows excellent performance in VRPs [55]. However, in real-life large-scale and highly constrained instances, the basic VNS structure often shows not powerful enough and promoted by being combined with other mechanisms [56]. To this end, Reinforcement Learning is introduced to VNS in our study. Reinforcement Learning (RL) is an adaptive learning and decision-making scheme, based on a probability distribution sampled over the candidate set. The probability distribution is continuously updated according to the reinforcement feedback from the learning environment [57].
RL scheme can be used in Adaptive Operator Selection (AOS), guiding the search process [58]. Credit Assignment mechanism and Selection Rule are the two essential components in AOS. According to the historical performance of operators, credit assignment mechanism updates the invoked probability (or weight) of candidate operators during the search, e.g. increasing or reducing the probability with elaborately designed reward function [59]. Then, the operators to be executed are chosen with a Selection Rule. Roulette Wheel scheme and Pursuit Algorithm [60] are the two commonly used selection rules. The former chooses operators on the basis of their probability distribution and the latter always selects the operator with the largest probability/weight. AOS has been widely used in metaheuristics. In population-based metaheuristics, credit assignment mechanism calculates the feedback values based on the quality of solution population [61,62]. To single solution-based approaches, RL and AOS has also shown outstanding performance in VRPPD and other VRP variants, e.g. in Adaptive Large Neighbourhood Search [23,63]. Some associated methodologies produce the best results on the benchmarks of classic VRP variants [64,65]. More applications of adaptive learning in single solution-based heuristics can be found in [58,66].
In metaheuristics, Sampling can be used to measure the fitness landscape surrounding a solution, providing guidance for the search. It shows powerful performance in Evolutionary Computation [67]. Soria Alcaraz et al. [68] propose an Evolvability Metric of Sampling, where two scalars are proposed as the indicators in the credit assignment mechanism during evolution. One indicator is the probability of the offspring having higher or equal fitness comparing to their parents, while the other indicator is the mean fitness of all sampling offspring. The proposed EA approach with Sampling produces promising results in three classic optimization problems. Whether Sampling also works in single solution-based heuristics and PVRPTW-O is worthy to be investigated further.

Algorithm Framework
A VNS algorithm with Reinforcement Learning and Sampling (VNS-RLS) is proposed in this paper, see the framework in Algorithm 1. The initial feasible solution Sol is constructed with a heuristic in Step 1. Then, at each VNS iteration, Shaking generates a new feasible solution Sol as the starting solution by perturbing Sol. In Step 2.2, with Sol , Sampling initializes a set of weights (WS) for all the operators in the neighbourhood operator set (NS). With the invoked probability distribution of operators, which is generated according to WS, Local Search explores better solutions iteratively (Step 2.3 ). These steps are repeated until the objective value has not been improved by 0.01% after a predefined number of iterations.

Algorithm 1 The VNS-RLS framework
Input: The set of tasks to be assigned and the scheduling horizon.

Start
Step 1: Produce an initial feasible solution Sol with a constructive heuristic. // Sec. 3.2.
Step 2: while Terminate condition is not met do Step 2. Eight improved neighbourhood operators, which require relatively less computation time, are adopted in VNS-RLS. These operators are devised considering the specific solution structure of the PVRPTW-O. The solutions of classic VRPs are 2-dimensional. However, as mentioned before, the solution of PVRPTW-O has one more dimension of shift. Previous research has shown that properly using operators with diverse degrees of perturbation can significantly improve search performance [47]. To systematically increase the diversification of the search in this tightly constrained problem, we specify the level of perturbation for each operator. The eight improved operators work at different levels (intra-route, intra-shift and inter-shift) with diverse levels of perturbations, details explained in Table 2.
Considering the time constraints in PVRPTW-O, the directions of the operated strings of tasks in the routes are kept in all the eight neighbourhood structures to reduce the constraint violation. Table 2. Operations of the eight operators within NS in VNS-RLS.

Operator Description
Intra-Shift 2-opt* Execute 2-opt* exchange [52], where the two chosen routes are randomly selected from the same shift.
Inter-Shift 2-opt* Execute 2-opt* exchange, where the two chosen routes are randomly selected from different shifts. Intra-Route Or-opt A string of task nodes is repositioned in the original route [50].

Intra-Shift Or-opt
A string of task nodes is repositioned from one route to another route in the same shift.
Inter-Shift Or-opt A string of task nodes is repositioned from one route to another route in a different shift. Intra-Route CROSS Swap two strings of task nodes in the same route. [53] Intra-Shift CROSS Swap two strings of task nodes from different routes in the same shift.
Inter-Shift CROSS Swap two strings of task nodes from two routes which are from different shifts.

Initial Solution Construction
A large number of constructive heuristics have been developed for VRPs. Saving Algorithm [69] always selects the operation which brings the largest travel distance saving. However, it shows worse performance on asymmetrical Capacitated Vehicle Routing Problem and cannot control the number of vehicles used [70]. The later constructive algorithms, e.g. Cluster First-Route Second [71], Route First-Cluster Second, Sweep Algorithm [72] and so on, perform better on instances where customers are geographically clustered around the depot. More constructive heuristics for VRPs can be found in [73].
Based on the above-mentioned methods, Solomon [5] proposes four constructive heuristics for VRPTW, while the Insertion-Based heuristic outperforms the other three. Insertion-based heuristics can be easily integrated with diverse insertion selection strategies. In addition, by adjusting the insertion strategy applied during the construction procedure, the number of vehicles used can be better controlled. At Ningbo Port, the nine container terminals are neither clustered nor uniformly located around the depot, and the triangle rules do not apply in its transportation network due to the real traffics, which make the other classic construction heuristics inapplicable or poor-performance under the circumstances. Considering the number of trucks in this problem is limited (100), an Urgency Level-based Insertion constructive Heuristic (ULIH) is thus developed for PVRPTW-O.
In ULIH, the solution construction is done shift-by-shift. Because the number of containers to be transshipped is large every day in Ningbo Port, to complete all the tasks before their deadlines, the tasks are classified into two categories to specific shifts: mandatory and optional. For each shift, the mandatory tasks will be assigned first. The urgency level of a task varies along with the change of the operating shift. In brief, to a feasible task i of shift s, if it must be completed no later than s, then it is mandatory to shift s; otherwise, it is optional. This classification is executed before the solution construction. Tasks are inserted into the scheduled routes, which means being assigned to trucks. When no feasible insertion available to existing routes, a new route will be created, meaning the number of trucks used (k) is increased by one. Considering the big tasks (with long service time) are harder to assign in scheduling, the biggest task is selected first in the newly created route.
Another strategy often used in practice is that tasks should be assigned as early as possible to avoid leaving too many unassigned tasks to later shifts. In real-life, there may be new tasks submitted to the scheduling system in real-time, so leaving more free time slots and trucks for later shifts can also improve the system reliability. Thus, after all mandatory tasks being assigned, optional tasks will be inserted until no feasible insertion or free truck is available (k ≤ K).
The insertion selection tactic determines the solution quality and constructive speed of insertion-based heuristics. Some criteria have been developed to select the best candidate and insertion position for routing problems [18,19], however, the greedy selection strategies dramatically increase the computation time in large problems. To reduce the computation time, some researchers randomly select the next insertion or narrow the selection range in each iteration [64]. Three insertion selection tactics are proposed in ULIH, which are introduced below: • Greedy Tactic. All unassigned tasks and insertion positions on all routes are evaluated, and the insertion which brings the lowest empty-load travel distance increase would be executed. This greedy tactic generates tighter routes but longer evaluation time is required. • First-Feasible Tactic. In this case, all tasks are sorted according to their closeness to their deadlines. This tactic always inserts the first task in the unassigned task set (closest to deadline) into its first feasible position. This strategy can construct routes more quickly, but sacrificing the solution quality as more trucks are required, and some mandatory tasks may not be assigned due to the lack of trucks. When K is small, this tactic should be used cautiously. • One-Route Tactic. Considering all the unassigned tasks, the insertion with the lowest empty-load travel distance increase on the latest created route execute with this tactic. This is a greedy insertion tactic on the newest route. It may cost more computation time than First-Feasible tactic, but produce tighter routes accordingly. This tactic trades off construction speed and solution quality. These three tactics are tested and compared on benchmark instances, and the comparison results are presented in Section 4.3.

Shaking
Shaking is often used in VNS algorithms to escape from the local optimum and diversify the search. The commonly used Shaking schemes include: randomly invoking neighbourhood operators, designing specific Shaking Operators and so on. To jump to farther areas from the current search region, in VNS-RLS, four neighbourhood operators which bring larger changes are employed in Shaking. Inter-Shift 2-opt*, Intra-Shift 2-opt*, Inter-Shift Or-opt and Inter-Shift CROSS are sequentially used to generate the starting solution of Local Search. Each operator is executed once, and the solution generated is passed to the following operator as the seed solution. Note that the new solution generated will be accepted in Shaking as long as it is feasible, even if its quality is worse. The aim of Shaking is to increase diversification and avoid search cycle, rather than myopic solution improvement.

Sampling of Neighbourhood
Sampling of surrounding environment can provide guidance for the search and has improved the search performance in many population-based algorithms. The impact of sampling on the search trajectory of single solution-based algorithms is investigated in our study. A sampling scheme is applied before the Local Search phase in VNS-RLS.
Based on the perturbed solution Sol generated by Shaking, every candidate neighbourhood operator NS i is sampled, producing d sampling solutions (Sol spl ij ∈ Sol spl i , j ∈ {1, ..., d}). In VNS-RLS, search with more sampling iterations explores the neighbourhood environment more thoroughly, however at the cost of a higher computation time. Our preliminary experiment results show that, when the number of samples (d) for each operator is higher, the performance of the algorithm is more stable with a lower standard deviation, while the computation time increases drastically. However, when d > 50, further improvement is not significant. In the experiments, no significant change on solution quality is observed along with the increment of d. To balance the evaluation time and the sampling approximation accuracy, each operator is thus sampled 50 times (d = 50). The obtained sampling solutions are then measured, while the feedback will be used to generate the initial weights of operators WS, providing an initial search direction to Local Search.
Three scalars (eqs. (19) to (21)) are defined to measure the sampling solutions. To operator NS i , the first scalar E a i indicates the probability of finding a solution which is not worse than the current solution. E b i presents the average solution quality of the solutions generated. ρ is a parameter for balancing the impact of E a i and E b i , which is set to 10 empirically. In the preliminary experiments, it is observed that more than half of the neighbourhood moves produce infeasible solutions. To reduce the infeasible moves in the search, a third indicator E c i is proposed in VNS-RLS, to indicate the probability of obtaining a feasible solution with operator NS i .
For each operator NS i , the overall evaluation value (E i ) is a combination of the above three metrics. Based on the preliminary experiment results, the weights of indicators are set as α = 0.5, β = 0.2 and γ = 0.3.
The weight (W i ) determines the probability of a neighbourhood operator NS i being invoked in the Local Search, stored in WS. To each operator, the initial W i is calculated according to the feedback of Sampling (E i ), see eq. (23). The sum of the initial W i of the P operators is 1. Let E min be the minimal E i among all operators, an operator with larger E i would have a larger initial W i . To avoid some of the weights being too small and the associated operators never being invoked, a minimal initial weight W min is adopted, which is empirically set to 5% in VNS-RLS.

Local Search with Reinforcement Learning
Algorithm 2 Local Search(Sol, Sol ,WS) Input: Current best solution Sol, starting solution Sol and the initial weight set WS (initial Wi).
Step 1: Generate the invoked probability of each operator based on its weight (Wi). In VNS-RLS, the Local Search is an iterative search with reinforcement learning, see Algorithm 2. Firstly, the initial invoked probabilities are distributed to operators according to W i . Using the Roulette Wheel scheme as the selection rule, in Step 2, one neighbourhood operator is selected and applied once to the current solution (Sol ), generating a new solution (Sol ). In Move or Not, Sol is evaluated to decide whether it is accepted as the new current solution. W i is then updated according to the evaluation feedback. These steps are repeated until no improvement is obtained after a predefined number (L max ) of evaluations. Based on the preliminary experiments, L max is set to 150.
In the Move or Not phase, a Record-to-Record Travel scheme [74] is used as the acceptance criterion. When Sol has better solution quality than Sol, or the difference of the solution quality between Sol and Sol is less than a predefined value (DEVIATION ), Sol will be accepted as new Sol . In our algorithm, DE-VIATION is set to 2 based on preliminary experiments. In practice, less than 2 km increase in total travel distance is acceptable.
The weight of NS i is updated during the search. In the literature, most learning indicators focus on the solution quality [64], while the feasibility of search attracts less attention. When updating the weight, a feasibility indicator (FI) is employed in VNS-RLS, which is just the feasibility of the newly generated solution. To reduce the computing time in calculating the updated weights, a simple Credit Assignment Mechanism is employed. If Sol is infeasible or cannot be accepted due to its low solution quality, a penalty of 10% reduction of weight will be applied to NS i . Otherwise, if the quality of Sol is better than the current best solution Sol, the weight will be increased 10% as a reward. The penalty and reward rates are set based on the preliminary experiments. More discussion on adaptive weight adjustment can be found in [57].
In our study, the contribution of the feasibility indicator and different components in VNS-RLS are analyzed and discussed, results presented in Section 4.5.

Benchmark Instances
Bai et al. [38] extract 15 instances from the real-life Ningbo Port dataset. In different instances, the planning horizons are 4, 6 or 8 shifts, respectively, of 384 up to 1073 tasks. An artificial instance set including 17 instances is created as well, with different feature combinations on time window tightness (Tight/Loose), workload balance at terminals (Balanced/Unbalanced) and planning horizons of 4 and 8 shifts. Especially, a very large instance with more than 2000 tasks is created as well. The detailed features of instances are presented in Table 3 and Table  4. The time window of a task, defined by the time it becomes available and the deadline it must be delivered to its destination terminal, varies from 1-2 hours, up to 6 shifts. Comparing to classic VRP benchmarks [5,75], it is easy to notice that the number of tasks in the Ningbo Port benchmark is quite big. To systematically test the performance of VNS-RLS on diverse instances, we have generated two more datasets scaled down to 25% and 50% respectively by extracting tasks from the complete dataset, keeping the original features (available at http://www.cs.nott.ac.uk/˜pszrq/benchmarks.htm).

Exact Approach
In Section 2, the new mathematical model of PVRPTW-O is established. CPLEX is employed to solve it, adopting the default parameter setting. Because CPLEX is a MILP/MIQP solver, some modification should be made to the original model in Section 2 to adapt to the solver. In the original model, the constraints (6) - (8) are nonlinear, which are reformulated to the linear form as follows, The original constraint (6) is linearized to three constraints (24)-(26), correspondingly, the constraint (5) is reformulated to (27). Here wait j is a variable that has been introduced to linearize the function max, and M is a parameter with very large value. In constraints (28)-(30), the slack ij are variables to force T j to be larger than the assigning B i . In constraint (28), only the assigned B i is positive value from this formulation. The non-assigning B i will be deducted by M so that they are negative. Correspondingly, T j is required to be larger than the summation of slack ij in constraint (30). Note that there is only one slack ij in i that is not zero (according to constraints (2) and (28)), so i slack ij equals i max(slack ij ). Instead of intuitively formulating constraint (30) to T j ≥ slack ij for all i, j ∈ N , the current formulation reduces the number of constraints in the model from |N | · |N | to |N |. Constraints (31) and (32) replace the original constraints (7) and (8). Adding extra cutting plane constraints to the problem model (e.g. subtour elimination constraints) might reduce the scope of search, however, we do not add them in order to avoid increasing the memory required for loading the model in CPLEX.
In the preliminary experiments, the solver runs on a machine of 16 cores (2.6 GHz), 16 GB memory and 24 hours running time limit. The solver found feasible solutions for 7 out of 16 the 25% scaled down artificial instances and 2 out of 15 real-life instances, while on the other instances ran out of memory. Therefore, we increase the memory resource to 100 GB while the same running time limit is kept in the later experiments. The results are reported in Table 5 and Table 6. In these tables, the objective value of a solution is converted into Heavy-Loaded Distance Rate (HLDR), which is widely used by logistic companies in practice, see eq. (33). Note that objective (1) is equivalent to (33).
By using HLDR, the problem becomes a maximization problem. In Tables 5  and 6, NF means no solution is found. Bound is the upper bound of HLDR obtained by CPLEX, while Time is the actual runtime. + indicates that the memory is insufficient to generate a feasible solution, and * represents that the search reached the memory limit while a feasible solution is obtained.  It can be found that, given a large amount of computing resources, it is still difficult to obtain optimal solutions on all the 25% scaled down instances. Except TU4-7, on other instances, the gaps between the obtained solutions and the upper bounds are quite large. On NP6-4 and NP8-4 no feasible solution or upper bound were found by CPLEX, due to memory overflow. It is no doubt that there must be other exact methods may find better solution than CPLEX on this benchmark. However, it still can be concluded that, exact search would be time and resource-consuming in this large scale and nonlinearly constrained problem. The results obtained on the 25% instances provide the upper bounds of the optimal solutions, which can be used to estimate the performance of our proposed heuristic algorithms.

Construction Scheme Selection
Completing all tasks on time with the limited vehicle resource is the primary job to logistic companies. Currently, considering the fleet owned by the company and outsourced vehicles, the truck resource in the Ningbo Port is relatively sufficient. But when the number of tasks is large, efficiently using the limited trucks becomes critical. In Section 3.2, three insertion selection tactics are proposed for solution construction. Nine different combinations of tactics are tested on the 100% benchmark to choose the recommended heuristics. Comparison results are presented in Table 7. One-Route One-Route 58.54% 620 Among these nine heuristics in Table 7, the three tactics are applied to mandatory tasks and optional tasks, respectively. The runtime is obtained on a PC with i7 CPU 3.2GHz and 6 GB memory. It can be found that, heuristics 1 and 7 obtain the best solution quality (the highest average HLDR), while the runtime are significantly higher than the other heuristics. Besides, when Greedy is used on mandatory tasks and First-Feasible is applied to optional tasks (heuristic 2), the fastest constructive speed is obtained, while the obtained average HLDR is not the worst. Applying First-Feasible to both mandatory and optional tasks (heuristic 5) does not bring the fastest heuristic.
The details of constructed solution quality is presented in Tables 8 and 9. It can be found that applying the Greedy heuristics to optional tasks (heuristics 1, 4 and 7) obtains significantly better objective value than those of First-Feasible (heuristics 2, 5 and 8). The HLDR of tactic One-Route group (heuristics 3, 6 and 9) is between the above-mentioned two groups. However, to mandatory tasks, no tactic shows obvious better performance than the others. It can be concluded that, the tactic used on optional tasks is the key issue in devising a constructive heuristic. This is because, in practice, optional tasks always account for a major proportion of total unassigned tasks in one shift, and also mandatory tasks have less available insertion options due to their closeness to their deadlines. On real-life instances, 72% tasks in a shift are optional on average, while the proportion on artificial instances is affected by instance type. On loose instances, the optional task rate can be 100% in every shift, while there may be around 50% mandatory tasks on tight instances. Therefore, the tactic for optional tasks has a greater influence on solution construction. These constructive heuristics have not shown bias on specific features of instances. Overall, the obtained HLDR values span from 46.79% to 69.93%, higher than 50% on most instances. It should be noted that, two pairs of heuristics (1,7) and (3,9) have constructed the same solutions while heuristics 7 and 9 spend less computation time. It is because of that, when the candidate tasks are few, if the existing routes are too tight to insert other tasks, Greedy tactic can only assign a task to the newest route. The same thing occurs to tactic One-Route in the same case, but its computation time is less since it only evaluate one route. As there are few mandatory tasks which are always assigned first in each shift and no randomization in this heuristic, applying tactic Greedy or One-Route to mandatory tasks may generate the same partial routes. In this case, if the tactics applied to optional tasks are the same, their final routes (solutions) generated would be the same as well.
In metaheuristics, the quality of initial solutions does not always directly determine the quality of final improved solutions. Our study also finds that better Table 9. HLDR of initial solutions constructed with the nine construction heuristics (on 100% artificial instances). Best obtained HLDR are in bold. initial solutions cannot guarantee better final solutions in VNS-RLS. Considering all the nine heuristics can produce feasible solutions on the benchmark instances, in normal cases, the construction heuristic 2 is recommended, as it has in general the fastest construction speed. The two main constraints of PVRPTW-O are the time constraints and the limited trucks. To evaluate and identify effective heuristics in tight constrained scenario, we compare the performances of the nine constructive heuristics on the largest instance of 2,614 tasks, results presented in Table 10. It can be found that, when tactic First-Feasible is applied to optional tasks (heuristics 2, 5 and 8), the quality of solutions is poor (lower HLDR and more violations) with around 23 hours computation time. Heuristics 1 and 7 produce the same best solutions, while heuristic 7 takes less computation time (7 hours less). Heuristic 7 is thus chosen for this kind of very large instances. Other solutions with acceptable quality are generated when tactic Greedy or One-Route is applied to optional tasks (heuristics 3, 4, 6 and 9).

Impact of the Sampling
To evaluate the contribution of Sampling introduced in Section 3.4, a variant of VNS-RLS without the Sampling scheme and with equal initial weights for all candidate operators is implemented. This variant of VNS with Reinforcement Learning (VNS-RL) is tested on benchmark instances of different sizes. The categorized results of 30 runs on 25% scaled down instances are presented in Appendix  Tables 15 and 16. It can be summarized that VNS-RLS performs better than VNS-RL in both the best found and average HLDR on 19/31 of all instances, but more evaluations were conducted. On 26/31 instances, both VNS-RLS and VNS-RL obtain better results (higher HLDR) than CPLEX with previously mentioned configuration, obtaining feasible solutions on all instances while CPLEX fails on 13 instances.
Results of large 100% instances are presented in Appendix Tables 17 and 18. In 16/31 instances, solutions with better best-found and average HLDR are found by VNS-RLS. In [38], the HLDR for the relaxed problem removing the unloaded travels from and to the depot are calculated as the upper bounds of the benchmark. Since PVRPTW-O can be relaxed to the same form, those upper bounds are also adopted in this study. In the experiments, the gaps between the obtained solutions and upper bounds are less than 5% on 14 instances. Especially, on the instances of NP4-2, NP4-5, NP6-4, NP8-4 and TU8-8, the objective values of solutions generated are very close to the upper bounds.
However, the contribution of Sampling on improving the objective value is not significant. The t-test results on VNS-RLS and VNS-RL (see Tables 11 and 12) show that the improvement is significant on only 25% instances. This observation indicates that, by sampling on the search trajectory of single solution-based algorithm, the information collected may not sufficiently reflect the neighbourhood environment and provide valuable guidance for the search. In other words, limited Sampling points in single solution-based approaches is not so helpful in improving objective value comparing to its impact in evolutionary approaches. The further discussion on the impact of Sampling indicators is presented in Section 4.5. Table 11. T-test between VNS-RLS and VNS-RL on 25% scaled down and 100% real-life instances. (Y means significantly different, while N means not.) The same observation is found on the medium size 50% scaled down instances (see Appendix Tables 19 and 20). VNS-RLS outperforms VNS-RL on 16 out of Table 12. T-test between VNS-RLS and VNS-RL on 25% scaled down and 100% artificial instances.
instances and higher stability (smaller standard deviation) is obtained on 17 instances, which is not significant better. In this single solution-based algorithm, no significant solution quality improvement is achieved by adopting Sampling.

Feasibility Indicators & Feasible Solution Space Exploration
In Sampling and Local Search, feasibility measure is employed aiming to improve the search efficiency. The hypothesis is that, by using the feasibility indicators, the infeasible solutions explored during the search will decrease. To validate this hypothesis, the feasibility indicators in Sampling (E c ) and Local Search (FI) are removed respectively leading to three new algorithm variants (VNS-RLS-NoEc, VNS-RLS-NoFI, VNS-RL-NoFI). The detailed experiment results are also presented in Appendix Tables 16 -19. Comparing the best and average solutions, VNS-RLS-NoFI produces the most best results on medium and large instances, while VNS-RLS-NoEc outperforms the other variants on small instances.
We record the amount of infeasible solutions explored over every 10,000 evaluations during the search and present the trend of infeasible solutions explored by the five algorithm variants. Figure 3 presents the typical results of two instances. It is easy to see that, without Sampling, both VNS-RL and VNS-RL-NoFI explore much more infeasible solutions than the other three variants throughout of the search. The infeasible rates of the three VNS-RLS variants decrease at the beginning of search and stay at a low level afterwards, showing the effects of Sampling on reducing infeasible neighbourhood moves. At each VNS iteration, by estimating the surrounding environment of the starting solution, Sampling leads Local Search to the search regions with high feasibility. Using feasibility indicators in both Sampling and Local Search, VNS-RLS achieves the lowest infeasible rate dring the search. In addition, in Figure 3, the two lines of VNS-RL and VNS-RL-NoFI are shorter than the other three VNS-RLS variants, indicating the two variants converge earlier.
It can be concluded that, both Sampling and Feasibility indicator can reduce the infeasible rate of neighbourhood moves, while Sampling brings a greater impact, obtaining better solutions within the same computational time. Sampling, even without the feasibility indicator E c , can significantly decrease the infeasible rate. This indicates that the other two indicators (E a and E b ) apt to guide the search to feasible solution regions, meaning the high quality neighbourhood has higher feasibility. The use of feasibility indicator in Sampling and Local Search contributes to a higher probability of exploring feasible neighbourhood moves during the search.
It should be noted that, the lowest infeasible rate does not necessarily contribute to the best objective function value. The removal of feasibility indicators may also increase the flexibility and diversification of the search, thus improves the search performance, e.g. on the LB8-1 100% instance, VNS-RLS-NoFI is the only one variant which finds the solution with objective value 87.49%. It is almost 2% higher than the other four variants. Overall, VNS-RLS-NoEc and VNS-RLS-NoFI produce the most best-found solutions and average solutions in all the five variants.

Comparison with the general VNS and State-of-the-art Algorithms
PVRPTW-O is a new model in the literature, thus there is not many existing algorithms applied to it directly. A general VNS algorithm (VND) [55] with the same operators as VNS-RLS is implemented to be compared. Besides, considering the similarity between PVRP and PVRPTW-O, two state-of-the-art methodologies for PVRP, namely RVNS [31] and FVNS [8], are also modified (e.g. initial solution construction, side constraints consideration and problem specific parameter setting) and applied to PVRPTW-O for comparison in our research. Both RVNS and FVNS use the VNS framework as well. Apart from the different operators used (i.e. different shaking operators and neighbourhood structures), the main difference between them is that the shaking operators are randomly selected in RVNS but applied in a fixed sequence in FVNS. In addition, due to the time constraints and multiple periods in PVRPTW-O, algorithms for Open-VRP and Open-VRPTW are either inapplicable or not suitable for PVRPTW-O, thus they are not compared in this study. Chen et al. [33] develop an adaptive large neighbourhood search algorithm (VD-ALNS) which has large perturbation in the search.
-0.19% -0.15% -0.04% -0.07% -0.06% -0.16% -0.13% -1.10% -0.08% -0.37% -0.14% The comparison results are presented in Table 13, which gives the deterioration comparing to the results of VNS-RLS-NoEc. Generally, our algorithm outperforms the other four algorithms on most categories of instances on both the solution quality and stability. VND produces significantly better results than RVNS and FVNS, while RVNS and FVNS virtually tie. It indicates that the customized neighbourhood operators in VNS-RLS contribute much to the improvement of algorithm performance in this problem, since RVNS and FVNS use different neighbourhood operators. The operators in VNS-RLS bring systematical exploration and perturbation at different levels, cooperating with the proposed reinforcement learning scheme and the effective indicators, thus exploration and exploitation are better balanced in the search. The solution deterioration is smaller on tight and unbalance instances (TU), which shows the stronger search ability of VNS-RLS on the larger search space caused by the loose time windows and balanced workload. However, RVN and FVNS show better stability on tight instances (TB4, TB8, TU4 and TU8) than VNS-RLS. In terms of VD-ALNS, it shows lower stability than VNS-RLS. Except LB4 and LB8 instances, VNS-RLS outperforms VD-ALNS on vast majority of other instances. The larger perturbation in VD-ALNS achieves more solution improvement on loose and balance instances.

Comparison with the Previous Scheduling Scheme
Comparing the PVRPTW-O model with the study of [38,41], the biggest difference is that the shift-change between day-shift and night-shift can occur at any terminal in PVRPTW-O. The open routes in each shift may bring the growth of HLDR to the fleet. Table 14 presents the increase of HDLR obtained by applying PVRPTW-O, where the values are the differences of HLDR between the best-found results on both models. It can be found that the use of PVRPTW-O increases the utility of vehicle to varying degrees, while the improvement on artificial instances is larger. This scheduling scheme can further save operating cost and promote efficiency for the company.

Conclusions
A Periodic Vehicle Routing Problem with Time Windows and Open Routes (PVRPTW-O) derived from a real-life container transportation problem is investigated in this paper. A node-based mathematical model is established by combining service activities into task nodes. This problem model has a huge solution space with nonlinear constraints. The study shows that, comparing to the previous scheduling schemes, the utility of vehicles is increased in the PVRPTW-O model. The experiment results on both real-life and artificial benchmarks indicate that it is unrealistic to address this problem with exact methods even with a large amount of computation resources, thus a Variable Neighbourhood Search algorithm with Reinforcement Learning and Sampling (VNS-RLS) for PVRPTW-O is developed accordingly.
In VNS-RLS, an urgency level-based insertion heuristic is devised to construct initial feasible solutions. After classifying tasks according to their urgency levels (mandatory and optional), nine different insertion selection tactic combinations are investigated and compared. Experiment results show that the tactic applied to optional tasks mainly determines the performance and effectiveness of the heuristic. When the resource of trucks is sufficient, applying the First-Feasible tactic to optional tasks is of the fastest speed. In the case of extremely large amount of tasks, One-Route tactic on mandatory tasks and Greedy tactic on optional tasks is recommended to produce better solutions within a less computation time.
In the improvement phase, a Sampling scheme is proposed to investigate the neighbourhood of the starting solution, providing an initial search direction for the local search with reinforcement learning. Experiment results show that the contribution of Sampling to solution improvement is not significant. This indicates that, Sampling is not as powerful as it is in evolutionary algorithms due to search region changes significantly in single solution-based metaheuristics. In addition, to reduce the number of infeasible solutions generated in the search, a feasibility indicator is used in both phases of Sampling and Local Search. It is found that Sampling greatly reduces the infeasible solutions generated, while the feasibility indicators make further contribution as well. Compared to the results of exact method and solution upper bounds, the proposed algorithms generate promising results with strong search ability within large search spaces.
On several instances, the gaps between the obtained objective values and upper bounds are still large. Considering that the operators used in VNS-RLS bring relatively small changes to solutions, advanced techniques with balanced exploration and exploitation are worthy of further investigation in future work. In addition, more practical objectives and constraints in real-life problems, such as the balance of workload and carbon emission reduction, may also be introduced to the PVRPTW-O model.