Multi-behaviors coordination controller design with enzymatic numerical P systems for robots

Membrane computing models are parallel and distributed natural computing models. These models are often referred to as P systems. This paper proposes a novel multi-behaviors co-ordination controller model using enzymatic numerical P systems for autonomous mobile robots navigation in unknown environments. An environment classifier is constructed to identify different environment patterns in the maze-like environment and the multi-behavior co-ordination controller is constructed to coordinate the behaviors of the robots in different environments. Eleven sensory prototypes of local environments are presented to design the environment classifier, which needs to memorize only rough information, for solving the problems of poor obstacle clearance and sensor noise. A switching control strategy and multi-behaviors coordinator are developed without detailed environmental knowledge and heavy computation burden, for avoiding the local minimum traps or oscillation problems and adapt to the unknown environments. Also, a serial behaviors control law is constructed on the basis of Lyapunov stability theory aiming at the specialized environment, for realizing stable navigation and avoiding actuator saturation. Moreover, both environment classifier and multi-behavior coordination controller are amenable to the addition of new environment models or new behaviors due to the modularity of the hierarchical architecture of P systems. The simulation of wheeled mobile robots shows the effectiveness of this approach.


Introduction
P systems (PS) are bio-inspired parallel distributed computing models [40,48]. Many variants of P systems have been introduced, inspired by biological phenomena such as the functioning and inter-cellular communication of cells and neurons [50,38]. The computing power and complexity aspects of these models have been studied extensively [2,11,26,36,58]. Moreover, membrane computing models with parallel 1 School of Electrical Engineering, Southwest Jiaotong University, Chengdu, China, Email: 491098063@qq.com; School of Computer Science, Jubilee Campus, Wollaton Road, Nottingham NG8 1BB, United Kingdom, Email: ferrante.neri@nottingham.ac.uk. distributive architecture and membrane creation, deletion and division operations can generate exponential workspace and these variants can solve computationally hard problems, i.e., the NP-complete, PSPACEcomplete problems in polynomial time or even in linear time [2,11,44,37].
In recent years the use of the membrane computing models to solve many real-life problems has also gained interest, especially to solve engineering problems [34]. Some variants such as spiking neural P systems [39,66] have been used for fault diagnosis of the power systems [70,71,52] and image processing [53]. Spiking neural P systems are an important paradigm of Membrane Computing that combines spiking neural networks [17,18,6,16,21]. Also numerical P systems have been used in robotics [8,41,61] and tissue P systems have been used in image segmentation [10,45]. Another important application of P Systems is to solve optimization problems [69]. Some important studies are in parameter optimization problems in manufacturing [68] and combinatorial optimization problems [71].
Among the many variants of P Systems, Numerical P Systems (NPS) [49] and Enzymatic Numerical P Systems (ENPS) [61] are amongst the most successful due to their high performance in robots' control applications [7,8,41,63,13], especially for modular and complex tasks of autonomous mobile robots (AMR), see also [12,42,65,47,3]. The success of NPS and ENPS in robotics is due to the inherent parallel and distributed nature of these P systems along with their powerful numerical computation power [43].
One of the most fundamental problems in robotics is to obtain a path for the robot from the starting point to the goal [59,62]. When a robot moves in a complex and unknown environment, it faces many possibilities. To reach the goal by overcoming various difficulties is the main challenge. These problems can be solved by recognizing the environment patterns, planning a path, positioning and executing the navigation safely and efficiently [55,64,72,29]. The concept of controllers based on numerical P systems was introduced in [8] and has been further discussed in [63] to design controllers of autonomous mobile robots using ENPS and to solve simple navigation tasks. This work investigates the navigation of the robots in more complex environments by means of controllers based on ENPS. We aim at studying ENPS controllers for autonomous robots which can identify multiple environment prototypes and coordinate the behaviors of the robots within them.
In this study, an environment classifier and a novel multi-behaviors control approach based on ENPS are proposed to enhance the reactive navigation performance of the AMR. The novelty of this approach is mainly in three aspects: (1) 11 prototypes of comprehensive topological maps describing the local environments are considered together to design the classifier for environment identification module; (2) A multi-behavior coordination membrane controller (MBCMC) is presented for behavior coordinator module; (3) A serial control algorithm is developed to guide AMR to avoid obstacle, tend to target and follow a wall, etc.
In order to reduce the error impact of sensor noise and poor obstacle clearance, the membrane classifier is designed based on the "binarized rough model" to produce the precisely desired environment pattern, which is used as the input of the behavior coordinator module. Behavior coordinator uses an enzymatic numerical P system to integrate specific behaviors by a wellthought out local path planning ( i.e., path planning in an unknown or partially unknown environment) strategy, without large memory size and heavy computation burden. The specific behavior control algorithm is designed based on the Lyapunov stability theory to produce the precisely desired velocity. Furthermore, the effectiveness of the introduced control approach is verified by applying the simulated AMR.
The remainder of this article is organized as follows. Section 2 describes Multi-Behaviors Dynamic Selection Problems (MBDSP). In Section 3, we depict the proposed behavior based membrane controller in detail for solving MBDSP of AMR reactive navigation. Section 4 presents simulation results. Conclusions are drawn in Section 5.

Multi-Behaviors Dynamic Selection Problems and ENPS
The autonomous robots are capable of self-judgment and independent navigation in an unknown environment. We describe the AMR mechanical system, and MBDSP in the following sections.

AMR description and Problem Statement
In this study, the AMR mechanical system schematic graph, which is shown in Figure 1(a) consists of two actuated wheels and a back unpowered universal wheel. The passive wheel does not affect the degree of freedom of the kinematic model, and can work with the nonholonomic constraints as follows: . y · cos θ + .
x · sin θ = 0 (1) The posture of AMR in global coordinates frame XOY is represented by using the Cartesian coordinate vectors with three degrees of freedom p = {x, y, θ } T .  The positive direction of θ is anti-clockwise, which is used to guide the angle of a robot. The motion posture of AMR is determined by linear velocity v and angular velocity ω, which is denoted by vector V = (v, ω) T . Note that, the two wheels are driven by independent torques from two DC motors, where the radius of two wheels are represented by r, while the distance between two driving wheels is denoted by 2R. It is assumed that the AMR mass center is located at O c and mounted with non-deformable wheels. The kinematic model for AMR can be represented as in (2) below, where v r and v l are the linear velocities of the left and right wheel, respectively, see [51].
Now we discuss what Multi-Behaviors Dynamic Selection Problems (MBDSP) are. Let us imagine that a robot wants to reach some destination in an unknown environment. At first, the robot follows the planned path and will avoid if some obstacle is blocking the path. If the obstacle is very large, it may decide to walk along the periphery of the obstacle. So there can be many unknown situations in front of the robot and it must have the ability to handle the movements safely and effectively. Hence a group of distinct behavior modes is supposed to help the robot to co-ordinate at each time instant. This is the so-called Action Selection Problem in robotic reactive navigation [46], which we have referred to as the MBDSP. The reactive navigation is one of the most challenging problems in AMR. The behavior-based systems are proved to be very responsive to an unknown environment, and the performance of reactive navigation greatly relies on its behavior selection mechanism module. Moreover, there are several aspects about MBDSP which should be paid more attention to (1) Behavior control law model: current controllers usually implement processing of sense-planaction separately, and do not consider the unity kinematic control law model of different behaviors deliberately, while robots need to wander free not only in maze but also in outdoor and indoor unknown environment; (2) Control architecture mode: current action based architecture is not clear about designing an architecture which allows the dynamic switching among different types of behavior (such as reactive or reflective behavior) selection strategies; (3) Multi-behaviors coordination mode: AMR can very easily fall into the local minima trap when reactive navigation has no prior knowledge of the complex environment. It is also likely to be caused by the first two factors. But an excellent coordinator prevents from these faults. Hence, the dynamic switching strategy, subdivision of different types of behaviors and designing of the corresponding control law are introduced. Furthermore, the behaviors that are usually needed for AMR to wander free in an unknown environment (including outdoor, indoor and maze) are defined clearly in the following: * Self rotation; * ...;

Related Work
The world's first intelligent mobile robot Shakey [20] was developed at Stanford in 1960's. Following these methods, more and more advanced modern control approaches have been proposed and successfully applied to AMR in industrial contexts [63]. These control approaches can be classified into the following categories according to different control theories: (1) artificial potential field (APF) [23], (2) vector field histogram methods [30], (3) virtual target approaches [67], (4) dynamic window approaches [19], (5) fuzzy logic control (FLC) [25], (6) neural network methods [56], (7) bug methods [28], and many others. Among the various local or reactive navigation methods, some problems continue to bother them, such as local minimum trap, complex scenarios, lack of prior knowledge, etc.
The well-known traditional APF [23] and its extended methods [24,25,72] are suitable for underlying on-line control in dynamic environments and low processing needs, but it has a problem of local minima [72], which needs to resort global knowledge of the environment at a higher layer. The Bug family methods [15,28] are inspired by bug's behavior on crawling along the obstacle. These approaches are well known for local navigation with minimum sensor, and also for shorter timing, shorter path planning, a simpler algorithm and better performance. But the performance of these approaches depends on the shape of the obstacles in the environment and need some global visual information. Moreover, the Bug family algorithm usually ignores robotic's practical setting (e.g., for kinematic or dynamic constraints). FLC is indeed one of the most fundamental methods and widely used to coordinate numerous basic tasks involved in path planning of behavior-based robots. Many FLC approaches with other complementary techniques were developed to solve some of mobile robot navigation problems in obstacle avoidance [25,27], path tracking [4] and behavior coordination [22,60]. Although FLC rules offer possible implementations of human knowledge and experience which do not require a precise analytical model of the environment, they cannot obtain the optimal solution and mostly fail while dealing with trap situations and complex scenarios [31].
AMR behavior based reactive navigation usually involves many aspects such as environment identification, control structure, dynamic behavior selection strategies, robot physical setting, etc. The study of MBDSP [31,32,33,35,51] usually emphasizes on one or two aspects and the other properties are simplified or ignored.
In this study, most of them are carefully considered to obtain the desired behaviors of the corresponding environment models and reduce the influence of the local minimum traps of complex unknown environments. Unlike APF and bug family methods [15,24,72], which do not care about the robotic physical characteristics completely. But in this paper, the kinematic behaviors are considered to be designed by Lyapunov theory in accordance to robotic characteristics which are suitable for indoor and outdoor environments. Design of the specialized behaviors control law is beneficial for multi behaviors co-ordination. This study also uses an enzymatic numerical P system to improve the parallel computation performance of the environment classifier and behavior coordinator. Thus, the computations are flexible and are in accordance with reactive navigation. For analogy, some studies emphasising the advantages models of parallelisation are reported in [57,54,1].

Enzymatic numerical P systems
ENPS are naturally distributed and parallel computing models, in which numerical variables store information. Also a set of evolving rules in each membrane region can iterate simultaneously according to the activation conditions, and transmit information between the nodes (membranes). A standard ENPS is as follows [61]: 3. µ is a membrane structure; 4. Var i is the set of variables from membrane i, and Var i (0) is the initial values for these variables; 5. E i is a set of enzyme variables from membrane i, i.e., E i ⊂ Var i ; 6. Pr i is the set of programs (rules) in membrane i, composed of a production function and a repartition protocol, which have the following two forms.
• Enzymatic form: the jth program is the production function; k i is the number of variables in membranes i; c j,1 |v 1 +...+c j,n i |v n i is the repartition protocol; n i is the number of variables contained in membranes i plus the number of variables contained in children and parent membrane of i; the value to be distributed to variables v 1 , ...v n i , where these variables can be calculated according to their corresponding coefficients c j,1 , ..., c j,n i at time t; • Non-enzymatic form, which is just like the standard NPS: Pr j,i = (F j,i (x 1,i , ..., x k i ,i ), c j,1 |v 1 +. . .+c j,n i |v n i ).  Inspired by the catalyzing reactions of the biological enzymes, the enzymatic action in ENPS model is to select the valid rules. Here we illustrate how the ENPS works in Figure 2,where there are four variables [1] and e 11 [5], one production function x 11 * x 31 + x 21 (e 11 →) and one repartition protocol 1|x 11 + 2|x 31 inside the membrane M 1 . In this case, e 11 > min(x 11 , x 21 , x 31 ) and the amount of enzyme is more than the number of variables, which indicates that reaction can take place. Then, function F j,i (x 1,i , ..., x k i ,i ) = 2 * 1 + 4 = 6 is computed, followed by the the sum of these repartition coefficients calculation: C j,i = 6 3 = 2, denotes "unitary portion" to be distributed to variables v 1 , ...v n i , where these variables can be calculated according to their corresponding coefficients c j,1 , ..., c j,n i at time t + 1. So in this case the new value is x 11 = q * 1 = 2 and x 31 = q * 2 = 4.
ENPS have flexible computing feature. Because of the hierarchical membrane structure with multiple rules in one region characteristics, enzyme variables can be used for conditional transmembrane transport and decide on the rules of evolution direction. The active rules are performed simultaneously inside their membranes, but unnecessary rules are not carried out and the results are distributed in globally uniform way. The computing power of the ENPS, and efficiency of the membrane structure representation for designing robotic behaviors have been investigated in [61] and [8], respectively.

Design of Environment Classifier and Behavior
Coordination Controller

Design of Environment Classifier
In order to respond according to the appropriate behavior, AMR should know the relationship between its current status and the local environment at first. The output of the environment prototype will work as the features of the essential environment for navigation, and need not store or deal with unnecessary details.

Local environment prototype:
Based on our understanding of the outdoor or indoor navigation, there are ten cases [56] for a robot, such as: following a left-side wall, wandering in open area, crossing a corridor or meeting a right-side obstacle, etc. Figure 3 lists these ten cases. At the first row of  Before classification of the various local environments by sensor, the robot's sensor feature must be defined. In order to reduce the cost of a sensor device, e-puck has only eight 8 Infra-red(IR) distance sensor around the body in Figure 4(a). The Figure 4(b) shows the sensors IR 1···8 layout and the probing direction from the top of the robot. The values from the 8 IR are grouped into (G 0 , G 1 , . . . , G 6 ) as they meet some obstacle or follow some wall. For instance in Figure  4(b), the values for the groups (G 4 , G 5 ) and (G 1 , G 2 ) will be bigger than the other groups when they meet the left wall and right side obstacle conditions (bigger value means smaller distance to obstacle), respec-tively. Figure 5 shows the 11 sensory patterns registered for the entire prototype environment which correspond to the 10 cases of maximum possibility according to the assumptions in Figure 3 and NO represents there is no obstacle in the environment. (1) (10)

Environment classifier design based on ENPS:
In this paper, we propose a local environment classifier based on ENPS to quickly identify the sensory patterns when AMR is surrounded by obstacles. Fast and accurate environment classification is beneficial for the response to the appropriate behavior. As shown in Figure 6, the environment classifier is designed by using a membrane system with a hierarchical membrane structure containing four membranes. The inner membrane Compute Environment Model is used to match the 11 case environment model. According to the sensor data, it has 11 variables, where s j sensor j − p i j 2 , ( j = 1 · · · 8) represent the 8 infrared sensor match errors.
The p i j (i = 1 · · · 11) represent the 11 cases of environment patterns in Figure 5. The enzyme E c [v in ] has the threshold input value v in as the initial value, and it is used to decide whether the rules Pr i_ j , sensor j should be executed according to the values of the variable of s 1···8 .
Rule Pr i_ j , sensor j is executed when sensor j is matched with i-th environment pattern patt i_ j success-fully and the variables sum i [0] and Sum all [0] are assigned with value 1 simultaneously.
Again, sum i is used to store the number of successful match of the i th pattern (i = 1 · · · 11), where larger value represents higher match degree. Then, the numbers are sorted from big to small. The Variable Sum all is proposed to store the total number of successful matches in 11 sensory patterns, which is further used to understand the accelerated sorting instead of traditional sorting method (such as Bubble Sorting, Hill sorting, etc).
The inner membrane Find Out Several Possible Pattern is designed to find out several more likely patterns with nine variables, where (1) S aver [0] is the set of average time of total successful match through rule Pr 1 , pattern i (i.e., first program for pattern i = 1, 2, . . . , 11).
(2) The variable pat i [0] represents the distance difference between pat i [0] and S aver [0] though the rule Pr 2 , pattern i .
(3) The Enzyme E aver [0] is combined with pat i in Pr 3 , pattern i to verify whether this rule is applicable or not.
The execution of this rule means, this sensory pattern is a matching environment prototype and the next rules are applicable. The Enzyme E max [0] is set to 9 in Pr 3 , pattern i . Since the pattern variable sum i must be less than 9, the rule Pr 4 , pattern i can be applied, and the enzyme E pat_i [0], E patt_i [0] are set to pattern value sum i . The pattern sum variable M sum [0] also accumulate one copy of sum i . Then the rule Pr 5 , pattern i is executed and the initial value 1 of variable num i [1] accumulate to the sum variable Num sum [0] which works as a counter.
The innermost membrane Find Out Optimal Model has two variables. The average variable S pat_i [0] is assigned to the number of the group pattern whose values are bigger than S aver in membrane Find Out Several Possible Pattern. So, S pat_i must be larger than S aver , and it can decide whether rule Pr 2 can be activated while combined with the enzyme E patt_i . It can also find out the optimal pattern and the output of the most possible result in the i-th pattern is stored into No. Note that, the enzyme EH in skin membrane Output Environment Model No must be assigned to double value of i th . For instance, if the most possible pattern happens at i = 1 and EH only get one part value of i th , then Pr 2 , main in skin membrane cannot be activated because of the initial value of variable C T being 1, and the computing cannot be finished. It is used to ensure that the rule Pr 2 , main in the skin membrane must be executed and the computing is terminated. Meanwhile, the variable Out no in the skin membrane collects the output result of the computation.

Dynamic Multi-Behavior Coordination
In order to explore complex and unknown environments, AMR not only needs to be promptly reactive, it also must act safely and smoothly. Moreover, AMR can break away from local minima trap and arrive at the goal finally. This section describes how to coordinate with these behaviors by dynamic selection mechanism. It should be noted that the control law design for all behaviors in this article is described in detail in Appendix.

Multi-behavior coordination strategy
The proposed flow chart of dynamic multi-behavior selection is depicted in Figure 7. In Figure 7, Flag = 1 means AMR is moving towards the goal until some "obstacles" are detected, where d gr is the distance between goal and robot. It is defined as d gr = (x g − x r ) 2 + (y g − y r ) 2 , where (x g ,y g ) and (x r ,y r ) represents the coordinate of goal and robot, respectively. It should be getting smaller and smaller while running towards to the goal, but in contrast, if it is becoming bigger, it means that the obstacle avoidance or wall following mode is operated and the robot has moved far away. AMR can determine the accurate status relationship between itself and the obstacle by environment classifier at once.
The "obstacles" can be grouped as (1) obstacle cases, i.e., FW, LS, RS, (2) wall follow cases, i.e., LW, RW, HW, LC, RC (corridor walking also classified as this case), and (3) the dead end cases, i.e., TS, DE. AMR might fall into the trap while avoiding the obstacle or following the wall. In order to resolve the local minimum problem, AMR must solve the problems such as positional relationship among goal, obstacle, wall and robot. Also must investigate whether the distance d gr is minimal and what kind of obstacle   is around? For instance, if some obstacles or walls are located at the right side of the AMR according to the environment model case, then the goal is located at the left side of the robot, and d gr is the minimal distance. Also, AMR should enter the goal reaching mode. On the other side, if the goal and obstacle are located on the same side of AMR, then even if d gr is minimal, the goal reaching mode cannot be activated. In another example, in order to go out of the maze, if the robot has just passed the wall (obstacle) and entered into the  In this work, an interesting dynamic multi-behavior selection strategy is constructed to speed up the behavior coordination by parallel processing. Moreover, the corresponding co-ordination controller based on P system is shown in Figure 8.

Multi-behavior coordination membrane controller design:
MBCMC is shown in Figure 8. It is designed by using a P system with a hierarchical membrane structure containing eight membranes. The skin membrane Main has 27 variables.
(1) C i = (input no − i) 2 , i = 1, 2, . . . , 11 are the environment case variables. These variables have the initial value (input no − i) 2 , input no and it is one of the 11 patterns from environment model membrane classifiers.
(2) A gr1 input angle and A gr2 −1 * input angle are the angle variables and have the input value θ = θ R − θ G as initial value, which depicts the positional relationship between robot and goal. θ < 0 means that the goal is located at the left hand of robot and vice versa ( Figure 17(b)).
( All of the output variables have the initial value 0, but when some of the behavior gets triggered, then the corresponding variable value is set to 1. (5) The variable T h [1] is the threshold variable with initial value 1 and D min [0] is the minimal distance variable with initial value 0.
(6) The enzyme variable E a [0] and E T [0] have the initial value 0, but when E T is not equal 0, the controller is terminated. (2) Enzyme E c has initial value 1, and is used to decide whether the 11 rules Pr i=1,...,11 , case should be executed or not according to the environment case variables C i=1,..., 11 . For instance, if input no = 9, then the initial value of C 11 = (9 − 11) 2 = 4 and hence enzyme E c < C 11 . So, the rule Pr 11 , case can not be activated. On the contrary, for rule Pr 9 , case, the initial value of C 9 = 0, hence E c > C 9 , and rule Pr 9 , case is executed. Moreover, Com de has set value 1, U-turn behavior is selected, E T is set to 1 and controller ends.
The inner membrane Judge Distance If Minimal has two variables.
(1) The enzyme E min dist with the input value input min dist , which is also the minimum distance of all the distances between the robot and goal.
(2) The variable D cur has the input value input cur dist as the initial value, which is the current distance between the robot and goal. Both of these variables decide whether the rules Pr 1 , dist min and Pr 2 , dist min should be applied or not. If D cur < E min dist , both rules are activated and the minimal distance variable D min [0] is set to 1. Meanwhile, the minimal current distance variable D cur is collected as the output of the variable Out put dist min [0]. The inner membrane Select Goal Reaching Case has two enzyme variables, i.e., E sr no [0] and E gr no [0] with initial value 0. This membrane will be activated by enzyme E no = 1 as case 11 (no obstacle around the robot).
Rule Pr 1 , not_any is used to judge the robot. If there is any reverse movement away from the goal, then the value of D min is equal to 0. It means that the robot has just left the obstacle or wall case. The rule Pr 2 , not_any is activated when E sr no = 1 and Com sr no is set to 1. The robot will implement the rotation mode. It will turn left or right according to the previous behavior case. For instance, the robot will turn left when the previous case is left obstacle avoidance or left side wall following, and vice versa. This mode helps the AMR to find the wall or obstacle surface again while running around the maze or to avoid the U shape obstacle. Rule Pr 3 , not_any will let enzyme E gr no be equal to 1 as D min = 1 (trend to goal movement). Again, E gr no = 1 will activate the rule Pr 4 , not_any and the robot will obtain the goal reaching mode. Moreover, E T is set to 1 and the controller ends.
The inner membrane Select Obstacle Avoidance Case has two enzyme variables E lr ob [0] and E RS ob [0] and one sub membrane. This membrane is activated by the enzyme E ob = 1 as in case 6, 7, 8 (front, left or right obstacle).
Rule Pr 1 , obstacle is also used to judge the robot whether there is any reverse movement away from the goal like rule Pr 1 , not_any in membrane Select Goal Reaching Case, and rule Pr 2 , obstacle, Pr 3 , obstacle or Pr 4 , obs tacle is activated according to the values of C 6 , C 7 and C 8 . For instance, if C 8 = 0, then the rule Pr 4 , obstacle is executed, Com right ob is set to 1 and the robot implements right side obstacle avoidance. Moreover, E T is set to 1 and the controller ends. On the other side, rule Pr 5 , obstacle assigns E RS ob = 1 as D min = 1 (trend to goal movement), and it activate the rules in sub membrane Judge Robot State Obstacle.
Judge Robot State Obstacle has 5 enzyme variables and 4 common variables.
(  The rule Pr 1 , obstacle rs is activated when C 6 = 0 (case 6: the obstacle is located at the front of the robot) and both Com f ront ob and E T are set to 1. The AMR should compute the obstacle avoidance at once and without any further analysis. Rule Pr 2 , obstacle rs should be activated as case 7, two contribution is assigned to O le f t , one contribution is as-signed to O right , then O le f t = −1 + 2 = 1, O right = −1+1 = 0 (means obstacle is located at left side of the robot). It is same as rule Pr 2 , obstacle rs to represent the right side of the obstacle. Both rules Pr 4 , obstaclee rs and Pr 5 , obstaclee rs are used to judge the location of the goal at the left side or right side of the robot according to the variables A gr1 and A gr2 .
Enzyme Eog can obtain contribution from rules Pr 2,...,5 , obstaclee rs . Also if Eog is large enough (2 is obtained in this controller), then it will activate rules Pr 6 , obstaclee rs and Pr 7 , obstaclee rs . Moreover, both the rules are used to judge whether the obstacle and goal are located on both sides of the robot.
If rule Pr 8 , obstaclee rs or Pr 9 , obstaclee rs is activated, and both of Com gr and E T are set to 1, then the AMR should be able to compute the goal reaching. On the contrary, if the rules Pr 10,...,13 , obstaclee rs are executed while the obstacle and target are located at the same side of the robot, Com le f t ob or Com right ob is set to 1. So, AMR continues to maintain obstacle avoidance despite closer to the goal.
The operating mechanism of inner membrane Select Wall Follow Case and its sub membrane Judge Robot State Wall for wall following are similar to obstacle avoidance. To restrict the length of the paper, we do not expand the description further.

Simulation Results
In this section, the performance of the proposed environment model classifier and dynamic multi-behavior co-ordination controller is verified based on the Matlab simulation. Furthermore, the simulation under Webots (robot simulation software) environment is used to test the performance of mobile robot navigations in different environment models. All the simulations are conducted on the PC with CPU 2.8GHz, 4GB RAM, and the software platform MATLAB7.4 and Windows 7 OS. E-puck robot has 8-infrared sensors and Max IR value is 4096. The size of the robot is 70 mm in diameter and 55 mm in height with 2 stepper motors.

Performance metrics
A good selection of the metrics is very important for the control performance. The autonomous robot should reach to the target safely and smoothly in min-imum time with the shortest distance. In this section we will introduce some metrics that evaluate the performance of robot motion methods.
1)Time to reach the goal: T tog is the total time to approach to the goal. Less time means better performance.
2)Path length: L tog is the total length of the path from start point to goal point. Shorter length is desirable for better performance. L tog can be represented by equation (4).
3)Minimum distance overruns: D min obs used to measure and record the number of times the sensor value of any channel is less than the minimum safety distance D sa f e from obstacle. D min obs can be represented by equation (5).
4)Mean distance to obstacles: D mean obs is the mean value of the distance between the obstacle and the robot's sensors in each execution cycle of the entire walking process. Higher values means the walking will be safe. 5)Number of collisions: N coll is the number of times the robot hit an obstacle. The number of the collisions also indicates a degree of safety. 6)Mission failures: N f ail is the number of times the mission failed to reach the end. The more times the task fails, the worse the algorithm adaptability. 7)Number of oscillation: T osc is the frequency of change towards the forward direction of the robot during walking, three consecutive switching clocks towards the forward direction are one oscillation behavior, such as left-right-left, or right-left-right. The smaller the value, the smoother will be the walk of the robot. T osc can be obtained by equation (6).

Simulation for Environment Classifier
Since the navigation environment is usually unpredictable, complex and partially unknown, a single environment model membrane classifier can hardly take charge of the whole task. If a single membrane classifier (SMC) is used, it must have a complex structure with many internal parameters to solve the problems of navigation in complex environment. Therefore, a multi-membrane classifier (MMC) (in this paper, two or three) has been employed to identify the environment model with good fault-tolerance capabilities. Since the MMC uses the SMC modules (each covering a specific local environment), it can quickly and easily find good local solutions. The simulations on e-puck robot with 8 infra-red sensors around have been shown in Figure 4(a). In order to reduce the impact of sensor noise, the sensor's value is filtered with a given threshold before being sent to the membrane classifier. All values smaller than 70 are ignored. At the same time, in order to simplify the environmental identification model, once the value of some sensor is greater than 70 (close to the obstacle), it activates this channel and is set to 1. Otherwise, is set to 0.
Using aforementioned informations, three kinds of SMC can be constructed in the following manner:  Figure 9. Binary encoding of C 1 ,C 2 and C 3 modular Note that row 1, . . . , 11 of the binary encoding of C 1 , C 2 and C 3 modular represent the 11 environment patterns shown in Figure 5. The last row is all zeros that represents not any obstacle is around the robot. Figure 10 shows that the actual paths are taken by SMC and MMC. SMC uses C 1 and MMC uses C 1 and C 2 . There are several local minimum traps in Figure  10(a) and (b). As shown in Figure 9, MMC can break away from both of the local minimum trap and arrive at the destination successfully as in Figure 10(a) and (b). But SMC can not struggle to break away from the local minimum trap (A) point in Figure 10(a), and sometimes can not break away from the local minimum trap (A) or (B) in Figure 10(b). SMC alternately judge the environment patterns by switching from case 2 to 7 constantly while reaching the edge of the trap. The environment pattern changes to case 3 (Hall way) while reaching the bottom of the neck trap. But if the robot size is bigger than the spacing of the hall way, it  Under the same obstacle configuration as in Figure 10, we have changed the start and goal positions and ran the simulation ten times. For the same navigation task, Table. 1 is listed in the performance of SMC and MMC. The N module is the number of modular and N f ail is the number of failures. Whenever SMC fall into the trap in Figure 10(a) every time, N f ail is 10. But MMC can have better identification of the environment and can move away from the trap successfully, where N f ail is 0 and T tog is the total execution time. Since MMC has a low elapsed time, it has better performance than SMC. The L tog is the total length of the path and SMC has the longer path length than MMC because it walks a duplicate path due to the environment model identify error. The number of collisions N coll indicates a safe navigation. The results in Table.2 show that MMC has a better performance than SMC. In addition, unlike in [56] where "5-by-1" modular neural network (M-NN) environment classifier is required to replace single NN classifier to realize successful navigation in Figure 10(a). This paper considers only two kinds of modular (C 1 , C 2 ) to achieve the same task. As shown in Table.2, the performance of different sizes of MMC(2, 3) for Figure 10(b) is not obvious. So, the number of modules used depends on the specific local environment. Furthermore, NN environment classifiers need larger and greater amount of samples to train the controller. There is a need for about 3000 ultrasonic patterns to train NN classifiers [56] and 50, 000 samples with speed of 4.5 hours for training the navigation reservoirs [5]. On the contrary, SMC or MMC based on ENPS does not need to train any processing and is simple to initialize the environment model.

Simulation for Multi-Behavior Coordinator
Several behavior coordination schemes are employed to evaluate the performance, such as fuzzy logic approach [35], expert fuzzy cognitive map (FCM)-based approach [31], fuzzy discrete event systems FDESbased approach [22], optimized modular NN approach [56]. Throughout the simulations, we have adopted modular C 1 as in MMC in Test I, (C 1 , C 2 ) as in Test II and Test III, modular (C 1 ,...,C 3 ) as MMC in Test IV. The cruising speed of the robot is set to 30 mm/sec and the minimum safety distance D sa f e = 24mm. The translation cycle of robot is set to 50 ms. The following simulation tests are carried out for validating the proposed approach.
Test I: Unstructured environment: Figure 11 shows navigation of different trajectories by the proposed MBCMC and NN controller [56]. In the environment of Figure11, multiple obstacles of different shapes and sizes are randomly distributed, and set up three different groups in an unstructured environment with the same obstacle layout. Contrast experiments between the starting point and the target point are shown in Figure11 (a)-(f). Considering the randomness of the autonomous robots in exploring the unknown external environment, under the same starting point and target point conditions, the autonomous robot performs computer simulations of ten navigation respectively, and the average of each performance metrics in ten experiments is used as the navigation performance. Table.2 shows the comparison results of the performance metrics of the three groups of exper-iments. It can be seen that the MBCMC method has better performance than the NN approach. MBCMC has a smooth trajectory, less time overhead, and fewer oscillations. Test II: G-shape and snail shape environment: Figure 12 shows the expected results as previously depicted in MBCMC. In [35], a new fuzzy logic controller for robot navigation has been developed, which has adopted an actual-virtual target to escape from the local minimum by defining a sum of turning angles.
If the sum of turning angles throughout the way is near 0 o , the robot would decide to go toward the real target.
If the total amount of turning angles is negative, then the robot will have a counterclockwise motion to compensate the amount at the opening point.
Since the sum of turning angles is −360 o at point "(B)" in Figure 12(a), the robot will not execute goal reaching and will turn counterclockwise to continue following the wall until the point "(B)". But MBCMC will switch the control scheme and run towards the goal directly. Although after breaking away from the G-shape obstacle [35], it will spend more time and run more distance than MBCMC to get goal point.
Snail shape in Figure 12(b) is more complicated trap than G-shape. The distance between the corridors of the snail must be wide enough. The robot in [35] after encountering the first wall (left side or (a) G-shape (b) snail shape Figure 12. Escape from a G-shape and snail shape environment right side), follow the left side or right side wall and then break away from the snail shape obstacle successfully. But the snail shape environment in this paper has a very narrow corridor and with a dead end.
Hence, it will effect the definition of the virtual target [35] and event weights of the expert-FCM graph [31]. Also, the robot falls into a trap at dead ends "(A)" as shown in Figure 12(b). Since hall way and dead end are in the general definition of environment patterns and MBCMC can identify those cases in this paper. Moreover, the wall following method is also modified by equation (33) to a suitable corridor environment. The results of Figure 12(b) prove that the robustness of the proposed approach is better than the approaches in [35,31], whether it is a wide corridor or narrow corridor.
Test III: Building environment: The robot starts in room 1 and navigates to the goal at room 2. Figure  13 shows that both MBCMC and M-NN [56] started at room 1, crossed the narrow corridor and arrived at the turning point "(A)". MBCMC can implement selfrotation strategy according to the environment model and aim the room 2 as the goal. But the robot (M-NN [56]) failed to enter through the "door" at (A), because it was confused by the corridor module and the left turn module (adopt the competitive coordination). The robot ( [56]) can break away the dead end "(B)" in Figure 13, but it spends more time to reach the goal than the proposed approach.
Test IV: Maze environment: The performance of MBCMC was examined in the similar environments ( [22]) with more complex mazelike traps in Figure 15. Figure 15(b) shows the similar navigation scenarios of the robot moving in the maze environment with irregular obstacles. FDES-based approach [22] employs supervisory control theory of fuzzy discrete event sys-  tems to model and control several navigation task of a mobile robot. Two deliberative behaviors ("Go to Target"(GT) and "Route Follow" (RF)) and three reactive behaviors ("Wall Follow"(WF), "Avoid Obstacle"(AO) and "Avoid Dead ends"(AD)) are weighted through FDES and navigate the robot to the final target successfully. In this method, target seeking is based on following a series of immediate sub-targets (waypoint). GT is used for path optimization and aims to find the next nearest waypoint. RF is used to navigate the robot through way points. Therefore the robot can trace a collision-free path with optimum distance towards the actual target in maze-like environments. Unlike in [22], the start and end points are identified and moreover the waypoints are given manually. The robot in this paper only knowns the start point and goal point and also can identify the surrounding unknown environments by MMC accurately. The dynamically chosen reasonable behaviors by MBCMC, the AMR can help to walk out of the maze safely. Figure 15 shows the traveled trajectory with these environments, where both MBCMC and FDES-based approach have the similar path. Also, Figure 16 depicts the Yaw angle between robot direction and goal, environment pattern, left and right wheel driving torque and robot speed results of MBCMC related to the complex maze-like environment in Figure 15. Figure 14 depicts autonomous robot's trajectory in different structured / unstructured maze environments using the proposed MBCMC approach. These different types of maze environments have many different types of local minimum traps. But the robot does not fall into the traps and the trajectory is smoother while maintaining a certain safety distance from obstacles. It shows that MBCMC can well adapt to navigation tasks in different complex environments and find an optimal path to the goal. Table.3 depicts the performance evaluation of the proposed MBCMC with the existing approach in different kinds of environment. Moreover the trajectory is smoother and safer than other methods, the oscillation times of MBCMC is also the smallest. This is due to both the behavior selection strategy and computational efficiency of the membrane controller, as well as the detailed design of each behavior.

Conclusions
In this paper, a simple and effective environment pattern membrane classifier is constructed based on parallel distributive computing models known an ENPS. It can be identified by eleven environment patterns and can build or modify environment modules quickly. It is observed that the proposed MMC and MBCMC are able to provide a robust and successful navigation with a smooth path in different type of environments. The proposed bio-inspired controllers are validated on simulated mobile robots and comparison with neural network controller and fuzzy logic controller has been provided The proposed approach eases the design of the behavior-based hybrid control architectures with 6. Appendix

Multi-Behavior Design
In order to adapt to the local environment, AMR reflective behavior and reactive behavior are properly designed according to the physical characteristics of the robot.  Goal reaching is a behavior that orders the robot to move from the current position to a destination by receiving the desired goal position from the top of the deliberative layer. The description for goal reaching is shown in Figure 17. The current position of AMR with respect to the goal position is expressed in form of the polar coordinates where d represents the distance between goal point G 1 and AMR current point O c . Again, d s specifies the safety distance of goal reach-ing, and θ is the angle error between the current robot heading vector θ R and goal vector θ G , θ = θ R − θ G . Let the robot safety distance and AMR speed limited are as depicted in Figure 17, then the kinematic equations [14] of AMR can be described by the following equationsḋ where v and ω represent the robot's linear and angular velocities, respectively. First, select the Lyapunov candidate function which is a positive definite function and the time derivative of equation (8) iṡ When we substitute (7) into (9), we obtaiṅ where v max is the maximum of linear speed and S function is defined below The variable d s in (12) decides the deceleration distance of AMR while reaching the goal. When the robot is far from the goal (i.e., d > d s ), AMR approaches the target as fast as possible, and begins to slow down while reaching the desired target. Substituting (11) into the second partialV 2 gr of (10), since v max > 0, d > 0, one gets the semidefinite negative functionV Then, substituting (11) into the first partial of (10), V 1 gr rewriting aṡ The control law of angular velocity is proposed as where k is a proportional constant and k > 0. Substituting (15) into (14) which results in another semidefinite negative functionV 1 gr = −k * θ 2 ≤ 0. Hence, one can conclude that the first derivative of the Lyapunov function (16) is the semidefinite negative function, d = 0 and θ = 0, which results inV gr = 0, i.e., The proposed controller guarantees that v ≤ v max for all t ≥ 0, and drives the states d(t) and θ (t) asymptotically to zero. In addition, the goal reaching strategy is based on the Lyapunov function which utilizes the target distance information, has good portability. Moreover, it is inclusive and is applied to design control laws of obstacle avoidance, wall following, corridor walking with a unified "virtual target". In addition, the control laws take into account the safety distance close to the target and the maximum speed of AMR, for which it has better performance of efficiency and safety.

Obstacle avoidance:
This controller is responsible for avoiding the obstacles that may appear randomly when AMR is moving towards the target or following a wall. It is a reflective action and is designed by the goal reaching method described above. When an obstacle is detected, we suppose that a dynamic goal will appear ahead of the robot motion direction to lead it to walk around obstacle smoothly. In Figure 18, an obstacle is around the robot and the environment classifier accurately judge it to locate it on the right side of the robot. Thus, one should set a new "virtual target" on the left side of the robot to make ARM turning left in order to prevent the collision. First, it must select the appropriate sensor IR i (since several sensors detect the obstacle simultaneously, we must select the suitable ones to define the reference direction of "virtual target" ). Since the obstacle is located at the right side of the robot, first find the non-zero value sensors (represent obstacle detection) from the front of left side layout sensor IR 6 to the front of right side ones IR 3 (IR 6 → IR 7 → IR 8 → IR 1 → IR 2 → IR 3 ). If the obstacle is located at left side of the robot, then find the first non zero sensors from IR 3 to IR 6 (IR 3 → IR 2 → IR 1 → IR 8 → IR 7 → IR 6 ). In this case, IR 1 is found as the candidate sensor, where d max is the maximum measurable distance of sensor, d sa f ety is the safety turning distance of the robot, d IR_1 is the distance reading of IR 1 . The virtual pressure radius is denoted by R v and R v = d max −d IR_1 , d os is the distance between O o and O s , where O o is the contact point from the ray of sensor IR 1 to the surface of the obstacle. In fact, d os = d IR_1 − d sa f ety . So, the virtual turn angle of robot θ T is The direction of the new goal O new is parallel to the line segment O r O s . Thus, the desired new goal position can be given by x g y g = x r + k * d w * cos θ New y r + k * d w * sin θ New (18) where d w is the desired distance of the new goal ("virtual target"), k is the proportional coefficient and the variable θ New is the orientation of new goal O new attached on the robot platform centroid O c measured from the horizontal axis (OX).Moreover, θ oa , the obstacle avoidance angle between the current robot heading line and the straight line connecting the points of O new and O c , is obtained by using the following equation: where θ R is the orientation of the robot measured from the horizontal axis (OX), and θ IR1 is the mounted angle of the sensor IR 1 attached to the robot local coordinate frame (u, O c , v). Hence, θ e is equivalent to the angle of the sensor vector (IR 1 ) relative to the horizontal axis (OX). Then, the kinematic equations of AMR for obstacle avoidance can also be described by using the following equationṡ d w = −v oa * cos θ oa ; θ oa = −ω oa + v oa * sin θ oa d w where v oa and ω oa represent the robot's linear and angular velocities for obstacle avoidance, respectively. Select the Lyapunov candidate function Equation (18) defines a local new goal ahead of the robot motion direction, precisely it converts the repulsive force field of the obstacle into a gravitational field of "virtual target". Obviously, as the robot-obstacle distance starts decreasing, θ T will become bigger and the new goal of the robot is shifted to the opposite direction rapidly. Then, a deviation control signal of angular velocity ω oa and line velocity v oa are generated by goal reaching methods v oa = κ oa * d w * cos θ oa ω oa = k oa * θ oa + κ oa * sin θ oa * cos θ oa (22) where k oa and κ oa are the proportional coefficients. The reaction capability of the controller is regulated by a proper pre-definition of those constants. Hence, one can conclude the derivative of Lyapunov function (21) is the semidefinite negative function aṡ V oa = d wḋw + θ oaθoa = = −κ oa cos 2 θ oa d 2 w − k oa θ 2 oa ≤ 0 In this way, the proposed obstacle avoidance controller can stably avoid obstacles.  Figure 19. State describe for wall following Wall following is the robot's ability to follow a wall. These abilities are intended to complement other reflective behaviors in narrow spaces and corridors. The basic objective of the wall following is the generation of a reference trajectory, parallel to a wall (even to a bigger obstacle surrounding). If the distance and angular information between the robot and wall are considered, it can use the goal reaching method to design the wall following controller. In [14,9], the wall of the surface followed by the robot is supposed to be parallel or perpendicular and the coordinates (X, O,Y ) are used in order to simplify the design. Consider a more practical situation as in the wall in Figure 19 which is an arbitrary arrangement. The angle θ rw between the direction of robot motion and the parallel line of the wall can be computed by the angles between the infrared sensors ray and wall surface. In Figure 19, environment classifiers judge the wall located at the right side of the robot. So, several right side group of sensors (G 1 ,G 2 ,G 3 ) in Figure 4(b) are used to compute θ rw . In this case, G 1 is used to get θ G1 . For instance, G 1 has two sensors IR 1 and IR 2 . The θ IR1 is the angle between the vector of IR 1 and wall surface. Moreover, L 1 and L 2 are the line length from the centroid O c to the wall surface through sensors IR 1 and IR 2 , respectively. Again, L 1 = d IR_1 + R, L 2 = d IR_2 + R, where d IR_1 and d IR_2 are the reading data of IR 1 and IR 2 and R is the robot's radius. Hence, θ IR1 can be given by θ IR1 = arctan L 3 L 4 ; L 3 = L 2 * sin θ 12 ; L 4 = L 1 − L 2 * cos θ 12 (24) where, θ 12 is the angle between sensor IR 1 and IR 2 , θ 01 is the angle between the sensor IR 1 and the robot motion direction. So, θ G1 is represented by θ G1 = θ IR1 − θ 01 (25)