Migration Threshold Tuning In The Deterministic Dendritic Cell Algorithm

. In this paper we explore the sensitivity of the migration threshold parameter in the Deterministic Dendritic Cell Algorithm (dDCA), one of the four main types of Artiﬁcial Immune System. This is with a view to the future construction of a DCA augmented with Deep Learning. Learning mechanisms are absent in the original DCA although tuneable parameters are identiﬁed which have the potential to be learned over time. Proposed in this paper is the necessary ﬁrst step towards placing the dDCA within the context of Deep Learning by understanding the maximum migration threshold parameter. Tuning the maximum migration threshold determines the results of the signal processing within the algorithm, and here we explore a range of values. We use the previously explored Ping Scan Dataset to evaluate the inﬂuence of this key parameter. Results indicate a close relationship between the maximum migration threshold and the signal values of given datasets. We propose in future to ascertain an optimisation function which would learn the maximum migration threshold during run time. This work represents a necessary step towards producing a DCA which automatically interfaces with any given anomaly detection dataset.


Introduction
The Dendritic Cell Algorithm (DCA) is an Artificial Immune System (AIS) based on the function and behaviour of the dendritic cells of the human immune system. It is driven by a concept termed the danger theory, which postulates that the human immune system has the ability to discriminate between 'safe' and 'dangerous' contexts [18], informally known as 'danger signals'. The danger theory is in opposition to the classical self-nonself theory of discrimination of antigen proteins via their structure and origin. In the danger theory, and indeed in the DCA, antigen is classified through correlation of context with danger signals and not via examination of the structure of the antigen proteins. The DCA is inspired by this model, as are other similar danger based algorithms [20] [16]. Details of the function of the DCA are given in Section 2. This paper is motivated by a comparison study performed by Lau & Lee in 2018 [17], where a direct comparison between the DCA and an artificial neural network (ANN) is performed. This is an innovative application of the DCA, applied as a monitor for human behaviour in carrying out tasks in a VR/AR CUBE setup. The study's results indicate that the DCA can produce a similar good performance on this task. They also indicate that the DCA has a distinct advantage over ANNs when lengthy training periods are required. ANNs have enjoyed a resurgence in popularity in the guise of 'Deep Learning'. This extends on the traditional ANN through adding an optimisation function (frequently gradient descent) to the signal inputs, multiple nodes in multiple layers and a discrimination technique such as softmax to aggregate classification. The implication is that Deep Learning based on ANNs can now tackle 'Big Data' in a computationally feasible manner. Ease of implementation facilitated through TensorFlow and Keras have further increased the popularity beyond the machine learning community. Widespread application of this technique, and the automation of the selection of inputs has heightened the learning capacity of these techniques, now popular in image processing in particular. Given the direct comparison in Lau & Lee [17], we postulate that if the DCA can be directly compared to an ANN, there must be properties of the DCA which make transforming the algorithm into a Deep Learning framework possible at least in principle.
The DCA dispenses with a lengthy training period in favour of expert learning to map input streams. However, multiple authors have attempted to automatically map inputs to the algorithm, as reviewed in Chelly & Elouedi [3].. This already suggests a step towards incorporating a dynamic learning component to this algorithm akin to the first stage in deep learning. However, there are numerous "back-end parameters" in the DCA which have not been subjected to the same automation processes. In this paper we identify a particular parameter which is ideally suited to parameter tuning. Furthermore, Elisa et al. [5] apply k-means clustering to the output of the algorithm to refine the discrimination features of the algorithm. A significant improvement is shown in this re-imagined DCA architecture when applied to a standardised intrusion detection dataset, highlighting the importance of the state-change based discrimination performed by the algorithm, indicating the importance of the "back-end parameters".We focus on examining the assignment of the migration threshold across the agents in the DCAs population, to highlight the importance of this parameter's influence on classification accuracy. This is the most basic experiment possible to investigate learning within this algorithm, while maintaining the DCA's key advantage of dispensing with the requirement for a lengthy training period.
The main contribution of this paper is to assess the impact of tuning the maximum migration threshold parameter on the algorithm's classification accuracy. Section 2.1 describes the major features of the algorithm and formal analysis of the algorithm is reviewed in Section 2.2. A rationale for exploring the sensitivity of the maximum migration threshold parameter is given in Section 3. A preliminary experiment including learning the migration threshold parameter is shown in Section 4 comparing a range of values. We conclude the paper by Fig. 1: Schematic representation of the processing by a single population member, demonstrating the migration process, from [6] suggesting how this modification can be extended to further enhance the DCA towards a Deep Learning DCA framework. For a comprehensive review of the DCA see [3], and the original DCA is described in detail in [6].

Algorithm Overview
As an algorithm, the DCA was first presented in 2005 [7] as an anomaly detector in the style of a population based algorithm. Individual DCs in a population are transformed from an immature state to either 'mature' or 'semi-mature', depending upon the type of 'signal' they have encountered throughout their defined lifespan. Expert knowledge couples data streams to the DC population through a rough categorisation of the streams into 'safe' or 'danger'. The stream data is processed by individual DCs though a simple weighted sum equation.The output of this weighted sum increments internal values, either the 'mature' or 'semi-mature' indicator The data collection window for each DC is determined by a lifespan limit, termed in the literature as 'migration threshold' [9]. Upon migration a DC is classed as either 'mature' or 'semi-mature' via the application of a linear threshold or simply labelling the cell based on which of the mature or semi-mature variables is the larger value, demonstrated in Figure 1.Classification cannot be performed with the DCA without an orthogonal data stream termed 'antigen' -this is a representation of the item to be classified.
Each member of the set of antigen in a dataset is termed an 'antigen type', of which will have multiple instances. This decoupling of the data allows for data correlation within the DCA. The first real-world test for the DCA involved experiments similar to this though monitoring an individual host information and not a network and system calls [8]. Each system call per process is captured in this dataset, and the process ID associated with each system call forms the antigen types in this dataset. This same dataset is used to experiment with the maximum migration threshold in this paper.
In the population each DC agent samples signals from the signal stream and antigen from the antigen stream within a dynamic lifespan. The sampling duration of a DC is controlled by its migration threshold, assigned upon creation of the immature DC. An internal variable of an immature DC, termed in the literature as 'csm' [10], is incremented in proportion to to the strength of signal experienced by the DC. Upon the 'csm' variable having a greater value than the assigned migration threshold, the DC is removed from the sampling pool and presented for analysis. The secondary analysis phase of the algorithm counts for each antigen type the percentage of DCs which are mature versus semi-mature. This ratio returns a value between 0-1 for each antigen type, with values closer to 1 indicating an anomaly, and this is referred to as the 'mean context antigen value' or MCAV. Once all data is sampled a final value for each antigen type is calculated, a linear threshold is applied. A range of values for this threshold can be used to create ROC curves out of the DCA output.

Theoretical Research and Formal Specification
The prototype version of the DCA was first presented by Greensmith et al. in 2005[7], with the full version published in 2006 [9]. While implementable details were attempted, the algorithm's function and behaviour were obfuscated by the complex agent based framework used to implement the DCA and the over twenty potentially tuneable parameters. Two approaches were taken to 'demystify' this algorithm and to increase its applicability. The first approach was to reduce the number of tuneable parameters to two in the Deterministic DCA [11] dDCA, leaving population size and range of migration threshold across the population. Dynamic antigen buffers, sigmoidal functions for weighted sum inputs, and MCAV was replaced with a real valued metric, K α .
The dDCA as a simplified algorithm has proven popular for implementation and assisted in some of the earliest theoretical research for the DCA [14]. Aside from the simplification of the algorithm, a key motivator for the development of the dDCA was to provide a 'stripped down' version of the algorithm in order to build in new components, to add in stochastic elements individually. This has not happened to any great extent, though the dDCA has become a studied and applied algorithm e.g. [15] in its own rite as detailed in the review in [3]. Further theoretical analysis of the DCA is performed in Oates et al. [19], Stibor et al.[21] and Gu et al. [13], which analysed the DCA as a set of linear classifiers, without analysing the impact of the antigen stream.
The second attempt to clarify the algorithm is motivated by the inconsistencies in DCA implementations. Ambiguity surrounding signal mapping, the use on inappropriate datasets and direct comparison with unsuitable supervised learning techniques motivated Greensmith & Gale in 2017 [12]. A formal specification of the dDCA with Haskell is presented. Haskell is a purely functional language, where the specification becomes the implementation, therefore if the specification is verified as correct, then the implementation is also correct. This research shows that the input to the DCA is stream data, and not necessarily 'feature vectors' and that the 'antigen stream' must be de-coupled from the signal streams in order for the algorithm to be effective. This is the version of the dDCA used in experiments in Section 4 using a verified dDCA. If the learning process for "back-end" tuning is possible, the Haskell specification will be extended to ensure correct future implementation of this component.

Cell Migration Control in the DCA
It came as a surprise to reviewers past that there is no explicit learning process, optimisation or local search operator in the DCA. The assumption in the literature is that an AIS must behave like any other evolutionary algorithm. It is thought that it must converge upon a solution like the clonal selection or at least engage in a training process akin to supervised learning techniques in AIS including negative selection [2]. However, the DCA does not have such facility, the deterministic DCA even more so as it dispenses with reliance on any random elements included in the original variant.
The obvious approach is to replace with the requirement to use expert knowledge to decide how signals from the signal stream are mapped to the categories of safe and danger as indeed has been widely performed in the DCA literature, as reviewed in [3], including the use of fuzzy systems, rough set theory and PCA. Secondarily is the optimisations of the weights in the signal processing equation which encompasses a training phase for the DCA, as performed by Elisa et al. [4] though the use of a genetic algorithm. This is in contrast to the work presented in [17] which determined that the lack of lengthy training period of the algorithm was indeed how the DCA has an advantage over ANNs.
There are other aspects of the DCA which can be augmented with some form of learning capability outside of the paradigm of requiring a training period. Two tuneable parameters with optimisation potential are population size and the assignment of the maximum migration threshold of the DC population. The maximum migration threshold is tested for its sensitivity in this paper through performing a preliminary investigation in the link between the parameter and the algorithm's performance. Optimisation of this parameter during the algorithm's runtime may be able to enhance its performance, though this must be done in an incremental manner.
The migration threshold is important in the DCA as it determines the exact set of signal and antigen instances processed within an individual DC throughout the run-time duration of the algorithm. It controls the length of time a DC remains in the sampling pool before being presented for analysis. A migration threshold is assigned to each DC upon the initialisation of the algorithm. In the dDCA, each DC is given a specific value of migration threshold which is calculated in proportion to an overall maximum migration threshold, set as a user-definable parameter as DC mt = f (M ax mt /N umCells). For example if there are 5 DCs in the sampling pool and a maximum migration threshold of 10 is assigned the DCs migration thresholds are assigned as a simple modulus function as shown in Table 1.
The migration threshold is applied to a parameter termed 'csm' 1 is incremented through summation of the danger and safe signals collected at each signal sampling iteration. Once the value of 'csm' exceeds that of the DC m t, the cell is removed from the sampling pool and presents data for the analysis phase. At this point, the DC is destroyed, and a new DC is created with an identical DC m t and repopulated the sampling pool. This process is specified formally in [12].
We commence the investigation into parameter tuning in the dDCA by firstly ascertaining the sensitivity of the algorithm to variation in the migration threshold of the individual cells, controlled by tuning the master maximum migration threshold parameter. The dDCA is useful for this task as it allows for a high degree of reproducibility and traceability of data within the algorithm. An initial specific set of parameter values are chosen with a view to exploring optimisation techniques in future research.

Ping Scan Dataset
Ten datasets are created, originally for [9] and used for the sensitivity analysis in [8], based on performing a series of ICMP Ping Scans on a medium scale university network. This data is designed specifically to assess the parameters of the DCA, and not necessarily to capture all of the nuances of network intrusion detection. Given the experiments relate to a DCA parameter, this justifies the use of this dataset in this case. The generated data captured the processes involved  during the scan to form the antigen and measured the network attribute of packets per second sent from the machine instigating the scan. The danger signal is the number of packets per second sent, normalised into a range of 0-100. The safe signal is the inverse rate of change of number of packets per second sent also normalised in the range of 0-100. Summary statistics of the signal data for the ten datasets (S1-S10) are shown in Tables 2, 3 and Table 4, including the duration of the monitored session in Table 4.
As part of this data capture exercise, antigen data is also captured. While over 25 processes were active during the scan duration, four 'processes of interest' were identified as making over 100 system calls for the duration of each scan. These are the bash process which is the terminal from which the scan is instigated; the nmap process used to instigate the scan; the pts pseudo-terminal slave process which is a helper process for the nmap process; and sshd process which was used to log into the linux terminal from which the data was collected. We expect in the results for the experiments to indicate the nmap and pts processes as anomalous and the sshd and bash processes to be classified as normal, as indicated in the results in [8]. In previous experiments where the anomaly score is given as the mean context antigen value -MCAV, a coarse threshold of 0.5 is added to discriminate between the normal and anomalous processes as in [8].

Experiments
A control experiment is performed with the dDCA using the standard parameters for population size of 100 and the M ax mt set at 100. All other settings are as detailed in [11] and [12]. The results for each dataset are shown in Table 5. Five parameters are chosen on a logarithmic scale to examine the link between the dataset and M ax mt set at 1, 10, 100, 1000 and 10000. Given the ranges of the data, this covers the smallest window possible ensuring that the cells will migrate each iteration. The maximum value of 10000 exceeds the total signal amount for each dataset, ensuring that each cell will only migrate once. For the sake of completeness, we also test the average signal sum across all 10 datasets which is 3165. We also test a normalised version of this value which takes into account the duration of each dataset, resulting in a M ax mt value of 73. Results of these experiments are shown in Table 6, as mean values per process of interest and the related standard deviation. A more detailed presentation of individual results per process is given in Figure 2. We expect correct classification to produce values of below 0.5 for bash and sshd, and above 0.5 for nmap and pts.

Discussion
The results clearly show that the maximum migration threshold parameter is important for the dDCA, producing marked changes in classification performance.   The result that identical M CAV s are obtained for all values under 100 was initially surprising, and assumed to be a fault in the experimental test harness. Thorough investigation of this phenomena was performed as a result, and we are confident that this is a genuine observation and not due to a bug in the dDCA code or in the test harness. Upon analysis, we see that for each signal instance a combined value of 100 is present. This means that for instance, for cells with a migration threshold of less than 100, all cells in the population migrate, making parameter variability in this range immaterial. This is a useful observation for future guidance on setting this parameter. As the parameter increased above 1000, there is a deterioration in the discrimination of the anomalous processes, though changes in the discrimination of the normal processes were not significant. This is most pronounced with the value of 10000 in which no migration occurs until all signal instances are processed, as shown in Figure 2. These results indicate that this parameter and the migration thresholds of individual DCs

Conclusions
The contribution of this paper is that it shows the sensitivity of the migration threshold parameter in the dDCA and on a wider range of data than in previous experiments with the dDCA. Deterioration of classification is shown with excessively large migration thresholds, and lower limits related to the current system signal values. Therefore this represents a small but important step towards implementing a learning mechanism in the DCA independent of an initial training phase. The results suggest that the maximum migration threshold is likely to benefit from an optimisation technique either based on the expected input for the algorithm or, more importantly, during run time. This can be achieved via lightweight local search operator and we hope to explore this in subsequent studies. We have not studied the distribution of the migration thresholds across the population as a uniform distribution is used here. Pertinently, a uniform distribution is used in this paper, and we do not know how the results would be affected if for example a gaussian distribution be used in its place.
In this paper we have ignored the potential influence of the number of cells in the population. Therefore a multi-objective optimisation approach to tune both key parameters of the dDCA may be beneficial in ascertaining their optimum values for any given dataset. We are aware that there may also be nuances of this particular dataset which are influencing the results, and therefore we would seek to replicate this study on a different dataset, for example using the KDD99 dataset as a starting point. The central goal is to move the DCA towards a Deep Learning style framework, and understanding the influence of the migration threshold is just one component which contributes to this aim. An integrated approach examining both front and back-end parameters in a dependent fashion would be the intended trajectory of future work on this algorithm. There is also the potential to run multiple DCA instances in parallel in a similar multilayered fashion to an ANN. A combination of these techniques will be needed to achieve the aim of creating a Deep Learning DCA.