A multi-aperture optical ﬂow estimation method for an artiﬁcial compound eye

. An artiﬁcial compound eye (ACE) is a bio-inspired vision sensor which mimics a natural compound eye (typical of insects). This artiﬁcial eye is able to visualize large ﬁelds of the outside world through multi-aperture. Due to its functioning, the ACE is subject to optical ﬂow, that is an apparent motion of the object visualized by the eye. This paper proposes a method to estimate the optical ﬂow based on capturing multiple images (multi-aperture). In this method, based on descriptors-based initial optical ﬂows, a uniﬁed global energy function is presented to incorporate the information of multi-aperture and simultaneously recover the optical ﬂows of multi-aperture. The energy function imposes a compound ﬂow ﬁelds consistency assumption along with the brightness constancy and piecewise smoothness assumptions. This formula efﬁciently binds the ﬂow ﬁeld in time and space, and further enables view-consistent optical ﬂow estimation. Experimental results on real and synthetic data demonstrate that the proposed method recovers view-consistent optical ﬂows crossed multi-aperture and performs better than other optical ﬂow methods on the multi-aperture images.

A powerful machine vision tool is the artificial compound eye, see [50], which allows a broad vision of the scene. The natural compound eyes widely exist in insects and crustaceans, and one of the mer-its of natural compound eyes is its sensitivity to motion objects. The outstanding property is contributed by both its high temporal resolution and the structure with multiple ommatidia (singular: ommatidium, the basic unit of compound eyes that provides the picture element to brain). In general, the motion object is captured by a small number of ommatidia, and therefore, the pressure on a processor system can be very low and the processing speed can be very fast since only a small amount of data is needed to be processed. As a bionic system that hopes to inherit the merits of natural compound eyes, the artificial compound eye (ACE) is increasingly investigated [6,44,50]. However, few researches have addressed the practical use of ACE cameras [50], such as motion estimation. Although lots of literatures based on the 2-dimensional motion estimation method called optical flow [17] and the 3dimensional motion estimation method called scene flow [45] have been reported in recent years, the motion estimation method of ACE has scarcely been studied. There are mainly two limitations about the optical flow estimation of artificial compound eye. On one hand, ACEs have multiple apertures, while most of which contains only one pixel in each aperture. The image resolution obtained by ACE is limited by the number of apertures [12], and the interval between apertures are much larger than the pixel interval in traditional camera. The relatively low resolution and large aperture interval heavily restricts the accuracy of optical flow result, in addition, it also limits the use of optical flow information to other high level computer vision tasks, such as semantic segmentation, object recognition et al., due to the sparse sampling of the scenario. This problem can be addressed by integrating a pixel array in each aperture. Therefore, an aperture of ACE captures a small image, and the image of ACE is composed of an image array. Then the image resolution of ACE with the same aperture is greatly improved. However, the image array captured by ACE incurs the second problem, since the ACE often has a small size [50], the image of an aperture only captures a small region of the scenario, therefore, estimating the optical flow on each aperture will suffer from the aperture problem [9]. In this paper, we mainly address the second problem of optical flow in ACEs based on a specific artificial compound eye (electronic cluster eye, eCley).
The eCley, inspired by a wasp parasite called Xenos Peckii, is widely researched to get superresolution of imagery [6]. In this work, the eCley has a lenslet array of 17 × 13 apertures, and the size of the optics module is 6.8mm × 5.2mm × 1.4mm (Length × Width × Height). This means that the height of the optics module of eCley is about one third of a singleaperture optics module with the similar resolution. With such a compact size, the eCley can be widely applied to the fields such as bio-medical imaging, document analysis, fingerprint identification, micro-robot navigation. However, since the eCley was proposed several years ago, only a few researchers have adopted this kind of camera in the area of computer vision. This is partially because the optical axes between adjacent apertures of eCley have an small offset to obtain a larger field of view (FOV), which leads to the oblique incidence of the marginal aperture. As shown in Fig. 1, a chessboard captured by eCley is parallel to the eCley, and the image captured by the central aperture (image in the cyan box) has less distortion, while the images captured by the marginal apertures suffer from the oblique distortion (images in the red and yellow box). The oblique distortion will lead to a complex process for computer vision tasks [52]. Therefore, standard vision algorithms cannot transfer to eCley directly to yield accurate results. Besides, the biggest challenge for using the eCley is that the small size and small FOV of aperture image resulting in little context to support the corresponding field inferencing. The image caputered by eCley is an image array, and adjacent aperture images are partially overlapped. The multi-aperture structure will cause stereo problem. The depth estimation method for eCley can refer to [52,51]. Due to the short focal length and baseline of eCley, the stereo nature of aperture images is negligible when the objects locate larger than 86mm away from the eCley. Therefore, this paper assumes that the moving objects are far enough, and the motion estimation problem with eCley camera becomes multi-aperture optical flow problem. Howerver, traditional optical flow methods are based on single aperture image sequences. The small FOV of aperture image will lead to the inaccurate optical flow result, especially for the border of image, and inconsistent optical flow between adjacent aperture images. For example, four adjacent aperture images at time t (left) and time t + 1 (right) are shown in Fig. 2(b), and the color coding in [2] for the visualization of optical flow is shown in Fig. 2(a), and the optical flow of each aperture image sequences is estimated independently based on the DeepFlow method [33,49], which are shown in the right of Fig. 2(c). The optical flow of bottom-right aperture image based on DeepFlow method can not be accurately estimated due to the aperture problem. Besides, the four aperture images capture the same motion boundary, but the optical flows at the motion boundary are less consistent.  To address the optical flow problem of eCley, we first introduce an oblique distortion rectification method to rectify the image array and estimate the parallax between adjacent aperture images. Then, based on the rectified image array sequences, the Deep-Matching method [33] is used to obtained the initial optical flow. While the initial optical flow estimation of an aperture image based on DeepMatching method uses only the contexts of corresponding aperture image sequences. The refinement method based on traditional variational method leads to less accurate results due to the aperture problem, see the right image of Fig. 2(c). Therefore, in order to obtain accurate optical flow, a multi-aperture optical flow method is proposed to refine the optical flow fields. In this method, a unified variational framework for all images is proposed to simultaneously refine the optical flows of all images. In contrast to traditional variational optical flow methods using brightness constancy and piecewise smoothness assumptions, our model combines the information of multi-aperture and further imposes a compound flow field consistency assumption that combines the corresponding flow fields of adjacent images. Consequently, the multi-aperture images information is used to define a global energy function. By solving the associated Euler-Lagrange equations and following the coarse-to-fine warping strategy, the minimization of the nonconvex energy function is obtained.
Experiments are based on synthetic and real data, and the results demonstrate that our method achieves more accurate and consistent performances in comparison with those optical flow methods based on single aperture.
The rest of the paper is organized as follows. The related work is presented in Section 2. Section 3 describes our multi-aperture optical flow method. Experiments are presented in Section 4. Finally, conclusions are followed in Section 5.

Related work
Since ACEs are bio-inspired instruments, the bioinspired model of optical flow computation and the use of optical flow signal to control the movement are relatively important aspects for mimicking the function of their biological prototype. Franceschini et al. [13] researched the visual behavior and neural networks of the airborne insects with compound eyes, and designed an artificial compound eye with an array of elementary motion detectors (EMD) to avoid obstacles. Inspired by insects, Zufferey et al. [59] proposed a simple control strategy based on optical flow to avoid obstacles. Pericet-Camara et al. [31] designed a lightweight artificial elementary eye to extract the local optical flow fields. Bračun et al. [3] used the functional subnetwork approach to model the neural system of ACE for visual motion recognition. Although above researches focused on the motion detections of ACEs and further handling method based on motion information, a basic unit of the ACE only contains an EMD, which is quite different from the eCley used in this paper. The basic imaging unit (the aperture) of eCley contains multiple pixels that will form an image of scenario in the FOV. Therefore, an aperture image of eCley is able to capture much more information (such as color and environmental context) than the EMD in above ACEs, and is a local image captured the scenario. The most related camera to eCley is the light field (LF) camera, which puts a microlenses array behind the main lens. Due to the ability to capture much richer information than traditional camera, LF camera is widely researched in the field of computer vision, such as depth estimation [58,7,19], motion deblurring [41,25], material recognition [46], super-resolution imaging [11], scene flow estimation [24]. Since the captured light field information mainly comes from the microlenses structure, the eCley can also be considered as a kind of LF camera that do not have the main lens. The widely used LF cameras capture multiple aperture images with small offset. However, the adjacent aperture images of eCley have much larger offset. For a pixel in an aperture image, the large offset leads to the corresponding pixels can only be found in several neighboring aperture images. Therefore, some methods based on LF image [20], such as epipolar plane image (EPI) method, angular patches method, can not yield good performance.
The bio-inspired model for the velocity estimation of local image has been widely researched [15,40,38,9]. These models mimic the neural models existed in cortical areas called primary visual cortex (V1) and medio-temporal area (MT) by using a two layers feed-forward model. The V1 layers uses a bank of oriented filters (i.e. the Gabor filters or Gaussian derivatives [9]) to obtain the responses of the velocity and the direction of velocity. Then, the responses of V1 layer are pooled and transfered to the MT layers, fi-nally, the final responses are acquired by transforming the pooled responses through a non-linear function. To compute the optical flow of large moved object, the multi-scale approach is adopted. Although the two layers model is likely to the model of the visual system, the obtained optical flow is worse than the results by using modern computational method (i.e. the variational method). The possible reasons may be due to the fact that the visual system has much more complicated procedure and can use much larger range of scenario to infer the contextual information. On the contrary, the convolutional neural network (CNN) using much more layers to extract the local features and global abstract descriptors greatly improved the accuracy of optical flow [10,14,1,8]. While the CNN-based method for optical flow needs numerous training data to learn the hyper-parameters, and the numerous training data of ACE images are currently unobtainable.
Except for the CNN-based method, the variational method is another widely researched optical flow method. After 35 years' development, current variational methods achieve more accurate results. Brox et al. derived a variational formulation to improve the optical flow result by imposing a coarse-tofine warping technique for large displacements and using two nested fixed point iteration strategy to optimize the global energy function [4,29]. Zach et al. improved the Horn-Schunck model by imposing a robust L 1 data term and total variation (TV) regularization [55]. The energy function is minimized by alternating optimization strategy, see also [39,34,35,36,37]. Subsequently, Wedel et al. further improved the TV − L 1 optical flow algorithm by performing a structuretexture decomposition of the images and integrating a median filter into the numerical scheme [48]. Sun et al. gave a thorough analysis of what has made recent advances possible [42,43]. A systemic analysis of the energy function, the optimization, and modern implementation practices was drawn. Consequently, a method called "Classic + NL" was proposed to further improve the accuracy. The above methods were performed based on a multi-scale variational framework for large displacements. Although the multiscale strategy is the key step for variational framework to estimate the large displacement, the fine motion structure with a large displacement cannot always be correctly estimated. Recently, several algorithms were proposed to handle this problem by going beyond the variational framework and incorporating additional feature corresponding information [5,53,49,32]. Unfortunately, due to the small size and small FOV of aperture image, estimating the optical flow of aperture image independently often leads to inaccurate results caused by aperture problem. Since an object near the border of an aperture image will be close to the image center of one of the neighboring aperture, incorporating the neighbor aperture image to address the aperture problem can be considered to be a possible way to improve the accuracy of optical flow results.

Multi-aperture optical flow method
Our goal is to simultaneously recover the optical flows of an image array captured by a multi-aperture camera -eCley. However, as shown in Fig. 1, the oblique incidence of marginal aperture leads to oblique distortion, which will result in the parallaxes of adjacent aperture images inconsistent. Therefore, in this Section, we first introduce a method to rectify the image array and estimate the parallaxes in Section 3.1. Then the optical flow method is given in Section 3.2.

Preprocessing of eCley image array
To address the distortion caused by oblique incidence, this section introduces the rectification method for eCley to rectify all the aperture images simultaneously. Since the eCley is an array of optical channels, it can be considered as multi-apertures camera and the incident angle of marginal aperture is oblique. The parameters of the marginal apertures obtained by traditional calibration method [57] will not be the accurate parameters due to the oblique incidence. Therefore, based on the fixed system parameters and incident angles, we propose to deduce the rectification parameters of other apertures from the central aperture's. The rest of this section gives the detail of our rectification method.
As the same as calibration method, we use a chessboard as reference pattern. The chessboard is parallel to the eCley lens and put far enough to ensure that the disparity is negligible. The captured chessboard image is shown in Fig. 1, and each aperture im-age captures a part of the chessboard. Since the aperture image is a small size and small FOV image and the optical axis of central aperture is perpendicular to the imaging sensor plane, the image of central aperture is considered as undistorted image. Therefore, the size of black and white block of the central aperture image can be used as a reference for the rest of aperture to rectify those images.
To obtain the reference block size in the central aperture image, the corners of each block need to be extracted. In this paper, a canny edge detector is applied to obtain the edges, and then a Least Square (LS) fitting is performed over these edges to obtain multiple straight lines. Then the corner points can be extracted from the intersection points of these straight lines. Consequently, the average size of these white and black blocks is used as the reference block size. Since all of the black and white blocks in the chessboard we used have the same size, the sizes of blocks in the undistorted images will be the same as the blocks' in the central aperture image. Therefore, for an aperture image, the block size and corner points can be obtained by using the same procedure as the central aperture's. Assume that the corner point closest to the aperture image center is chosen as a reference point, then each corner point can be assigned an undistorted coordinate according to the block size of central aperture image. Base on the original coordinates and the undistorted coordinates of corner points, the Random Sample Consensus (RANSAC) method can be used to estimate the rectification parameters. Consequently, the aperture image can be rectified by using those transformation parameters.
Due to the unfixed reference point, the parallax between rectified adjacent images is required to be reestimated. For a rectified aperture image, the undistorted corner points can be used to further estimate the translation parameters. Since there is an inherent parallax between adjacent aperture images (see [28] for more detail about eCley), a corner point has a roughly corresponding point at the neighboring image, then the exact corresponding point is searched around the corresponding point. Afterward, the translation parameters are obtained by averaging offsets of these corner points. For each aperture image (except the marginal aperture), 4-neighboring aperture images are used to obtain 4-pair translation parameters, which can be considered as the inherent parallax between the rectified adjacent images. Fig. 3 shows the aperture images and merged image of the chessboard before and after rectification. The oblique distortion is largely reduced in the rectified aperture images, and the merged image based on the re-estimated parallax has much less aliasing than the merged image with original aperture images. While the rectification needs to be an interpolation processes, the marginal aperture images after rectification is blurrier than the central image, and the merged image with rectified aperture images is blurry at the image border.

Optical flow initialization
In order to handle large displacement in optical flow, lots of literatures blend a matching approach with a variational method [5,53,49,32], and largely boost the optical flow performance. This paper also takes advantage of the matching approach to obtain the initial optical flow. Since the DeepMatching method is a quasi-dense matching method tailored for optical flow problem, for each aperture image sequence I i (x, y,t) and I i (x, y,t + 1), the DeepMatching method as [49] is adopted to obtain the initial optical flow w init i of aperture image. Due to the small FOV of the aperture, the captured image frequently shows a textureless or little texture pattern, which leads to the initial optical flow less accuracy. To refine the initial optical flow, we propose a variational optical flow model to incorporate the adjacent aperture optical flow to improve the result.

The variational optical flow model
Based on the variational formulation of optical flow, the total energy function we aim to minimize is where E i (u, v) is the energy function of aperture A i , and is a weighted sum of three terms, where E i Data is data term, E i Smooth is smooth term, E i Compound is compound term that encourages the flow of adjacent images to be similar. α and β are weight parameters accordingly. If β = 0, E i (u, v) becomes the energy function of traditional optical flow model. E i Data imposes the brightness constancy and gradient constancy assumption, and is obtained by where ∇ = (∂ x , ∂ y ) denotes the spatial gradient. The function Ψ is a penalty function. As illustrated in [42,43], among three penalty functions, the quadratic penalty Ψ(x 2 ) = x 2 , the Charbonnier penalty Ψ(x 2 ) = √ x 2 + ε 2 and the Lorentzian penalty Ψ(x 2 ) = log(1 + x 2 2δ 2 ), the Charbonnier penalty performs the best. So the Charbonnier penalty is chosen in this paper and ε is a small positive constant to make sure the function is convex. The smooth term E i Smooth imposes a piecewise smoothness assumption to deal with the ambiguities of low texture regions and give a smooth flow field. It is obtained by penalizing the total variation of flow field, where ∇ 3 = (∂ x , ∂ y , ∂ t ) . If only two sequence images are available, it will be replaced by the spatial gradient ∇ due to ∂ t = 1. By using a modern optimization method, traditional optical flow model with the two terms mentioned above can yield an accurate result [4]. While in our eCley images, the image captured by an aperture only spans a small FOV and has a small size, which may lead to less accurate results due to the lack of texture information in aperture image. As shown in Fig. 2(c), the optical flows of adjacent aperture images are not consistent by using DeepFlow method, especially in the motion boundary. Fortunately, adjacent images are partially overlapping and can provide spatial correspondence information to improve the consistence optical flow results. As for the corresponding areas of the border area of aperture image in adjacent images, at least one of them will be close to the center of corresponding image. Therefore, the moving object in corresponding area will have less chance to move out of the aperture image and more likely to have an accurate flow boundary. Thus, this paper further imposes a compound term E i Compound to improve the optical flow. The left image of Fig. 2(c) shows that the proposed compound term gives more consistent and accurate optical flow result. the compound term E i Compound assumes that the corresponding pixels of adjacent images have the same flow field, and is expressed as where Ne(i) indicates the 4-nearest neighboring apertures of aperture A i ; x j indicates the corresponding pixel of x in aperture A j , and w j is its displacement vector. δ (x j ) is a Dirac function which is 1 if x j exists, and is 0 otherwise. Ψ(|w − w j | 2 ) measures the difference between the flow of pixel x and the corresponding pixel x j .
Since pixel x in aperture A i generally has at least two corresponding pixels among Ne(i). The flow vector of x tends to be the mean of displacement vectors of corresponding pixels by using an uniform weight to all corresponding pixels. If all corresponding pixels use the same weight, the error flow vector of corresponding pixels located at the border of adjacent image will force the flow vector of pixel x to tend to the error flow vector. To reduce the effect of corresponding border pixels, a weight function g(x j ) is added. The pixel closer to the border of image will have a smaller weight, and the weight of pixel x j only relates to its position in the corresponding image. Therefore, we use a Gaussian function to model the relative importance of the position of pixel.

Minimization
Based on the preprocessing method in 3.1, the parallax between aperture images is obtained, so for a pixel in aperture A i , the corresponding pixels x j are known. Although a new compound term is added, the same penalty function is used for compound term so that the total energy function is differentiable. The energy function of each aperture, according to the calculus of variations, a minimizer of eq.(2) must fulfill the associated Euler-Lagrange equations with homogeneous Neumann boundary conditions. For better readability we follow the abbreviations defined in [4] where t is replaced by z. Since the energy function of each aperture is the same, i is omitted for simplicity.
The Euler-Lagrange equations of eq.(2) is expressed as Ψ (I 2 z + γ(I 2 xz + I 2 yz )) · (I y I z + γ(I yy I yz + I xy I xz )) Since the initial optical flow based on Deep-Matching still suffers from the aperture problem, The variational refinement is based on a standard coarseto-fine warping scheme. After a pyramid of images is constructed, the flow is estimated from the coarsest level to the finest level, and at each level, a fixed point iteration method is used to compute w. Let w k = (u k , v k , 1) , k = 0, 1, ..., be the displacement vector at iteration k. The coarsest level w 0 is initialized based on the DeepMatching result w init , and the finer level is initialized by using the final result of previous level. At each step k + 1, w k+1 will be the solution of Then let (10) With eqs. (9) (10), the first equation in eq. (8) can be written as and the second equation can be expressed in a similar way. In order to estimate the increment of displacement vector at each iteration, we follow Brox et al [4] and use a second, inner, fixed point iteration. We initialize the inner iteration with u k,0 = 0, v k,0 = 0. Then at each step l + 1, the system of equations in increment of displacement vector u k,l+1 , v k,l+1 are + I k xy v k,l+1 ) + I k xy (I k yz + I k xy u k,l+1 + I k yy v k,l+1 ) + For the first equation of eq.(8), here we further assume that u k,l+1 j = 0, and we use the u k−1 j to replace u k j so that there is not any new unknown variable added. Then the final linear system can be solved by successive overrelaxation (SOR) method [54].
The minimization is performed from the coarse level to the fine level, and therefore, the original image resolution is the finest level. At each image level, the outer loop is iterated k times. In each outer loop, the inner loop iterates l times to estimate the increment of displacement vector. After each outer loop iteration we further use a weighted median filter (WMF) with size of 5 × 5 to remove outliers and preserve motion boundary [42,43]. The pseudo code of our method is shown in Algorithm 1, and the function f (w k r ) is to transfer the displacement vector w k r to next finer scale as an initialization.

Experimental results
This section presents the experimental evaluation of the proposed method. In this paper, all experiments are performed on a laptop with an Intel Core i5-3210M CPU clocked at 2.5GHz and 4GB RAM. To evaluate the efficiency of the proposed method, four optical flow methods, DeepFlow [33], Classic+NL [43], LDOF [5] and EpicFlow [32], are used for comparison. Classsic+NL method is a baseline variational optical flow estimation method. DeepFlow, EpicFlow and LDOF used dense descriptors matching and interpolation to estimate the initial flow, and then used variational method to optimize the optical flow. In the proposed method, the initial optical flow is obtained as the same as DeepFlow. The accurate descriptor matching method and interpolation method can largely improve the initial flow result, and further achieve more accurate optical flow after optimization. Among the optical flow results submitted to Sintel dataset based on variational method (with or without descriptor matching), EpicFlow ranks 17th, Deep-Flow ranks 20th, Classic+NL ranks 43th, and LDOF ranks 50th. The parameters of DeepFlow, Classic+NL, LDOF, EpicFlow are kept the same as in the litera-tures. The evaluation of the results is based on Middlebury datasets, Sintel datasets and real data captured by eCley, and the performances of benchmark datasets with ground truth are evaluated with the standard metrics, the average angular error (AAE) and average endpoint error (EPE). All the visualized optical flow image are using the color coding method as [2].
In our method, all parameters at the initial optical flow estimation stage are kept the same as in [33]. At the variational refinement stage, since the most of motions in the Middlebury dataset are small, while the Sintel dataset contains large motions, therefore, the downsampling factor in coarse-to-fine step of Middlebury dataset and Sintel dataset are 0.8 and 0.5, respectively. For the outer iteration at each image level k, Fig. 4 gives the AAE and EPE of four images and their averages under different k. In order to better visualize the results, the AAE and EPE at different iterations are compared with the initial flow's (iteration = 0), and the ratio of AAE and EPE are shown in the figure. As shown in the figure, after 1 iteration, both the EPE and AAE decrease significantly thanks to the coarseto-fine scheme. As the iteration increase, most AAE and EPE decrease gradually and then turn to increases. When k = 3, the average AAE and EPE are the smallest. Therefore, we use k = 3 in our experiments. The inner iteration based on SOR method l = 100. Fig. 5 - Fig. 8 gives the AAE and EPE of Middlebury data and Sintel data under different α and β , respectively. The data are divided into image array as described in Section 4.1. We find that good results of Sintel data are obtained at α = 0.5, β = 1. For Middlebury data, α = 0.9 has the best result, and the smallest AAE obtained at β = 1.2 but smallest EPE at β = 0.8, in order to balance the AAE and EPE, we choose the median value with β = 1 for the Middlebury data.

Results on benchmark datasets
Middlebury dataset The evaluation on Middlebury Benchmark is performed on 7 pairs of frames, which have ground truth optical flow. In order to construct an image array like the images captured by eCley, each frame is divided into a set of partially overlapping images. More concretely, the RubberWhale, Hydrangea and Dimetrodon sequence with resolution of 584×388 pixels are divided into 7 × 11 sub-images with reso-  lution of 96 × 96 pixels, the Grove2, Grove3, Urban2 and Urban3 sequence with resolution of 640 × 480 pixels are divided into 5 × 7 sub-images with resolution of 160 × 160 pixels, and the overlap between neighboring images is half FOV of the sub-image. The ground truth optical flow images are divided in the same way. The sub-image pair is extracted from the  original image pair at the same position but in different frame, and is a local image pair of original image pair. Since the ground truth describes the displacement of each pixel of current frame at the following frame, the displacement of each pixel in sub-image will keep the same as the original image. Therefore, the optical flow extracted from the same position can be considered as the ground truth of each sub-image. The original image, ground truth optical flow, and the divided image array and ground truth array are shown in Fig. 9. Table 1 gives AAE and EPE of five optical flow methods, and the visualized results are shown in Fig. 10. The percentage in the bracket shown in Table 2 indi-   cates the improvement rate of our method, in which the negative percentage means our method is worse than the corresponding method. Our method achieves the best result among the five methods except for the Dimetrodon sequence.
Sintel dataset The Sintel dataset and ground truth is divided in the similar way. The resolution of Sintel image is 1024×436 pixels. Due to the large motion existed in Sintel image, we use a relatively high resolution (200 × 200) of sub-image to avoid the ob-   Fig. 11 gives the visualized results of divided image arrays, and Fig. 12 shows the sub-image results of image array for visualized comparison. In Fig. 11, the Clas-sic+NL and LDOF are much worse than the other three methods. Our method yields the similar performance to DeepFlow method and both of them outperform the EpicFlow method. As shown in Fig. 12, for the motion boundary near the image border, our method is able to correctly estimate the motion boundary and performs better than DeepFlow method. In addition, the compared methods are used to estimate the optical flow based on single image sequence. Therefore, Table 3 gives the results on the original benchmark data in comparison with our result on multi-aperture images. The results indicate that the proposed method for small sub-image array achieves an approximate optical flow accuracy to these methods on the original image sequences, although the aperture problem of sub-image array is more severe than the original image.

Results on real data
In this section, we used the real-world sequences captured by eCley. the eCley image used in this section contains 13 × 13 apertures. Each aperture image after rectification has a resolution of 101 × 101 pixels, and the parameters of our method on eCley image keep the same as in Sintel dataset. The first sequences (Fig. 13) involves the vertical motion of an arm and a hand, in a static scene. As shown in Fig. 13, Classic+NL method is able to estimate the motion boundary, but the corresponding flow fields across adjacent aperture images do not keep consistent; DeepFlow, LDOF and EpicFlow use descriptor matching to estimate initial optical flow and have more consistent results, but they can not accurately estimate the motion boundary; Our method is able to estimate the motion boundary and keep the flow fields consistent between adjacent images. In addition, EpicFlow uses the edges information for interpolation, the optical flows of textureless subimages are much worse than our method and the other three methods. The second sequences in Fig. 14 Figure 11. Visualized optical flow results on Sintel datasets. In the error map, the blue color means no error or tiny error, white means small error, the orange color indicates large error. In addition, although our method imposes a compound term to improve the result, at each iteration, the aperture image only needs the optical flow fields of current aperture image and adjacent images from the last iteration. Therefore, the estimation of optical flow of one aperture image does not need the information from this iteration and can be accelerated by using multithreading processors. From the experimental results on Middlebury datasets, Sintel datasets and the real data, the proposed method has a better performance on the sub-image array structure. But there are also some failure cases, i.e. the proposed method can only take care of the motion within an aperture image, if the motion crosses the aperture images, then the proposed method will fail to obtained the accurate flow. As shown in Fig 15, the bottom-left image sequences cannot estimate the accurate optical flow due to the across images motions. Since the large motion that across images will lead to the lack of matching point for the moving object, therefore, future work will focus on the fast moving objects that across multiple images.

Conclusion
In this paper, we proposed a multi-aperture optical flow method for ACE -eCley. we first introduce an image rectification method for eCley. Based on the multiaperture configuration, all apertures are used to simultaneously recover the optical flow of multi-aperture and keep the flow consistent across apertures. In addition, the method based on a variational framework, which imposes a compound flow field consistency assumption along with the brightness constancy and