Model-based rate-distortion optimized video-based point cloud compression with differential evolution

. The Moving Picture Experts Group (MPEG) video-based point cloud compression (V-PCC) standard encodes a dynamic point cloud by first converting it into one geometry video and one color video and then using a video coder to compress the two video sequences. We first propose analytical models for the distortion and bitrate of the V-PCC reference software, where the models’ variables are the quantization step sizes used in the encoding of the geometry and color videos. Unlike previous work, our analytical models are functions of the quantization step sizes of all frames in a group of frames. Then, we use our models and an implementation of the differential evolution algorithm to efficiently minimize the distortion subject to a constraint on the bitrate. Experimental results on six dynamic point clouds show that, compared to the state-of-the-art, our method achieves an encoding with a smaller error to the target bitrate (4.65% vs. 11.94% on average) and a slightly lower rate-distortion performance (on average, the increase in Bjøntegaard delta (BD) distortion is 0.27, and the increase in BD rate is 8.40%).


Introduction
A static point cloud is a representation of a three-dimensional object, where in addition to the spatial coordinates of a sample of points on the surface of the object, attributes such as color, reflectance, transparency, and normal direction may be used.A dynamic point cloud consists of several successive static point clouds.Each point cloud in the sequence is called a frame.Point clouds are receiving increased attention due to their potential for immersive video experience applications such as virtual reality, augmented reality, and immersive telepresence.
To get a high-quality representation of a three-dimensional object as a point cloud, a huge amount of data is required.To compress point clouds efficiently, the Moving Picture Experts Group (MPEG) launched in January 2017 a call for proposals for point cloud compression technology.As a result, two point cloud compression standards are being developed: video-based point cloud compression (V-PCC) [1] for point sets with a relatively uniform distribution of points and geometry-based point cloud compression (G-PCC) [2] for more sparse distributions.In this paper, we focus on V-PCC for dynamic point clouds.In V-PCC, the input point cloud is first decomposed into a set of patches, which are independently mapped to a two-dimensional grid of uniform blocks.This mapping is then used to store the geometry and color information as one geometry video and one color video.Next, the generated geometry video and color video are compressed separately with a video coder, e.g., H.265/HEVC [3].Finally, the geometry and color videos, together with metadata (occupancy map for the two-dimensional grid, auxiliary patch, and block information) are multiplexed to generate the bit stream (Fig. 1 [1]).In the video coding step, compression is achieved with quantization, which is determined by a quantization step size or, equivalently, a quantization parameter (QP).Given a set of  quantization step sizes { 0 , … ,  −1 } and a dynamic point cloud consisting of  frames, an optimal encoding can be obtained by determining for each frame  ( = 1, … ) the geometry quantization step size  , ∈ { 0 , … ,  −1 } and color quantization step size  , ∈ { 0 , … ,  −1 } that minimize the distortion subject to a constraint   on the total number of bits.This can be formulated as the multi-objective optimization problem min , where  , (  ,   ) and  , (  ,   ) are the number of bits for the geometry and color of the th frame, respectively.In practice, problem (1) is scalarized as follows.
. .(  ,   ) ≤   , where  ∈ [0,1] is a weighting factor that sets the relative importance of the geometry and color distortions.As the number of possible solutions is  2 , solving the problem with exhaustive search is not feasible when  or  is large as the computation of the distortion and the number of bits requires encoding and decoding the point cloud, which is very time consuming.In this paper, we solve the rate-distortion optimization problem (2) by first developing analytical models for the distortion and bitrate and then applying a metaheuristic based on differential evolution (DE) [4] to the analytical models.There is a need for new models as the existing ones [5,6,7] are not suitable for the rate-distortion optimization problem (2).Note also that the V-PCC standard does not give any solution to problem (2).In the latest MPEG V-PCC test model [8], for example, the QPs for the geometry and color are selected manually: one chooses the QPs of the first frame, and the QP values of the following frames are set according to some fixed rules (e.g., by using the same values for the low delay configuration).

Related Work
Only a small number of works [5,6,7] have proposed rate and distortion models for point cloud compression.In [5], the focus is on the point cloud library (PCL) platform [9] for the compression of static point clouds.This platform uses an octree decomposition for geometry compression and JPEG for color compression.Analytical models that describe the relationship between the encoding parameters (the maximum octree level and the JPEG quality factor) and the color distortion   and bitrate  are derived with statistical analysis.Let  be the maximum octree level and let  be the JPEG quality factor.The color distortion is modeled as   =     , where , ,  are model parameters.On the other hand, the bitrate is modeled as ln  =  +  + , where , ,  are model parameters.Then, the models are used to formulate the rate-distortion optimization problem as a constrained optimization problem, and an interior point method is applied to solve it.In [6], a similar approach is applied to V-PCC for dynamic point clouds.First, distortion and rate models for the geometry information and color information are derived as follows:   =    ,1 +   ,   =    ,1 +    ,1 +   ,   =    ,1   ,   =    ,1   , where   ,   ,   ,   ,   ,   ,   ,   ,   are model parameters.Then, an interior point method is used to minimize the weighted sum of the distortions subject to a constraint on the total number of bits.One limitation of this work is that the distortion and rate models are functions of the quantization steps of the geometry and color information of the first frame only.Thus, these models are only suitable when the quantization steps of the following frames are set according to the default settings of the V-PCC test model and are not appropriate for the general rate-distortion optimization problem (2).In [7], a point cloud is partitioned into seven regions such that the first six regions correspond to the six patches with the largest area in the six projection planes, and the seventh region consists of all other patches.Then, the geometry and color quantization steps corresponding to each region are optimized separately using the analytical models in [6].

Rate and Distortion Models
In this section, we propose new analytical distortion and rate models for V-PCC.For both the geometry distortion and color distortion, we used the symmetric point-to-point distortions based on the mean squared error (MSE) [10].Moreover, for the color information, we considered only the Y (luminance) component.To compute the actual values of the distortion and bitrate, we used the latest V-PCC test model (TMC2 v12.0) [8], where the encoder settings were modified such that the QPs of the frames can be chosen arbitrarily.Note that TMC2 v12.0 relies on the HEVC Test Model Version 16.20 (HM16.20)[11] to compress the geometry and color videos.In HEVC, the set of QP values is {0, … ,51}, which corresponds to quantization step sizes {0.625, … ,224}.We encoded four frames of the point cloud using the low delay configuration with group of pictures (GOP) structure IPPP.
Table 1.Dependency between the first frame and the second frame for the basketballplayer point cloud.Encoding is with the low delay configuration of [8].

Distortion Models
In [6], the geometry distortion   and color distortion   are modeled as functions of the geometry and color quantization step sizes of the first frame ( ,1, and  ,1 , respectively) according to where   ,   ,   ,   , and   are model parameters.In this paper, we extend this model by including the quantization step sizes of all frames.For simplicity, we assume that the number of frames  is equal to 4. To study the effect of the quantization in the first frame on the distortion in the second frame, we fixed the quantization steps of the second frame and varied those of the first frame.Table 1 shows that the effect of the quantization step of the first frame on the distortion of the second frame is very small for both geometry and color.We observed the same phenomenon for the other frames.Consequently, we propose the following distortion models for the th frame (5)

Rate Models
As the number of bits of the first frame is only determined by its own quantization steps ( ,1 ,  ,1 ), it can be modeled as in [6] where  ,1 ,  ,1 ,  ,1 , and  ,1 are model parameters.To obtain the rate model for the second frame, we first ignore the impact of the first frame on the second frame and use the basic model where  ,2 ,  ,2 ,  ,2 , and  ,2 are model parameters.However, Table 1 shows that the number of bits of the second frame increases when the quantization steps of the first frame increase.To take this dependency into account, we update the model as where  ,(1,2) and  ,(1,2) are the impact factors of the first frame on the second frame.Similarly, we first assume that the number of bits of the third and fourth frames are independent of the quantization steps of the other frames and model them as where  ,3 {  ,4 = ∏ ( ,(,+1) •  , + 1) where  ,(,+1) and  ,(,+1) ( = 2,3) are the impact factors of the -th frame on the ( + 1)-th one.Finally, we use ( 6), ( 8), ( 11) and (12) to build the rate model as =∑  , 4 =1 +  , .

(13)
Table 2 shows the QP settings used to compute the parameters of the distortion and rate models.

Optimization
To solve the rate-distortion optimization problem (2), we apply a DE variant to the analytical models derived in Section 3. Unlike the standard DE algorithm, this variant decreases the crossover rate with time and uses a random scaling factor.The decrease in crossover rate at runtime increases the exploitation pressure at the end of the run [12].The randomization of the scaling factor is motivated by the experimental observation that a certain degree of randomization is beneficial [12].
set the crossover rate to  = 0.9 otherwise, set  = 0.1 Step 1: Select randomly from the population three different agents , ,  that are also different from  ()  Step 2: Select randomly an index  such that 1 ≤  ≤ 2 Step 3: Compute a candidate new agent  () as follows:  For each  ∈ {1, … ,2} , choose a random number   according to a uniform distribution in (0,1).Choose a scaling factor  randomly in . If   ≤  or  = , then set   () =   +  × (  −   ); otherwise, set   () =   () Step 4: If ( () ) < ( () ) and ( () ) ≤   , note .END FOR FOR  = 1 to , replace  () by  () if  was noted in Step 4. END FOR END FOR  Select the agent from the population that gives the lowest distortion  and round the components of this agent to the nearest values in the set { 0 , … ,  −1 }.Another way of solving problem ( 2) is to use conventional non-evolutionary constrained nonlinear optimization algorithms.However, when the problem is not convex, such algorithms are only guaranteed to find local minima and are very sensitive to the starting point of the algorithm (see Section 5).

Experimental Results
We first study the accuracy of the proposed distortion and rate models.The bitrates and distortions were computed for the quantization steps obtained as solutions of the optimization problem (2) for a given target bitrate.In the DE algorithm, the number of iterations and the size of the population were set to 200 and 50, respectively.The interval  was [0.1, 0.9].As in Section 3, we used the symmetric point-to-point distortions and considered only the luminance component.The weighting factor  in (2) was set to 0.5.To compute the actual distortion and bit rates, we used TMC2 v12.0 [8] and encoded the first four frames of the point cloud for the IPPP GOP structure.Table 3 shows the results for six dynamic point clouds (longdress, redandblack, loot, soldier, queen, basketballplayer) [13,14].The bitrates are expressed in kilobits per million points (kbpmp).We observe that the bitrates and distortions computed by our models have a high squared correlation coefficient (SCC) and a low root mean squared error (RMSE) with the actual values computed by encoding and decoding point clouds.This shows that our models are accurate.Table 4 compares the bit allocation accuracy of the proposed method to that of the method in [6].The bit allocation accuracy is evaluated with the bitrate error (BE) where   and   are the actual bitrate computed by the method and the target bitrate, respectively.The largest BE for the method in [6] was 42.15% (basketballplayer, 265 kbpmp), while the largest BE for the proposed method was only 15.28% (redandblack, 640 kbpmp).Moreover, the average BE for the method in [6] was 11.94%, while that of the proposed method was only 4.65%.Table 4 and Fig. 2 show that the rate-distortion performance of the proposed method is slightly lower than that of the method in [6].Table 5 compares the time complexity of the proposed method to that of the method in [6].The increase in the CPU time is mainly due to the pre-optimization step needed to determine the parameters of the models (11 encodings for the proposed method vs. three encodings for the method in [6]).
Table 5. CPU time on a laptop with a 2.7 GHz i7-7500U processor and 8 GB RAM.
Finally, Fig. 3 illustrates how solving the optimization problem (2) with conventional non-evolutionary constrained nonlinear optimization algorithms can lead to poor solutions.Here the MATLAB implementation of the state-of-the-art interior point method in [16] was used with the starting point (2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5).
The data files used in the experimental results are available in [17].

Conclusion
We proposed analytical distortion and rate models for V-PCC that include the geometry and color quantization steps of all frames in a group of frames.Then, we used the models and a DE variant to efficiently select the quantization steps for a given target bitrate.Experimental results show that the proposed optimization technique allows a better rate control than the state-of-the-art.Rate control is critical in applications where the bandwidth is constrained.Our optimization technique can be easily extended to the case where the point cloud consists of more than one group of frames: we first determine the model parameters of the distortion and rate models for each group separately and then use DE to minimize the overall distortion subject to the constraint on the total number of bits.As further future work, we plan to apply our technique to GOPs of more than four frames and to the V-PCC random access configuration.

Table 3 .
Accuracy of the proposed rate and distortion models.