Global Rate-distortion Optimization of Video-based Point Cloud Compression with Differential Evolution

In video-based point cloud compression (V-PCC), one geometry video and one color video are generated from a dynamic point cloud. Then, the two videos are compressed independently using a state-of-the-art video coder. In the Moving Picture Experts Group (MPEG) V-PCC test model, the quantization parameters for a given group of frames are constrained according to a fixed offset rule. For example, for the low-delay configuration, the difference between the quantization parameters of the first frame and the quantization parameters of the following frames in the same group is zero by default. We show that the rate-distortion performance of the V-PCC test model can be improved by lifting this constraint and considering the ratedistortion optimization problem as a multi-variable constrained combinatorial optimization problem where the variables are the quantization parameters of all frames. To solve the optimization problem, we use a variant of the differential evolution algorithm. Experimental results for the low-delay configuration show that our method can achieve a Bjøntegaard delta bitrate of up to -43.04% and more accurate rate control (average bitrate error to the target bitrate of 0.45% vs. 10.75%) compared to the state-of- the-art method, which optimizes the rate-distortion performance subject to the test model default offset rule. We also show that our optimization strategy can be used to improve the rate-distortion performance of two-dimensional video coders.

Point clouds have received increased attention as they provide a better immersive experience, such as free viewpoint rendering, as well as mixing of natural and synthetic objects. However, this improved user experience comes at the cost of increased storage and bandwidth requirements as point clouds are typically represented by the geometry and color (texture) of millions of 3D points. In 2017, the Moving Picture Experts Group (MPEG) launched a call for proposals for point cloud compression [1]. As a result, three point cloud compression technologies were developed: surface point cloud compression (S-PCC) for static point cloud data, video-based point cloud compression (V-PCC) for dynamic content, and LIDAR point cloud compression (L-PCC) for dynamically acquired point clouds. Later, L-PCC and S-PCC were merged under the name geometry-based point cloud compression (G-PCC). In this paper, we focus on V-PCC [2], which has reached the Final Draft International Standard status. V-PCC can support many applications, including six degrees of freedom immersive media, virtual reality, augmented reality, and immersive realtime communication. It converts a dynamic point cloud into one geometry video and one color video (Fig. 1) and then uses a two-dimensional (2D) video coder (e.g., H.265/HEVC [3]) to compress the two video sequences independently. In the video coding step, compression is achieved with quantization, which is determined by a quantization step or, equivalently, a quantization parameter (QP) ( Table I). In the latest MPEG V-PCC test model [4], the QPs for the geometry and colour information are not optimized. For a given group of frames, one chooses the QPs for the first frame, and the QPs for the following frames in the same group are set according to a fixed difference (e.g., zero as the default value in the lowdelay configuration). This approach has two major limitations. First, given a target bitrate, it is unclear how the QPs for the first frame should be chosen. Second, the imposed relationship between the QPs of the first frame and those of the following frames may harm the rate-distortion performance. To address these limitations, we propose a general rate-distortion optimization framework for V-PCC, where we allow the QPs to take any value in the admissible set. Then, we use a variant of the differential evolution (DE) algorithm [5] to optimize the selection of the QPs for each frame in a group of frames. DE is an evolutionary algorithm that is easy to implement and that has been successfully applied to various global optimization problems in science and engineering [6]. Unlike many traditional optimization techniques, it can be used when the objective function is nondifferentiable and even when it is not given in closed form. While DE was initially introduced for continuous optimization problems, it can also be adapted [6] to combinatorial optimization problems.
Given a dynamic point cloud consisting of N frames, an optimal encoding is obtained by determining for each frame i(i = 1, . . . , N ) the geometry quantization step Q g,i ∈ {q 0 , . . . , q M −1 } and colour quantization step Q c,i ∈ {q 0 , . . . , q M −1 } that minimize the distortion subject to a constraint R T on the total bit budget. We formulate this problem as the multi-objective optimization problem min Qg,Qc is the color distortion, and R g (Q g , Q c ) is the number of bits allocated to the geometry and color information. Then we scalarize problem (1) as min Qg,Qc where ω is a weighting factor that sets the relative importance of the geometry and color distortions. As the number of candidates is M 2N , finding an optimal solution with full search is not feasible when M or N is large.

II. RELATED WORK
Rate-distortion optimization for V-PCC has not been studied sufficiently. In [7], statistical analysis is used to derive mathematical models for the rate and distortion under the offset constraint QP g,i = QP g,1 and QP c,i = QP c,1 for i = 2, . . . , N , where QP g,i and QP c,i are the geometry QP and color QP for the ith frame in a group of N frames, respectively. Specifically, the geometry distortion D g is modeled as α g Q g,1 + β g , the color distortion D c is modeled as α gc Q g,1 + α cc Q c,1 + β c , the geometry bitrate is modeled as γ g Q θg g,1 , and the color bitrate is modeled as γ c Q θc c,1 , where α g , β g , α gc , α cc , β c , γ g , θ g , γ c , θ c are parameters for the models. Then, problem (2) is written as min Qg,1,Qc,1 and solved using an interior point method. Experimental results show that the method has similar rate-distortion performance to exhaustive search (for the same offset constraint) but has a much lower time complexity. In [8], the method in [7] is first used to allocate the total bitrate between the geometry and color information. Then, a frame is partitioned into seven regions, and analytical models for the rate and distortion of each region are derived. Next, the geometry quantization steps for all regions are determined by minimizing the total geometry distortion subject to the allocated geometry bitrate. Finally, given the optimal geometry quantization steps, the color quantization steps for the regions are determined by minimizing the total color distortion subject to the allocated color bitrate. Experimental results for static point clouds show that the method allows more accurate rate control than the method in [7] but has lower rate-distortion performance. While the two methods are computationally efficient, their rate-distortion performance is limited by the fixed offset between the QPs of the first frame and those of the following ones. In [9], the offset constraint is lifted and DE is used to solve problem (2), where analytical models are used for the geometry and color rates and distortions. However, as the proposed analytical models lack accuracy, the rate-distortion performance is lower than that of [7]. The problem of finding the globally optimal QPs for a group of frames has been intensively studied for 2D video. By assuming a monotonicity condition, Ramchandran, Ortega, and Vetterli [10] show how an optimal solution can be found with a tree-based algorithm. However, the time complexity of the algorithm is too high, making it impractical [11]. A similar solution based on the Viterbi algorithm was proposed in [12]. However, to reduce the time complexity, a monotonicity assumption and a node clustering assumption that are not necessarily true are made. In [11], an analytical convex ratedistortion model that takes into account the dependencies between the frames is developed for HEVC. Then, a primaldual algorithm is used to determine the optimal rates for each frame. Because the model is only an approximation of the operational rate-distortion function, the solution is not guaranteed to be optimal. Moreover, the optimization algorithm considers the problem as continuous, while the QPs are discrete. In [13], [14], [15], and [16], frames within a group of pictures are allocated different QPs according to their hierarchical layer. This approach, known as quantization parameter cascading, is motivated by the fact that frames in a lower layer have a greater impact on the overall ratedistortion performance than frames in a higher layer. However, this approach is suitable for hierarchical prediction structures only. We conclude this section by noting that an extension of the existing 2D video rate-distortion optimization methods to V-PCC would be difficult.

III. PROPOSED OPTIMIZATION METHOD
To solve the optimization problem (2), we propose to use a variant of DE. Starting from a population of randomly selected solutions, DE generates for each solution an offspring by perturbing another solution from the population with a scaled difference of two randomly selected solutions from the population. If the offspring is a better solution than the parent, the parent is replaced by the offspring. This procedure is repeated for a given number of iterations. One of the advantages of DE is that it has only three control parameters: the population size N P , a scaling factor µ that scales the difference of the two randomly selected solutions, and a crossover rate CR that controls the number of parents that may be replaced. In the following, we explain how we applied DE to the rate-distortion optimization problem (2). First, we rewrite the problem as for k = 1 to n do if k < 2 3 n then Set the crossover rate CR to 0.9 else CR = 0.1 end if for j = 1 to N P do Step 1: Select randomly from the population three different vectors A, B, C that are also different from X (j) Step 2: Select randomly an integer r such that 1 ≤ r ≤ 2N Step 3: Build a candidate vector Y (j) as follows.
For each i ∈ {1, . . . , 2N }, choose a random number r i according to a uniform distribution in (0.1,0.9). Choose a scaling factor µ randomly in I.
Step 4: If D(F −1 (Y (j) )) < D(F −1 (X (j) )) and R(F −1 (Y (j) )) ≤ R T , mark j. end for for j = 1 to N P do replace X (j) by Y (j) if j was marked in Step 4. end for end for Output: Select the vector from the population that gives the lowest distortion D. Our implementation differs from standard DE in three ways. First, we decrease the crossover rate CR at runtime to increase the exploitation pressure. As shown in [17], DE frameworks are prone to stagnate. That is, the population of the algorithm is diverse and yet searches in the decision space without succeeding at generating a solution outperforming the best individual of the population. This stagnation can be mitigated by exploiting the search directions available in the DE population [18]. A reduction in the crossover rate CR makes the offspring similar to the generating parent and thus exploits the available genotypes. Second, in accordance with [19], we select the scaling factor µ randomly to retain population diversity as the search progresses. Our experiments show that this randomization is beneficial. Finally, we use rounding inside the iterations to make the algorithm suitable for the combinatorial optimization problem (4).

IV. EXPERIMENTAL RESULTS
In this section, we apply our optimization technique to V-PCC (Section IV-A) and H.264/AVC (Section IV-B). To assess the performance of our technique, we compute the Bjøntegaard delta (BD) rate and BD distortion [20]. We also use the bitrate error (BE) as a measure of the bit allocation accuracy. Here, where R a is the bitrate of the method and R T is the target bitrate. For V-PCC, the bitrates are expressed in kilobits per million points (kbpmp). The source code of our method and the data files are publicly available from [21].

A. Rate-distortion optimization of V-PCC
We applied the proposed method to the V-PCC test model category 2 Version 12 [4], which relies on the High Efficiency Video Coding Test Model Version 16.20 to compress the geometry and color videos. We considered four dynamic point clouds (Soldier, Queen, Loot, and Longdress) [22], [23] and encoded their first four frames (i.e., N = 4) using the lowdelay configuration with group of pictures (GOP) structure IPPP. In the DE algorithm, the size of the population N P was 50, the number of iterations n was 75, and the range I of the scaling factor was the interval [0.1, 0.9]. In the initialization step, a vector was included only if it satisfied the rate constraint, where the rate was computed according to the analytical model in [9]. For the geometry and color distortions, we used the symmetric point-to-point metrics [24] based on the root mean squared error. For the color distortion, we considered only the luminance component. The weighting factor ω was set to 1/2. Table II compares the solutions found by our algorithm to those found by the state-of-theart algorithm in [7]. Table III and Fig. 2 compare the ratedistortion performance of the proposed method to that of the method in [7]. The table provides for various target bitrates, the actual bitrate, the distortion, the peak signal-to-noise ratio (PSNR) for the geometry information (PSNR G), the PSNR for the color information (PSNR C), and the BE. The PSNR was computed as in [24]. Fig. 2 compares the rate-distortion curves. The results show that our method outperforms the method in [7] in terms of rate-distortion performance and bitrate accuracy. For example, the BD-rate was up to -43.04% and the highest BE was only 1.52%, while it reached 25.72% for the method in [7]. Note that the method in [7] was shown to provide results comparable to exhaustive search subject to the test model offset constraint QP g,i = QP g,1 , QP c,i = QP c,1 , i = 2, . . . , N . However, since the method in [7] uses mathematical models to solve the optimization problem while the proposed method uses the actual distortion and rate functions, this improvement in rate-distortion performance comes at the cost of increased computation time (Table IV). Note that the number of times our method encodes the point cloud can be reduced by saving the values of the distortions and rates in Step 4 of the DE algorithm to avoid recomputing them if the same vector is considered again. For the method in [7], the optimization process is very fast but the point cloud must be encoded three times in a pre-processing step to compute the parameters of the rate and distortion models. RaceHorses has a frame rate of 30 frames per second (fps), while BQSquare's frame rate is 60 fps. We encoded the first sixteen frames (N = 16) using GOP structure IPP · · · P. For x264, we used the -bitrate rate control option, which leads to variable QP values within and across frames. We give results for both the 1-pass and the 2pass mode. We set the other encoder settings to the default values. The settings for DE were as in the previous section, except for n, which was equal to 500. Table V shows the results. Our optimization technique, which computes one QP for each frame, increased the average PSNR of the luminance component by 0.42 (resp. 0.12) dB in terms of BD-PSNR and decreased the average bitrate by 6.96% (resp. 9.48%) in terms of BD-rate compared with the 1-pass (resp. 2-pass) mode of the x264 bitrate rate control. Moreover, the average BE was only 0.16%, compared to 20.17% (resp. 6.23%) for the 1-pass (resp. 2-pass) mode.

V. CONCLUSION
We proposed a method to optimize the rate-distortion performance of V-PCC. Our method formulates the rate-distortion problem as a constrained combinatorial optimization problem where the optimization variables are the quantization step sizes for the compression of the geometry and color videos. A solution to the problem is computed with a variant of the DE algorithm. Experimental results show that the method outperforms the state-of-the-art interior point method in [7], which imposes a fixed offset between the quantization parameters for the first frame and those for the other frames in the same group of frames. Our method not only provides more accurate rate control but can achieve a BD rate of up to -43.04% compared to the method in [7]. The performance of our method is expected to improve if we increase the number of iterations as we used only 75 iterations, although the recommended number for such a population size and problem dimension is significantly larger [27]. Since the calculation of the actual distortion and rate requires the encoding and decoding of the point cloud, our method is only suitable for applications where the point cloud is encoded offline. Our method is very general and can be used with any video coder. For example, we showed that it can improve the rate-distortion performance of the x264 implementation of H.264/AVC. In this paper, the encoding was applied to only one group of frames of the point cloud. Extending our method to more than one group of frames is straightforward and left as future work. Our optimization framework may lead to large variations of the quantization step sizes within a group of frames, which may affect the subjective quality. This can be addressed by adding appropriate constraints on the variables in problem (4). Other future work will include trying more sophisticated variants of DE, which have proven to be effective in other applications [6].