Survival Regression Models With Dependent Bayesian Nonparametric Priors

Abstract We present a novel Bayesian nonparametric model for regression in survival analysis. Our model builds on the classical neutral to the right model of Doksum and on the Cox proportional hazards model of Kim and Lee. The use of a vector of dependent Bayesian nonparametric priors allows us to efficiently model the hazard as a function of covariates while allowing nonproportionality. The model can be seen as having competing latent risks. We characterize the posterior of the underlying dependent vector of completely random measures and study the asymptotic behavior of the model. We show how an MCMC scheme can provide Bayesian inference for posterior means and credible intervals. The method is illustrated using simulated and real data. Supplementary materials for this article are available online.


Introduction
The statistical analysis of the, potentially censored, survival time to an event has a long history. Often, estimates of the effects of observed covariates on the survival time distribution are key statistical quantities of interest. For example, information about white blood cells may be useful for the prediction of the time to death of leukemia patients. There are several standard regression models. The accelerated failure time (AFT) model takes into account the effect of a covariate by accelerating or decelerating over time its effect on the survival time (Buckley and James 1979). Alternatively, a parametric effect for the covariates can be combined with a nonparametric estimate of a baseline distribution of the survival time. The most popular example of this type of model is the semiparametric Cox (1972) model which has had a substantial impact in statistical and medical research, being introduced in one of the most cited statistical papers of all time (Ryan and Woodall 2005). The Cox regression model assumes proportional hazards (PH) and can be easily fitted with partial likelihood methods (Cox 1975). The combination of this inference method with the counting process formulation of the model (Andersen and Gill 1982) has led to extensions to stratified analysis, proportional intensity models, frailty models, and so on (Therneau and Grambsch 2000). The model also leads medical researchers to focus on differences in instantaneous risk (hazard) rather than mean or median survival as in common regression models. Under the PH assumption, the survival curves for any combinations of covariate values must have hazard functions that are proportional over time, that is, have constant hazard ratios. This is sometimes not realistic. For example, if a treatment effect is negative at the beginning of a study and positive by CONTACT Fabrizio Leisen fabrizio.leisen@gmail.com School of Mathematical Sciences, University of Nottingham, Nottingham NG7 2RD, UK. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. the end. Failing to account for this can lead to poor model fits, particularly in the tails of the survival distribution. Such problems can be addressed by including interactions with time or stratifying according to the treatment (Kalbfleisch and Prentice 2011). However, these approaches can lead to difficulties with interpretation of effects. Alternatively, the structure of the model can be changed. For example, the proportional odds (PO) model relaxes the PH assumption of a constant hazard ratio by assuming hazard functions such that this property holds only when the time goes to infinity (Cheng, Wei, and Ying 1995;Murphy, Rossini, and Van der Vaart 1997;Yang and Prentice 1999).
From the Bayesian perspective, the analysis of survival data was one of the first areas of application of Bayesian nonparametric techniques, see Doksum (1974) and Ferguson (1974), and Hjort et al. (2010) for a review. Popular priors include the beta process prior for the cumulative hazard function (Hjort 1990), the extended gamma process (Dykstra and Laud 1981), and the wide-class of neutral to the right (NTR) distributions (Doksum 1974).
In this article, we focus on the NTR model which assumes that the survival function of a survival time of interest Y, which is S(t) = P [Y > t], is given by where μ is a completely random measure (CRM) (Kingman 1967) for which μ(R + ) a.s.
= ∞ to ensure that the distribution of Y is supported in R + . As noted in Doksum (1974), such distribution is NTR in the sense that if F(t) = 1 − S(t) is the associated cumulative distribution function then are independent for every t 1 < · · · < t k . The structure is very general and includes the Dirichlet process (Ferguson 1973) and Beta-Stacy process (Muliere and Walker 1997) as special cases. The family of NTR distributions has desirable theoretical properties with survival data such as conjugacy for right-censored data (Ferguson and Phadia 1979) and posterior consistency at an optimal rate (Kim and Lee 2004). They are also a natural Bayesian nonparametric analogue of the widely used frequentist Kaplan-Meier estimator. The approach was extended to multiple samples by Epifani and Lijoi (2010) and Riva-Palacio and Leisen (2018) and to Cox regression modeling by Kim and Lee (2003). Their model assumes that the survival function, S X (t), for covariate value X = (X 1 , . . . , X m ) ∈ R m is modeled by Alternatively, Bayesian nonparametric regression survival models can be built by modeling the logarithm of the survival time using a dependent nonparametric prior. These allow for crossing survival and hazard curves and have straightforward interpretations. For example, the linear dependent Dirichlet process (LDDP) mixture (De Iorio et al. 2009) uses dependent Dirichlet processes (MacEachern 1999) and the linear dependent tailfree process (Jara and Hanson 2011) builds on Pólya tree priors. These approaches are reviewed by Hanson and Jara (2013). Other approaches to nonproportional hazards include Nieto-Barajas (2014) who introduced a semiparametric model based on increasing additive processes, Nipoti, Jara, and Guindani (2018) who proposed a partially proportional hazards model using cluster-dependent random hazards, and Fernández, Rivera, and Teh (2016) who modeled the hazard as a logistic transform of a sum of Gaussian processes.
In this article, we build a tractable Bayesian nonparametric regression model for survival data based on the class of NTR distributions. The model assumes that where f 1 , . . . , f d are known functions of the covariates X, β are unknown parameters and μ = (μ 1 , . . . , μ d ) is a vector of completely random measures (VCRMs). This extends the proportional hazards Cox regression model of Kim and Lee (2001) to allow for more general functions of the covariates and can be interpreted in a competing risks framework. VCRMs have proven to be a useful tool for inducing dependence in Bayesian nonparametric priors (see, e.g., Lijoi, Nipoti, and Prünster 2014;Camerlenghi, Dunson, et al. 2019). We term this model a generalized additive neutral to the right regression (GANTR) model. GANTR provides a flexible regression modeling approach within an NTR model. It allows for nonproportional hazards and leads to clustering of observations into subpopulations (associated with different causes in the competing risks interpretation) according to covariate values. The model can be seen as a generalization of the multiple-sample model of Riva-Palacio and Leisen (2018) to allow for stratification into unknown covariate dependent clusters/subsamples. The structure of the prior allows us to develop a posterior characterization and use it to construct an inference scheme that depends on the VCRM through its Laplace exponent. We concentrate on the class of compound random measures (Griffin and Leisen 2017), and develop both an MCMC method and empirical Bayes method for estimating the hyperparameters. Unlike AFT models, the NTR approach models the survival function directly which eases the interpretation of the overall model and, particularly, the covariate effects. The tractability of the GANTR model allows us to derive the likelihood of the regression coefficients and the hyperparameters of the VCRM and so implement fast maximum a posteriori (MAP) inference methods. We also find that the GANTR model leads to better fit of the data than the LDDP in both simulated and real data.
The outline of the article is as follows. In Section 2, we consider CRMs and VCRMs in more detail. In Section 3, we formally present the GANTR model. Section 4 develops a posterior characterization of the model and results necessary for the ensuing inference procedures. We present simulation and real data studies for our model in Section 5. Conclusions for the work are presented in Section 6. Proofs, further properties regarding asymptotic behavior, details regarding inference and further simulation studies are included in the supplementary materials. Code for our model is available in https://github.com/ alan7riva/GANTR.

Preliminaries
VCRMs are a key building block of our proposed Bayesian nonparametric model. This section will introduce some basic ideas and representations through Laplace functional transforms. We will focus on the compound random measure (CoRM) class of VCRMs which were recently introduced by Griffin and Leisen (2017).
Let Y be a complete and separable metric space with corresponding Borel σ -algebra Y and probability space ( , F, P). We denote by M Y the space of boundedly finite measures in the measure space (Y, Y) and the associated Borel σ -algebra by M Y .
In this article, we restrict attention to CRMs of the form are collections of random variables taking values in R + and Y, respectively. We will refer to w i as jump heights and Y i as jump locations. Such CRMs can be characterized by their Laplace transform where λ > 0 and ν is a measure on R + × Y such that R + ×A min{s, 1}ν(ds, dy) < ∞ for any bounded set A ∈ Y. The measure ν is usually called the Lévy intensity of μ. We denote the Laplace exponent of a CRM as ψ t where for t, See Kingman (1967) for a full review of CRMs. We say that a Lévy intensity ν is homogeneous if it can be written in the form where κ is a nonatomic measure on Y referring to the jump locations and ρ is a measure on R + referring to the jump heights. For example, we will use the homogeneous CRM with Lévy intensity where γ > 0 and α > 0 which is a particular case of the Gamma process, see Phadia (2015). We refer to this process as the Gamma CRM which is denoted Gamma(α, γ ). There is a natural generalization of CRMs to the multivariate setting.
The corresponding multivariate analogue of the Laplace transform (3) is The corresponding homogeneous case arises when ν(ds, dy) = ρ(ds)κ(dy). (2017) introduced a flexible class of VCRMs called compound random measures where the dependence structure of the vector is modeled in a constructive way.

Griffin and Leisen
Definition 3. Given a d-variate probability density function h and an univariate Lévy intensity ν we say that a d-variate VCRM μ is a compound random measure (CoRM) with score distribution h and directing Lévy measure ν if the d-variate Lévy intensity of μ is given by In Riva-Palacio and Leisen (2019), the existence of marginal first moments for the score distribution in a CoRM is shown to be sufficient for the integrability condition (5) to be satisfied; in this work we only consider CoRMs with such score distributions. Furthermore, Riva-Palacio and Leisen (2019) showed that CoRMs have an elegant interpretation in terms of discrete measures. Indeed, if μ is a homogeneous univariate CRM with Lévy intensity ν (ds, dy) and series representation An interesting example of a CoRM is defined by a LogNormal(m, ), score distribution, with mean vector m and covariance matrix . Such choice allows us to distribute the mass of the directing Lévy intensity across the d-dimensional space of the CoRM intensity. The Lévy intensity of a CoRM with such score distribution and Gamma directing Lévy measure is presented in the supplementary materials. In particular, we will use the following construction.
Definition 4. We say that a d-dimensional random variable Z is given a δ-LogNormal distribution if its probability density function is where LogNormal(z|m, ) is the probability density function of a multivariate lognormal distribution using the parameterization discussed at the end of Section 2, δ ∈ (0, 1], We can use the above as the score distribution in a CoRM with Gamma directing Lévy measure. Observe that when using a mixture for the score distribution of a CoRM the Lévy intensity is a sum of the Lévy intensities corresponding to the mixture components with the directing Lévy measure fixed. The δ-LogNormal is a d component mixture model. For small values of σ , the effect of the parameter δ is 2-fold. First, when used as the score distribution in a CoRM, it controls the dependence between dimensions of the VCRM. If δ = 1 the mass of the distribution is accumulated near the point 1 and the related CoRM has a Lévy intensity which accumulates mass near the identity axis. This CoRM is close to a completely dependent VCRM where all dimensions are almost surely equal. On the other hand, if δ → 0, the δ-LogNormal distribution accumulates mass near the points {e i } d i=1 and the related CoRM Lévy intensity accumulates mass near the axes in (R + ) d , which will be close to an independent entries VCRM. Values of δ ∈ (0, 1) will modulate between these distributions and VCRMs. Second, in the multiple-sample information setting where the regression functions f i (β, X k ) = 1 {X k,j =1} for i = 1, . . . , d, then as δ → 0 and σ → 0 the GANTR model is equivalent to an NTR model for each sample. While if δ = 1 and σ → 0 the GANTR model is equivalent to a NTR model. The parameter σ serves to diffuse the mass of the distribution. The modulating effect of δ decreases as σ increases so we chose a relatively small value of σ .

Survival Regression Model
The NTR process (Doksum 1974) for the survival function was defined in (1) as the exponential transform of a CRM μ for which μ(R + ) a.s. = ∞. We say that a random variable Y with this survival function has a NTR distribution, which is denoted with Lévy intensity ν c (ds, dt), where c are parameters of the Lévy measure, and f i : The GANTR model for a single observation can be seen as an NTR distribution conditionally on the covariates X i which allows us to use results about NTR processes with our model. For example, neutrality to the right as in Equation (2) is satisfied conditionally on the covariates, as in Proposition 3 of Riva-Palacio and Leisen (2018) Some previously proposed models arise as special cases of the GANTR model. If X i ∈ {0, 1} m such that X i,j = 1 if and only if the ith observation belongs to the jth sample, the multiplesample NTR model of Riva-Palacio and Leisen (2018) for m samples can be recovered by choosing f j (β, Kim and Lee (2003) arises when d = 1 and f 1 (β, X) = e β , X . Unlike the Cox NTR model, the GANTR model allows for nonproportional hazards. Indeed, if S X 1 (t) and S X 2 (t) are the survival functions at time t of GANTR distributed random variables Y 1 , Y 2 with respective covariates . . , d}, and, clearly, the survival functions for different covariates values can cross at any The GANTR model has been motivated as a flexible model of the effects of covariates on the survival function but it can also be viewed as a competing risks model (Prentice et al. 1978). We assume d independent latent causes for the event of interest and define the survival function for the jth cause with covariates X to beS The survival function for all risks across a sample is the GANTR model. The structure of models means that f j and μ j are not separately identified (although, their product is identified). This is not necessarily a problem for Bayesian inference if we are only interested in functions of these products (such as the survival function) and a prior can be placed on β and μ j . Alternatively, we can choose a parameterization which identifies each product (e.g., by fixing the value of f j (β, X 0 ) for a specific covariate value X 0 ). The survival function for all risks for a covariate X be reexpressed as where w j (β, X). Under the latter parameterization, the w j 's are weightings on each latent cause (which depends on covariates) and allow departures from a Cox proportional hazards models which occurs if w j (β, X)'s do not depend on X. Observations which have similar w j (β, X) will have similar survival curves and this allows us to define subpopulations which tend to have similar survival outcomes. We illustrate this idea in Section 5 using data from melanoma patients. This parameterization can be identified by fixing the value of f (β, X 0 ) at covariate value X 0 . The competing risk interpretation also leads to a simple simulation scheme for our model. We sampleỸ i as survival times according to the survival functionS j,X (t) and set Y = min{Ỹ 1 , . . . ,Ỹ d }.
The GANTR model with a CoRM chosen as the VCRM μ, can be represented as a conditional NTR model where, by substituting (6) The measure μ is a CoRM with the same directing Lévy measure as μ 1 , . . . , μ d and scores d i=1 f i (β, X)m i,j . In this form, the score distribution of the CoRM is marginally a random linear combination of the regression functions f 1 , . . . , f d .

Posterior Characterization
The form of the GANTR model allows us to derive an analytic expression for the posterior distribution given right-censored data. This result allows us to construct an inference scheme for the model as explained in Section 5. We assume that a sample of size n is observed and that there are (right) censoring times C 1 , . . . , C n which are iid and independent of the survival times Y 1 , . . . , Y n . We observe the time T i = min{Y i , C i } and the indicator variable be the survival data available for analysis. The k ≤ n order statistics (without repetition) of T 1 , . . . , T n are represented by T (1) < · · · < T (k) and define T ( . Initially, we derive the likelihood of right-censored data D in the GANTR model. We assume the following condition on the GANTR model.

Condition 1. the VCRM μ has a Lévy intensity ν(s, dy)ds such that η t (s) = ν (s, (0, t])
is differentiable in the sense that the partial derivative η t 0 (s) = ∂η t (s)/∂t| t=t 0 exists and, as s → ∞, η t (s) = O(exp(ks)) with k < This is a weak condition and is equivalent to requiring that the derivative of κ(t) exists in the homogeneous case.
In the following result we provide a convenient expression for the likelihood of β, the regression coefficients, and c, the hyperparameters of the Lévy intensity, in the GANTR model. We want to emphasize the dependence of the Lévy intensity on c so in the following proposition we use the particular notation ν c , η t,c and ψ t,c for the Lévy intensity, partial derivative as above and Laplace exponent, respectively. Proposition 1. Let D be survival data and assume a GANTR model with Condition 1. Let ψ t,c be the Laplace exponent associated to ν c , then the likelihood of β and c is given by The next theorem provides the posterior distribution of the model in (7) with a general VCRM and possibly right-censored data.
Theorem 1. Let D be survival data and assume a GANTR model with Condition 1. If f i > 0 for at least one i ∈ {1, . . . , d}, the posterior distribution of μ is the distribution of the random measure The above characterization can be seen as a conjugacy property where, similarly to NTR distributions (see, e.g., Ferguson and Phadia 1979), the posterior is updated to be GANTR model with (μ • ), furthermore this can be used to calculate the posterior mean of the survival function of a new time event Y with associated new covariate X , that is, E μ|D [S X (t)] = E μ|D [P [Y > t |μ, β, X ]]. Such a posterior mean, where we have integrated out the underlying VCRM μ, can be used for estimation purposes; its calculations is made explicit in the next corollary.
Corollary 1. In the setting of Theorem 1, we denotẽ Let S X (t) = P [Y > t |μ, β, X ] be the survival function of an GANTR distributed r.v. Y associated with a covariate vector X . Then . . , f d (β, X ) and ψ • is the Laplace exponent of μ • in Theorem 1.
Lemma 1 gives an analytic expression for integrals of the type that appear in both the posterior mean of Corollary 1 and likelihood function of Proposition 2.

Lemma 1. Let ν be a Lévy intensity associated to a d-variate
Lemma 1 provides a readily available estimatorŜ X , in Corollary 1, if the Laplace exponent associated to the underlying VCRM can be easily computed. Further theoretical properties of the GANTR model regarding asymptotic behavior are discussed in the supplementary materials. The results in this section are valid for GANTR models based on general nonhomogeneous VCRMs. In the rest of the article, however, we will work with homogeneous VCRMs consisting of CoRMs with LogNormal score distributions. Epifani and Lijoi (2010) discussed the flexibility of NTR models built using homogeneous VCRMs which is illustrated by the asymptotic properties of this subclass of the GANTR model, see the supplementary materials.

Simulation and Real Data Studies
In this section, we analyze a simulation study and two real survival datasets with the GANTR model and the LDDP (De Iorio et al. 2009) using the implementation in the R library "DPpackage" ). The first real dataset illustrates the identification of subpopulations with GANTR and the second dataset illustrates the performance of the GANTR model relative to the Cox regression model. The GANTR model uses a CoRM as the underlying VCRM with a δ-LogNormal score distribution with σ = 0.1 and a homogeneous Gamma directing Lévy process with parameters α and γ , whose intensity is given in (4), and κ(dy) = dy. There are also analysis of a further simulation study and a real survival dataset in the supplementary materials.
We consider two hybrid inference approaches. In both approaches, we first set (α, γ ). We have found that an effective approach is to use the MAP estimateĉ MAP = (α MAP ,γ MAP ) under the corresponding NTR model using the homogeneous Gamma CRM μ without considering covariates. In such NTR setting, a priori the mean survival is given byŜ(t) = E e −μ(0,t] = exp −γ log 1 + 1 α t . So we can assign priors on γ and α which reflect the rate of survival times in an exponential model. In particular we used a log-normal prior centered in n/ n i=1 T i and variance 0.001 for γ and a log-normal prior centered at one with variance 0.1 for the bone marrow data and 0.01 for the Kidney transplant data due to the rates in the different datasets. This centers the GANTR model around the data and captures the overall rate of the survival times while allowing small values of δ to indicate departure from the NTR model that does not consider covariate effects. The posterior distribution of δ and β, conditional on (α, γ ) or (α,γ ), can be calculated using the likelihood l (β, (δ,α,γ ); D) in (10). A closed form expression for the Laplace exponent of the δ-LogNormal CoRM is not available but a Monte Carlo estimate can be easily calculated using draws from the score distribution and the Laplace exponent of the directing gamma CRM (this can also be used for the calculation of the posterior mean survival curve in Corollary 1). The two inference approaches differ in how δ and β are inferred. First, a MCMC scheme (see the supplementary materials) can be used to draw samples from the posterior distribution of δ and β allowing Monte Carlo estimates of the posterior mean survival and credible intervals to be calculated. Alternatively, an MAP estimate of δ and β can be found using numerical optimization of the posterior distribution. We use the LFBGS routine of the Optim package in Julia (Mogensen and Riseth 2018). Details regarding evaluation of the likelihood gradient are given in the supplementary materials. This only involves one evaluation of Equation (11), in contrast to the MCMC approach, but at the expense of ignoring posterior uncertainty in the parameters. We will denote the MAP estimate of a generic parameter θ byθ MAP .

Simulated Example
We consider a competing risks example. For i = 1, . . . , n, there is a covariate Z i which normal distribution with mean 1 and variance 0.75 truncated to (0, 2) and there are three possible risks: (1−l) 1/5.3 where We(k, λ) represents a Weibull distributions with shape parameter k and scale λ and l ∈ (0, 1).
i and, otherwise, . This defines two subpopulations which are determined by whether (or not) the covariate is above the threshold value of 1. The parameter l controls the difference  Table 1. L 2 distance between estimated survival functions and true survival functions for z = 0.66 and z = 1.44 over evaluation meshes at every 1/60 between 0 and 3.5 for l = 0.1 and at every 1/50 between 0 and 5 for l = 0.9.
Simulation study l = 0.1 l = 0.9 L 2 distance with true survival z = 0. between the distributions for the two groups with the distributions becoming more similar as moves from 0 to 1. We generate two datasets of 200 observations with l = 0.1 and l = 0.9. We fit the GANTR model with regression functions f 1 (z, β) = 1 {z≤β} and f 2 (z, β) = 1 {z>β} where β controls the threshold between two subpopulations. For simplicity we took (α, γ ) = (1, 1). We use both the MAP and MCMC methods for inference. We ran the MCMC algorithm with 10,000 iterations, which essentially achieved convergence.
Results of fitting the GANTR and LDDP models to data generated with l = 0.1 are shown in Figure 1. Both models can clearly reconstruct the true survival curves with both inference methods for the GANTR model providing estimates that are closer to the true survival curves than the LDDP models. This visual impression is supported by the L 2 distance between a point estimate and the true curves in Table 1. In fact, the fitted survival curves are very similar for the full posterior inference and MAP estimation. Other aspects of inference are also similar, the MAP and posterior mean estimates of the threshold parameter β are 1.01 and 1.02, respectively (which are very close to the true value), and similarly, the corresponding estimates of δ are 0.18 and 0.19, respectively. The estimated value of δ is close to zero which indicates that the two groups are nearly independent. Table 1 also shows the L 2 distances between point estimates of the survival curve and its true value for data generated with l = 0.9. Again, both GANTR estimates have smaller L 2 distances than the LDDP model. Fitting the GANTR model using MCMC took about 2 hr for 200 observations and about 3 hr for 400 observations. On the other hand, MAP estimation took around 5 min for both 200 and 400 observations. The similarity of estimated survival curves and the shorter computation time leads us to only consider MAP estimation for the real data examples in the rest of this section. The supplementary materials include results from fitting the GANTR and LDDP models to simulated datasets with l = 0.1 with 400 observations and l = 0.9 with both 200 and 400 observations. Andersen et al. (2012) included a study of 205 patients with melanoma who had a tumor removed by surgery. The thickness of the tumor was a covariate of interest as an increase in the tumor's thickness is thought to increase the chances of death. The data were right-censored for 72% of the patients. Again we use a two-dimensional GANTR model.

Melanoma Survival Data
The lack of a straightforward way to stratify the patients into subpopulations according to tumor thickness motivated us to use a flexible regression model using two regression functions. There are many possible constructions for these regression functions such as univariate or multivariate splines (Denison et al. 2002) or Gaussian processes. We choose to use a Lagrange interpolator polynomial (see Friedberg, Insel, and Spence 2013). Let q = (q 1 , q 2 , q 3 , q 4 , q 5 ) be the five number summary of the observed tumor thickness values and L q,β be the Lagrange interpolator polynomial with knots 1] 4 are unknown parameters to define the regression functions: f 1 (z, β) = max min 1, L q,β (z) , 0 and f 2 (z, β) = max 1 − max L q,β (z), 0 , 0 . This leads to nonnegative regression functions which are constrained so that f 1 (z, β) + f 2 (z, β) = 1. These functions can be interpreted as weights on two subpopulations, as in (9), which are determined by whether (or not) f 1 (z, β) > f 2 (z, β). The parameter δ controls the sharing between these subpopulations (i.e., there is very little sharing if δ is close to 0). The fixed value of the last knot, (q 5 , 0), constrains f 2 (z, β) to be close to one (and f 1 (z, β) to be close to zero) for large values of z and so we can interpret μ 2 as the competing risk of patients with large tumor thicknesses and μ 1 as the competing risk of patients with small tumor thicknesses.
The MAP estimates of β imply that f 1 (z, β) crosses f 2 (z, β) = 1 at 3.3984 (shown in Figure 2) and the estimatedδ MAP = 0.000972 indicates little sharing of information. This suggests two subpopulations defined by the threshold tumor thickness of 3.3984 with substantially different survival curves in each subpopulation. This is illustrated by the estimated survival curves with different tumor thicknesses in Figure 3. The estimated survival curves for each subpopulation are illustrated by the curves for thicknesses 1.5 and 6.1 which show clear differences with a better prognosis for smaller tumor thicknesses. The estimated survival curve for thickness 3.4 (close to the threshold value) shows the smoothing induced by the model between the subpopulations. The presence of the two heterogeneous subpopulations detected by the GANTR model are supported by the plotted Kaplan-Meier curves and the LDDP mean fits for the survival curves, which are shown for comparison.

Kidney Transplant Data
We consider the Kidney transplant dataset from the survival analysis book of Klein and Moeschberger (2006) which is available in the R package "KMsurv" by Yan (2012). This dataset consists of 863 observations of which 723 were right-censored. There are two binary covariates: sex (male or female), and race (white or black), and age is treated as a continuous covariate. The combinations of race and sex can be used to divide the patients into four groups: (1) Male-White, (2) Male-Black, (3) Female-White, and (4) Female-Black. We consider a GANTR model with d = 4 and regression functions f mw (z) = e β 0,mw +β 1,mw z age 1 {Male-White} , f mb (z) = e β 0,mb +β 1,mb z age 1 {Male-Black} , f fw (z) = e β 0,fw +β 1,fw z age 1 {Female-White} , f fb (z) = e β 0,fb +β 1,fb z age 1 {Female-Black} .
The intercept coefficients β 0,mw , β 0,mb , β 0,fw , and β 0,fb account for the heterogeneity in the populations. The coefficients of the interactions between group and age β 1,mw , β 1,mb , β 1,fw , and β 1,fb account for differences in the effect of age between the groups. The White-Male subpopulation consists of 431 observations and the White-Female of 278 observations. The two other groups are much smaller with 92 observations in the Black-Male group and 59 observations in the Black-Female group; this restricts the ages for which Kaplan-Meier estimates can be provided. In the GANTR model, the estimated value ofδ MAP = 0.3731 which is indicative of the borrowing of information in the model's fit. The estimated values for the regression functions' parameters are presented in Table 2. We find that the Cox regression provides good fits for many ages. However, we find some discrepancies between the Cox regression model and both nonparametric models (white, male, age 65) and between the two nonparametric models (black, male, age 50). In both cases, the GANTR fit is much closer to the Kaplan-Meier than the Cox regression model or LDDP for black, males aged 50 ( Figure 4). Further fits are presented in the supplementary materials. , LDDP mean · · · · · ), thickness 3.4 (GANTR MAP ( ), LDDP mean ( · · · · · )), and thickness 6.1 (GANTR MAP , LDDP mean ( · · · · · )). The Kaplan-Meier fits of observations with thicknesses in the windows:

Conclusions
In a Bayesian nonparametric setting, we have introduced the GANTR model for possibly right-censored survival data. Our model generalizes the NTR models in a regression setting where nonproportional hazards are allowed. Our model can be interpreted in a competing risks framework. As a particular case of the GANTR, we can recover the multiple-sample models of Epifani and Lijoi (2010) and Riva-Palacio and Leisen (2018). The posterior characterization of the model was presented in Theorem 1 and asymptotic properties of the model are discussed in the supplementary materials. We presented two approaches to draw posterior mean estimators for the survival curve, where the vector of completely random has been integrated out. The first relies on an MCMC sampler and the second in an MAP procedure. Simulations studies provide evidence of the accuracy of our methodology and ease of interpretation. We also showed how these models can be used in real data studies to allow for nonproportional hazard effects and crossing survival functions, and to discover subpopulations with different survival curves.
The GANTR model relies on the random weights of the underlying VCRM to allocate mass on latent competing risks. A generalized additive model is applied to the competing risks where interpretable covariate effects, for example, Cox proportional hazards effects, are introduced in the regression functions. Thus, the model in the NTR setting focuses on the random weight structure of the underlying CRM. However, time varying effects, such as AFTs (e.g., Christensen and Johnson 1988), can be considered in an interpretable manner by focusing on the location component of the underlying CRM.
Another approach that could be considered in survival analysis is the frailty model. It generalizes the proportional hazards model by introducing a multiplicative random effect. Frailty models are often used to model clustered survival data, for example, arising in multi-center clinical trials. However, such heterogeneities can be modeled directly with the GANTR model in a multiple-sample framework given by the different clinical centers. As such, although it is possible to include a frailty term in our model to introduce a mixed effect, we preferred to focus on the multiple-sample interpretation of our model. Future work will be devoted to explore this research line.
We have not considered inference for left-censored or interval-censored observations. Previous approaches to this problem in the Bayesian nonparametric setting include Doss (1994) based on mixtures of Dirichlet processes, Jara and Hanson (2011) based on a linear dependent Poisson-Dirichlet process and Kim and Lee (2003) focused on NTR distributions from the focus of cumulative hazards. We believe that our approach could be extended to using the approach of Kim and Lee (2003) but we leave this problem to future work.
The GANTR approach leads to an analytic form for the marginal likelihood (integrating over the VCRM. This is an attractive feature which allows us to calculate MAP estimates of hyperparameters. This can be seen as an empirical Bayes approach which approximates the fully Bayesian approach that we also consider. Petrone et al. (2014) provided a discussion on empirical Bayesian methods, including asymptotic results. The use of MAP estimates for hyperparameters is becoming increasingly popular in Bayesian nonparametrics (see, e.g., Di Benedetto, Caron, and Teh 2017;Masoero et al. 2019), where the number of hyperparameters is usually small and well-informed by the data. This contrasts with flexible Bayesian nonparametric regression approaches which model the logarithm of the survival time using a dependent Dirichlet process. We believe that this allows the GANTR to be applied more easily to problems with many observations or covariates where MCMC samplers may mix slowly.

Supplementary Materials
Further asymptotic results for the GANTR model, details for CoRMs with LogNormal scores and Gamma directing Lévy measure, proofs of main results, details for simulation algorithm and likelihood gradient, additional simulation studies and additional fits for the kidney transplant study.