Estimating the risk–return profile of new venture investments using a risk-neutral framework and ‘thick’ models

This study proposes cascade neural networks to estimate the model parameters of the Cox–Ross–Rubinstein risk-neutral approach, which, in turn, explain the risk–return profile of firms at venture capital and initial public offering (IPO)financing rounds. Combining the two methods provides better estimation accuracy than risk-adjusted valuation approaches, conventional neural networks, and linear benchmark models. The findings are persistent across in-sample and out-of-sample tests using 3926 venture capital and 1360 US IPO financing rounds between January 1989 and December 2008. More accurate estimates of the risk–return profile are due to less heterogeneous risk-free rates of return from the risk-neutral framework. Cascade neural networks nest both the linear and nonlinear functional estimation form in addition to taking account of variable interaction effects. Better estimation accuracy of the risk–return profile is desirable for investors so they can make a more informed judgement before committing capital at different stages of development and various financing rounds.


Introduction
This study proposes cascade neural networks to estimate the model parameters of the Cox, Ross, and Rubinstein (1979) risk-neutral approach, which, in turn, explain the risk-return profile of firms at venture capital and initial public offering (IPO) financing rounds. Cascade neural networks nest both the linear and nonlinear functional estimation form. In this study, I combine the risk-neutral approach with the cascade neural network technique and compare the estimation accuracy with the risk-adjusted valuation approaches, conventional neural networks, and linear benchmark models.
Estimating the risk-return profile of privately held firms at various stages of development and financing rounds is difficult. The use of traditional risk-adjusted valuation techniques is problematic because conventional asset pricing models such as the Capital Asset Pricing Model (CAPM) rely on stock market trading data. However, privately held firms do not have a stock market listing. Also, the risk-return profile of new ventures changes as they advance through different stages of development and various financing rounds. Privately owned firms have no obligation to disclose information on the amount of capital injected and their valuation at financing rounds. Published estimates of risk-adjusted rates of return are rare. Young (1987, 1991) and Wetzel (1981) are among the few studies to publish estimates of the risk-adjusted rates of return for various stages of development.
To overcome the limitations associated with conventional risk-adjusted valuation approaches, the literature documents alternative ways of estimating the risk-return profile of venture capital and IPO financing rounds. One strand of the literature focuses on staged financing. Staged financing is a prominent feature as new venture firms advance through different development stages. In staged financing, investors have an option, but not an obligation to commit capital at consecutive financing rounds. 1 Having option-like features enables investors to share risk with firms, minimise agency costs and information asymmetry, while retaining some control over the firm. These option-like features enable this study to estimate the risk-return profile with the help of the Cox-Ross-Rubinstein risk-neutral framework. This approach has several benefits. The risk-free rate of return is less subjective than estimates of the risk-adjusted rates of return for different stages of development and various financing rounds. 2 Therefore, the risk-neutral approach should have lower estimation errors because the risk-free rates of return are less heterogeneous compared with the risk-adjusted rates of return. These assertions are consistent with findings in the literature; for example, Seppä and Laamanen (2001). In their study, the estimation accuracy of the risk-neutral approach is, in general, better when compared to the risk-adjusted approach. 3 Ideally, the features likely to impact on the model parameters of the Cox-Ross-Rubinstein approach should be based on variables available to investors at financing rounds. For example, Seppä and Laamanen (2001) use the stages of development, the number of financing rounds prior to the current round, the amount of capital injected, and the length of time between the financing rounds. Casamatta and Haritchabalet (2007) note that the level of investor syndication improves the screening process and thus helps to mitigate information asymmetry about firm value at financing rounds. Accordingly, Admati and Pfleiderer (1994) use the level of syndication to certify firm value and risk. Hanley (1993) uses the partial adjustment in the offer price between the filing of the preliminary and the final prospectus, to explain the mispricing at IPO financing rounds. Gompers and Lerner (2000) find a correlation between the valuation of private equity transactions and the performance of the aggregate stock market. Loughran and Ritter (2004), Lowry (2003), and Yung, Çolak, and Wei (2008) identify two stylised facts from the IPO literature. First, underpricing and issue volume are highly autocorrelated. Second, there is a positive correlation between the two series.
Traditionally, the literature uses multivariate linear regressions to estimate the risk-return profile. These studies rule out the possibility of nonlinearity. No previous study has applied artificial neural networks in the present context despite their widespread appeal as data-driven, universal function approximators in option pricing. 4 For example, Hutchinson, Lo, and Poggio (1994) use neural networks to 'learn' the Black-Scholes option pricing formula. In their analysis, artificial neural networks provide more accurate valuation estimates when the underlying asset pricing dynamics are unknown or when the option pricing formula cannot be solved analytically. We also know from the literature that the cascade neural network architecture can potentially provide better estimation accuracy. For example, Malik and Nasereddin (2006) report that cascade neural networks have smaller estimation errors than conventional artificial neural networks when estimating the gross domestic product of an economy.
Neither the application of the Cox-Ross-Rubinstein framework nor the use of cascade neural networks is novel. The contribution of this study lies in the combination of these two methods. This novel approach provides better estimation accuracy of the risk-return profile at financing rounds than benchmark models. Better estimation accuracy is desirable for investors so they can make a more informed judgement before committing capital at different financing rounds.
I use the Cox-Ross-Rubinstein approach to estimate the risk-neutral probability of an upmovement in firm value between consecutive financing rounds. It is surprising that the risk-neutral The European Journal of Finance 343 framework has not attracted more attention in the literature despite the benefits of relying on risk-free rates of return. They are less subjective and less heterogeneous than estimates of the risk-adjusted rates of return.
In this paper, I use previously untested cascade neural networks to estimate the Cox-Ross-Rubinstein model parameters. In the cascade architecture, the researcher does not enforce the hidden nodes. They are determined by the data endogenously following the true spirit of neural network learning. Neural learning lets the data decide what the 'true' underlying relationship is between variables. I compare the estimation accuracy of the cascade neural networks to conventional neural networks and linear benchmark models. Linear regression models do not account for nonlinearity and variable interaction effects unless they are specified a priori. In this study, the neural networks leave the functional form unrestricted and let the data determine what the 'true' functional form is. 'Thick' models are neural networks, which rely on different neuron connections, number of neurons, starting values at network initialisation, and trimmed mean estimates. 5 In contrast to prior studies, I perform out-of-sample tests to assess the risk-return estimation accuracy of the different approaches using ex ante and ex post values. Seppä and Laamanen (2001) only use in-sample comparisons to assess estimation accuracy. This limited approach could undermine the validity of their findings. Therefore, this study provides a truly acid test on model performance by using unseen data, which are not part of the estimation set.
Moreover, I use a unique US data set of 3926 venture capital and 1360 initial public offerings financing rounds that have obtained a listing on a US stock exchange between January 1986 and December 2008. This sample, therefore, exceeds both the absolute number of observations and the period under investigation in Seppä and Laamanen (2001). 6 My sample represents 39 industry sectors, using the 48-industry classification of Fama and French (1997). 7 In addition, this study segments the data into venture capital and IPO financing rounds to reflect the different risk-return profile. This allows the present analysis to include previously untested variables relating to the partial adjustment of offer prices in Hanley (1993) and the level of syndication in Admati and Pfleiderer (1994).
Generally, my findings show that combining the risk-neutral approach with the cascade neural network methodology provides better estimation accuracy of the risk-return profile than riskadjusted valuation approaches, conventional neural networks, and linear benchmark models. The findings are persistent across in-sample and out-of sample tests.

The Cox-Ross-Rubinstein model
I use the Cox-Ross-Rubinstein method and the ex post firm value to derive the risk-return profile at financing rounds. The ex post firm value reflects the post-money valuation and includes the capital injected at each financing round. Firm value V t,s at the beginning of stage s can increase to V u or decrease to V d over one period (t, T ). The definition of the up-movement in firm value u s is provided that u s > 1. The down-movement is d s = 1/u s and must satisfy 0 < d s < 1. u and d allows calculating the implied risk-neutral success probability of an up-movement p for each 344 B. Reber stage s as follows: where r f ,s denotes the continuously compounded risk-free Treasury-bill rate. Equation (2) implies that higher returns have lower risk-neutral probabilities. Firm value at the beginning of stage s is the discounted expectation under the risk-neutral probability distribution (p s , 1 − p s ) of the firm value in the up or the down states at time T : Discounting is at the risk-free rate of return over the period (t, T ). The risk-neutral probability p s is the proxy measure for risk, while the up-movement in firm value u s is the proxy measure for return. The Cox-Ross-Rubinstein framework is useful in the present context. First, the model allows separating consecutive financing rounds into individual binomial steps. The voluntary disclosure requirement of pre-IPO financing rounds brings about the problem of incomplete information on capital transactions and firm valuations. Incomplete information for consecutive financing rounds makes the analysis of compound option pricing models unfeasible. Second, firms do not follow an identical financing pattern as they advance through different stages of development. Some firms are successful in raising sufficient finance to skip entire stages of development, whereas other firms require several injections of capital for a single stage of development. The binomial framework can be adapted to suit any number of financing rounds. Third, the Cox-Ross-Rubinstein framework operates in discrete time steps. This is a necessary condition for the present analysis, because the valuation at consecutive financing rounds is only observable at discrete points in time.
To validate the estimation accuracy of the Cox-Ross-Rubinstein framework, I calculate the ex ante returnû for each stage s by re-arranging Equation (2): wherep s is the fitted risk-neutral probability. First, I have to estimate the value ofp s before I can calculateû s . The fitted risk-neutral probabilityp s comes from the fitted model parameters and the value of the independent variables obtained from the in-sample estimation. For the linear regression model, the regression coefficients are the fitted model parameters. In the case of the neural networks, the connection weights obtained during the neural learning (training) are the fitted model parameters.
The independent variables used to estimate the risk-neutral probabilities at each financing round includes the stage of development, the number of prior financing rounds, the length of time between financing rounds, and the level of investor syndication. The offer price and the partial adjustment in the offer price, are two additional independent variables to explain the risk-neutral probabilities at IPO financing rounds. The return on the stock market, the aggregate average underpricing, and the aggregate number of initial public offerings, control for the market conditions. To validate the estimation accuracy of the Cox-Ross-Rubinstein framework, I repeat the calculations in Equations (2) and (4) with estimates of the risk-adjusted rates of return from Young (1987, 1991).

Data
Thomson Reuters' VentureExpert and New Issues database contain the data on venture capital and IPO financing rounds. VentureXpert provides the data on venture capital deals, including the post-money valuations, the capital injected, the number of venture capital investors, the industry classification of the firm, and the dates for each financing round. In addition, VentureXpert supplies the classification for the different stages of development. They are early stage, expansion, and later stage of development. Early-stage investments normally provide capital for the initial product development, manufacturing, sales and marketing. Investments at the expansion stage supply capital to expand the current operations of firms. Later-stage investments normally provide capital to firms with established products or services. This round typically constitutes the last source of funding before venture capital firms exit from their investment through a trade sale or IPO.
The New Issues database supplies the data on initial public offerings, including the offer price, the relative change between the actual offer price and the expected price from the preliminary prospectus offer price range. Thomson Reuters provides 3926 US pre-IPO venture capital financing rounds of 1360 venture capital-backed firms, which obtained a listing between January 1986 and December 2008. Jay Ritter's web page provides the monthly average underpricing and the number of initial public offerings. 8 Thomson Financial Datastream provides the equity market (Nasdaq) return and the risk-free rate of return. The estimates of the risk-adjusted rates of return at different stages of development are from Young (1987, 1991). The rates are 54.8% for early stage, 42.2% for expansion, and 35.0% for later stage of development. Table 1 presents the annual sample distribution of the sample financing rounds. Columns [2] and [3] report the number and the percentage of venture capital financing rounds by calendar year. The number of venture capital financing rounds increases from 1986 onwards, peaks in 1999, and then subsequently decreases again. Columns [4] to [9] report the number and the percentage of venture capital rounds across the early, the expansion, and the later stage of development. The expansion stage accounts for 42.46% of the total venture capital financing rounds, followed by the early stage (28.96%), and the later stage (28.58%) of development. The annual distributions of rounds across the different stages of development follow a similar pattern to that of the total venture capital financing rounds. Columns [10] and [11] list the number and the percentage of IPO financing rounds by calendar year. The distribution of the initial public offerings is similar to the venture capital financing rounds. Table 2 shows the sample distribution of financing rounds across the Fama and French (1997) 48-industry classification. Column [1] lists the industry. 39 out of 48 industries (81.25%) have attracted venture capital investments and funding from initial public offerings.
Columns [2] and [3] report the total number and the percentage of venture capital financing rounds. Venture capital financing rounds show a high concentration in a few industries and reflect the characteristics of venture capital investments. The top five industries include Business Services (34.46%), Pharmaceutical Products, (15.94%), Electronic Equipment (10.62%), Medical Equipment (9.32%), and Computers (5.30%). These sectors account for more than a combined 75% of all venture capital financing rounds. Columns between [4] and [9] list the number and the percentage of the early stage, the expansion, and the later stages of development. Columns [10] and [11] state the number and the percentage of initial public offerings across industries. The concentration of initial public offerings follows a similar pattern to that of venture capital financing rounds. Table 3 lists the variables and Table 4 presents the summary statistics. The sample firms have a mean risk-neutral probability p of 33.38% for venture capital financing rounds and 28.73% for  [2] and [3] report the number and the percentage of venture capital financing rounds by calendar year. Columns [4] and [5] state the number and the percentage of early stage financing rounds in relation to the total venture capital financing rounds. Columns [6] and [7] describe the number and the percentage of expansion stage financing rounds in relation to the total venture capital financing rounds. Columns [8] and [9] convey the number and the percentage of later stage financing rounds in relation to the total venture capital financing rounds. The early stage, expansion, and later stages of development classification of financing rounds are from the VentureXpert database. Columns [10] and [11] report the number and the percentage of initial public offering financing rounds in relation to the total IPO financing rounds by calendar year.
IPO financing rounds. In contrast, the mean risk-adjusted probability q is 45.05% for venture capital financing rounds and 39.42% for IPO financing rounds. The mean risk-adjusted rate of return is at least six times that of the risk-free rate of return across the financing rounds. The mean multiplier on firm value u is 3.5 and 4.9 for venture capital and IPO financing rounds, respectively. u has a high variation and reflects the high-growth potential of the sample firms across the different stages of development. In this study, the risk-neutral probability p and the risk-adjusted probability q are the proxy measures for risk, while the multiplier on firm value u between consecutive financing rounds is the proxy measure for return. A zero-one dummy variable captures the early stage of development of the previous financing round. This variable features only in the risk-return estimation of venture capital financing rounds. The random allocation of observations to training, validation and test data sets for neural network The European Journal of Finance 347  [1] shows the Fama and French (1997) 48-industry classification. Columns [2] and [3] report the number and the percentage of venture capital financing rounds across each industry. Columns [4] and [5] state the number of early stage financing rounds and the percentage in relation to all venture capital financing rounds. Columns [6] and [7] list the number of expansion stage financing rounds and the percentage in relation to all venture capital financing rounds. Columns [8] and [9] show the number of later stage financing rounds and the percentage in relation to all venture capital financing rounds. The classification of early stage, expansion, and the later stage of development of financing rounds is from VentureXpert. Columns [10] and [11] report the number and the percentage of the initial public offering financing rounds across each industry.

Variable code Definition
p The Cox-Ross-Rubinstein risk-neutral success probability of an up-movement in firm value between consecutive financing rounds as defined in Equation (2)  q The Cox-Ross-Rubinstein risk-adjusted success probability of an up-movement in firm value between consecutive financing rounds. q is obtained from replacing the continuously compounded 5-year Treasury-bill rate in Equation (2) with the continuously compounded risk-adjusted rate of return from Young (1987, 1991) for the corresponding development stage u The multiplier on firm value between two consecutive financing rounds. u is the post-money firm value at the current financing round divided by the post-money firm value at the previous financing round as defined in Equation (1) r f The continuously compounded 5-year Treasury-bill rate r The continuously compounded risk-adjusted rate of return from Young (1987, 1991) for the corresponding development stage: 54.8% for early-stage, 42.2% for expansion, and 35% for later stage of development Early A dummy variable that is set to one if the firm is at an early stage of development at the prior venture capital financing round. Early-stage is identified from the VentureXpert database Rounds The total number of financing rounds of a firm prior to the current financing rounds Capital The amount of capital (US$ million) raised at the current financing round Time The time period in years between two consecutive financing rounds Offer price The IPO price (US$) per share Change price Hanley's (1993) partial adjustment in the offer price between the filing of the preliminary and the final prospectus, identified from Thomson Reuter's New Issues database Syndication The number of venture capital investors at the previous financing round identified Thomson Reuter's VentureXpert Market return The return on the Nasdaq index between two consecutive financing rounds IPO return The equally weighted average IPO underpricing during the month of the current financing round from Jay Ritter's web site a Number of IPOs The total number of initial public offerings during the month of the current financing rounds from Jay Ritter's web site a Notes: This table presents the definitions of the dependent and the independent variables. The risk-neutral success probability p and the risk-adjusted success probability q are the proxy measures for risk between consecutive financing rounds. The multiplier in the up-movement in firm value u is the proxy measure for return between consecutive financing rounds. The mean offer price per share is US$13.03. 5.82% is the relative price change between the actual IPO price and the expected price from the preliminary prospectus offer price range. The remaining variables capture equity market and new issues market conditions. The mean Nasdaq The European Journal of Finance 349 return between consecutive financing rounds is 18.04% for the venture capital and 20.04% for IPO financing rounds. The mean underpricing of firms obtaining a stock market listing is similar for firms at venture capital or IPO rounds. An average of 40.0 firms have obtained a stock market listing during the month of the current venture capital financing round, whereas the mean is 4.5 for IPO rounds. However, the data set used in this study does have limitations. All sample firms have obtained venture capital investments and gone through a successful IPO. Therefore, the sample firms are more likely to have an increase in value leading up to the IPO. This upward trend could potentially bias the findings of this study. A more balanced sample with decreasing firm value between financing rounds could overcome this bias. Unfortunately, privately held firms are more likely to disclose information on deals and valuation if firm value increases. It is much more common to conceal information on decreases in firm value between financing rounds. 9 Decreases in firm value could discourage future venture capital investments. Nevertheless, the VentureXpert database represents one of the best publicly available data sets.

Estimation models
This section provides an overview of the different models to estimate the Cox-Ross-Rubinstein model parameters. In this study, I use linear regression models, conventional multilayer perceptron (MLP) neural networks, and cascade neural networks. The MLP is a pure nonlinear estimation model, whereas the cascade neural network nests both the linear and nonlinear functional estimation form.

Linear regression (Linear)
A simple linear ordinary least squares regression is the first benchmark model: where p is the risk-neutral probability for each financing round s. x is the set of explanatory variables i = 1, . . . , i * . β 0 is the constant term, β i are the regression coefficients, and ε is the error term. The Jarque and Bera (1980) (JB) test for normality and White's (1980) test for heteroskedasticity show that the regression residuals are not well-behaved. Variable transformations cannot alleviate the problem and hence the t-statistics use White's (1980) heteroskedasticity consistent standard errors and covariances. The Lee, White, and Granger (1993) (LWG) test identifies neglected nonlinearity in the regression residuals. Therefore, since the functional form of the nonlinearity is unknown, this study resorts to artificial neural networks.

MLP neural network
The conventional MLP neural network (Rosenblatt 1961;Rumelhart, Hinton, and Williams 1986) is the second benchmark model. Neural networks detect patterns from the underlying data through processing elements connected together. These processing elements (known as neurons) are arranged in two layers: the input and output layer. The number of neurons in the input layer corresponds to the number of input (independent) variables, whereas the number of output neurons in the output layer corresponds to the number of output (dependent) variables. Between the input and output layer is the hidden layer, which also has neurons. The purpose of the hidden layer is to identify the nonlinear pattern and interaction effects between the input and output variables. Each neuron in the hidden layer and the output layer receives signals from other neurons, whereas the input layer neurons receive their signals from the input variables. The strength of the input signals from each neuron is stored in the connection weights. A nonlinear transfer function is then applied to the sum of the input connection weights to form the output signal of a neuron. Accordingly, the MLP has the following form: N k,s = T (n k,s ) = e n k,s − e −n k,s e n k,s + e −n k,s , The where T (n k,s ) is the tansig activation function, i * are the input variables x, and k * is the number of neurons. A linear combination of the input variables x i,s , i = 1, . . . , i * , for each stage s, which the input weights ω k,i , i = 1, . . . , i * and the constant weight bias w k,0 form the variable n k,s . The activation function squashes the n k,s variable to take on a value of N k,s for each observation s. γ 0 is the weight bias of the output neuron. Neural learning (or training) determines the optimal value of the interconnection weights to minimise the estimation error between the input and output variables. Learning starts from initial randomised weights. The learning algorithm adjusts the weights repeatedly to minimise the difference between the output produced and the output desired of the dependent variable. In accordance with common practice, I divide the data into a training set, cross-validation set, and test set.
The purpose of the training set is to estimate the connection weights. The cross-validation set monitors the learning progress and terminates the training as soon as the estimation error increases, to avoid overfitting a model to the data. Finally, the test set evaluates the estimation performance on previously unseen data.
However, a challenge arises in the MLP architecture because the researcher needs to decide on the connectivity of the neurons in the hidden layer. Deciding on this connectivity directly affects the estimation performance. The cascade neural network architecture determines the connectivity of neurons from the data rather than having it enforced by the researcher.

Cascade neural network (Cascade)
I use the cascade neural network architecture as advocated in Fahlman and Lebiere (1990). In this architecture, the input variables are not only linked through the hidden layer of the squashed tansig functions, but also have direct linear links to the output variable. 10 N k,s = T (n k,s ) = e n k,s − e −n k,s e n k,s + e −n k,s , The cascade architecture nests both the MLP and the linear model. This configuration allows for the possibility of combined linear and nonlinear functional components. Cascade neural networks are particularly useful in situations where there is no clear a priori expectation about the underlying functional form. In the conventional MLP neural network, the researcher needs to determine the number of hidden nodes and their connectivity to minimise the estimation error.
In the cascade architecture, the hidden nodes are determined by the data endogenously and not enforced by the researcher. Cascade learning starts off with no hidden neurons. The only connections are the direct ones between the neurons in the input and the output layer. Hidden nodes are added one at a time and the estimation error re-calculated. The cascade algorithm adds additional hidden nodes until no further improvement in the estimation performance takes place.

Model performance
In this paper, I use the Hannan and Quinn (1979) information criterion (HQIFC) in the model building process to test the estimation accuracy. The HQIFC measure penalises the estimation error for the number of model parameters. More complex neural network models have an increasing number of model parameters when compared to the linear regression models. The HQIFC measure, therefore, allows for a better estimation comparison between different model complexities. I use Granger and Jeon's (2004) 'thick' modelling technique which relies on trimmed mean estimates of repeatedly trained neural networks. This approach provides stable estimates across different architectures. In addition to the HQIFC measure, I also use traditional performance measures, including the sum of squared estimation errors (SSE), the mean squared error (MSE), and the coefficient of determination (R-squared).

Variable significance testing
In this study, I perform variable significance testing to assess the relevance of the explanatory variables across all estimation models. In nonlinear relationships, the functional form between the explanatory and the dependent variables only requires that the conditional expectation varies with an increasing value in the independent variable. The approach in variable significance testing used in linear regression analysis is, therefore, not useful in detecting symmetric or periodic nonlinear functions. Instead, I analyse the impact of the explanatory variables on the sensitivity of the model fitness as advocated in Refenes and Zapranis (1999). I use the HQIFC as the modelfitness sensitivity measure. An explanatory variable is significant only if its inclusion leads to an improvement in the HQIFC. I calculate the HQIFC sample variance by means of re-sampling with replacement (bootstrap) to obtain empirical probability density functions. Testing that variable x i is statistically significant takes the form of H0: HQIFC(x i ) = HQIFC against the alternative HA: HQIFC(x i ) < HQIFC and involves a t-test.

Out-of-sample testing
I apply the 0.632 bootstrap method to validate the estimation error of the different models. This approach is based on Efron (1979Efron ( , 1983. 11 I estimate the in-sample estimation errorê 2 as the difference between the actual and the fitted values from the different model parameters and their functional approximations f , asê where y i is the actual value, f (·) is the estimated value of y i from the fitted regression parameterŝ b and the independent variables x i using the sample length n of the entire estimation set. The bootstrapping procedure involves drawing n observations with replacement from the original sample length n and allocating these observations to the new estimation set Q. I use Q to estimate the model parametersb. Some of the observations in Q will be repeated, while others will not have been picked. Unselected observations are allocated to the out-of-sample test data set. I then estimate the errorê (0) for those observations, which appear in the test data set from m bootstrap The European Journal of Finance 353 replications,ê To calculate the 0.632 bootstrap error, I take account of the in-sample bias adjustment, to calculateê (0.632) ,ê or, equivalently,ê (0.632) = 0.368(ê 2 ) + 0.632(ê (0) ).
The weighting of 0.632 and 0.368 comes from the probability of observations ending up in the estimation or the out-of-sample data sets. For example, a particular observation has a probability of (1 − 1/n) not being picked for the estimation set. Therefore, for a large data set, the probability of ending up in the out-of-sample data set after n draws with replacement is approximately It follows from Equation (17) that approximately 63.2% of the observations end up in the estimation data set for any one bootstrap replication. Unfortunately, the 0.632 error estimate does not follow a well-defined distribution. Therefore, I cannot test ifê τ (0.632) from model τ is significantly different fromê υ (0.632) of model υ. I calculate the 0.632 bootstrap ratio (BR) to measure the 'thick' estimation errors relative to the ones obtained from the linear benchmark models. A BR value of less than one indicates a gain for the 'thick' models over the linear benchmark regressions. In addition to the BR measure, I use the SSE, the MSE, the root mean squared error (RMSQ), the mean absolute error (MAE), and the correlation coefficient (R) between the ex-ante and ex-post values.
I use the estimation models presented in this section to estimate the probabilities of an upmovement in firm value. An accurate approximation of the success probabilities is an important intermediary step. The fitted probabilities are used to calculate the ex ante returns from the Cox-Ross-Rubinstein model in Equation (4).

Findings
In this paper, I argue that estimating the Cox-Ross-Rubinstein model parameters with cascade neural networks provides better estimation accuracy of the risk-return profile than risk-adjusted valuation approaches, conventional neural networks, and linear benchmark models. The findings are persistent across in-sample and out-of-sample tests using 3926 venture capital and 1360 US IPO financing rounds between January 1989 and December 2008. However, the estimation error across the different performance measures remains relatively high. High-estimation errors are consistent with common observations of new venture investments which can have extreme outcomes in risk and return.  (5), MLP neural networks from Equation (8), and cascade neural networks (Cascade) from Equation (11). The in-sample diagnostics include the JB test of normality of residuals, the LWG test of nonlinearity, the HQIFC, the SSE, the MSE, and the coefficient of determination (R-squared). a Denotes probability value. b The number of trials for neglected nonlinearity out of 1000 experiments. Table 5 presents the in-sample model performances of the estimation models to explain the risk-neutral probabilities of an up-movement in firm value at financing rounds. The cascade neural network (Cascade) estimates are more accurate than those of the MLP and the linear benchmark model (Linear). This finding is not surprising since the cascade neural networks nest both the linear and nonlinear estimation models.

In-sample performance
The outperformance of the cascade neural networks is compelling across all venture capital financing rounds (Panel A) and IPO financing rounds (Panel B). The HQIFC, the SSE and the MSE have the smallest values, while the coefficient of determination (R-squared) has the highest values. The coefficients of determination have similar values to the R-squared reported in Seppä and Laamanen (2001). 12 The MSE confirms that the partial adjustment in the offer price (Hanley 1993) improves the estimation accuracy if we compare the initial public offering financing rounds (Panel B) with the venture capital financing rounds (Panel A).
Both the Lee-White-Granger and the JB test indicate that the residuals are not well behaved across all estimation models. The LWG test indicates the presence of neglected nonlinearity in the residuals of the linear regression models. In Panel A, we can reject linearity outright, while in Panel B, 302 out of the one thousand randomly generated nonlinear combinations of the predictor variables are statistically significant in explaining the residuals of the linear benchmark model. The JB test rejects normality of the residuals across all estimation models. This result reflects the actual nature of the risk-return profile of venture capital and IPO financing rounds. The variation in the risk-return profile for this type of firms is very high.
The European Journal of Finance 355  (5), MLP neural networks from Equation (8), and cascade neural networks (Cascade) from Equation (11). a t-statistics are based on partial derivatives of the dependent and the independent variables, ∂y/∂x i . The t-statistics are White's (1980) heteroskedasticity consistent errors and covariances. The values of the regression intercepts are not reported. b Testing that the independent variable x i is statistically significant is based on H0: HQIFC(x i ) = HQIFC against the alternative HA: HQIFC(x i ) < HQIFC and involves a t-test. The one-tailed p-values are calculated from empirical density functions and bootstrap analysis. * * * denotes 1% significance level. * * denotes 5% significance level. Table 6 shows the results of the variable significance testing. The risk-neutral approach is consistent with the model predictions on the risk-return characteristics of venture capital and initial public offering financing rounds. All variables in the estimation models are statistically significant at the 10% level. For those variables, which are not statistically significant, variable deletion tests show deterioration in the estimation accuracy when excluded. Therefore, these variables remain in the final model. Although the neural networks have fewer predictor variables, the estimates of these models are more accurate than the estimates of the linear regressions. The better performance of the cascade neural network is due to its ability to take account of variable interaction effects together with the nonlinear and linear functional form.
The significance of the variables is, by and large, consistent with the extant knowledge of the venture capital and IPO financing rounds. Knowing the statistical significance of the independent variables is an important intermediary step in estimating the ex ante probabilities and, in turn, the ex ante up-movement in the firm value.
Only regression analysis allows statements to be made about the direction of the relationship between the risk-neutral probabilities and the independent variables. Testing of the variable significance in neural networks involves testing the null hypothesis of no underlying pattern and the variable wrongly entering the estimation models.
Early-stage investments have higher implied risk than firms at higher stages of development. More frequent financing rounds, an increasing number of investors, and larger amounts of capital injected have lower risk. The risk reduces for an increasing length of time between two consecutive financing rounds. Positive adjustments in the initial public offering prices between the preliminary and the actual offer price have smaller risk-neutral probabilities. Higher offer prices also have smaller risk-neutral probabilities. Consistent with the risk-neutral framework, the market return has a negative correlation with the risk-neutral probabilities. The market return has a negative association with the risk-neutral success probabilities. The number of initial public offerings during the month of the current financing round has a negative correlation with the risk-neutral success probabilities. The relationship between the IPO return of companies during the month of the current financing round and the risk-neutral success probabilities is inconclusive. The association is positive in the case of venture capital investments and negative in the case of IPO financing rounds. This reversed direction of the relationship could be an indication of an overspecified linear model. However, the variable inflation factors do not raise any concerns for the problem of multicollinearity. Table 7 presents the out-of-sample tests on the estimation accuracy of the success probabilities on the up-movement in firm value between two consecutive financing rounds, the proxy measure for risk.

Out-of-sample performance
Overall, the estimation errors of the risk-neutral framework are smaller than the risk-adjusted approach across the performance measures. There is only one exception in which the correlation coefficient (R) shows a better fit for the risk-adjusted framework when compared to the risk-neutral framework. The majority of the performance measures which compare the different estimation models favour the cascade neural networks (Cascade) compared to the benchmark models. There are only three exceptions in which the MLP provides more accurate estimates. These exceptions are for the correlation coefficient (R) in the risk-neutral framework (in Panel A), the 0.632 BR in the risk-neutral and risk-adjusted framework (in Panel B). Table 8 presents the estimation accuracy of the up-movements in firm value based on the fitted probabilities for each of the estimation models. The up-movement in firm value is the proxy measure for return.
Overall, the risk-neutral framework provides more accurate estimates in the up-movement in firm value between consecutive financing rounds when compared to the risk-adjusted approach. There are only two exceptions. The correlation coefficient between the ex ante and the ex post values for the risk-neutral framework are lower for the MLP and the cascade neural networks (Cascade) in Panel B. The majority of performance measures show that the cascade neural networks (Cascade) outperform their benchmark models. There are, again, only two exceptions to the rule. In Panel A, the correlation coefficient (R) and the MAE have higher estimation errors in the case of the Cascade model than the MLP model.
The smaller estimation errors of the risk-neutral approach that relies on the parameter estimation using cascade neural networks are likely to come from the lower heterogeneity in the risk-free rates of return than the risk-adjusted rates of return, and the estimation flexibility of cascade  Notes: This table presents the out-of-sample estimation errors and comparison between the risk-neutral and the riskadjusted framework. The risk-neutral success probability p and the risk-adjusted success probability q are the proxy measures for risk as defined in Table 3. The out-of-sample observations are randomly selected from 3926 venture capital financing rounds (Panel A) and 1360 IPO financing rounds (Panel B) between January 1986 and December 2008. The estimation models are linear regression (Linear) from Equation (5), MLP neural networks from Equation (8), and cascade neural networks (Cascade) from Equation (11). BR is the ratio of the 0.632 bootstrap (Equation 16) estimates in relation to the estimates of the linear regression models. SSE is the sum of squared error. MSE is the mean squared error. RMSQ is the root mean squared error. R is the correlation coefficient between the ex ante and the ex post probabilities. Notes: This table presents the out-of-sample estimation errors and comparison between the risk-neutral and the riskadjusted framework. The up-movement in firm value u is the proxy measure for return as defined in Table 3. The out-of-sample observations are randomly selected from 3926 venture capital financing rounds (Panel A) and 1360 IPO financing rounds (Panel B) between January 1986 and December 2008. The estimation models are linear regression (Linear) from Equation (5), MLP neural networks from Equation (8), and cascade neural networks (Cascade) from Equation (11). BR is the ratio of the 0.632 bootstrap (Equation (16)) estimates in relation to the estimates of the linear regression models. SSE is the sum of squared error. MSE is the mean squared error. RMSQ is the root mean squared error. R is the correlation coefficient between the ex ante and the ex post up-movement in firm value.
neural networks. Cascade neural networks nest both the linear and nonlinear functional estimation form. They also take account of any variable interaction effects without having to model them a priori.

Extensions
There are many ways to extend this study. My selection of the Cox, Ross, and Rubinstein (1979) is crude. More sophisticated or alternative risk-neutral approaches may be available to improve the risk-return estimation accuracy. However, these techniques need to be able to overcome some of the challenges when using large samples. For example, the length of the time between consecutive financing rounds differ and new venture firms do not follow an identical sequential pattern in financing rounds to fund key development stages. More sophisticated estimation techniques could also improve the risk-return estimation accuracy. However, I do not claim that the 'thick' neural network models are the only alternative to linear regressions or indeed superior to other estimation techniques per se. Neural networks are appealing because they can approximate any functional form without theoretical guidance or prior knowledge. My analysis shows that cascade neural networks which nest both the linear and nonlinear functional form provide the most accurate estimates of the risk-return profile at financing rounds. However, more sophisticated neural network architectures or alternative estimation techniques are possible directions for future research. Some of these estimation techniques could also try and consider possible structural breaks between venture capital and initial public offerings. Some of the variable significance tests imply that the linear benchmark models are overspecified. Parsimonious 'thick' models with fewer independent variables provide more accurate forecasts.