NONLINEAR REGRESSION WITH R PDF
Nonlinear Regression with R. Series: Use R! ▷ Is unique because it approaches non-linear regression modeling through the functionality available in R, such. Currently, R offers a wide range of functionality for nonlinear regression. Part of the Use R book series (USE R). Download book PDF. Chapters Table of. The nonlinear regression model generalizes the linear regression model . The R function nls is used for estimating parameters via nonlinear.
|Language:||English, Spanish, Japanese|
|Genre:||Health & Fitness|
|ePub File Size:||22.69 MB|
|PDF File Size:||17.74 MB|
|Distribution:||Free* [*Regsitration Required]|
PDF | 10+ minutes read | On Jan 1, , Jan de Leeuw and others published Nonlinear Regression with R. PDF | This chapter is devoted to model checking procedures. Without having validated the assumptions underlying a nonlinear regression. This book is about nonlinear regression analysis with R, in particular, how to use the function nls() and related functions and methods. Range of the book.
Use R! Series Editors: Bayesian Computation with R Bi Interactive and Dynamic Graphics for Data Analysis: Bioconductor Case Studies Paradis: Analysis of Phylogenetics and Evolution with R Pfaff:
Consequently, it is a more general model than the nonlinear regression model, or, in other words, the nonlinear regression model is a submodel of the ANOVA model.
We consider two approaches for testing the hypothesis: Plot of the root length for a range of concentrations of ferulic acid using the dataset ryegrass. A large value of the statistic indicates that the two models are far apart, meaning that they do not provide similar descriptions of the data, and consequently the more general ANOVA model is the more appropriate model.
What can we conclude from the test? Df Res. We conclude that the four-parameter loglogistic model is appropriate for describing the relation between the response and the predictor in the dataset ryegrass. Like the F -test, the likelihood ratio test is a test for comparison of two nested models, where one model is more general than the other model. Q 1 - pchisq Q, df. Q 'log Lik. A few possibilities are one or few outliers, too many extreme values heavy tails , or skewness.
Such deviations can be assessed using the standardised residuals Pinheiro and Bates, , pp. They are not entirely independent, but usually the correlation is small. Note that, apart from the scale on the y axis, the plot looks very much like the one in Fig. However, it is often more informative to look at the standardised residuals in a QQ plot. The two 68 5 Model Diagnostics distributions agree if the points in the QQ plot approximately follow a straight line intercepting the y axis at 0 and having slope 1.
The QQ plot can be made by qqnorm. We use abline to add the reference line. The reference line with intercept 0 and slope 1 is also shown. In some situations especially for small datasets , it may be useful to have a statistical test as an aid for deciding whether or not the normality assumption is met. The next subsection introduces such a test. This test is available within the standard installation of R through the function shapiro.
The assumption of normally distributed errors is acceptable. The value of the test statistic the closer to 1, the better the agreement in Fig. Use it as supplement to the QQ plot. Correlated response values often occur in experiments where repeated measurements on the same organism or subject occur. A few examples are: One way to detect correlation is to look at a lag plot, which is a plot of each raw residual versus the previous residual also called the lag-one residual in the order determined by the predictor scale.
Correlation would show up as a linear trend; examples are given by Bates and Watts , pp.
Nonlinear Regression with R
We obtain the lagged residuals using residuals vapCO. To match the lengths of the vectors, we append an NA to the vector of lagged residuals. There appears to be a positive linear relationship in the plot.
However, there is quite some scatter in the plot, and therefore we cannot be very certain about this trend. Exercises 5. Fit the Clapeyron model to the dataset vapCO. Carry out model checking. In this chapter, we consider several approaches for dealing with model violations related to the measurement errors: In contrast, a transformation approach may often be able to remediate both non-normal error distributions and variance heterogeneity.
One way of taking into account variance heterogeneity is by explicitly modelling it; that is, formulating a regression model for the variance similar to what we have done for the mean in Equation 1. Now we will consider situations where the variance changes as the predictor values change. A small mean value implies a small variance, whereas a large mean value implies a large variance it can also be the other way around.
Consider the plot in Fig. There is a clear tendency: Relative growth rate as a function of time in days. One way to model the dependence of the variance on the mean seen in Fig.
Variance homo- 6. If the response values are counts that are not too small then they may be treated as if they are normally distributed with a variance structure mimicking that of the Poisson distribution: In R, the function gnls in the package nlme allows generalised least squares estimation of the power-of-themean model Equation 6.
The details behind the estimation procedure implemented in gnls are found in Pinheiro and Bates , pp. For more details, consult Pinheiro and Bates , pp. Power of variance covariate Formula: Parameter estimates: Value Std.
Error t-value p-value a 3. Min Q1 Med Below we provide an explanation for each component. Second, the name of the data frame used is listed. In our example, it is a single number, as we consider a mean function with only two parameters. The estimates of the parameters in the mean function will be mutually correlated to a lesser or greater extent. The self-starter functions that are available for nls also work for gnls.
Other examples using gnls in conjunction with varPower are found in the help page of gnls? The most important models are listed in Table 6. Variance models for use with gnls. For example, right-skewed response values e. This approach for nonlinear regression was introduced by Carroll and Ruppert and elaborated in Carroll and Ruppert , Chapter 4.
It has also been discussed by Streibig et al. For modelling the stock-recruitment relationship between sockeye salmon in the Skeena River, Carroll and Ruppert , pp. The mean function in Equation 6.
The data are provided in the data frame sockeye in the package nlrwr. The data frame sockeye contains the two variables recruits and spawners numbers in thousands. Observation no. Figure 6. We see some increase as stock values get larger, but the variation is clearly increasing as the stock is increasing.
Plot of recruitment as a function of the stock using the dataset sockeye. Initially we consider a standard nonlinear regression model, assuming variance homogeneity and normality of errors, that is a model of the general form given in Equation 1.
So the presence of variance heterogeneity is the only severe deviation from the model assumptions in Section 5. We can cope with this model violation by using a variance model as seen in the previous section or a transformation.
Following Carroll and Ruppert , pp. In linear regression, it is common to transform the response only that would be the variable recruits in our example but otherwise not change the model Carroll and Ruppert, , p. The same approach can be applied to nonlinear regression models, but it will distort the original relationship between the response and the predictor dictated by the chosen mean function.
We insist on preserving this relationship because there is almost always a good reason why a particular mean function is used. Therefore, it is necessary to 6. This model ensures that the original relationship between response and predictor is retained, as both the response left-hand side and the mean function right-hand side are subject to the same transformation. In this way, estimation of a transformation is simply a matter of estimating a single parameter. The resulting transformation may not always be successful in establishing normally distributed responses with constant variance.
The optimal transformation within the Box-Cox family for the Ricker model is conveniently done using boxcox. In order to compare the standard errors between the models with and without transformation, the summary output of both sockeye. Error t value beta1 3. In this case, the estimated standard errors from the model assuming normality and variance homogeneity will be inconsistent: They will not approach the true standard errors as the sample size increases Carroll and Ruppert, , p.
Equation 2. Rewinding the derivation leading to Equation 2.
Nonlinear Regression with R (Use R)
This equation is valid likelihood function and B as long as the mean structure is correct and independence can be assumed. The robust variance-covariance matrix is often called a sandwich estimator due to the particular product form on the right-hand side of Equation 6.
More detailed explanations can be found in White , Carroll and Ruppert , pp. In passing, we note that the vcov method can be useful for assessing how strongly correlated the parameter estimates are.
High correlation between some of the parameters may indicate that possibly a model with fewer parameters should be considered. In order to calculate the sandwich estimator, the function sandwich in the package sandwich Zeileis, can be applied: The next step is to look at the summary output to see the actual changes in the estimated standard errors. In order to view the part of the summary output containing parameter estimates and corresponding standard errors, t-tests, and p-values, 6.
Alternatively, we could use the function coeftest in the package lmtest Zeileis and Hothorn, The parameter estimates with naive standard errors using the elements in vcov sockeye.
Nonlinear Regression with R | SpringerLink
To obtain the estimated standard errors based on the sandwich estimator we can use coeftest again, but this time also specifying the argument vcov with the value sandwich. If the complete original dataset is available, then there are often 86 6 Remedies for Model Violations better ways to take the variance structure into account Carroll and Ruppert, , p.
It happens that the complete dataset is not available to the data analyst because response values have been averaged at some level. Typically, the average response was calculated for each unique predictor value. Ideally, the corresponding empirical standard deviations and the number of replicates for each unique predictor value should also be available because these values can be used to construct weights to be used in the estimation procedure.
Denote the weights w1 ,. We consider the following nonlinear regression model: Through multiplication with wi , the model in Equation 6. We need not do any multiplication manually. If we assume that there is variance homogeneity of the original responses, which are averages , then the general model given by Equation 1. If we cannot assume variance homogeneity, we also need to consider the standard deviations in addition to the number of replicates.
In this case, Equation 1. The resulting datasets are available as exp1 and exp2 in the package nlrwr. The corresponding dataset is shown below. The standard deviations in the third column are not varying that much, being of the same order of magnitude, indicating that the assumption of variance homogeneity may be acceptable. A plot of the data is shown in Fig.
A self-starter function is available for the biexponential model: SSbiexp see Table B. Decline in nitrogen content over time for experiment 1. Error t value a1 There also are only slight changes in the estimated standard errors. This is not surprising, as the weights supplied are of roughly the same magnitude. The dataset exp2 is similar to exp1 with respect to the variables included.
However, the standard deviations vary substantially several orders of magnitude. Error a1 Exercises 6. Consider the dataset RGRcurve. Fit the two-parameter exponential model, but in such a way that the variance is assumed to be an exponential function of the mean.
Consider the dataset sockeye. Fit the Ricker model, but this time with a variance model included. Fit the exponential model Equation 3.
Check the model assumptions. Are there any problems? For this kind of data, it is common to apply a transform-both-sides approach using the logarithm transformation OECD Organisation for Economic Cooperation and Development , b , so in this case there is no need to look for the best transformation. Fit the logarithm-transformed model, and check the model assumptions.
This linear approximation may or may not be used in the estimation algorithm e. The quality of the linear approximation can be summarised by means of two features of the model referred to as intrinsic curvature and parameter effects curvature. More details can be found in Bates and Watts , Chapter 7. Intrinsic curvature is related to the planar assumption, and it depends on the dataset considered and the mean function but not on the parameterisation used in the mean function.
Large values of these two curvature measures indicate a poor linear approximation. The function rms. The bootstrap approach in Section 7. The Wald procedures in Sections 7. Sections 7. In contrast, the linear approximation seems quite acceptable for the parameter Vm , as there is only very slight curvature for this parameter component. We will use the function nlsBoot in the package nlstools and the associated plot and summary methods. This means that we use a nonparametric bootstrap approach where the mean centered residuals are bootstrapped Venables and Ripley, a, Chapter 8.
By default, nlsBoot generates datasets, and for each dataset the original nonlinear regression model is 7. This seems not to be the case here comparing the values above to the original parameter estimates Another example using the bootstrap approach is found in Venables and Ripley a, Section 8. Moreover, the linear approximation will improve as the sample size increases, and therefore the bootstrap approach may be most useful for small datasets.
Bailer and Piegorsch , p. This result is extremely powerful, as it makes it possible to establish approximate normality for various derived parameters, that are functions of the original parameters in the model.
Applying this result is usually referred to as using the delta method. Of course, if the original parameter estimates are not approximately normally distributed, then the delta method will not work well. In such cases, it may be better to use a bootstrap approach, generating a bootstrap estimate of the derived parameter of interest for each bootstrap estimate of the original parameters and then proceeding along the lines of Section 7.
Next, we will show how to use the delta method in R. As a measure of the rate of change of the underlying enzymatic process, it can be useful to consider the slope of the Michaelis-Menten curve at the concentration K: The larger the slope, the faster the enzymatic reaction goes. The slope of the curve at the concentration K is a function of the two parameters K and Vm: We use the delta method!
The function to use in R is delta. We need to supply two arguments to this function: Then, the next step would often be to try to simplify the model in order to obtain the most parsimonious description of the data. In the terminology of Equation 1. In the following two subsections, we will discuss two ways of testing the null hypothesis.
The data in secalonic stem from a large experiment exploring the toxicity of secalonic acid to roots of various grasses Gong et al.
The data are shown below. The response values are means of an unknown number of replicates. Figure 7. The dose-response relationship is clearly sigmoidal, and therefore the fourparameter logistic model is a natural choice Streibig et al. Plot of the root length as a function of the dose of secalonic acid based on the dataset secalonic.
Using the summary method, we can instantaneously get t-tests based on Equation 7. Typically only one or a few — if any — of these hypotheses are relevant to consider. The logistic model will predict root lengths close to a for doses almost 0, and the predicted root lengths will approach b as the dose becomes very large.
It would be natural to expect that the lower limit will be close to 0, as large doses may prohibit any growth, and therefore it is relevant to consider the hypothesis H0: The value of the corresponding t-test statistic is 1. The corresponding p-value based on the t-distribution with 3 degrees of freedom is 0. This means that we cannot reject the hypothesis H0: The result is not surprising in light of Fig. This three-parameter model is also available in a self-starter version for nls , namely SSlogis we will use it in the next subsection.
A related approach for testing the hypothesis H0: Model A and Model B. Model B should be a submodel of Model A; that is, obtained from Model A by imposing some constraint on the parameters.
The choice of submodel will be determined by the null hypothesis that is of interest. This test statistic is related to the statistic introduced in Subsection 5. The test is sometimes referred to as the extra-sum-of-squares F -test Motulsky and Christopoulos, , Chapter The reason for considering this submodel is of that we want to test the hypothesis that the lower limit could be 0 just as it was in the previous subsection. As already mentioned, the relevant self-starter function is SSlogis.
The F -test yields the same conclusion 7. These models need not be submodels of each other, and therefore it may not be possible to use the F -test procedure introduced in the previous section to compare these models.
How then do we decide which model is the most appropriate? Ideally, the experimenter collecting or generating the data should decide which model to use based on subject matter. If no such suggestions are available, then it may be useful to use some kind of statistic to compare the available models.
The decision rule is simple: One model is better than another model if it has the smallest value of the statistic. Based on all pairwise comparisons using this rule, a ranking of the candidate models can be established.
Using Equation 2. We consider the dataset M.
Cadima , pp. The two three-parameter models, the Deriso and Shepherd models, end up having the largest AIC values, being penalised for using one parameter more than the two other models. Therefore we would expect the models to provide very similar descriptions of the dataset M. In Section 4. The null hypothesis H0: Is there evidence in the data in favour of rejction of this hypothesis?
Test the hypothesis using both a t-test and an F -test. What is the conclusion? The data frame ScotsPine contains leaf area indices for various ages in years collected for a number of Scots pine trees Pinus sylvestris L. The leaf ratio index is the ratio between the single-sided leaf surface area and the ground surface area. Is there anything suspicious? Examples are: One reason for considering grouped data is that data pertaining to one experiment should be analysed in one model, as this allows comparisons among groups at various levels.
We start out by considering models assuming variance homogeneity but now also across groups, which means that the same constant variance applies to all observations regardless of their group membership. In Section 8. Sections 8. Variance homogeneity is not always a reasonable assumption, as there may be more than one source of variation in play in grouped data structures.
We consider this situation in Section 8. The example used throughout this section will be the dataset Puromycin, which consists of measurements of enzymatic reaction rates for a range of 8 Grouped Data substrate concentrations. The response, predictor, and grouping factor are named rate, conc, and state, respectively.
We use the function xyplot in the package lattice to show the data. The resulting scatter plot in Fig. Conditional scatter plot of reaction rate versus substrate concentration for the two groups in the dataset Puromycin.
Following Bates and Watts , pp. Error t-value Vm. Below we denote the grouped data object resulting from using groupedData by Puromycin2. Interest lies in: Looking at the plot in Fig. The hypothesis can be formulated in terms of the parameters as follows. Let us continue using the dataset Puromycin.
To be able to examine these questions, we need to use parameter models. Nonlinear regression models for grouped 8 Grouped Data data give rise to model formulations for each parameter in the mean function. The two sets of parameter models listed above are the extremes where either all or none of the parameters in the mean function are shared across groups. As already mentioned, Fig. Similarly, we can test the null hypothesis H0: Error t value K 0. The common estimate of K is 0.
We illustrate this situation through an example. The dataset G. The test plant used was cleavers Galium aparine. The same ten nonzero doses were used for both formulations, with ten replicates per dose. In addition, there were taken 20 replicates of the untreated control dose equal to 0. The upper limit is mainly determined by the common 8. Conditional scatter plot of the dose-response curves in the dataset G. The remaining parameters b, c, and e can easily vary from formulation to formulation.
They have their own values of the variable treatment equal to 0 which was utilised in creating the plot in Fig. This is useful for indicating that the control measurements form a separate group, which should not be used repeatedly as control measurements for each of the remaining groups, as that approach would illegally augment the dataset. For this purpose, it is convenient to assign the control measurements to one of the remaining groups. In the second line, we change that: The levels G.
We return to the dataset Puromycin to illustrate the concepts. In the following construction of the data frame, it will be replicated appropriately; that is, repeated for each element in the vector concValues.
It is intended to be an appetizer only.
Pinheiro and Bates provides a comprehensive introduction to all the facets of nonlinear mixed model analysis in R. Nellemann et al. We will only consider the data obtained from the in vitro assessment of vinclozolin. However, in one assay, only eight concentrations were used.
Thus we expect that there are two sources of variation in the data: The data are available in R in the package drc as the data frame vinclozolin, and they are shown in Fig. The variables in the data frame are the predictor variable conc, the response variable effect, and the grouping variable exper. The concentration-response pattern in the data is apparent from Fig. Conditional scatter plot of the dose-response curves in the dataset vinclozolin. Following Nellemann et al.
There are several reasons for taking this route. In general, it may be a good idea to build a model for grouped data on the basis of models for the individual groups. We anticipate interassay variation, but the question is in which parameters this variation manifests itself.
It can happen that there is a lot of variation 8. Figure 8. The previous sections and chapters have considered nonlinear regression models assuming one single source of randomness: As we have seen in Fig.
We can think of this variation as being derived by minute unobservable changes in the environment of the assay that occur at random. In Equation 8. This means that a nonlinear mixed model is a nonlinear regression model that has been extended by allowing some or all of the parameters to vary randomly across groups in order to account for the variation between groups.
In the case of the dataset vinclozolin, the systematic concentration-response trend is governed largely by the fungicide concentration the predictor variable. As we are considering grouped data, it is natural to formulate parameter models similar to what was done in Section 8.
Therefore such a model is often called a hierarchical model.
The level 0 corresponds to the average trend or population trend, which is what we are interested in here. Error b 0. Min Q1 Med Q3 Apart from the predict and summary methods already used, there is a plot method for displaying the residual plot.
The residuals can be extracted directly using the residuals method. We refer to Pinheiro and Bates , Appendix B for a complete overview of the available functions and methods. In practice, there often occurs both a grouping variable, which we need to use to account for between-group variation, and a treatment variable or even several such variables , which is of primary interest.
Exercises Exercises 8. Fit an appropriate grouped data model to the dataset S. The asymptotic regression model can be used to describe growth. Nielsen et al. Appendix A: Datasets and Models Table A. Models considered in this book.
Datasets used in this book. Self-starter Functions Table B. Available self-starter functions for nls. Model No. Self-starter function param. Table C. Packages used in this book. Packages and Functions Table C. Main functions used in this book. From quantal counts to mechanisms and systems: The past, present and future of biometrics in environmental toxicology.
Biometrics 56, — Bates, D. Statistical Models in S, chapter 10 Nonlinear Models. Chapman and Hall, Boca Raton, Fl. Nonlinear Regression Analysis and Its Applications. John Wiley and Sons, New York. Beckon, W. A general approach to modeling biphasic relationships. Box, G. An analysis of transformations. B 26, — Brain, P. An equation to describe dose responses where there is stimulation of growth at low dose. Weed Res. Brazzale, A. An R package bundle for higher order likelihood inference.
R News 5, 20— Burnham, K. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, New York, second edition. Cabanne, F. Cadima, E. Fish Stock Assessment Manual. Carroll, R. Transformation and Weighting in Regression. Chapman and Hall, New York. Cedergreen, N. New Phytol.
Improved empirical models describing hormesis. Christensen, M. Pest Management Science 59, — Dalgaard, P. Introductory Statistics with R. Springer-Verlag, New York. Ducharme, G. Biometrics 60, — Environment Canada Environment Canada, Ottawa. Fox, J. Nonlinear regression and nonlinear least squares. Gong, X. Zongkai University of Agriculture and Technology, Guangzhou.
Hamilton, D. Combining non-linear regressions that have unequal error variances and some parameters in common. Huet, S. Statistical Tools for Nonlinear Regression: Springer-Verlag, New York, second edition. Inderjit, Streibig, J. Physiologia Plantarum , — Jensen, J. Fitness of herbicide-resistant weed biotypes described by competition models.
European Weed Research Society. Kafadar, K. An application of nonlinear regression in research and development: A case study from the electronics industry.
Technometrics 36, — Laberge, G. Stabilization and plant uptake of N from 15 N-labelled pea residue Soil Biol. References McCullagh, P. Generalized Linear Models. Chapman and Hall, Boca Raton, Fl, second edition.
Motulsky, H. A Practical Guide to Curve Fitting. Oxford University Press, Oxford. Murrell, P. R Graphics. Nelder, J. Generalized linear models for enzyme-kinetic data. Biometrics 47, — Nellemann, C. Nielsen, O. Nonlinear mixed-model regression to analyze herbicide dose-response relationships. Weed Technol. Ultrasonic Reference Study Block D. Technical report.
A Guidance to Application. A Guidance to Application — Annexes. Pedersen, B. Front Matter Pages i-xi. Pages Getting Started. Starting Values and Self-starters. More on nls.
Model Diagnostics. Remedies for Model Violations. Uncertainty, Hypothesis Testing, and Model Selection. Grouped Data. Back Matter Pages About this book Introduction R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences.
Fitting Regression analysis grouped data linear regression regression diagnostics self starter functions transform-both-sides approach. Editors and affiliations. Bibliographic information DOI https:
- SUNDERKAND WITH MEANING PDF
- GK QUESTIONS WITH ANSWERS IN TAMIL PDF
- AIPMT PREVIOUS YEAR QUESTION PAPERS WITH SOLUTIONS PDF
- SQL INTERVIEW QUESTIONS WITH EXAMPLES PDF
- ASP.NET WITH C# TUTORIAL PDF
- STARTING OUT WITH JAVA 2ND EDITION PDF
- CURRENT AFFAIRS PDF IN HINDI WITH ANSWERS
- RUNNING SMALL MOTORS WITH PIC MICROCONTROLLERS PDF
- APTITUDE QUESTIONS AND ANSWERS WITH EXPLANATION FOR FRESHERS PDF