[Contents]  [Introduction]  [Overview]  [Preparing]  [Data]  [General]  [Statistics]  [References]  [Appendices]

[Statistical Commands]
[MRPP]  [MEDQ]  [MRBP]  [PTMP]  [MRPP Syntax]  [MEDQ Syntax]  [MRSP]  [SP Syntax]  [LAD]  [Regression Quantiles]  [LAD Syntax]  [OLS]  [OLS Syntax]  [COV]  [COV Syntax]


Least Absolute Deviation Regression (LAD) and Quantile Regression in Blossom Statistics

LAD regression differs from least squares (OLS) regression in that the sum of the absolute, not squared, deviations of the fit from the observed values is minimized to obtain estimates. LAD regression estimates the conditional median (0.5 regression quantile) of the dependent variable (y) given independent variables (X), and its generalization, regression quantiles, estimate the conditional quantile (τ, where 0 ≤ τ ≤1) of y given X. Since LAD does not use squared distances, it is an obvious companion to the MRPP which emphasizes Euclidean distances. Both LAD and MRPP satisfy the congruence principle (Mielke and Berry 2001). Asymptotic distributional theory for testing procedures for LAD regression are found in Dodge (1987) and a concise, readable implementation is provided by Birkes and Dodge (1993). Cade and Noon (2003) is a primer on quantile regression for ecologists.

The LAD command is used to compute a fit of one dependent response variable by one or more independent predictor variables. The parameters in a LAD regression are tested by using a test statistic that compares the proportionate reduction in sums of absolute deviations when passing from a reduced to full parameter model (i.e., a test statistic very similar to general F-tests in OLS regression). The drop in dispersion test statistic, Tobs , equals (sum of absolute deviations for reduced model - sum of absolute deviations for full model) / sum of absolute deviations for full model (Cade and Richards 1996, Cade 2003, 2005). Large values of Tobs are evidence against the null hypothesis that the parameter(s) equal(s) zero. If all slope parameters are tested simultaneously against a reduced parameter model that includes only the intercept, then the reference permutation distribution for the test statistic Tobs is obtained by randomly sampling the n! permutations of the dependent variable to the matrix of independent variables as described by Manly (1991) and calculating T for each permutation. However, if only a subset of parameters are being tested (partial model tests), then the reference permutation distribution for the test statistic Tobs is obtained by randomly sampling the n! permutations of residuals from the reduced model to the matrix of independent variables and calculating T for each permutation, following (Freedman and Lane 1983). Probabilities under the null hypothesis are given by (number of TTobs + 1)/ number of permutations sampled. Extensive power simulations demonstrated that these procedures maintained nominal error rates under the null hypothesis well across a variety of error distributions and design configurations (correlated and uncorrelated independent variables) provided the error distributions are independent and identically distributed (Cade and Richards 1996). Similar conclusions were reached for the same form of the test statistic used with OLS regression (Kennedy and Cade 1996, Anderson and Legendre 1999). The LAD permutation test is extended to any selected regression quantile (LAD is just 0.5 regression quantile) by replacing sums of absolute deviations in the test statistic computation with the appropriate sums of weighted absolute deviations used in regression quantile estimation (Cade and Richards 2006).

Cade (2003, 2005) and Cade and Richards (2006) found that Type I error rates were improved when testing subsets of parameters in quantile regression models by deleting all but a single zero residual associated with the fit to p - q parameters under the null hypothesis, where p is the number of parameters in the full model and q is the number of parameters being tested. Dropping the excessive zero residuals eliminates a mass of zeros in the distribution associated with the estimation process rather than sampling variation. As this reduces the length of the residual vector so that it no longer conforms to the n × p matrix X of predictors, the corresponding number of rows of X are randomly deleted at each permutation. This deletion of zero residuals and random deletion of rows of X are done by default for this drop in dispersion permutation test. In addition, Cade (2003, 2005) and Cade and Richards (2006) found that anytime the null, reduced (p - q) parameter model was constrained through the origin (no intercept), Type I error rates were improved by randomly recentering the residual vector since the residuals from the null model will no longer have zero associated with the specified quantile (or mean zero for OLS). This is implemented as a double permutation procedure where the first step at each iteration is to randomly recenter the selected quantile of the residual vector by a quantity generated as a random binomial for the specified quantile (e.g., 0.90). A similar operation is done for OLS regression where the quantile = 0.50 is always used to generate random binomials. The second step at each iteration then (the doubling of permutations) permutes these randomly recentered residuals to the matrix X. Because it is not always obvious when a model is constrained through the origin (e.g., some weighted model tests will require this and some won't), we elected to make the double permutation scheme selected by an option of the hypothesis testing command (HYP/DP).

If error distributions are not identical (heteroscedastic) then they must be transformed or weighted to be made approximately identical (homogeneous) (Cade and Richards 1996, Cade 2003, 2005, Cade and Richards 2006). Cade and Noon (2003) and Cade et al. (2005) discuss two weighting schemes, one where all quantiles have the same weights in a location\scale form of heterogeneity, and one where the weights must be estimated separately for the selected quantiles in more general models of heterogeneity. When the weights are based on a function of the independent variables (X), many of the permutation hypothesis tests will implicitly constrain the null model through the origin and the double permutation procedure will be required to maintain correct Type I error rates (Cade 2005, Cade et al. 2005, Cade and Richards 2006).

As an alternative test for LAD and its generalization to regression quantiles, we provide a quantile rank score statistic that is less sensitive to heterogeneous error distributions (Koenker 1994, Cade et al. 1999, Koenker and Machado 1994). The permutation version of the quantile rank score test (Cade 2003, Cade et al. 2006) maintains Type I error rates better than the asymptotic Chi-square distributional approximation (Koenker 1994) at smaller n and more extreme quantiles. It is important to note that the rank score test is not immune to the effects of heterogeneity and maintaining correct Type I error rates with this test often requires weighted estimates and test statistics just as the drop in dispersion test does (Cade 2003, Cade et al. 2005, Cade et al. 2006).

We will demonstrate the procedures with an example from Cade (1997), where lodgepole pine canopy cover was modeled as a function of basal area and density of the trees. Use the data file FRASERF.DAT. Issue the following command for the simple regression of canopy cover (LCC) as a linear function of basal area (APICO):

        >LAD LCC = CONSTANT + APICO /TEST

The model to be computed is written out algebraically where the dependent variable is LCC (lodgepole pine canopy cover) and the single independent variable is APICO (basal area of lodgepole pine adjusted for slope of terrain). The term "CONSTANT" indicates that LAD will estimate an intercept. If "CONSTANT" is left out the fit is forced through the origin. The TEST option indicates that the model is to be compared to a reduced model that is a straight line parallel to the X axis going through the median y value (LCC). Thus, the reduced model has just one parameter, the constant. In this test Blossom uses a default sample size of 5,000 permutations (including observed value) to approximate the permutation distribution.

Here are the results of the above LAD command:

                  Least Absolute Deviation Regression (LAD)
                                   LAD /TEST

Data Used
   Data file: FRASERF.DAT

LAD Regression:
     LCC = CONSTANT + APICO
Results
   Number of observations: 31
       Dependent Variable: LCC

   Independent variables           Regression coefficients
   CONSTANT                        8.788741166892983
   APICO                           1.053549693532391

   Number of iterations: 3
   Sum of absolute values of the residuals: 252.8516271215468
               Solution: SUCCESSFUL

Regression Evaluation:
   LAD Model:
     LCC = CONSTANT + APICO

Test Summary
     Number of permutations: 5000
         Random Number Seed: 37551954
      P-value of Full Model: 0.000200000000000000

Because canopy cover must be zero when basal area is zero, Cade (1997) used LAD regression models without an intercept term. Here the following command estimates the model above without an intercept:

        >LAD LCC = APICO

The output is given below:

                  Least Absolute Deviation Regression (LAD)

Data Used
   Data file: FRASERF.DAT

LAD Regression:
     LCC = APICO
Results
   Number of observations: 31
       Dependent Variable: LCC

   Independent variables           Regression coefficients
   APICO                           1.314340400402193

   Number of iterations: 1
   Sum of absolute values of the residuals: 267.4469224575146
               Solution: SUCCESSFUL

A multiple independent variable LAD regression is specified by adding the appropriate independent variable names to the LAD command. Here we consider the model used by Cade (1997) with lodgepole pine density (PICOPHA) as an additional explanatory variable:

        >LAD LCC = APICO + PICOPHA /SAVE

The added variables are assumed to be in the data file in USE. The SAVE option causes Blossom to save a labeled data file that includes the variables in the model and two new columns that contain the predicted y values (PRED) and the residuals (RESID). The saved file by default has the name of the file in use but with a ".LAD" file extension. To specify the saved file's name follow the save option with a file name e.g., SAVE = MODEL1.OUT. If the save file already exists you will be prompted with a choice to overwrite it or not. If a LAD command with the SAVE option appears in a SUBMIT file any preexisting save file is automatically overwritten. Here are the results:

                  Least Absolute Deviation Regression (LAD)

Data Used
   Data file: FRASERF.DAT

LAD Regression:
     LCC = APICO + PICOPHA
Results
   Number of observations: 31
       Dependent Variable: LCC

   Independent variables           Regression coefficients
   APICO                           0.9347387650253696
   PICOPHA                         0.01157237625015522

   Number of iterations: 3
   Sum of absolute values of the residuals: 127.5112822503336
               Solution: SUCCESSFUL
Output was appended to file "FRASERF.OUT"
Model, predicted, and residual values saved in labeled file: FRASERF.LAD

The regression function, observed values and residuals are plotted in Fig. 9.

fig09.gif

A polynomial regression on a single independent variable, its square, its cube, and so on can be performed by including in the data file a column containing the square, cube, and so on of the independent variable as well as the original independent and dependent variable. We expect the user to have access to a commercial statistical package to perform these data transformations and graph results outside of Blossom. USE the file FRASERF.DAT and enter the following LAD command:

        >LAD SCC = APIEN + PIENPHA + APIEN2

to estimate the model used in Cade (1997), where canopy cover of Engelmann spruce is predicted as a function of basal area (APIEN), basal area2 (APIEN2), and stem density (PIENPHA). The results are below and the regression surface is plotted in Fig. 10:

                  Least Absolute Deviation Regression (LAD)

Data Used
   Data file: FRASERF.DAT

LAD Regression:
     SCC = APIEN + PIENPHA + APIEN2
Results
   Number of observations: 31
       Dependent Variable: SCC

   Independent variables           Regression coefficients
   APIEN                           1.582478742021183
   PIENPHA                         0.008422609834239920
   APIEN2                          -0.03019457671060476

   Number of iterations: 5
   Sum of absolute values of the residuals: 85.83944442924574
               Solution: SUCCESSFUL
fig10.gif

Here the quadratic curvature implied by use of basal area2 can be tested with the HYPOTHESIS command to test whether the addition of the squared term yielded an improvement in fit. This is equivalent to testing the full model specified above against a reduced model that doesn't include the term (APIEN2) for basal area2. This is done by algebraically specifying the reduced parameter null model in the HYPOTHESIS command after the LAD command for the full parameter alternative model has been specified:

        >HYPOTHESIS SCC = APIEN + PIENPHA / DP NPERM = 10000

Here are the results for the HYPOTHESIS command where we optionally have selected the double permutation scheme because our null hypothesized model is constrained through the origin:

                  Least Absolute Deviation Regression (LAD)
    Hypothesis test, drop p - q - 1 zero residuals, with double permutation

Data Used
   Data file: FRASERF.DAT

HYPOTHESIS Regression:
     SCC = APIEN + PIENPHA
Results
   Number of observations: 31
       Dependent Variable: SCC

   Independent variables           Regression coefficients
   APIEN                           0.6134996537157700
   PIENPHA                         0.01352828777493369

   Number of iterations: 3
   Sum of absolute values of the residuals: 99.37978891627789
               Solution: SUCCESSFUL

Regression Evaluation:
   LAD Model:
     SCC = APIEN + PIENPHA + APIEN2
   Versus Hypothesis Model:
     SCC = APIEN + PIENPHA

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3211532
    Observed Test Statistic: 0.1577403555796886
   P-value of variables in full model but not in reduced model:
                             0.01400000000000000

The results indicate that the coefficient for the quadratic basal area term differs from zero withP = 0.014. Here both double permutation and dropping of all but 1 of the zero residuals under the null model were implemented because the null model includes 2 parameters but no intercept. If we had not used the double permutation option (/DP) and not deleted one of the zero residuals associated with the 2 parameters fit under the null model, then the P-value would be slightly smaller (0.0091) as in Cade (1997). The double permutation and dropping of zero residuals usually will increase the size of P-values slightly. A goodness-of-fit measure for regression models is often a useful summary statistic. It is possible to compute a LAD coefficient of determination for the full model with reference to some reduced model (usually that specifies just an intercept term) by estimating the full model and obtaining the sums of absolute deviation (call it SAF), then estimating the reduced parameter model and obtaining its sum of absolute deviations (call it SAR), and computing the coefficient of determination R1 = 1 - (SAF/SAR) (Cade and Richards 1996, Cade 1997). This can be extended to any selected regression quantile by replacing the sums of absolute deviations in the formula above with the sum of weighted absolute deviations minimized by regression quantiles (Koenker and Machado 1999). We've already obtained the sums for the full parameter model, SCC = APIEN + PIENPHA + APIEN2 as SAF = 85.839, so to obtain them for the reduced parameter model:

        >LAD SCC = CONSTANT
                  Least Absolute Deviation Regression (LAD)

Data Used
   Data file: FRASERF.DAT

LAD Regression:
     SCC = CONSTANT
Results
   Number of observations: 31
       Dependent Variable: SCC

   Independent variables           Regression coefficients
   CONSTANT                        10.00000000000000

   Number of iterations: 1
   Sum of absolute values of the residuals: 230.0000000000000
               Solution: SUCCESSFUL

yields a sum, SAR = 230.0 for and, thus, the coefficient of determination R1 = 1 - (85.839 / 230.000) = 0.627. This is interpreted as the model with variables APIEN, PIENPHA, and APIEN2 yield estimates of conditional medians of LCC with a 63% reduction in sum of absolute deviations compared to the model that is just a simple estimate of the median of LCC.

It is possible to specify greater or fewer permutations for calculating probabilities by specifying number of permutations as an option after either the test option for LAD command or as an option after HYPOTHESIS command. For example:

        >USE FRASERF.DAT
        >LAD LCC = APICO + PICOPHA / TEST NPERM = 10000 SEED = 123456

will test all slope parameters equal to zero using 10,000 permutations of y. Manly (1991) summarizes recommendations on number of permutations to use in Monte Carlo sampling procedures. More is better but comes at increased computational cost. Specifying the random number seed is done with the SEED = num option (set by default from the computer clock).

It is important to recognize that the LAD regression model (and generalization to regression quantiles discussed below) can be extended to any linear model design that might be estimated with OLS regression, including various variable transformations, and mixtures of continuous independent variables with indicator variables for categorical predictors. Extensive examples are in Mielke and Berry (2001). Indeed it is possible to use LAD regression for linear model analyses of multifactorial experimental designs, where the focus is on estimating changes in conditional medians rather than estimating changes in conditional means as typically done with OLS regression (Cade and Richards 1996, Mielke and Berry 2001).

As an example, consider the soap production example from Cade and Richards (1996), where soap scrap (y) is modeled as a linear function of production line speed (X1) and an indicator variable X2 = 1 for production line 1 and X2 = 0 for production line 2 (Fig 11). We are interested in testing whether the rates of change in soap scrap (y) as a function of line speed (X1) differs by production line (X2), which requires that we estimate a model with an interaction term (X1 X2).

fig11.gif

Open the data file NETER365.DAT and estimate the full parameter model with the interaction term specified

        >LAD SOAP = CONSTANT + SPEED + LINE + LXS

where the LXS is a column variable created by multiplying SPEED times LINE across all observations (done as a data transformation outside of Blossom). Here are the results:

                  Least Absolute Deviation Regression (LAD)

Data Used
   Data file: NETER365.DAT

LAD Regression:
     SOAP = CONSTANT + SPEED + LINE + LXS
Results
   Number of observations: 27
       Dependent Variable: SOAP

   Independent variables           Regression coefficients
   CONSTANT                        -3.197442310920450E-14
   SPEED                           1.333333333333333
   LINE                            107.6153846153847
   LXS                             -0.2102564102564106

   Number of iterations: 5
   Sum of absolute values of the residuals: 389.4358974358974
               Solution: SUCCESSFUL

Interpretation of the parameter estimates is identical to the interpretation for linear models estimated by OLS regression: the CONSTANT term is the intercept and the SPEED term (X1), is the slope for the regression of soap scrap on line speed for line 2, the LINE term (X2) is the difference between intercepts for the regressions for line 1 and line 2, and the LXS interaction term (X1X2) is the difference between slopes for the regressions for lines 1 and 2. We want to test the null hypothesis that the estimated interaction term is equal to zero, i.e., differences in slopes equals zero, by specifying the reduced parameter null model in the HYPOTHESIS command:

        >HYPOTHESIS SOAP = CONSTANT + SPEED + LINE/ NPERM = 10000

The results below indicated that there was moderate evidence (P = 0.046) that the estimated difference in slopes of -0.21 for the interaction term LXS was not equal to zero. Note that without dropping 2 of the 3 zero residuals in the null hypothesized model the P-value would be slightly smaller at P = 0.031.

                  Least Absolute Deviation Regression (LAD)
                Hypothesis test, drop p - q - 1 zero residuals

Data Used
   Data file: NETER365.DAT

HYPOTHESIS Regression:
     SOAP = CONSTANT + SPEED + LINE
Results
   Number of observations: 27
       Dependent Variable: SOAP

   Independent variables           Regression coefficients
   CONSTANT                        39.24999999999999
   SPEED                           1.183333333333333
   LINE                            60.41666666666669

   Number of iterations: 6
   Sum of absolute values of the residuals: 451.7500000000001
               Solution: NON-UNIQUE

Regression Evaluation:
   LAD Model:
     SOAP = CONSTANT + SPEED + LINE + LXS
   Versus Hypothesis Model:
     SOAP = CONSTANT + SPEED + LINE

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3231063
    Observed Test Statistic: 0.1600111930471427
   P-value of variables in full model but not in reduced model:
                             0.04550000000000000

Confidence intervals on parameters in a LAD regression model can be constructed by inverting the hypothesis testing process in an iterative fashion. This is accomplished by recognizing that testing for nonzero values of parameters in null hypotheses only requires a linear transformation of the dependent variable, y. For example, for the H0: β1 = λ, where λ is some hypothesized value of the parameter, you transform y to, say z, by z = y - λX1. The transformed values of the dependent variable, z, are then substituted for y in the regression model and estimation and hypothesis testing of the null H0: β1 = 0 proceed as before. Cade and Richards (1996) describe in more general matrix notation how you accomplish this linear transformation for multiple parameters. Note that the formula defaults to what is done automatically when we test null hypotheses that parameters, λ, equal zero. The complication that arises in implementing this procedure for a (1 -α)% confidence interval is that you must iterate through many possible values of λ to define the bounds on the set of values of λ with P ≥α for H0: β1 = λ. This can require many transformations of y, estimation with LAD, and testing the null hypothesis with the HYPOTHESIS command.

As an example of constructing confidence intervals, return to the model of lodgepole pine canopy cover as a function of pine basal area and stem density (Cade 1997). Endpoints of the 95% confidence interval for the basal area parameter (b1 = 0.935) were given as 0.81 - 1.05 in Cade (1997). This means that the transformations LCC - 0.81(APICO), call it Z81, and LCC - 1.05(APICO), call it Z105, should have approximate P = 0.05 when Z81 and Z105 are substituted for LCC in the regression model that includes APICO (basal area) and PICOPHA (tree density) as predictors for the partial model hypothesis of APICO. Any transformation of LCC by values between 0.81 and 1.05 ought to yield P > 0.05 and any outside of this interval ought to yield P ≤ 0.05. Minor discrepancies can occur, of course, because of the resampling variation inherent in Monte Carlo procedures and because of discreteness in the permutation distribution. We try and make the resampling error as small as possible by using a large number of permutations (NPERM ≥ 10,000). The file FRASERF.DAT includes the transformations Z81 and Z105, as well as Z90 = LCC - 0.90(APICO) and Z50 = LCC - 0.50(APICO). Here we know that the interval presented in Cade (1997) is slightly narrower than expected when the more recently developed double permutation scheme (Cade 2005, Cade and Richards 2006) is used because the null model is constrained through the origin. To run the hypothesis test corresponding to the null model that the parameter for APICO equals 0.81, issue the commands:

        >USE FRASERF.DAT
        >LAD Z81 = APICO + PICOPHA
        >HYPOTHESIS Z81 = PICOPHA/ NPERM = 10000 DP

The output indicates P = 0.0562, well within the Monte Carlo resampling variation of 0.05 as it should be and only slightly larger than P = 0.0523 obtained without the double permutation scheme.

                   Least Absolute Deviation Regression (LAD)

 Data Used
    Data file: FRASERF.DAT

 LAD Regression:
      Z81 = APICO + PICOPHA
 Results
    Number of observations: 31
        Dependent Variable: Z81

    Independent variables           Regression coefficients
    APICO                           0.1247387650158959
    PICOPHA                         0.01157237625034566

    Number of iterations: 4
    Sum of absolute values of the residuals: 127.5112822479933
                Solution: SUCCESSFUL

 ======================================================================


                  Least Absolute Deviation Regression (LAD)
                   Hypothesis Test with Double Permutation

Data Used
   Data file: FRASERF.DAT

HYPOTHESIS Regression:
     Z81 = PICOPHA
Results
   Number of observations: 31
       Dependent Variable: Z81

   Independent variables           Regression coefficients
   PICOPHA                         0.01282187066071429

   Number of iterations: 1
   Sum of absolute values of the residuals: 139.9837151962771
               Solution: SUCCESSFUL

Regression Evaluation:
   LAD Model:
     Z81 = APICO + PICOPHA
   Versus Hypothesis Model:
     Z81 = PICOPHA

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3245102
    Observed Test Statistic: 0.09781434809844124
   P-value of variables in full model but not in reduced model:
                             0.05620000000000000

Similarly, we can run the hypothesis test corresponding to the null that the parameter for APICO equals 0.50 by issuing the commands:

        >USE FRASERF.DAT
        >LAD Z50 = APICO + PICOPHA
        >HYPOTHESIS Z50 = PICOPHA/ NPERM = 10000 DP

The output here indicates the null hypothesis that the parameter equals 0.50 has P = 0.0001, much smaller than 0.05 so that this hypothesized parameter value must be outside the 95% confidence interval.

                   Least Absolute Deviation Regression (LAD)

 Data Used
    Data file: FRASERF.DAT

 LAD Regression:
      Z50 = APICO + PICOPHA
 Results
    Number of observations: 31
        Dependent Variable: Z50

    Independent variables           Regression coefficients
    APICO                           0.4347387650139864
    PICOPHA                         0.01157237624999170

    Number of iterations: 4
    Sum of absolute values of the residuals: 127.5112822468947
                Solution: SUCCESSFUL

 ======================================================================


                  Least Absolute Deviation Regression (LAD)
                   Hypothesis Test with Double Permutation

Data Used
   Data file: FRASERF.DAT

HYPOTHESIS Regression:
     Z50 = PICOPHA
Results
   Number of observations: 31
       Dependent Variable: Z50

   Independent variables           Regression coefficients
   PICOPHA                         0.01561545623843416

   Number of iterations: 1
   Sum of absolute values of the residuals: 211.1764422885150
               Solution: SUCCESSFUL

Regression Evaluation:
   LAD Model:
     Z50 = APICO + PICOPHA
   Versus Hypothesis Model:
     Z50 = PICOPHA

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3254170
    Observed Test Statistic: 0.6561392730693671
   P-value of variables in full model but not in reduced model:
                             0.000100000000000000

Presently, hypothesized values of the parameter and their transformations must be made iteratively by successive approximation, i.e., guess at values, compute the P-values, and then based on the size of the P-value successively move towards larger or lower values until you have values with P = α, which define the confidence interval endpoints. This can require 20 or more iterations depending on how close your initial choice of hypothesized parameter values are to the final values. It is possible to use asymptotic procedures described in Birkes and Dodge (1993) to help pick initial values for confidence interval endpoints that might be close to those obtained by the iterative permutation testing process.

Regression Quantiles

The QUANT = num | ALL option of the LAD regression command fits any specified conditional quantile as a linear regression model. LAD regression is the 0.50 (50th percentile) regression quantile. Various regression quantiles, e.g, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95 (i.e., 5th, 10th, 25th, 75th, 90th and 95th percentiles ), can be estimated to examine linear trends in a dependent variable (y) as a function of one or more independent variables (X). Selecting QUANT = ALL will yield all possible quantile regression estimates. If there is little variation in the errors across the independent variables (homogeneous errors), the regression quantiles will have similar slopes but different intercepts. However, if the errors are heterogeneous across the independent variables, then slopes and intercepts can differ greatly (Cade and Richards 1996, Terrell et al. 1996, Cade et al. 1999 Koenker and Machado 1999). Regression quantiles, thus, provide a way of modeling rates of change associated with heterogeneous variation in linear models without having to specify a functional link between conditional measures of means and and variances. Regression quantiles are especially useful when the consequences of over and under prediction differ in a linear model. Cade and Noon (2003) present a primer on quantile regression for ecologists and Koenker (2005) is a detailed monograph.

In studies of ecological limiting factors it is often expected that important measured processes operate as constraints on the response distribution (y) and, thus, we may focus on estimating regression quantiles associated with the upper percentiles (e.g., 90 - 99th) of the dependent variable, i.e., rates of change estimated are along the upper boundary of the distribution as it changes across the independent variables (Terrell et al. 1996, Cade et al. 1999, Haire et al. 2000, Cade and Guo 2000). Rates of change in the responses below the boundary constraint may be lower because of the impact of unmeasured processes (Cade et al. 1999). Many ecological processes can be considered constraints on responses, where rates of change estimated with regression quantiles for upper percentiles might yield new insights. Examples include animal responses to habitat, self-thinning in plants, algal productivity as a function of limiting nutrients, animal abundance and body size relations in macroecology, comparisons of local and regional species diversity, plant productivity as a function of species diversity, and competition field experiments. Estimating rates of change for endpoints of some interval of quantiles (e.g., 10th and 90th percentiles) also provides a flexible way to estimate prediction intervals for responses without resorting to untenable distributional assumptions.

Returning to the soap production example, after USEing the file NETER365.DAT, issue the following command:

        >LAD SOAP = CONSTANT + SPEED + LINE + LXS/QUANT = 0.50

The output indicates that the coefficients estimated are identical to those above without the QUANT = 0.5 option, because the 0.5 quantile is LAD regression. Notice also that both the sum of absolute deviations minimized in LAD regression and the sum of weighted absolute deviations minimized in regression quantiles are reported. The weights used when minimizing sums of absolute deviations in regression quantiles are τ for positive residuals and 1 - τ for zero and negative residuals, where 0 ≤ τ ≤ 1 is the selected quantile with QUANT = num. Thus, in this example the sum of weighted absolute deviations is exactly half the sum of absolute deviations.

                              Quantile Regression

Data Used
   Data file: NETER365.DAT

0.50 Quantile Regression:
     SOAP = CONSTANT + SPEED + LINE + LXS
Results
   Number of observations: 27
       Dependent Variable: SOAP

             For Quantile = 0.50
   Independent variables           Regression coefficients
   CONSTANT                        -3.197442310920450E-14
   SPEED                           1.333333333333333
   LINE                            107.6153846153847
   LXS                             -0.2102564102564106

   Number of iterations: 5
   Sum of absolute values of the residuals: 389.4358974358974
   Weighted sum of the absolute deviations: 194.7179487179487
               Solution: SUCCESSFUL

It is possible to test a full versus a reduced parameter regression quantile model with the default TEST and HYPOTHESIS options as in the LAD regression command, where the test statistic is identical in computation as for LAD except that the simple sum of absolute deviations are replaced with the sum of weighted absolute deviations (Cade 2005, Cade and Richards 2006). Validity of hypothesis tests for regression quantiles using this test statistic requires the same assumption of independent, identical error distributions as for LAD regression. However, we expect most applications of regression quantiles to be made when it is unreasonable to assume homogeneous variation across the independent variables, i.e., the identical error distribution assumption is violated. Therefore, we have included the regression quantile rank score test (Koenker 1994, Koenker and Machado 1999), its asymptotic P-value approximation with a Chi-square distribution , and a permutation approximation that makes use of the permutation test for OLS regression. Type I errors of the regression quantile rank score test are less sensitive to heterogeneous error distributions because the statistic is based on the sign of the residuals from the reduced parameter null model and not their size. However, as Cade (2003) and Cade et al. (2006) make abundantly clear, valid Type I error rates often will require appropriate weighted estimates and test statistics. This quantile rank score test is implemented with the option / RANKSCORE given with the HYPOTHESIS command.

As an example, consider the acorn production data as related to oak (Quercus spp.) forest characteristics (Schroeder and Vangilder 1997) as analyzed with regression quantiles by Cade et al. (1999). We will estimate 0.10 and 0.90 (10th and 90th percentiles) regression quantiles of annual acorn biomass (kg/ha) as a function of a forest suitability index based on canopy cover and number of oak species (Schroeder and Vangilder 1997). USE the data file ACORN.DAT and issue the command for a 0.10 regression quantile:

        >LAD WTPERHA = CONSTANT + OAKCCSI/ QUANT = 0.10

The command then is issued to test the hypothesis that the slope for the 0.10 quantile equals zero with the rank score test:

        >HYPOTHESIS WTPERHA = CONSTANT/ RANKSCORE NPERM = 10000

The output indicates that the estimated slope for the 0.10 regression quantile (21.8) likely differs from zero (P = 0.012).

                               Quantile Regression

 Data Used
    Data file: ACORN.DAT

 0.10 Quantile Regression:
      WTPERHA = CONSTANT + OAKCCSI
 Results
    Number of observations: 43
        Dependent Variable: WTPERHA

              For Quantile = 0.10
    Independent variables           Regression coefficients
    CONSTANT                        2.440204114342245
    OAKCCSI                         21.77188478448907

    Number of iterations: 2
    Sum of absolute values of the residuals: 1526.329804846275
    Weighted sum of the absolute deviations: 173.7303486881576
                Solution: SUCCESSFUL

 ======================================================================


                              Quantile Regression
                         Hypothesis Test of Rankscore

Data Used
   Data file: ACORN.DAT

0.10 Quantile HYPOTHESIS Regression:
     WTPERHA = CONSTANT
Results
   Number of observations: 43
       Dependent Variable: WTPERHA

             For Quantile = 0.10
   Independent variables           Regression coefficients
   CONSTANT                        12.82474000000000

   Number of iterations: 1
   Sum of absolute values of the residuals: 1737.337406000000
   Weighted sum of the absolute deviations: 194.2280894000001
               Solution: SUCCESSFUL

Regression Evaluation:
   0.10 Quantile Regression Model:
     WTPERHA = CONSTANT + OAKCCSI
   Versus Hypothesis Model at Quantile 0.10:
     WTPERHA = CONSTANT

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3274240
  Observed Rank Score Test Statistic: 0.1842141756281341
          P-value of Rank Score Test: 0.009500000000000000
     Asymptotic Rank Score Statistic: 6.326031751452834
        (Distributed as Chi-square with degrees of
         freedom equal to difference in number of
         parameters between full and reduced models.)
       P-Value of Asymptotic RS Stat: 0.01189782454124833

Similarly, we can estimate the 0.90 regression quantile for the same functional relation by issuing the command:

        >LAD WTPERHA = CONSTANT + OAKCCSI/ QUANT = 0.90

followed by the command:

        >HYPOTHESIS WTPERHA = CONSTANT/ RANKSCORE NPERM = 10000

The output for the 0.90 regression quantile indicates that the rate of change of acorn biomass with the suitability index is 5 times greater (102.3) at the 90th percentile of the distribution compared to the 10th percentile of the distribution (Fig 12). Clearly, there is heterogeneous variation in the acorn biomass changes across the acorn suitability index, with only larger biomass occurring at higher values of the suitability index. The estimated slope of the 0.90 regression quantile also likely differs from zero (P = 0.040). Here because of the heterogeneity, improved Type I error rates could be obtained by using weighted estimates with the rank score tests.

                               Quantile Regression

 Data Used
    Data file: ACORN.DAT

 0.90 Quantile Regression:
      WTPERHA = CONSTANT + OAKCCSI
 Results
    Number of observations: 43
        Dependent Variable: WTPERHA

              For Quantile = 0.90
    Independent variables           Regression coefficients
    CONSTANT                        14.44857045511282
    OAKCCSI                         102.3380295448872

    Number of iterations: 3
    Sum of absolute values of the residuals: 1722.332774255853
    Weighted sum of the absolute deviations: 268.5396005222195
                Solution: SUCCESSFUL

 ======================================================================


                              Quantile Regression
                         Hypothesis Test of Rankscore

Data Used
   Data file: ACORN.DAT

0.90 Quantile HYPOTHESIS Regression:
     WTPERHA = CONSTANT
Results
   Number of observations: 43
       Dependent Variable: WTPERHA

             For Quantile = 0.90
   Independent variables           Regression coefficients
   CONSTANT                        89.92350999999999

   Number of iterations: 1
   Sum of absolute values of the residuals: 1951.434255999999
   Weighted sum of the absolute deviations: 324.0588976000000
               Solution: SUCCESSFUL

Regression Evaluation:
   0.90 Quantile Regression Model:
     WTPERHA = CONSTANT + OAKCCSI
   Versus Hypothesis Model at Quantile 0.90:
     WTPERHA = CONSTANT

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3288861
  Observed Rank Score Test Statistic: 0.1151008696583272
          P-value of Rank Score Test: 0.03260000000000000
     Asymptotic Rank Score Statistic: 4.197619091511312
        (Distributed as Chi-square with degrees of
         freedom equal to difference in number of
         parameters between full and reduced models.)
       P-Value of Asymptotic RS Stat: 0.04048077691474605
fig12.gif

Estimates for other regression quantiles can be obtained by changing the value used in the option QUANT = num. Note that the P-values approximated by the permutation evaluation of the rank score tests are similar to those produced by the asymptotic Chi-square distributional approximation (uses a Chi-square distribution with degrees of freedom equal to difference in number of parameters in full versus reduced models). Although the permutation P-values are slightly smaller than those for the asymptotic Chi-square approximation, the differences may be attributable just to the resampling error associated with the Monte Carlo approximation. Simulation research in Cade (2003) and Cade et al. (2006) established that the permutation version of the rank score test maintains valid Type I error rates at more extreme quantiles (τ) with smaller n than does the Chi-square distributional approximation.

Confidence intervals based on the regression quantile rank score statistic can be formed by a process identical to that described above for LAD regression. However, if you want to use the asymptotic Chi-square approximation of P-values for computing confidence intervals (Koenker 1994, Cade et al. 1999, Koenker and Machado 1999), there are fast implementations in linear programming algorithms available for S-Plus, R, and SAS.

A multiple regression quantile example is provided by Cade et al. (1999), where glacier lily (Erythronium grandiflorum) seedlings are linearly related to the number of flowers and an index of rockiness in n = 256 contiguous 2 × 2 m quadrats (Fig. 13).

To estimate the 95th regression quantile model issue the following commands:

        >USE LILY.DAT
        >LAD SEEDLINGS = CONSTANT + FLOWERS + ROCKINESS/ QUANT = 0.95

and obtain the following output:

                              Quantile Regression

Data Used
   Data file: LILY.DAT

0.95 Quantile Regression:
     SEEDLINGS = CONSTANT + FLOWERS + ROCKINESS
Results
   Number of observations: 256
       Dependent Variable: SEEDLINGS

             For Quantile = 0.95
   Independent variables           Regression coefficients
   CONSTANT                        20.29915560916767
   FLOWERS                         0.08504221954161631
   ROCKINESS                       -0.08986731001206266

   Number of iterations: 5
   Sum of absolute values of the residuals: 3800.259951749095
   Weighted sum of the absolute deviations: 272.7809710494573
               Solution: SUCCESSFUL

The estimates indicate a 0.085 increase in seedling numbers with each increase in flower numbers at a given level of rockiness, and a decrease of 0.090 of seedling numbers with each increase in unit of the rockiness index. We can test that these parameters jointly are equal to zero by comparing the full parameter model above with the reduced parameter model having just an intercept by the command:

        >HYPOTHESIS SEEDLINGS = CONSTANT/ RANKSCORE NPERM = 10000
fig13.gif

The output indicates some evidence that at least one of the parameters is unlikely to equal zero (P = 0.030 for asymptotic approximation and P = 0.028 for permutation approximation).

                              Quantile Regression
                         Hypothesis Test of Rankscore

Data Used
   Data file: LILY.DAT

0.95 Quantile HYPOTHESIS Regression:
     SEEDLINGS = CONSTANT
Results
   Number of observations: 256
       Dependent Variable: SEEDLINGS

             For Quantile = 0.95
   Independent variables           Regression coefficients
   CONSTANT                        16.00000000000000

   Number of iterations: 1
   Sum of absolute values of the residuals: 3377.000000000000
   Weighted sum of the absolute deviations: 301.1500000000009
               Solution: SUCCESSFUL

Regression Evaluation:
   0.95 Quantile Regression Model:
     SEEDLINGS = CONSTANT + FLOWERS + ROCKINESS
   Versus Hypothesis Model at Quantile 0.95:
     SEEDLINGS = CONSTANT

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3314138
  Observed Rank Score Test Statistic: 0.02856780299231781
          P-value of Rank Score Test: 0.02700000000000000
     Asymptotic Rank Score Statistic: 7.016678099403375
        (Distributed as Chi-square with degrees of
         freedom equal to difference in number of
         parameters between full and reduced models.)
       P-Value of Asymptotic RS Stat: 0.02994661298786494

We can test each of the parameters individually by issuing the series of commands:

        >HYPOTHESIS SEEDLINGS = CONSTANT + FLOWERS / RANKSCORE NPERM = 10000
        >HYPOTHESIS SEEDLINGS = CONSTANT + ROCKINESS/ RANKSCORE NPERM = 10000

The output indicates stronger evidence that the parameter for ROCKINESS does not equal zero (P = 0.041) than for the parameter for FLOWERS (P = 0.079).

                              Quantile Regression
                         Hypothesis Test of Rankscore

Data Used
   Data file: LILY.DAT

0.95 Quantile HYPOTHESIS Regression:
     SEEDLINGS = CONSTANT + FLOWERS
Results
   Number of observations: 256
       Dependent Variable: SEEDLINGS

             For Quantile = 0.95
   Independent variables           Regression coefficients
   CONSTANT                        18.58181818181819
   FLOWERS                         -0.07272727272727292

   Number of iterations: 4
   Sum of absolute values of the residuals: 3460.490909090909
   Weighted sum of the absolute deviations: 296.0627272727274
               Solution: SUCCESSFUL

Regression Evaluation:
   0.95 Quantile Regression Model:
     SEEDLINGS = CONSTANT + FLOWERS + ROCKINESS
   Versus Hypothesis Model at Quantile 0.95:
     SEEDLINGS = CONSTANT + FLOWERS

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3325381
  Observed Rank Score Test Statistic: 0.01714721622155137
          P-value of Rank Score Test: 0.03830000000000000
     Asymptotic Rank Score Statistic: 4.169862180225807
        (Distributed as Chi-square with degrees of
         freedom equal to difference in number of
         parameters between full and reduced models.)
       P-Value of Asymptotic RS Stat: 0.04114914674750358

 ======================================================================


                              Quantile Regression
                         Hypothesis Test of Rankscore

Data Used
   Data file: LILY.DAT

0.95 Quantile HYPOTHESIS Regression:
     SEEDLINGS = CONSTANT + ROCKINESS
Results
   Number of observations: 256
       Dependent Variable: SEEDLINGS

             For Quantile = 0.95
   Independent variables           Regression coefficients
   CONSTANT                        22.00000000000000
   ROCKINESS                       -0.06521739130434784

   Number of iterations: 4
   Sum of absolute values of the residuals: 3918.434782608696
   Weighted sum of the absolute deviations: 278.5652173913043
               Solution: SUCCESSFUL

Regression Evaluation:
   0.95 Quantile Regression Model:
     SEEDLINGS = CONSTANT + FLOWERS + ROCKINESS
   Versus Hypothesis Model at Quantile 0.95:
     SEEDLINGS = CONSTANT + ROCKINESS

Test Summary
     Number of permutations: 10000
         Random Number Seed: 3330654
  Observed Rank Score Test Statistic: 0.01253676544419210
          P-value of Rank Score Test: 0.07410000000000000
     Asymptotic Rank Score Statistic: 3.094468182372695
        (Distributed as Chi-square with degrees of
         freedom equal to difference in number of
         parameters between full and reduced models.)
       P-Value of Asymptotic RS Stat: 0.07855881754213401

Both P-values are consistent with the 90% confidence intervals given in Cade et al. (1999) that did not overlap zero for either variable. Note that the permutation P-values are slightly smaller than the Chi-square distribution approximation. The confidence intervals in Cade et al. (1999) were based on inverting the asymptotic Chi-square distribution approximation of the rank score statistic as part of the linear programming solution for regression quantiles that are available for S-Plus (see Ecological Archives E080-001 for these routines). Because of the heterogeneity evident in this model, confidence intervals and rank score testing would be better based on weighted estimates (Cade et al. 2006)

The use of all quantile regression estimates and weighting is provided for an example relating Lahontan cutthroat trout (Oncorhynchus clarki henshawi) numbers per meter of stream to stream width:depth ratio for n = 71 observations of streams across years in Nevada (Dunham et al. 2002, Cade 2005, Cade et al. 2006). The scatter plot in Figure 14 (A) indicates moderate heterogeneity and some nonlinearity in the relationship. Dunham et al. (2002) chose to use a nonlinear model y = exp(β0 + β1X1 + ε) estimated in the linear scale by taking natural logarithms of both sides of the equation. Cade (2005), Cade et al. (2006), and Cade and Richards (2006) also used weighted estimates, where the coefficients of the weight function w = (1.310 -0.0017X1)-1 were estimated from the average pairwise differences (by using expected value obtained from multiresponse sequence procedure) between all possible quantile regression estimates for β0 and for β1 obtained by using the QUANT = ALL option:

        >USE LAHONTAN.DAT
        >LAD LNLCTM = CONSTANT +WIDRAT/ QUANT = ALL SAVE=ALLTROUT1.TXT
                            All Quantile Regressions

Data Used
   Data file: LAHONTAN.DAT

All Quantile Regressions From Command:
     LNLCTM = CONSTANT + WIDRAT

       Dependent Variable: LNLCTM
    Independent Variables:
         CONSTANT
         WIDRAT

       Number of Observations: 71
   Number of Model Parameters: 2
          Number of Solutions: 77
          Solution Result Was: SUCCESSFUL

 Full solution results are written to file LAHONTAN.OUT
Output was appended to file "LAHONTAN.OUT"
All quantile solutions saved in file: ALLTROUT1.TXT
fig14.gif

The file ALLTROUT1.TXT contains a row for each unique interval of quantiles, with column variables specifying the upper endpoint of the quantile interval (Quantile), the objective function minimized (ObjFuncSol is weighted sum of absolute deviations), the predicted value for that quantile at the mean of the independent variables (PredY_Xbar), and the parameter estimates (here, b_CONSTANT and b_WIDRAT). Plots of the parameter estimates by quantile suggested the linear location-scale (in log scale) form of heterogeneity was a reasonable approximation so that a single weight function could reasonably be applied to all quantiles. The empirical distribution plots for each parameter estimate by quantile in Figure 14 (B and C) were made from the weighted estimates by connecting the point estimates with an appropriate step function (Figure 14 B and C). The weighted estimates were made by multiplying all variables (LNLCTM, a column of 1's for the intercept, and WIDRAT) by the weights (WT) to form the variables WTLNLCTM, WT, and WTWIDRAT. The model was estimated as:

        >USE LAHONTAN.DAT
        >LAD WTLNLCTM = WT + WTWIDRAT/QUANT = ALL SAVE=ALLTROUT2.TXT
                            All Quantile Regressions

Data Used
   Data file: LAHONTAN.DAT

All Quantile Regressions From Command:
     WTLNLCTM = WT + WTWIDRAT

       Dependent Variable: WTLNLCTM
    Independent Variables:
         WT
         WTWIDRAT

       Number of Observations: 71
   Number of Model Parameters: 2
          Number of Solutions: 79
          Solution Result Was: SUCCESSFUL

 Full solution results are written to file LAHONTAN.OUT
Output was appended to file "LAHONTAN.OUT"
All quantile solutions saved in file: ALLTROUT2.TXT

Note that the variable WT ( = 1 × WT) replaces the usual CONSTANT term because the weighted model requires that weights are multiplied by all independent variables including the column of 1's for the constant. The confidence intervals formed around the parameter estimates by quantiles in Figure 14 (B and C) were made by using the drop in dispersion permutation test with double permutation (because null models for weighted estimates were constrained through the origin). Cade and Richards (2006) formed 90% confidence intervals at quantiles = 0.05, 0.10, 0.15 ... 0.90, 0.95 by successive iteration of hypothesized values as explained for LAD regression starting on page 82. These intervals were only slightly narrower than intervals formed by inverting the permutation version or Chi-square distributional approximation of the rank score test (Cade et al. 2006). Here, we provide an example of the hypothesis tests for the weighted 0.90 quantile regression estimates:

        >LAD WTLNLCTM = WT + WTWIDRAT/QUANT=0.90
        >HYP WTLNLCTM = WT/NPERM=100000 DP
        >HYP WTLNLCTM = WTWIDRAT/NPERM = 100000 DP
                               Quantile Regression

 Data Used
    Data file: LAHONTAN.DAT

 0.90 Quantile Regression:
      WTLNLCTM = WT + WTWIDRAT
 Results
    Number of observations: 71
        Dependent Variable: WTLNLCTM

              For Quantile = 0.90
    Independent variables           Regression coefficients
    WT                              0.05762007758715407
    WTWIDRAT                        -0.02154147781141880

    Number of iterations: 2
    Sum of absolute values of the residuals: 82.41461016146272
    Weighted sum of the absolute deviations: 8.796595654366030
                Solution: SUCCESSFUL

 ======================================================================


                              Quantile Regression
                   Hypothesis Test with Double Permutation

Data Used
   Data file: LAHONTAN.DAT

0.90 Quantile HYPOTHESIS Regression:
     WTLNLCTM = WT
Results
   Number of observations: 71
       Dependent Variable: WTLNLCTM

             For Quantile = 0.90
   Independent variables           Regression coefficients
   WT                              -0.6763883967527673

   Number of iterations: 1
   Sum of absolute values of the residuals: 80.29814001870930
   Weighted sum of the absolute deviations: 9.831009518510045
               Solution: SUCCESSFUL

Regression Evaluation:
   0.90 Quantile Regression Model:
     WTLNLCTM = WT + WTWIDRAT
   Versus Hypothesis Model at Quantile 0.90:
     WTLNLCTM = WT

Test Summary
     Number of permutations: 100000
         Random Number Seed: 46188336
    Observed Test Statistic: 0.1175925215603838
   P-value of variables in full model but not in reduced model:
                             0.002360000000000000

 ======================================================================


                              Quantile Regression
                   Hypothesis Test with Double Permutation

Data Used
   Data file: LAHONTAN.DAT

0.90 Quantile HYPOTHESIS Regression:
     WTLNLCTM = WTWIDRAT
Results
   Number of observations: 71
       Dependent Variable: WTLNLCTM

             For Quantile = 0.90
   Independent variables           Regression coefficients
   WTWIDRAT                        -0.02022132436065866

   Number of iterations: 1
   Sum of absolute values of the residuals: 81.25775342749679
   Weighted sum of the absolute deviations: 8.807900858607981
               Solution: SUCCESSFUL

Regression Evaluation:
   0.90 Quantile Regression Model:
     WTLNLCTM = WT + WTWIDRAT
   Versus Hypothesis Model at Quantile 0.90:
     WTLNLCTM = WTWIDRAT

Test Summary
     Number of permutations: 100000
         Random Number Seed: 46230889
    Observed Test Statistic: 0.001285179481489562
   P-value of variables in full model but not in reduced model:
                             0.7816900000000000

Note that both null hypothesized models above do not include a CONSTANT for a column of 1's because of the weighting scheme, so that the double permutation option DP was used to provide better Type I error rates. The output indicates a strong, nonzero slope but an intercept that doesn't differ from zero (in the log scale) for the 0.90 regression quantile. Notice that these results are consistent with the 90% CI which indicate nonzero slopes for quantiles ≥ 0.80 and nonzero intercepts for quantiles ≤ 0.70

There are many alternative approaches for estimating weights discussed in Cade et al. (2005), Cade and Richards (2006), and Koenker (2005).

The LAD Command Syntax

The LAD command can be used to fit a variety of least absolute deviation regressions. The HYPOTHESIS command allows the specification of reduced parameter LAD regression model to compare with the full parameter regression model specified in the main LAD command. The regressions are run and the tests performed upon entering the LAD and HYPOTHESIS commands. If the QUANT = num option is specified, all subsequent testing is done on the specified conditional quantile.

LAD dep. var = [CONSTANT +] ind. var1 + ind. var2 + ...

[/TEST | NPERM = num | SEED = num | SAVE [= file name] | QUANT = num | ALL]

HYPOTHESIS dep. var = [CONSTANT +] ind. var1 + ind. var2 + ...

[/NPERM = num | DP | SEED = num | RANKSCORE | SAVETEST [= file name]]

Items to be supplied by the user are given in lower case in italics. Items in square brackets are optional. The vertical line (|) can be read as "or" and separates different options that can be specified. They can be specified in any order. The single variable named on the left of the equal sign is the dependent variable. The independent variables are listed and separated by plus signs to indicate the form of the regression model. If the model is to include a constant (intercept term) the term CONSTANT must be placed right after the equal sign.

LAD options follow the slash (/) character. The TEST option causes the default test of all slope parameters equal to zero. The NPERM option allows the user to specify more or fewer permutations than the default of 5,000 used in approximating probabilities. The SEED option allows the user to specify a random number seed; by default the program uses a value from the computer clock. The SAVE option specifies that predicted values, residuals, and model variables are to be saved to a file with the name of the file in use but with a "LAD" file extension. The SAVEd file can also be named by supplying a file name.

The QUANT = num | ALL option specifies a regression quantile, where the number specified must be greater than 0.0 and less than 1.0. Specifying QUANT = ALL yields all quantile regression estimates and when combined with a SAVE = file name, the parameter estimates by quantile are saved in a file with estimates (column variables) by quantiles (rows).

The HYPOTHESIS command is used to specify a reduced parameter null model against which to test the regression given by the current LAD (/QUANT=num) command. Note that it is not possible to test a HYPOTHESIS when all quantiles were selected with the LAD/QUANT = ALL option. The dependent variable should be the same as that on the most recent LAD command line and a reduced number of the same independent variables used in the LAD command must be given. The syntax of HYPOTHESIS is similar to LAD with NPERM and SEED options. The TEST option need not be given on the LAD command line if a HYPOTHESIS is specified. The RANKSCORE option bases hypothesis tests on a scoring function of the sign of the residuals for the reduced parameter model specified by HYPOTHESIS. Asymptotic Chi-square distributional and permutation approximations of P-values are both provided. The DP option provides double permutation for null models that are constrained through the origins, for either the drop in dispersion permutation test or the RANKSCORE test option. The SAVETEST = file name option allows the Monte Carlo resampled test statistics to be saved into a single column variable in the specified file, where the first value is always the observed test statistic value.

Terse output provided following an OUTPUT/TERSE command for LAD is the USEd file name, dependent variable name, quantile selected if the QUANT=num option is used, sum of absolute deviations (or quantile weighted sum of absolute deviations if QUANT=num option used), estimated coefficients for intercept to p independent variables, and P-value if TEST option is used. If a HYP command follows a LAD command then the same summary information for the full parameter model is repeated as on the LAD terse output followed by columns for the observed test statistic and P-value (if RANKSCORE option is used then observed test statistic and P-values are provided for the permutation test followed by those for the asymptotic Chi-square approximation.


[Statistical Commands]
[MRPP]  [MEDQ]  [MRBP]  [PTMP]  [MRPP Syntax]  [MEDQ Syntax]  [MRSP]  [SP Syntax]  [LAD]  [Regression Quantiles]  [LAD Syntax]  [OLS]  [OLS Syntax]  [COV]  [COV Syntax]

[Contents]  [Introduction]  [Overview]  [Preparing]  [Data]  [General]  [Statistics]  [References]  [Appendices]