[Contents]  [Introduction]  [Overview]  [Preparing]  [Data]  [General]  [Statistics]  [References]  [Appendices]

[Statistical Commands]
[MRPP]  [MEDQ]  [MRBP]  [PTMP]  [MRPP Syntax]  [MEDQ Syntax]  [MRSP]  [SP Syntax]  [LAD]  [Regression Quantiles]  [LAD Syntax]  [OLS]  [OLS Syntax]  [COV]  [COV Syntax]


Multiresponse Permutation Procedure (MRPP) in Blossom Statistics

MRPP is best introduced with an example. The following is a bivariate example adapted from Biondini et al. (1985). A similar example is found in Zimmerman et al. (1985), Biondini et al. (1988), and a univariate example is given in Slauson (1988).

In Figure 1 the values of two variables, x and y, are shown for seven observations in two groups, A and B.

fig01.gif

The objects in groups A and B seem to be clustered or concentrated in different parts of the x-y plane representing the two response (measured) variables x and y. One way to determine if the two groups are so clustered is to measure or calculate the distances between all pairs of members of each group and calculate an average distance for each group (A = 1.609, B = 1.344). If group members are clustered together, then the intragroup average distances will be small compared to cases where the group members are spread out and overlap more with other groups. For example, Figure 2 shows the same data except that the groups that observations A3 and B2

fig02.gif

belong to are switched. In this case the intragroup average distances will be greater than for the case first shown above (A = 2.419, B = 1.717).

The strategy of MRPP is to compare the observed intragroup average distances with the average distances that would have resulted from all the other possible combinations of the data under the null hypothesis. The test statistic, usually symbolized with a lower case delta, δ, is the average of the observed intragroup distances weighted by relative group size, 3/7 and 4/7 in this case. The observed delta (δobs) is compared to the possible deltas (δ) resulting from every permutation of the above 7 points into 2 groups of 3 and 4 members. If the hypothesis that the two groups are not different (the null hypothesis) is true, then each of the possible assignments (permutations) is equally likely. In this example there are 35 permutations possible, each with a 1/35 (1/35 = 0.0286) chance of occurring. Here are the Blossom commands to read in the data file, EXAMPLE1.DAT, and compute the MRPP results.

        >USE EXAMPLE1.DAT / GROUP X_COORD Y_COORD
        >MRPP X_COORD Y_COORD * GROUP / NOCOM EXACT

X_COORD and Y_COORD are the 2 response variables, GROUP is the grouping variable, and the exact version of MRPP is chosen since this is such a small sample. NOCOM signifies that no multivariate commensuration is desired. Blossom by default will commensurate multiple variables by the average Euclidean distance for each variable ignoring group structure. Think of this as similar to the usual parametric approach of standardizing variables to unit variance (average squared Euclidean distance).

Here are the results:

             Exact Multi-Response Permutation Procedure (EMRPP)

 Data Used
            Data File: EXAMPLE1.DAT
    Grouping Variable: GROUP
   Response Variables: X_COORD, Y_COORD

Specification of Analysis
   Number of observations: 7
         Number of groups: 2
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1

Group Summary
  Group Value                 Group Size
  1.000000000000000                    3
  2.000000000000000                    4

Variables are not commensurated

Results
                                    Observed delta = 1.457822456131483
   Probability (Exact) of a smaller or equal delta = 0.02857142857142857

The probability value (P-value) is 0.0286 which means that the observed delta was the smallest among the 35 possible deltas.

Use the EXACT option for MRPP with caution for it can take a long time if the sample sizes are greater than about 20, depending on the computer.

By default MRPP does not compute exact probabilities but uses an approximation of the exact distribution of the test statistic (δ) to estimate the P-value. The default approximation is based on the first three exact moments (mean, variance, and skewness) of the permutation distribution evaluated as a Pearson type III distribution (Berry and Mielke 1983, Iyer et al. 1983, Mielke and Berry 2001). The moments approximation avoids the simulation error associated with Monte Carlo resampling tests (Mielke and Berry 1982; Berry and Mielke 1985). However, we offer the option of approximating the permutation distribution of the test statistic with a Monte Carlo resampling procedure with the option NPERM. By default NPERM uses 5,000 (4,999 + observed delta) random samples to approximate the permutation distribution but the user may specify any desirable number of resamples, e.g., NPERM = 10000. Most examples we've encountered yield similar P-values from the Monte Carlo resampling and Pearson type III distribution approximations, but it is possible for the Monte Carlo resampling approximation to yield better estimates for some problems, e.g., with a large number of discrete values clumped in some region of the data space or if interest is in upper tail probabilities (e.g., P > 0.90) associated with detecting regularity of spatial data distributions. Further investigation of these properties is an open area for research.

The next example shows how to emulate a 2-sample t-test with MRPP. Consider the data for two groups in Figure 3 (from Mielke 1986). The single response variable is represented on the horizontal axis and the number of observation on the vertical.

fig03.gif

Group 1 (median = 15.10, mean = 15.09 ) and 2 (median = 15.40, mean = 15.42 ) appear to differ slightly (0.3) in central tendency. To test for equality of means with the t-test, USE the data file EXAMPLE3.DAT, specify a title if desired, and enter the following MRPP command.

        >MRPP RESPONSE * GROUP / V=2 C=2

The V = 2 option causes MRPP to compute squared Euclidean distances (V = 1 is the default value and specifies Euclidean distance). The C = # option specifies how the intragroup distances are to be averaged. If C = 2 is specified, then the analysis mimics the classical parametric t-test, where the group distances are weighted by the relative degrees of freedom. If C = 1 then the intragroup distances are weighted by relative group size, then averaged to arrive at delta. This is the default value. In this example, since the group sizes are equal, the choice of C does not matter. In general choose C = 2 and V = 2 to calculate a test that mimics the classical parametric t- and F-tests for univariate data and Hotelling's T-square or MANOVA for multivariate data. Here are the results of the above MRPP command:

                 Multi-Response Permutation Procedure (MRPP)

 Data Used
            Data File: EXAMPLE3.DAT
    Grouping Variable: GROUP
   Response Variables: RESPONSE

Specification of Analysis
   Number of observations: 30
         Number of groups: 2
        Distance exponent: 2.000000000000000
         Weighting factor: (n(I)-1)/sum(n(I)-1) = C(I) = 2

Group Summary
  Group Value                 Group Size  Group Distance
  1.000000000000000                   15  0.02133333333333326
  2.000000000000000                   15  0.02704761904761895

Results
   Delta Observed = 0.02419047619047611
   Delta Expected = 0.08082758620689738
   Delta Variance = 1.563412523154374E-05
   Delta Skewness = -2.564972664937680

               Standardized test statistic = -14.32399931589521
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 1.925807694750630E-06

The very small P-value (0.0000019) indicates that these two samples are unlikely to come from populations with the same mean, i.e., they are different. The two sample t-test based on normal theory also gives a very low P-value for these data (P < 0.000001).

Now consider the same data, but with one difference, viz, a change in one of the 30 data values (Fig. 4).

fig04.gif

To compare these samples USE the file EXAMPLE4.DAT and issue the following MRPP command.

        >MRPP RESPONSE * GROUP / V=2 C=2

Here are the results:

                  Multi-Response Permutation Procedure (MRPP)

 Data Used
           Data File: EXAMPLE4.DAT
   Grouping Variable: GROUP
  Response Variables: RESPONSE

 Specification of Analysis
    Number of observations: 30
          Number of groups: 2
         Distance exponent: 2.00000000000000
          Weighting factor: (n(I)-1)/sum(n(I)-1) = C(I) = 2

 Group Summary
   Group Value        Group Size  Group Distance
   1.00000000000000   15          0.213333333333333E-001
   2.00000000000000   15          1.33561904761905

 Results
    Delta Observed = 0.678476190476191
    Delta Expected = 0.664275862068965
    Delta Variance = 0.255788784003516E-003
    Delta Skewness = -0.989342490484899

                Standardized test statistic = 0.887886882113226
                   Probability (Pearson Type III) of a
                     smaller or equal delta = 0.814363486267441

Now the P-value is quite large (0.81) indicating that it is likely that these samples come from the same population, i.e., there is no difference between the groups. The variances of the 2 groups differ considerably as evidenced by the average within group distance (when squared Euclidean distances are used this value is twice the variance). The medians are still 15.10 and 15.40, respectively, but the means now are 15.09 and 15.23, respectively. The parametric two-sample t-test also results in a large P-value (0.54). The reason for the discrepancy in results for data in which only one value is changed is the use of squared distance. In the squared Euclidean distance analysis space the distance of the outlier from the bulk of the data is exaggerated because it is squared. Now compare the results of analyzing the data of Example 4 in a space corresponding to the geometric space of the data itself. Issue the following command after using the data in EXAMPLE4.DAT.

        >MRPP RESPONSE * GROUP / V=1 C=1

which, since these are the default values, is equivalent to

        >MRPP RESPONSE * GROUP

Here are the results (EXAMPLE4B.OUT).

                 Multi-Response Permutation Procedure (MRPP)

 Data Used
            Data File: EXAMPLE4.DAT
    Grouping Variable: GROUP
   Response Variables: RESPONSE

Specification of Analysis
   Number of observations: 30
         Number of groups: 2
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1

Group Summary
  Group Value                 Group Size  Group Distance
  1.000000000000000                   15  0.1161904761904762
  2.000000000000000                   15  0.5314285714285718

Results
   Delta Observed = 0.3238095238095240
   Delta Expected = 0.4183908045977012
   Delta Variance = 6.000247458828578E-05
   Delta Skewness = -2.368557930798105

               Standardized test statistic = -12.21013905555642
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 6.262105637131614E-06

Now the resulting P-value (0.0000063) is in line with the results obtained from the data without the single aberrant value. This is a demonstration of the sensitivity of variance (squared Euclidean distance) based statistics and estimates of means to even a single outlying value. Estimates of medians and statistics based on absolute deviations (Euclidean distance) are far less sensitive to outlying data observations (Mielke and Berry 2001).

Here is another example of how it is possible to get varying statistical results by methods that differ in their underlying geometry. The distance and elevation change (in meters) for male and female blue grouse (Dendragapus obscurus) migrating from where they were marked on their breeding range to their winter range are given in the data file BGROUSE.DAT and are plotted in Figure 5 (data from Cade and Hoffman 1993). Generally the males seem to migrate farther and higher than the females and distance moved and elevation change are correlated (r = 0.71).

fig05.gif

To test gender differences in both distance and elevation, the multivariate parametric test is Hotelling's T 2, which gives P = 0.033 for F = 4.145 with df = 2, 18, indicating some evidence of a difference in the bivariate means (males = 13388.9, 493.0; females = 5966.7, 231.66, distance and elevation respectively). To perform a permutation version of Hotelling's T 2, you would issue the following commands:

        >USE BGROUSE.DAT
        >MRPP DIST ELEV * SEX/HOT V = 2 C = 2 EXACT

where the options HOT indicated Hotelling's variance/covariance standardization of the multiple dependent variables, V = 2 requests squared Euclidean distances, and C = 2 requests that groups be weighted by their relative degrees of freedom, and EXACT requests a complete enumeration of all possible permutations for computing P-values.

Here are the results:

                 Exact Multivariate Hotelling-type Permutation Test

 Data Used
            Data File: BGROUSE.DAT
    Grouping Variable: SEX
   Response Variables: DIST, ELEV

Specification of Analysis
   Number of observations: 21
         Number of groups: 2
        Distance exponent: 2.000000000000000
         Weighting factor: (n(I)-1)/sum(n(I)-1) = C(I) = 2

Group Summary
  Group Value                 Group Size
  3.000000000000000                    9
  4.000000000000000                   12

Hotelling's Commensuration Applied to Variable Values.
 Variance/Covariance Matrix written to output file, BGROUSE.OUT

Results
                                    Observed delta = 0.1773306082392447
   Probability (Exact) of a smaller or equal delta = 0.02962950362331167


Variance/Covariance Matrix

                         DIST                      ELEV
DIST...................  1412312380.952381         28404633.33333334
ELEV...................  28404633.33333334         1134020.666666667

Notice that there is little difference between the P-values for the permutation (0.030) and parametric normal theory (0.033) versions of Hotelling's T 2 for this data.

Now if we want to analyze these data in the more natural Euclidean distance space, we can issue the following commands:

        >MRPP DIST ELEV * SEX/EXACT

which uses the default average Euclidean distance of each variable, ignoring the group structure, to standardize the variables so that they have an average pairwise Euclidean distance (Δi, j ) = 1.0. Although distances and elevation changes are in the same units (meters) so that we might consider not commensurating the variables (NOCOM option), there is some correlation between distance moved and elevation change so that it is possible that commensuration will provide more powerful hypothesis tests (Mielke and Berry 1999, 2001). Here are the results:

             Exact Multi-Response Permutation Procedure (EMRPP)

 Data Used
            Data File: BGROUSE.DAT
    Grouping Variable: SEX
   Response Variables: DIST, ELEV

Specification of Analysis
   Number of observations: 21
         Number of groups: 2
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1

Group Summary
  Group Value                 Group Size
  3.000000000000000                    9
  4.000000000000000                   12

Variable Commensuration Summary
   Variable Name             Average Distance (Euclidean if V=1)
   DIST                      9264.761904761905
   ELEV                      279.2285714285715

Results
                                    Observed delta = 1.257456470657243
   Probability (Exact) of a smaller or equal delta = 0.003167420814479638

The same analysis but without any commensuration (NOCOM option) produced a P = 0.008, over twice the size of the above analysis with average Euclidean distance commensuration. Notice that the P-value with the MRPP statistic based on Euclidean distances (V = 1) and average Euclidean distance commensuration is an order of magnitude smaller (P = 0.003) than for the permutation version of Hotelling's T 2(P = 0.030) based on squared Euclidean distances (V = 2) and the variance/covariance commensuration. There are several contributing factors. Notice, that the bivariate medians for males and females in Figure 5 indicated that the centroids of the groups were shifted in the same direction as the correlation between distance (DIST) and elevation change (ELEV). Simulations conducted by Mielke and Berry (1999) demonstrated that the average Euclidean distance commensuration of bivariate variables provided greater power than the variance/covariance standardization when the group structure was shifted parallel to the covariance structure of the 2 variables. Furthermore, since the MRPP comparisons with V = 1 focus on shifts in the bivariate medians which were separated by 9,271.6 m rather than shifts in the bivariate means which were only separated by 7,426.8 m, there was a larger estimated effect size for the Euclidean distance compared to the squared Euclidean distance analysis. For these data, the analysis based on Euclidean distances and bivariate medians was more powerful with greater estimated effect sizes (shift in bivariate medians). When the groups are shifted orthogonal to the covariance structure of the dependent variables, then MRPP analyses with Hotelling's variance/covariance standardization (option HOT) and V = 1 can be more powerful. The bivariate medians for the blue grouse movements in Figure 5 were estimated by giving the following command:

        >MEDQ DIST ELEV*SEX/SAVE

where the SAVE option stores the distance between each observation and its group bivariate median (column labeled DIST2MVM) into a data file (BGROUSE.MQD) that can be used for additional analysis or graphing. The output is:

     2-Dimensional Median and Distance Quantiles

Data Used
          Data File: BGROUSE.DAT
  Grouping Variable: SEX
 # Report Variables: 2
   Report Variables: DIST, ELEV

Specification of Analysis
  Total Number of observations: 21
              Number of groups: 2
-----
Results for Group Value: 3.000000000000000
  Observations in Group: 9
 Iterations to Solution: 90
     Solution Tolerance: 1.600000000000000E-11

 Within Group Median Coordinates for Variables
               Variable Name  Multivariate Median Coordinate
                        DIST  11797.18217464808
                        ELEV  292.2063088726803

2-Dimensional Distance From Median Quantiles:
  Group Average Distance to Multivariate Median: 4260.340114054929
   Quantile                   Distance from Median
   0.00            [Minimum]  109.2426566745297
   0.05000000000000000        109.2426566745297
   0.01000000000000000E+01    109.2426566745297
   0.2500000000000000         1238.643608716985
   0.50             [Median]  2317.091482130417
   0.7500000000000000         5401.297011511038
   0.9000000000000000         17603.33390956653
   0.9500000000000000         17603.33390956653
   1.00            [Maximum]  17603.33390956653


-----
Results for Group Value: 4.000000000000000
  Observations in Group: 12
 Iterations to Solution: 500
     Solution Tolerance: 1.600000000000000E-11

 Within Group Median Coordinates for Variables
               Variable Name  Multivariate Median Coordinate
                        DIST  2526.840164096652
                        ELEV  139.3683620563214

2-Dimensional Distance From Median Quantiles:
  Group Average Distance to Multivariate Median: 5404.589608882267
   Quantile                   Distance from Median
   0.00            [Minimum]  1229.047764062475
   0.05000000000000000        1229.047764062475
   0.01000000000000000E+01    1732.455047809210
   0.2500000000000000         1883.845867196834
   0.50             [Median]  2429.253015341116
   0.7500000000000000         6284.576059213768
   0.9000000000000000         12575.09548984956
   0.9500000000000000         25480.71929234919
   1.00            [Maximum]  25480.71929234919


 Distances to multivariate median were written to labelled file "bgrouse.MQD"

The bivariate median coordinates are given for the 2 variables (DIST and ELEV), and summary quantiles are provided for the distances between observations and the bivariate median for each group. The average distances to the bivariate median differ for males (4,260.3) and females (5,404.6), suggesting that there may be dispersion differences being detected by the MRPP analysis as well as shifts in bivariate medians. It is possible to test for equality of multivariate dispersions using a permutation version of a modification of Van Valen's (1978) test; the effect of the shift in group centroids removed are made with the multivariate medians rather than the multivariate means. This is accomplished for the blue grouse movements by performing a permutation version of the 2-sample t-test on the distances from the bivariate medians (variable DIST2MVM) by sex in the file saved from the previous command:

        >USE BGROUSE.MQD
        >MRPP DIST2MVM * SEX/ V = 2 C = 2 EXACT

The output below suggests there is little statistical support for dispersion differences.

             Exact Multi-Response Permutation Procedure (EMRPP)

 Data Used
            Data File: BGROUSE.MQD
    Grouping Variable: SEX
   Response Variables: DIST2MVM

Specification of Analysis
   Number of observations: 21
         Number of groups: 2
        Distance exponent: 2.000000000000000
         Weighting factor: (n(I)-1)/sum(n(I)-1) = C(I) = 2

Group Summary
  Group Value                 Group Size
  3.000000000000000                    9
  4.000000000000000                   12

Results
                                    Observed delta = 82227845.96039172
   Probability (Exact) of a smaller or equal delta = 0.7081652094035995

Note that tests for equality of univariate dispersions based on the median modification of Levene's test (Good 2000) can also be performed by requesting the univariate medians be calculated for each group with MEDQ, saving the distances from the group medians into a data file, and then comparing those distances (DIST2MVM) with the permutation version of the t-test implemented in MRPP by using the V = 2, C = 2 options. Testing for equality of dispersions after removing the effect of the estimated medians is one of those special cases where tests based on squared deviations (V = 2) have better statistical performance than using Euclidean distances (V = 1).

Because the sample size is only 21 for the blue grouse data, all the examples used the optional EXACT enumeration of all permutations to compute probabilities. This is not practical to do with larger sample sizes and by default MRPP would use the Pearson Type III moments approximation. The following command yields the default approximation:

        >MRPP DIST ELEV * SEX

The output is:

                 Multi-Response Permutation Procedure (MRPP)

 Data Used
            Data File: BGROUSE.DAT
    Grouping Variable: SEX
   Response Variables: DIST, ELEV

Specification of Analysis
   Number of observations: 21
         Number of groups: 2
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1

Group Summary
  Group Value                 Group Size  Group Distance
  3.000000000000000                    9  1.072146525258270
  4.000000000000000                   12  1.396438929704272

Variable Commensuration Summary
   Variable Name             Average Distance (Euclidean if V=1)
   DIST                      9264.761904761905
   ELEV                      279.2285714285715

Results
   Delta Observed = 1.257456470655986
   Delta Expected = 1.512563363155316
   Delta Variance = 0.002706187555240923
   Delta Skewness = -2.097589827330834

               Standardized test statistic = -4.903918527376526
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 0.002983168009908485

Alternatively, we can approximate the probabilities by Monte Carlo resampling with the command:

        >MRPP DIST ELEV * SEX/NPERM = 10000

where the option NPERM specifies that 9,999 random samples + the 1 observed test statistic are to be used to approximate the probabilities. The output is (BGROUSE6.OUT):

                 Multi-Response Permutation Procedure (MRPP)
                           With Resampling

 Data Used
            Data File: BGROUSE.DAT
    Grouping Variable: SEX
   Response Variables: DIST, ELEV

Specification of Analysis
   Number of observations: 21
         Number of groups: 2
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1
       Random Number Seed: 3086554
        Number of Samples: 10000

Group Summary
  Group Value                 Group Size  Group Distance
  3.000000000000000                    9  1.072146525258270
  4.000000000000000                   12  1.396438929704272

Variable Commensuration Summary
   Variable Name             Average Distance (Euclidean if V=1)
   DIST                      9264.761904761905
   ELEV                      279.2285714285715

Results
   Delta Observed = 1.257456470655985

   Probability (Resample) of a smaller or equal delta = 0.004100000000000000

Notice that with these data that exact, Pearson Type III approximation, and Monte Carlo resampling approximation all yield very similar P-values even though sample sizes were only n = 9 and n = 12.

If the data given to Blossom have been rank transformed (substituting the original values by their rank order), then MRPP can be used to emulate some well known nonparametric rank tests. Using ranks combined with the selection of V = 2 and C = 2 produces these analyses. Analyze the data from EXAMPLE4.DAT, which have been rank transformed in the file EX4RANK.DAT, with a permutation version of the Mann-Whitney-Wilcoxon test as follows.

        >USE EX4RANK.DAT
        >MRPP RANK * GROUP /V=2 C=2
                 Multi-Response Permutation Procedure (MRPP)

 Data Used
            Data File: EX4RANK.DAT
    Grouping Variable: GROUP
   Response Variables: RANK

Specification of Analysis
   Number of observations: 30
         Number of groups: 2
        Distance exponent: 2.000000000000000
         Weighting factor: (n(I)-1)/sum(n(I)-1) = C(I) = 2

Group Summary
  Group Value                 Group Size  Group Distance
  1.000000000000000                   15  43.20476190476190
  2.000000000000000                   15  103.1333333333333

Results
   Delta Observed = 73.16904761904762
   Delta Expected = 151.8965517241379
   Delta Variance = 55.34064531480448
   Delta Skewness = -2.572494167782410

               Standardized test statistic = -10.58289223414080
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 4.005684535475250E-05

If there are more than three groups the test is analogous to the Kruskal-Wallis one-way analysis of variance by ranks. Note that both these tests are for univariate data (one response variable), but MRPP also is able to analyze multivariate data (ranked or unranked) as well, offering a generalization of these tests. Further, the approximation used by MRPP is more accurate than the normal approximation used by the classical rank tests, since it uses the skewness of the probability distribution in the Pearson Type III approximation. Of course, it is also possible to approximate the probabilities with the Monte Carlo resampling option. Since these tests use V = 2 and C = 2, they are not congruent with the data space. Use the default values of V and C to produce a congruent analysis. Thus besides generalizing some standard nonparametric tests to multiple dependent variables, MRPP adds congruent Euclidean distance variants to the statistical repertoire.

The TRUNC = # (truncation) option, if given on the MRPP command line, causes the MRPP analysis to replace interobject distances (Δi, j ) greater than the truncation value (call it B) with the truncation value (Δi, j = Δi, j : Δi, j < B; Δi, j = B : Δi, jB). For example,

        >MRPP VAR1 VAR2 * GROUP / TRUNC = 55

will replace distances greater than 55 with 55 in the permutation calculations. This is useful for detecting pattern and group clustering where one (or more) of the groups itself clusters in more than one region of the analysis space and another group is distributed uniformly or randomly in the same space. The truncation value (e.g., 55) specified is the average diameter of the sub-clusters. Data plotting and experimentation with truncation values are advised. Examples where truncation is useful include: One kind of archeological artifact may be found in two distinct areas of a site while another artifact type is found scattered throughout the site. Clumping of plants in a homogeneous site or pattern of habitat types within a landscape are detectable with a truncated MRPP analysis (Reich et al. 1991). For further information see Mielke (1991).

The EXCESS option allows for several comparisons not possible with other statistical procedures. MRPP takes data that, before analysis, are classified into groups. In the usual case the groups represent comparable levels of classification (e.g., male-female; treatments a, b, and c; or before and after observations). But in some cases one of the groups may not be comparable to the other groups of interest. This happens for example when one group is considered miscellaneous or otherwise contains unclassifiable objects. When such a group exists it may, in MRPP, be treated as an excess group. Since the concept of an excess group is not dealt with by most familiar statistical methods, a few examples will help clarify the idea.

In a study of the spatial distribution of artifacts in an archeological site Berry et al. (1983) note that many times artifacts can not readily be classified. A particular artifact may be anomalous, lack sufficient defining characteristics, or be broken or too worn to be classifiable. Such objects are definitely artifacts and may contain information, yet treating such a class on equal footing with other well defined artifact classes seems inappropriate. Investigators usually have the choice of excluding such miscellaneous classes from analysis or including them and risking bias in results or interpretation. MRPP gives the additional choice of including the excess group, but without elevating its status to that of the other groups. The observations of the excess group are treated as background noise, against which the observations on the other groups are analyzed.

Another example of the use of an excess group concerns the presence of higher lead concentration in soils near the center of a city (Mielke et al. 1983). The locations (x and y spatial coordinates) of high concentration soil samples (≥ median) were compared with the locations of all samples, low and high concentration, to determine whether higher concentrations of lead are associated with the city center.

In the excess group MRPP with a group of size n and an excess group of size m an intragroup average distance is computed for each possible combination of n observations out of the n + m possible observations. These values comprise the distribution of the test statistic, delta, to which is compared the actual intragroup distance.

The excess group can be implemented in comparisons of used versus available resources for a particular organism in a design where a random sample of resources is obtained and then presence (used) and absence (unused) observed. The used habitats are alike in that they all share the features necessary for the organism's survival. But the unused habitats may not form such a unitary group, some may be suitable for the organism and just happen not to be used, others may not be suitable at all, and among these some may not be suitable for lack of one requirement and others for lack of another requirement.

Here is an example comparing used versus available blue grouse habitat described by the basal area measurements of four kinds of trees present in stands on winter range (data from Cade and Hoffman 1990). Note that the n = 16 forest stands measured are an exhaustive and exclusive partitioning of the finite population of habitats studied (i.e. no random sampling assumptions apply).

        >USE HABITAT.DAT
        >TITLE Basal Area of Douglas Fir, Juniper, Aspen, and Other
        >MRPP DFIR JUNIP ASPEN OTHER * USE / EXCESS NOCOM

Here are the results:

              Basal Area of Douglas Fir, Juniper, Aspen, and Other
                 Multi-Response Permutation Procedure (MRPP)

 Data Used
            Data File: HABITAT.DAT
    Grouping Variable: USE
   Response Variables: DFIR, JUNIP, ASPEN, OTHER

Specification of Analysis
   Number of observations: 16
         Number of groups: 1
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1

Group Summary
  Group Value                 Group Size  Group Distance
  1.000000000000000                   12  9.168244554831348
  2.000000000000000*                   4*
    * Excess group

Variables are not commensurated

Results
   Delta Observed = 9.168244554831348
   Delta Expected = 10.07816474613700
   Delta Variance = 1.268445315696688
   Delta Skewness = -0.5313032171247737

               Standardized test statistic = -0.8079182672010964
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 0.1994339162492747

In this example the used habitats do not seem to differ (P = 0.200) in tree basal area from the available (i.e., used plus unused) habitats. NOCOM was selected for no variable commensuration because tree basal areas were all in the same units (square meters/ha) and occurred at the same scale (tens of square meters/ha). However, there is some covariation among the basal areas, so commensurating them with the average Euclidean distance may be desirable. Use of the average Euclidean distance here leads to even less difference with an exact P = 0.896.

The ARC = num option allows an analysis to be conducted on univariate circular data such as time or compass orientation. This analysis recognizes that there are no endpoints to the measurement scale. Distances between replicates used in the ARC analyses are the shorter of the 2 possible distances around the circular distribution, i.e. min (|xi - xj| and ARC - |xi - xj|). The ARC = num specifies the number of units in the circular distribution so that input data can be standardized to values on a unit circle. The ARC = num command submits the standardized data to an MRPP program configured for circular distributions.

As an example, consider an analysis of the orientation of movements of striped newts (Notophtalmus peristriatus) immigrating to and emigrating from Breezeway Pond, Florida in 1985 - 1990 (Dodd and Cade 1998). Figure 6 presents the angular orientation of 585 females immigrating to and 564 emigrating from the pond that were captured in pitfall buckets inside and outside of a drift fence surrounding the pond.

fig06.gif

Select the data file and implement the arc-distance analysis with the following commands.

        >USE NPOF.DAT
        >MRPP ANGLE * EI/ ARC=360

The grouping variable EI has 1's for emigrating and 2's for immigrating females. Here are the results of this analysis:

                 Multi-Response Permutation Procedure (MRPP)

 Data Used
            Data File: NPOF.DAT
    Grouping Variable: EI
   Response Variables: ANGLE

Specification of Analysis
   Number of observations: 1149
         Number of groups: 2
        Distance exponent: 1.000000000000000
         Weighting factor: n(I)/sum(n(I)) = C(I) = 1
       ARC distances used: 360.0000000000000
                           Intervals in unit circle

Group Summary
  Group Value                 Group Size  Group Distance
  1.000000000000000                  585  89.17375014635289
  2.000000000000000                  564  89.53178892206139

Results
   Delta Observed = 89.34949763938997
   Delta Expected = 89.74326106931342
   Delta Variance = 0.004177493371728869
   Delta Skewness = -1.969391955507092

               Standardized test statistic = -6.092246885514582
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 0.000796576349446414

The ARC analyses indicated that immigration and emigration orientation of the striped newts differed (P = 0.008). More females immigrated to the northeast and southwest, whereas more emigrated from the southeast and northwest. The arc-distance analyses with MRPP are likely to be better than the more conventional Watson's test, especially useful when comparing circular distributions that have unequal angular variation or that are multimodal (Mielke and Berry 2001). The ARC option in Blossom is intended to be used with any univariate cyclical data (angular orientiation, days of the year, hour of the day); more complicated transformations are possible for spherical data and combinations of scalar and circular data (see Mielke 1986, and Mielke and Berry 2001).

Multiresponse Randomized Block Procedure (MRBP)

Data from a complete randomized block design or data that can be construed in a treatment by block manner can be analyzed by specifying a blocking variable on the MRPP command line. The following data (Mielke and Iyer 1982) are from a mine reclamation study comparing oven-dried biomass (gm) of 3 species of shrubs in 6 treatments (1 = no fertilizer, 2 = low fertilizer, 3 = high fertilizer, 4 = mulch and no fertilizer, 5 = mulch and low fertilizer, and 6 = mulch and high fertilizer) by 3 blocks (different plots). A complete randomized block analysis is done with the following commands:

        >USE MRBP.DAT
        >MRPP SPP1 SPP2 SPP3 * TRTMT * BLOCK

Here are the results of the MRBP analysis with the default multivariable commensuration and block alignment. Note, the original analysis by Mielke and Iyer (1982) did not commensurate or align the data and you can duplicate their analysis by using the options /NOALIGN NOCOM.

Multi-Response Permutation Procedure for Blocked Data (MRBP)

Data Used
            Data file: MRBP.DAT
    Grouping Variable: TRTMT
    Blocking Variable: BLOCK
   Response Variables: SPP1, SPP2, SPP3

Specification of Analysis
   Number of observations: 18
         Number of groups: 6
         Number of blocks: 3
        Distance exponent: 1.000000000000000

Group Summary
  Group Value                 Group Size
  1.000000000000000                    3
  2.000000000000000                    3
  3.000000000000000                    3
  4.000000000000000                    3
  5.000000000000000                    3
  6.000000000000000                    3

Block Alignment Summary
  Block Value               Variable Name             Alignment Value
  1.000000000000000         SPP1                      6.500000000000000
                            SPP2                      3.165000000000000
                            SPP3                      2.170000000000000
  2.000000000000000         SPP1                      9.914999999999999
                            SPP2                      1.165000000000000
                            SPP3                      2.665000000000000
  3.000000000000000         SPP1                      6.250000000000000
                            SPP2                      1.915000000000000
                            SPP3                      2.415000000000000

Variable Commensuration Summary
   Variable Name             Average Euclidean Distance
   SPP1                      7.601503267973862
   SPP2                      3.106928104575162
   SPP3                      0.9005882352941186

Results
   Delta Observed = 1.785190971554857
   Delta Expected = 1.980491196233544
   Delta Variance = 0.02093173163716452
   Delta Skewness = -0.3897419356412425

            Agreement measure among blocks = 0.09861201355002502
               Standardized test statistic = -1.349895544421471
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 0.09499298021013625

The P-value is 0.095, indicating weak evidence to reject the null hypothesis of no treatment effect. The original analysis without commensurating and aligning variables gave P = 0.067. Because of the small number of blocks and treatments it is possible to conduct this analysis by complete enumeration of the permutation distribution by using the option EXACT. This yields P = 0.099. The Monte Carlo resampling approximation also is available for problems with large block and treatment structure.

The data used in the MRBP test have been aligned so that the median of the blocks are all equal. The value chosen to align each block is selected to make the block medians all equal to zero. If there is more than one response variable then Blossom adjusts or commensurates variables by their average Euclidean distance by default as in MRPP. The block alignment values and variable commensuration values are reported.

It is possible to turn off one or both of the alignment and variable commensuration options. The NOALIGN option given anywhere after the slash (/) of the MRPP command produces an analysis without data alignment. The NOCOM option given anywhere after the slash produces an analysis without multivariate commensuration. These options can be important for special applications of MRBP. Here is an example command line:

        >MRPP LENGTH * GROUP * BLOCK / NOALIGN

Of course since only 1 variable, LENGTH, was specified, no variable commensuration is done. This option is especially useful when the blocked design is used not so much to detect treatment effects but to get a measure of the agreement among blocks. One use for this option is numerical model verification. Here blocks contain the predictions of one or more models and one block contains measured results. See Tucker et al. (1989) for details. Agreement measures (1 - observed delta/expected delta) based on Euclidean distances are generalizations of Cohen's kappa extended to multiple groups, multiple variables, and interval data (Berry and Mielke 1988). The agreement measure based on squared Euclidean distances (V = 2) applied to interval data is a linear transform of Pearson's correlation coefficient, i.e., a probability value for a correlation coefficient based on a permutation argument can be obtained.

Here is an example analysis comparing measures of the proportion of basal area to the proportion of canopy cover of lodgepole pine (Pinus contorta) in 31 stands of subalpine forest in Colorado(Fig. 7) (Cade 1997).

fig07.gif

The 31 sample plots are specified by the grouping variable STAND and the proportion of either basal area or canopy cover is specified by the blocking variable METHOD. PCTLCC is the response variable for proportion lodgepole pine.

        >USE AGREE2.DAT
        >MRPP PCTLCC * STAND * METHOD/ NOALIGN

Here are the results of the analysis:

Multi-Response Permutation Procedure for Blocked Data (MRBP)

Data Used
            Data file: AGREE2.DAT
    Grouping Variable: STAND
    Blocking Variable: METHOD
   Response Variables: PCTLCC

Specification of Analysis
   Number of observations: 62
         Number of groups: 31
         Number of blocks: 2
        Distance exponent: 1.000000000000000

Group Summary
  Group Value                 Group Size
  1.000000000000000                    2
  2.000000000000000                    2
  3.000000000000000                    2
  4.000000000000000                    2
  5.000000000000000                    2
  6.000000000000000                    2
  7.000000000000000                    2
  8.000000000000000                    2
  9.000000000000000                    2
  10.00000000000000                    2
  11.00000000000000                    2
  12.00000000000000                    2
  13.00000000000000                    2
  14.00000000000000                    2
  15.00000000000000                    2
  16.00000000000000                    2
  17.00000000000000                    2
  18.00000000000000                    2
  19.00000000000000                    2
  20.00000000000000                    2
  21.00000000000000                    2
  22.00000000000000                    2
  23.00000000000000                    2
  24.00000000000000                    2
  25.00000000000000                    2
  26.00000000000000                    2
  27.00000000000000                    2
  28.00000000000000                    2
  29.00000000000000                    2
  30.00000000000000                    2
  31.00000000000000                    2

Data are not aligned within blocks

Results
   Delta Observed = 0.09431153345841937
   Delta Expected = 0.3061809387052664
   Delta Variance = 0.001219390953615218
   Delta Skewness = -0.08266123749709944

            Agreement measure among blocks = 0.6919745107019715
               Standardized test statistic = -6.067318074132690
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 8.653165493940574E-09

The agreement measure in this analysis (0.692) indicates that there is an average reduction in Euclidean distance between the proportions of basal area and canopy cover that is 69% greater than expected by chance and this differs from zero with P < 0.0001. The observed delta = 0.094 which indicates that the 2 proportionate measures of lodgepole pine differed on average by 0.094 across all 31 stands (Fig. 7). There was good but not perfect agreement between measures of the proportion of basal area and the proportion of canopy cover for characterizing the lodgepole pine contribution to the forest composition. Additional univariate agreement comparisons for subalpine fir (Abies lasiocarpa) and Engelmann spruce (Picea engelmannii) are given in Cade (1997). A multivariate measure of agreement that considers all 3 species simultaneously given in Cade (1997) is performed with the command:

        MRPP PCTSCC PCTFCC PCTLCC * STAND * METHOD/ NOCOM NOALIGN

The results indicate that the average deviation between proportionate measures of basal area and canopy cover is 0.168 (observed delta) across the 31 stands for the 3 conifer species and the agreement measure indicates a 62% reduction in the observed deviation over that expected by chance.

Multi-Response Permutation Procedure for Blocked Data (MRBP)

Data Used
            Data file: AGREE2.DAT
    Grouping Variable: STAND
    Blocking Variable: METHOD
   Response Variables: PCTSCC, PCTFCC, PCTLCC

Specification of Analysis
   Number of observations: 62
         Number of groups: 31
         Number of blocks: 2
        Distance exponent: 1.000000000000000

Group Summary
  Group Value                 Group Size
  1.000000000000000                    2
  2.000000000000000                    2
  3.000000000000000                    2
  4.000000000000000                    2
  5.000000000000000                    2
  6.000000000000000                    2
  7.000000000000000                    2
  8.000000000000000                    2
  9.000000000000000                    2
  10.00000000000000                    2
  11.00000000000000                    2
  12.00000000000000                    2
  13.00000000000000                    2
  14.00000000000000                    2
  15.00000000000000                    2
  16.00000000000000                    2
  17.00000000000000                    2
  18.00000000000000                    2
  19.00000000000000                    2
  20.00000000000000                    2
  21.00000000000000                    2
  22.00000000000000                    2
  23.00000000000000                    2
  24.00000000000000                    2
  25.00000000000000                    2
  26.00000000000000                    2
  27.00000000000000                    2
  28.00000000000000                    2
  29.00000000000000                    2
  30.00000000000000                    2
  31.00000000000000                    2

Data are not aligned within blocks

Variables are not commensurated

Results
   Delta Observed = 0.1681380813821556
   Delta Expected = 0.4408737371890011
   Delta Variance = 0.001552872011627394
   Delta Skewness = -0.08761800087608523

            Agreement measure among blocks = 0.6186253178649301
               Standardized test statistic = -6.921083477581646
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 1.161388232375376E-10

For information on other ways to align data useful for analyzing incomplete block and Latin square designs with MRBP see Fawcett (1990), Mielke and Iyer (1982), and Hodges and Lehmann (1962).

If V = 2 is chosen, then the univariate version of this test is a permutation version of analysis of variance for complete randomized blocks. Note that when V = 2 is used in an MRBP analysis that the blocks are self-aligning to a common mean and no alignment is required; analyses made with MRBP and V = 2 and the option NOALIGN should result in identical test statistics and P-values as when alignment is not turned off. Specification of the C (group averaging method) parameter has no effect, since group sizes have to be the same. Also the EXCESS option is not supported for MRBP and is ignored. The EXACT option is available only for some small block (<10) and group combinations. The Monte Carlo resampling approximation of P-values is available with the option /NPERM = num.

If ranked data are used and V = 2 is specified, then the test (with one response variable) is functionally related to Friedman's nonparametric randomized block analysis.

Permutation Tests for Matched Pairs (PTMP)

Matched pair tests can be performed by the MRPP command. Essentially the matched pairs test is a special case of the randomized block version of MRPP with one or more response variables, two groups, and a blocking variable identifying pairs. Data of this sort can be analyzed by an MRPP command specified just like that for performing an MRBP. For example the sample data file PAIRED1.DAT contains one response (RESPONSE), for two groups (GROUP), and with the paired members of each group indicated by a blocking variable (PAIR). Use this file and perform a matched pairs test by issuing the following command:

        >MRPP RESPONSE * GROUP * PAIR

Here are the results (PAIRED1.OUT):

Multi-Response Permutation Procedure for Blocked Data (MRBP)

Data Used
            Data file: PAIRED1.DAT
    Grouping Variable: GROUP
    Blocking Variable: PAIR
   Response Variables: RESPONSE

Specification of Analysis
   Number of observations: 20
         Number of groups: 2
         Number of blocks: 10
        Distance exponent: 1.000000000000000

Group Summary
  Group Value                 Group Size
  1.000000000000000                   10
  2.000000000000000                   10

Block Alignment Summary
  Block Value               Variable Name             Alignment Value
  1.000000000000000         RESPONSE                  4.275000000000000
  2.000000000000000         RESPONSE                  3.340000000000000
  3.000000000000000         RESPONSE                  6.545000000000000
  4.000000000000000         RESPONSE                  3.070000000000000
  5.000000000000000         RESPONSE                  2.880000000000000
  6.000000000000000         RESPONSE                  8.190000000000000
  7.000000000000000         RESPONSE                  6.105000000000000
  8.000000000000000         RESPONSE                  5.065000000000000
  9.000000000000000         RESPONSE                  2.695000000000000
  10.00000000000000         RESPONSE                  0.7900000000000000

Results
   Delta Observed = 1.211444444444444
   Delta Expected = 1.887222222222223
   Delta Variance = 0.02099246913580247
   Delta Skewness = -1.984239085982351

            Agreement measure among blocks = 0.3580806594053583
               Standardized test statistic = -4.664146087857878
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 0.003420674016044891

With one response variable and V = 2 specified on the command line, then this test mimics the t-test for matched pairs.

Sometimes it is convenient to structure paired data such that the values for each pair are given on a single line in the data file with a separately named variable for the response of the first and of the second members of each pair. Blossom allows for this different data structure. Use the example file PAIRED2.DAT and simply issue the following command:

        >MRPP FIRST SECOND /PAIRED

The PAIRED option signifies that the observations are paired (next to each other) in the data file. Thus the pairing is indicated by position not by a blocking variable. Also no grouping variable is specified because in PTMP there can only be two groups. The univariate observations for each group correspond to the columns named FIRST and SECOND. Note, this is a special data file format useful only for PTMP, which is a univariate, two group, paired comparison, where the number of blocks equals the number of pairs.

Here are the results of the above command:

       Multi-Response Permutation Procedure for Paired Data (PTMP)

Data Used
   Data file: PAIRED2.DAT
   Response Variables (Treatment Groups): FIRST, SECOND

Specification of Analysis
   Number of observed pairs (Blocks): 10
                   Distance exponent: 1.000000000000000

Results
  Delta Observed = 2.422888888891312
                 Number of non-zero differences = 10
  Probability (Exact) of a smaller or equal Delta = 0.003906250000000000
Output was appended to file "PAIRED2.OUT"

Because the number of pairs in this data set is less than 20 the P-value reported was obtained by exact enumeration of the permutation distribution (and thus differs slightly from the P-value given in the previous example). With more than 20 pairs an approximation with the Pearson Type III distribution is used by default or the Monte Carlo resampling option can be invoked with the option /NPERM = num. Notice that the different test statistic structures produce an observed delta in PTMP that is exactly twice the observed test statistic for the same problem in MRBP. Also, data in PTMP are aligned to a median of 0 by the structure of the test statistic.

It is possible to do a 1-sample comparison of data with an hypothesized parameter for central tendency (either median or mean) with PTMP by making one of the column variables equal to the hypothesized parameter and the other the observed data vector (Mielke and Berry 2001). If the hypothesized parameter is a median and PTMP is implemented with V = 1 then this test is for a null hypothesis that the sample comes from a population with median equal to the specified value. If the hypothesized parameter is a mean and PTMP is implemented with V = 2 then this test if for a null hypothesis that the sample comes from a population with mean equal to the specified value.

Mulitvariate extensions of the 1-sample comparison are made by using MRBP and specifying the vector of hypothesized parameters for the multivariate median (mean) as one group, the observed vector as the second group, for each of n blocks comprising the sample. As an example, consider the data on ring-necked pheasant (Phasianus colchicus) habitat selection from Aebischer et al. (1993: Appendix 1), where the proportion of home ranges in 5 habitat types (scrub, broadleaf woodlands, conifer woodlands, grasslands, and crops) for 13 radio-marked birds were compared to the available proportions of these habitat types. Because these data are compositions with a unit sum constraint, Aebischer et al. (1993) chose to analyze these data with log ratios in a MANOVA. We can perform a similar 1-sample analysis comparing the observed proportions of the habitat types for the 13 birds with the hypothesized available proportions in MRBP without resorting to log ratios (which are problematic when you have some zero proportions). The data file PREFER.DAT has 13 blocks (BIRD) for the grouping USE = 1 corresponding to the observations for the 13 birds, and the same 13 block values for the grouping USE = 2 corresponding to 13 replications of the hypothesized available proportions of the habitat types. Issue the commands:

        >USE PREFER.DAT
        >MRPP SCRUB BROAD CONIFER GRASS CROP*USE*BIRD

The following output indicated that the 13 pheasants were not using habitat types in proportion to their availability. Of course, it is possible to do a permutation version of the 1-sample MANOVA analysis on log ratios as done by Aebischer et al. (1993), but the Euclidean distance statistics of MRPP avoid concerns about singular matrices with dependent variables having the unit sum constraint and ad hoc procedures needed to deal with zero proportions when transforming to log ratios.

Multi-Response Permutation Procedure for Blocked Data (MRBP)

Data Used
            Data file: PREFER.DAT
    Grouping Variable: USE
    Blocking Variable: BIRD
   Response Variables: SCRUB, BROAD, CONIFER, GRASS, CROP

Specification of Analysis
   Number of observations: 26
         Number of groups: 2
         Number of blocks: 13
        Distance exponent: 1.000000000000000

Group Summary
  Group Value                 Group Size
  0.000000000000000                   13
  1.000000000000000                   13

Block Alignment Summary
  Block Value               Variable Name             Alignment Value
  1.000000000000000         SCRUB                     11.41000000000000
                            BROAD                     5.600000000000001
                            CONIFER                   0.3650000000000000
                            GRASS                     26.41500000000000
                            CROP                      56.19000000000000
  2.000000000000000         SCRUB                     11.90000000000000
                            BROAD                     11.96500000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     26.61500000000000
                            CROP                      49.14500000000000
  3.000000000000000         SCRUB                     5.770000000000000
                            BROAD                     7.480000000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     55.86499999999999
                            CROP                      30.50500000000000
  4.000000000000000         SCRUB                     6.000000000000000
                            BROAD                     16.54500000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     32.53500000000000
                            CROP                      44.54000000000000
  5.000000000000000         SCRUB                     3.815000000000000
                            BROAD                     19.76000000000000
                            CONIFER                   5.525000000000000
                            GRASS                     53.90500000000000
                            CROP                      16.99000000000000
  6.000000000000000         SCRUB                     4.325000000000000
                            BROAD                     19.87500000000000
                            CONIFER                   5.420000000000000
                            GRASS                     53.38500000000000
                            CROP                      16.99000000000000
  7.000000000000000         SCRUB                     3.780000000000000
                            BROAD                     20.23500000000000
                            CONIFER                   5.875000000000000
                            GRASS                     53.11000000000000
                            CROP                      16.99000000000000
  8.000000000000000         SCRUB                     5.940000000000000
                            BROAD                     23.97000000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     52.72000000000000
                            CROP                      16.99000000000000
  9.000000000000000         SCRUB                     6.430000000000001
                            BROAD                     31.19500000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     45.00000000000000
                            CROP                      16.99000000000000
  10.00000000000000         SCRUB                     7.470000000000001
                            BROAD                     9.025000000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     66.13499999999999
                            CROP                      16.99000000000000
  11.00000000000000         SCRUB                     8.789999999999999
                            BROAD                     20.89500000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     52.94000000000000
                            CROP                      16.99000000000000
  12.00000000000000         SCRUB                     6.460000000000000
                            BROAD                     10.09000000000000
                            CONIFER                   0.3650000000000000
                            GRASS                     66.08000000000000
                            CROP                      16.99000000000000
  13.00000000000000         SCRUB                     4.375000000000000
                            BROAD                     14.65500000000000
                            CONIFER                   2.420000000000000
                            GRASS                     61.55500000000000
                            CROP                      16.99000000000000

Variable Commensuration Summary
   Variable Name             Average Euclidean Distance
   SCRUB                     4.992523076923074
   BROAD                     11.68763076923076
   CONIFER                   2.455846153846161
   GRASS                     15.07083076923078
   CROP                      18.21464615384609

Results
   Delta Observed = 2.238306570824885
   Delta Expected = 2.659843078478799
   Delta Variance = 0.008275984809342826
   Delta Skewness = -1.165049905742928

            Agreement measure among blocks = 0.1584817206190213
               Standardized test statistic = -4.633672664197047
                  Probability (Pearson Type III) of a
                    smaller or equal delta = 0.001209222687833961

We can compute the multivariate median for the proportions of the habitat types used by the 13 pheasants to compare with the hypothesized proportions by issuing the command:

        >MEDQ SCRUB BROAD CONIFER GRASS CROP * USE

The output (PREFER2.OUT) indicated that the multivariate median vector for the proportions of habitats used is shifted towards a much higher proportion of broadleaf woodlands, moderately higher proportions of scrub and conifer woodlands, much lower proportions of crops, with little difference in the proportion of grasslands compared to available habitat types. Note that this summary doesn't recognize the blocked by animal nature of the design and could be made more appropriate by first taking differences between components of used and available habitat types by animal and then taking the multivariate medians of those differences.

     5-Dimensional Median and Distance Quantiles

Data Used
          Data File: PREFER.DAT
  Grouping Variable: USE
 # Report Variables: 5
   Report Variables: SCRUB, BROAD, CONIFER, GRASS, CROP

Specification of Analysis
  Total Number of observations: 26
              Number of groups: 2
-----
Results for Group Value: 0.000000000000000
  Observations in Group: 13
 Iterations to Solution: 1
     Solution Tolerance: 1.600000000000000E-13

 Within Group Median Coordinates for Variables
               Variable Name  Multivariate Median Coordinate
                       SCRUB  3.219999999999999
                       BROAD  9.229999999999999
                     CONIFER  0.7300000000000000
                       GRASS  52.83000000000001
                        CROP  33.98000000000000

5-Dimensional Distance From Median Quantiles:
  Group Average Distance to Multivariate Median: 7.377764055609925E-15
   Quantile                   Distance from Median
   0.00            [Minimum]  0.000000000000000
   0.05000000000000000        0.000000000000000
   0.01000000000000000E+01    7.377764055609925E-15
   0.2500000000000000         7.377764055609925E-15
   0.50             [Median]  7.377764055609925E-15
   0.7500000000000000         7.377764055609925E-15
   0.9000000000000000         7.377764055609925E-15
   0.9500000000000000         7.377764055609925E-15
   1.00            [Maximum]  7.377764055609925E-15


-----
Results for Group Value: 1.000000000000000
  Observations in Group: 13
 Iterations to Solution: 55
     Solution Tolerance: 1.600000000000000E-13

 Within Group Median Coordinates for Variables
               Variable Name  Multivariate Median Coordinate
                       SCRUB  7.616447082537260
                       BROAD  28.67679273133892
                     CONIFER  5.665707319386103
                       GRASS  54.04191899630136
                        CROP  3.987629069463447

5-Dimensional Distance From Median Quantiles:
  Group Average Distance to Multivariate Median: 33.74120131416418
   Quantile                   Distance from Median
   0.00            [Minimum]  6.621274377971053
   0.05000000000000000        6.621274377971053
   0.01000000000000000E+01    7.164311725959368
   0.2500000000000000         10.46614221964336
   0.50             [Median]  30.59714496831886
   0.7500000000000000         33.37217901082544
   0.9000000000000000         83.13692673544415
   0.9500000000000000         96.67827322752424
   1.00            [Maximum]  96.67827322752424

The MRPP Command Syntax

The MRPP command can take different forms depending on the nature of the analysis desired. If you don't understand MRPP consult the references given at the end of this document before attempting to change the default values. Here is the complete MRPP command syntax.

MRPP variable list * grouping variable [(num ...) | (num - num)] [*blocking variable]

[/ V = num | C = num | EXACT | EXCESS [= num] | PAIRED | NOCOM | HOT | NOALIGN | TRUNC = num | ARC = num | NPERM [= num] | SEED = num | SAVETEST [= file name]]

Items to be supplied by the user are given in lower case in italics. These are usually variable names or numbers (num). Items in square brackets are optional. Upper case words or letters are Blossom commands and must be entered exactly as given. The vertical line (|) can be read as "or" and separates different options that can be specified. They can be specified in any order. The optional numbers (num) given in parentheses after the grouping variable name specifies either a list or range of values indicating the groups to be used (these have to be numeric values of the grouping variable). If no values or range is specified, the groups correspond to each unique value of the grouping variable. To analyze blocked data (MRBP) a blocking variable is specified.

The options specified after the slash (/) control technical details of the analysis. The values for V determine the exponent of the distance function. The default is 1 and values other than V = 2 are seldom used. Valid values for C are 1, 2, 3, or 4 and determine how intragroup distances are averaged together. C = 1 is the default and corresponds to relative sample size, C = 2 corresponds to relative degrees of freedom, and the options 3 and 4 are seldom used. If the EXCESS option is specified, the excess group by default corresponds to the cases with the largest value of the grouping variable. This can be changed by adding the appropriate grouping variable value after the EXCESS option. To analyze paired data (PTMP) the PAIRED option is specified. The NOALIGN option is for blocked data analysis; the automatic alignment option is circumvented. The NOCOM option turns off default average Euclidean distance commensuration of multiple response variables. The HOT options specifies Hotelling's variance/covariance commensuration of multiple response variables. The TRUNC = num option is available for grouped but not blocked data. The truncation number (num) gives the maximum object to object distance to be used in the analysis. The ARC = num option provides the units of data in a circular distribution for standardization to a unit circle and inputs standardized univariate data to an arc-distance MRPP analysis. The NPERM option requests a Monte Carlo resampling approximation of P-values rather than the Pearson Type III moments approximation. By default NPERM uses 5,000 random samples but any number may be optionally specified by NPERM = num. The option SEED = num allows the user to specify a random number seed rather than using the default computer clock generated number. The option SAVETEST = file name allows you to save the Monte Carlo resampled test statistics into a column in the specified file, where the first value is always the observed test statistic.

Here are some more examples of valid MRPP command lines. They show how to perform an analysis using a subset or range of the groups indicated by the grouping variable and how to specify a particular group as the excess group. If not specified the excess group is always the group with the largest value on the grouping variable.

        >MRPP W X Y Z * GROUP (2 3 6) / EXCESS

The groups used are confined to a subset of the values of the grouping variable and group 6 will be the excess group.

        >MRPP VAR1 VAR2 * GP (3-8) / EXCESS = 1

The groups used are confined to a range of group values and the excess group is indicated by its value (1). The excess group value can be in the grouping variable list or range or not. For example the following three command lines produce the same analysis.

        >MRPP VAR1 VAR2 * GP (3-7) / EXCESS=8
        >MRPP VAR1 VAR2 * GP (3-8) / EXCESS
        >MRPP VAR1 VAR2 * GP (3-8) / EXCESS=8

Terse output provided by the MRPP command following an OUTPUT /TERSE command includes the USEd file name, dependent variable names, grouping variable name, number of groups, blocking variable name (if present), observed test statistic, and P-value.

Multivariate Medians and Distance Quantiles (MEDQ) Command Syntax

The MEDQ command can estimate multivariate and univariate medians for grouped or ungrouped data and optionally save the distances between the observations and the medians in a file. MEDQ is intended to provided summary statistical estimates that are useful for describing group differences detected by MRPP comparisons.

MEDQ variable list [* grouping variable [(num ...) | (num - num)]]

[/ SAVE | QUANT = num, num, ..., num]

By default MEDQ estimates the multivariate medians (or univariate) for the variables specified in the list ignoring any group structure. If the optional group variable is specified then MEDQ computes similar estimates but by each group. Options for selecting subsets of a grouping variable work similar as in the MRPP command. If the SAVE option is specified then a file with distances between observations and estimated medians are saved to a file named with your filename and the extension .MQD. A column variable named DIST2MVM that is the distance to multivariate (or univariate) medians is stored, along with the values for variables selected, and values of any grouping variables specified for each observation. These values can be useful for graphical exploration and for conducting tests of equivalent dispersions. The option QUANT = num, num, ..., num allows you to specify values other than the default quantiles (min = 0.0, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, max = 1.0) for summarizing distances to multivariate medians. Note that when you specify only a single response variable, the QUANT option also allows you to request specific univariate quantiles to be estimated as well as the default univariate median.

Terse output provided by the MEDQ command following an OUTPUT /TERSE command is identical to the default verbose output.


[Statistical Commands]
[MRPP]  [MEDQ]  [MRBP]  [PTMP]  [MRPP Syntax]  [MEDQ Syntax]  [MRSP]  [SP Syntax]  [LAD]  [Regression Quantiles]  [LAD Syntax]  [OLS]  [OLS Syntax]  [COV]  [COV Syntax]

[Contents]  [Introduction]  [Overview]  [Preparing]  [Data]  [General]  [Statistics]  [References]  [Appendices]