(PDF) Logistic Modeling to Spatially Pre

Logistic Modeling to Spatially Predict the Probability of Soil Drainage Classes P. Campling,* A. Gobin, and J. Feyen ABSTRACT cially in regions that are sparsely surveyed (Gessler et Logistic models were developed to spatially predict the probability al., 1995). of drainage classes in a humid tropical area (58 900 ha) using sampled Predicting soil drainage conditions is important for terrain attributes from a digital elevation model, and vegetation indi- both crop and hydrological modeling. Soil moisture con- ces from a LANDSAT-5 Thematic Mapper image. Soil drainage ditions will influence the selection and performance of classes were assigned on the basis of the local water table regime crops and will determine the propensity for rainfall- depth, determined by soil morphological indicators, to 295 pseudo- runoff. randomly selected soil auger hole observations (calibration data set) Bell et al. (1994) used multivariate discriminant analy- and 72 soil pedon observations (validation data set). Six drainage sis to relate soil drainage class to eight landscape param- classes were identified: excessively (D1), well (D2), moderately well eters in Pennsylvania, USA. Soil drainage probability (D3), imperfectly (D4), poorly (D5), and very poorly (D6). A nested maps were derived, and agreed at a level of 67% to dichotomous modeling strategy of progressively separating the six drainage classes was adopted, and resulted in five multivariate logistic the published soil drainage map. Sader et al. (1995) models. The best performing model, predicting the probability of combined LANDSAT Thematic Mapper and GIS rule- nonhydric (D1D2) soils versus hydric (D3D4D5D6) soils had a con- based methods to delineate forest wetlands in Maine. cordance of 99%, and the worst performing model, predicting the They concluded that a combination of hybrid and GIS probability of imperfectly (D4) drained soils versus moderately well rule-based classification methods were the most appro- (D3) drained soils had a concordance of 65%. The most important priate. Cialella et al. (1997) on the other hand, combined spatial determinants were: elevation, slope, distance-to-the-river chan- normalized difference vegetation index (NDVI) data nel (DC), and vegetation indices. The logistic models were combined from advanced visible and infrared imaging spectrome- in a geographic information system (GIS) to derive soil drainage class ter and digital terrain attributes to predict soil drainage maps using the gridded data sets of the significant variables. The class at a 6 by 4 km research site in Maine. Classification results showed that digital elevation models and vegetation indices tree analysis, based on binary recursive partitioning of from LANDSAT-5 Thematic Mapper provide complementary infor- predictor variables, was used to arrive at increasingly mation for developing statistical models to spatially predict and map homogeneous regions of drainage class areas. Merot et soil drainage classes. al. (1995) simply compared the mapped compound topo- graphic index with a published soil drainage class map on the basis of a contingency table in Brittany, France. T here is growing need to develop models that are able to predict the spatial extent of soil characteris- tics at a scale appropriate for environmental manage- They found that the topogaphic index was able to predict the distribution of intensely waterlogged soils, provided that the relationship with bedrock type was established. ment. Soil-landscape modeling aims to develop explicit, At the soil-landscape level, Hurt and Brown (1995) iden- quantitative, and spatially realistic models of the soil tified hydric soil indicators that fit the definition of hydric landscape continuum. Statistical models are used to in- soils and enabled the estimation of seasonal high water vestigate the correlation between soil characteristics (re- tables for all soils in Florida. Thompson et al. (1997) sponse variable) and terrain attributes and vegetation developed the profile darkness index (PDI) to assist in indices (explanatory variable) at field sampling points, delineating the areal extent of hydromorphic soils. In a and then spatial prediction is achieved by incorporating later paper, Thompson et al. (1998) emphasized the im- statistical models with gridded sets of significant explan- portance of landscape perspective when examining the atory variables. The underlying assumption is that the relationships among soil hydrology, hydric soil conditions, development of soil toposequences, and hence soil attri- and soil hydromorphic properties, as pedogenic processes butes, is a response to the way water moves through and are not restricted to a vertical soil profile. over the landscape (Moore et al., 1993), and that vegeta- We hypothesize that there is a relationship between tion indices provide indicators of prevailing edaphic fac- drainage class and terrain and vegetation. The objective tors (Levine et al., 1994). The increasing availability of was to spatially predict soil drainage classes in tropical spatially continuous explanatory variables from digital soils by expressing the relationship in a statistical model. elevation models and satellite images encourages the Terrain is expressed by digital terrain attributes and development of this approach in favour of the interpola- vegetation by vegetation indices from a LANDSAT-5 tion of sampled data by kriging or triangulation, espe- Thematic Mapper Image. As drainage class is a categori- cal response variable, logistic modeling is used to de- velop the predictive model, which is validated against P. Campling, A. Gobin, and J. Feyen, Lab. for Soil and Water, Faculty drainage classes assigned to reference pedons. The vali- of Agricultural and Applied Biological Sciences, Katholieke Universi- teit Leuven, Vital Decosterstraat 102, Leuven, B-3000, Belgium. Re- ceived 29 July 2000. *Corresponding author (Paul.Campling@sadl. Abbreviations: AIC, Akaike’s information criterion; DC, distance- kuleuven.ac.be). to-river channel; ESRI, Environmental System Research Institute; GIS, geographical informations system; NDVI, normalized difference Published in Soil Sci. Soc. Am. J. 66:1390–1401 (2002). vegetation index; PDI, profile darkness index; SC, Schwarz Criterion. 1390 CAMPLING ET AL.: LOGISTIC MODELING TO PREDICT SOIL DRAINAGE CLASS PROBABILITY 1391 Fig. 1. Study area with the position of the random auger hole observations (calibration data set), the reference profile pits (validation data set), and the local geology (1 ⫽ Udi Nsukka Cuesta, 2 ⫽ Cross River Plains). dated statistical models are combined to derive drainage tor was used to select 295 random 200 ha pixels in the study class maps for the entire case study area and the accu- area (Fig. 1). Observations were taken at a point within the racy of maps are determined. grid that was both accessible and representative for the land- scape, based on a previous soil profile survey (Gobin et al., MATERIALS AND METHODS 2000a). The global positioning satelite (GPS) was used to record the position of the selected observation point. Soil Study Site augers cores were described using the guidelines provided by The study area (58 900 ha) traverses a section of the sand- FAO (1990). The pseudo-random sampling strategy means stone Udi Nsukka Cuesta and the shale Cross River Plains that the relative frequencies of soil drainage classes are unbi- (Fig. 1). The region is located in the transition zone between ased estimators of class probability. the Guinea-Congolian and Guinea Savanna ecoclimatological Drainage classes were assigned on the basis of the depth regions of South Eastern Nigeria. The mean annual precipita- and the duration of the local water table regime (Fig. 2), tion is 1577 mm (max. ⫽ 2146 mm, min. ⫽ 901 mm) with a which was assessed by examining the soil texture and the type, distinct dry season between November and March. Soils in degree, and depth of soil mottling and gleying in the sampled the area exhibit contrasting soil drainage characteristics de- soil. Gleyic properties occur when soils become completely pending on the soil texture and the landscape position (Table saturated with groundwater for a period that allows reducing 1; Gobin et al., 2000b). Hydric soils are defined as soils that conditions occur and show a gleyic color pattern (FAO, 1998). form under conditions of saturation, flooding, or ponding long The gleyic colour pattern reflects the preponderence of oxi- enough to develop anaerobic conditions in the upper part morphic and reductomorphic properties. Oximorphic proper- (Federal Register, 1994). In lowland shale areas (lower in- ties reflect alternating reducing and oxidizing conditions, and terfluve), high soil water tables occur during the wet season. are expressed by reddish brown (ferrihydrite) or bright yellow- The occurrence of plinthite and redoximorphic features indi- ish brown (goethite) mottles. Reductomorphic properties, on cate prolonged inundation (Table 1). In more upland shale the other hand, reflect permanently wet conditions and are areas (upper interfluve), the presence of ferric properties expressed by neutral (white to black: N1/to N8/) or bluish to (Food and Agriculture Organization [FAO], 1998) is because Table 1. Relation between landform and soil according to U.S. of lateral subsurface flow and fluctuating water tables, respec- Soil Taxonomy (Soil Survey Staff, 1998) and the World Refer- tively. Where ironstone occurs, usually at crests or at the top ence Base for Soil Resources (WRB) (FAO, 1998). Only the of slopes leading to incised streams, the high percentage of dominant soils are presented. lag gravel increases infiltration and drainage. In the sandstone area, water tables are very deep (200–300 m) and the soils are Landform Subgroup (Soil Taxonomy) Soil Subunit (WRB) nonhydric (Landon, 1991). At the steep slopes of the escarp- Plateau Typic or Rhodic Kandiustox Areni-Acric Ferralsol ment, the weak crumbly structure of the Quartzipsamments Escarpment Ustoxic Quartzipsamments Rubi-Ferralic Arenosol encourages excessive drainage. On the plateau, however, the Residual Hills Typic Ustorthents Episkeleti-Humic Regosol Upper interfluve Ustic Kandihumults Umbri-Humic Acrisol better-structured Kandiustox retain water for longer periods. Ustic Kandihumults Profondi-Vetic Acrisol Typic Kanhaplustults Ferri-Abruptic Acrisol Soil Sampling Lower interfluve Typic Plinthaquults, Gleyi-Orthiplinthic Acrisol Plinthustults, Plinthohumults A pseudo-random soil sampling strategy was conducted at Floodplain Aeric Endoaquents Orthiplinthi-Umbric Gleysol Riverbank Oxyaquic Quartzipsamments Epidystri-Arenic Fluvisol a density of one sample per 200 ha. A random grid-cell genera- 1392 SOIL SCI. SOC. AM. J., VOL. 66, JULY–AUGUST 2002 Fig. 2. Soil drainage classes according to water regime, with associated soil morphological attributes (adapted from Landon, 1991). greenish (2.5Y, 5Y, 5G, 5B) colors in ⬎95% of the soil matrix. digitized contour lines (at 50 interval), stream lines (scale— Field evidence indicated that origin of the gleyic properties 1:50 000), lake polygon coverages (scale—1:50 000), and spot in the study area was from a combination of both run-on and heights. Preprocessing was carried out on the river lines to groundwater, termed as amphigley by Brinkman and Blok- ensure that all the arcs had a down slope direction, that braid- huis (1986). ing was not represented in the valley bottomlands, and that Six classes were used to define soil drainage characteristics: a flow line passed through lake features. The DEM was pro- excessive (D1), well (D2), moderately well (D3), imperfectly cessed iteratively to ensure that sinks were progressively re- (D4), poorly (D5), and very poorly (D6). The class limits moved. specified in Fig. 2 were used to determine the appropriate In early work on quantifying land surface topography, Zev- drainage class for field observations. A model validation data enbergen and Thorne (1987) fit a nine-term quadratic polyno- set was an independent soil survey of 72 reference profile pits mial to a moving 3 by 3 square grid network (Fig. 3) to derive (Gobin et al., 2000b; Fig. 1), which were assigned drainage terrain attributes: classes using the same criteria as in Fig. 2. z ⫽ Ax2y2 ⫹ Bx2y ⫹ Cxy2 ⫹ Dx2 ⫹ Ey2 Digital Terrain Modeling ⫹ Fxy ⫹ Gx ⫹ Hy ⫹ I [1] The ANUDEM program (Hutchinson, 1989), was used to Slope (␤), plan curvature (␾), profile curvature (␻) curvature create a 25-m resolution digital elevation model (DEM) from (␹), and aspect (␺) grids were derived according to: Fig. 3. The 3 by 3 grid elevation submatrix (Zevenbergen and Thorne, 1987). CAMPLING ET AL.: LOGISTIC MODELING TO PREDICT SOIL DRAINAGE CLASS PROBABILITY 1393 Slope ␤ ⫽ arctan[G2 ⫹ H2)0.5] [2] ␮m) were processed as they usually correspond best with the spectral characteristics of vegetation (Mather, 1995). DG2 ⫹ EH2 ⫺ FGH The NDVI was derived from the reflected solar radiation Plan Curvature, ␾ ⫽ ⫺2 [3] in the infrared and red wavelength bands via the algorithm G2 ⫹ H 2 (Tucker, 1979): DH2 ⫹ EG2 ⫺ FGH IR ⫺ R Profile Curvature, ␻ ⫽ 2 [4] NDVI ⫽ [12] G2 ⫹ H 2 IR ⫹ R Tangential Curvature, ␹ ⫽ ␻ ⫺ ␾ ⫽ 2E ⫹ 2D [5] Where IR represents the near-infrared (Band 3: 0.76–0.90 ␮m) and R the red range of reflectance (Band 4: 0.63–0.69 ␮m). Alternative vegetation indices were constructed for other Aspect, ␺ ⫽ 180 ⫺ arctan 冢HG冣 ⫹ 90 冢|G| G 冣 [6] band combinations: MIR(B5) ⫺ NIR Aspect was converted to radians. Contributing area, Aj, using VIB5B4 ⫽ [13] the D8 algorithm (Jenson and Dominigue, 1988) indicates the MIR(B5) ⫹ NIR upslope area that flows into each cell: Where MIR(B5) represents the middle-infrared (Band 5: nj 1 1.55–1.75 ␮m) and NIR is the near-infrared (Band 4: 0.63– Aj ⫽ b 兺 ai [7] 0.69 ␮m). i⫽1 where ai is the grid cell area, nj is the number of grid cells MIR(B7) ⫺ MIR(B5) VIB7B5 ⫽ [14] draining into grid cell j, and b is the contour width approxi- MIR(B7) ⫹ MIR(B5) mated by the grid-cell resolution. Three secondary terrain attributes in environmental model- Where MIR(B7) represents the middle-infrared (Band 7: ing are the compound topographic index (Kirkby, 1975; 2.08–2.35 ␮m). For visualization purposes, all vegetation indi- O’Loughlin, 1981), the stream power index (Moore et al., ces were converted to 8 bit-scale with a two standard deviation 1991) and the slope-aspect index (␸): contrast stretch. ␭i ⫽ ln 冢tanA ␤ 冣 j i [8] Logistic Model Calibration Logistic modeling is a generalized linear modeling tech- ⍀i ⫽ ln(Aj tan ␤i) [9] nique that predicts the binary or ordinal response of categori- cal variables. A nested dichotomous modeling strategy of pro- gressively separating the six drainage classes was adopted (Fig. ␸i ⫽ tan ␤i ⫻ ␺ [10] 4) to predict the probabilities of soil drainage classes. Univari- The compound topographic index indicates the possible pro- ate and multivariate logistic models (Hosmer and Lemeshow, pensity of areas of the landscape to develop saturated condi- 1989) were constructed to define the independent variables tions based on the shape of the topographic surface. The that were important to predict the probability of drainage stream power index is a measure of the erosive power of water class. The binary logistic model takes the following form: across the landscape, and the slope-aspect index provides an approximate method of examining the spatial distribution of radiation across a catchment (Moore et al., 1988). Prior to creating the secondary terrain attributes, the slope gradient grid was checked for zero values, which would pro- duce spurious results. Zero values were replaced by the mean value of the eight surrounding cells. The distance-to-river-channel grid, DCi, is a proximity attri- bute that provides information on the relative position of cells to the valley bottoms. The river channel grid is the digitized stream lines (from the 1:50 000 topographic map) converted to 25-m grids. The euclidian distance from each cell i to location j on the river channel grid is: DCi ⫽ minj (dij) [11] Vegetation Indices from LANDSAT-5 Thematic Mapper A LANDSAT-5 Thematic Mapper image was obtained on a cloud-free day in January 1987 when haze was minimal over the study area. The image was destriped and geometrically referenced to 1:50 000 topographic maps (UTM) using resam- pling techniques to give a 25-m raster image. The advantage of using a dry season image of the area is that the vegetation present is a clear indicator of the seasonal variation in edaphic factors across the landscape. The visible red (Band 3: 0.63–0.69 ␮m), near-infrared (Band 4: 0.76–0.90 ␮m), and middle-infra- Fig. 4. Nested dichotomous modeling strategy for deriving drainage red channels (Band 5: 1.55–1.75 ␮m and Band 7: 2.08–2.35 classes (D1, D2, D3, D4, D5, and D6). 1394 SOIL SCI. SOC. AM. J., VOL. 66, JULY–AUGUST 2002 Table 2. Summary statistics for predicting the probability of non- hydric soils for models NONHYDRICMOD, EXCESSMOD, logit[Pr(Y ⫽ 1|x)] ⫽ ln 冤1 ⫺Pr(Y Pr(Y ⫽ 1|x)冥 ⫽ 1|x) POORMOD, VPOORMOD, and IMPERFMOD. ⫽ Bk⬘ xk ⫽ ␤0 ⫹ ␤1x1 .... ⫹ ␤kxk [15] Models Univariate Variable† AIC‡ SC§ ⫺2lnL P ⬎ ␹2 ␥ Where Pr is probability, Y is the response variable drainage NONHYDRICMOD class, xk is a p ⫻1 matrix of 1 followed by (p-1) predictor Intercept 410.0 413.6 407.9 · · variables, Bk is a p ⫻ 1 matrix of intercepts and slope coeffi- NDVI 407.8 415.2 403.8 0.0429 0.101 cients (␤). VIB5B4 393.3 401.2 389.8 0.0001 0.279 VIB7B5 410.6 418.0 406.6 0.2401 0.101 The first logistic model, NONHYDRICMOD, predicted ⍀ 407.6 415.0 403.6 0.0386 0.198 non-hydric (Y ⫽ 1) versus hydric (Y ⫽ 2) drainage classes, ␭ 411.8 419.2 407.8 0.7157 0.017 DC 88.6 95.0 84.6 0.0001 0.977 where nonhydric combines D1 and D2 and hydric combines Z 68.9 76.3 64.9 0.0001 0.981 D3, D4, D5, and D6 (Fig. 4). The second logistic model, EX- Tan␤ 400.6 407.9 396.6 0.0007 0.144 ␻ 410.2 417.6 406.2 0.1804 ⫺0.11 CESSMOD, used only the nonhydric observations, to distin- ␾ 411.4 418.7 407.4 0.4302 ⫺0.11 guish between the excessive (Y ⫽ 1) and well (Y ⫽ 2) drained ␹ 410.1 417.4 406.1 0.1665 ⫺0.18 classes. The third logistic model, POORMOD (Fig. 4), distin- A 411.1 418.5 407.1 0.3451 0.038 ␸ 404.8 412.1 400.8 0.0072 0.059 guished between poorly (D5) and very poorly (D6) drained ␺ 406.7 414.0 402.7 0.0213 0.148 soils combined together (Y ⫽ 1), and moderately well (D3) Multivariate 60.1 78.6 50.1 0.0001 0.993 EXCESSMOD and imperfectly (D4) drained soils combined together (Y ⫽ Univariate 2). The fourth logistic model, VPOORMOD, distinguished Intercept 188.6 191.5 186.6 · · NDVI 170.7 176.7 166.8 0.0001 0.466 between very poorly D6 (Y ⫽ 1), and poorly D5 (Y ⫽ 2) VIB5B4 145.4 151.3 141.5 0.0001 0.650 drained soils, and the fifth logistic model, IMPERFMOD, VIB7B5 157.8 163.7 153.8 0.0001 0.551 distinguished between imperfectly D4 (Y ⫽ 1) and moderately ⍀ 190.4 196.3 186.4 0.6792 0.178 ␭ 141.3 147.2 137.3 0.0001 0.678 D3 (Y ⫽ 2) drained soils. DC 168.1 174 164.1 0.0001 0.458 Prior to producing a multivariate model, collinearity was Z 170.9 176.7 166.9 0.0001 0.394 Tan␤ 126.7 132.6 122.7 0.0001 0.695 examined on the basis of parameter tolerance. Each predictor ␻ 185.5 191.3 181.5 0.0235 0.106 variable was regressed on all other variables in a weighted ␾ 188.3 194.2 184.3 0.1324 0.165 ␹ 184.5 190.4 180.5 0.0136 0.110 least square regression (i.e., adjusted by the weight matrix A 182.5 188.4 178.5 0.0045 0.432 used in the maximum likelihood algorithm). The tolerance ␸ 142.7 148.6 138.7 0.0001 0.636 ␺ 184.5 190.4 180.5 0.0137 0.273 was computed as (1 ⫺ R2 ) and variables with values below Multivariate 104.9 116.6 96.9 0.0001 0.798 0.4 were removed until a satisfactory combination was ob- POORMOD tained. The multivariate solution was fitted on the retained Univariate Intercept 159.4 162.4 157.4 · · variables using a stepwise approach where thresholds of the NDVI VIB5B4 161.2 158.0 167.3 164.1 157.2 154 0.6929 0.0663 0.050 0.180 residual ␹2 were 0.15 for entry and 0.10 for removal. The VIB7B5 154.8 160.8 150.8 0.0100 0.284 likelihood ratio statistic (⫺2 lnL), based on the maximum ⍀ 161.2 167.2 157.2 0.6274 ⫺0.05 likelihood estimates of Bk (where L ⫽ l [Bk]), has an asymptotic ␭ 155.1 161.1 151.1 0.0118 0.269 DC 138.0 144.0 133.9 0.0001 0.579 ␹2-distribution with p degrees of freedom (where p is the Z 121.0 127.0 117.0 0.0001 0.658 number of estimated parameters ␤) under the global null Tan␤ 160.2 166.3 156.2 0.2968 0.292 ␻ 160.3 166.4 156.3 0.2832 0.136 hypothesis that all parameters (␤) equal zero. The likelihood ␾ 161.3 167.4 157.3 0.7942 0.008 ratio statistic (⫺2 lnL) was used to examine the significance ␹ 160.5 166.6 156.5 0.3480 0.172 A 155.7 161.9 151.8 0.0179 0.063 of individual models and to compare competing models in ␸ 160.3 166.4 156.3 0.2904 0.122 conjunction with Akaike’s Information Criterion (AIC) and ␺ 155.2 164.3 154.3 0.0762 0.063 Multivariate 102.4 120.6 90.4 0.0001 0.818 Schwarz Criterion (SC). Both the AIC and SC are based VPOORMOD on the likelihood ratio statistic but also take the number of Univariate observations into account. The Wald test was obtained by Intercept 46.2 47.7 44.2 · · NDVI 39.1 42.0 35.1 0.0025 0.611 comparing the maximum likelihood of each individual slope VIB5B4 37.3 40.2 33.3 0.0009 0.639 parameter (␤) to an estimate of its standard error, and was VIB7B5 37.7 40.6 33.7 0.0011 0.647 ⍀ 39.6 42.5 35.6 0.0032 0.533 used to keep individual variables in the model applying a ␹2- ␭ 46.7 49.6 42.7 0.2178 0.012 criterion of P ⬍ 0.10. The exponent of the slope coefficients, DC 47.3 50.3 43.3 0.3451 0.568 Z 38.7 41.6 34.7 0.0020 0.647 the odds ratio (e␤ ), represents the percentage of change in Tan␤ 43.9 46.8 39.9 0.0365 0.302 the logit for a change of one unit of the independent variable ␻ 43.9 46.8 39.9 0.0369 0.488 ␾ 48.2 51.1 44.2 0.8699 0.296 (x ). The association between predicted probabilities and ob- ␹ 44.8 47.8 40.8 0.0646 0.409 served responses was examined using percentages of concor- A 43.3 46.3 39.4 0.0270 0.319 dant pairs, discordant pairs and the rank correlation index ␥ ␸ 42.1 45.0 38.1 0.0131 0.460 ␺ 47.2 50.2 43.2 0.3160 0.230 to summarize goodness-of-fit, according to: Multivariate 23.1 29.0 15.1 0.0001 0.929 IMPERFMOD C⫺D Univariate ␥⫽ [16] Intercept 171.1 174.0 169.1 · · C⫹D NDVI 172.1 177.7 168.1 0.3000 0.083 VIB5B4 169.7 175.3 165.4 0.0626 0.195 VIB7B5 171.5 177.1 167.5 0.2028 0.133 Where C is the number of concordant pairs and D is the ⍀ 170.5 176.1 166.5 0.1042 0.182 number of discordant pairs. A pair of observations is concor- ␭ 173.1 178.8 169.1 0.9286 ⫺0.04 dant if the predicted event probability is lower for an observa- DC 172.1 177.8 168.2 0.3201 0.129 Z 172.3 177.9 168.3 0.3485 0.095 tion with a lower-ordered value of the observed response. Tan␤ 169.1 174.7 165.1 0.0436 0.248 ␻ 171.5 177.1 167.5 0.2009 ⫺0.084 ␾ 172.4 178.1 168.5 0.4052 ⫺0.02 ␹ 171.5 177.1 167.5 0.2014 ⫺0.05 Logistic Model Validation A 172.8 178.4 168.8 0.5576 ⫺0.353 ␸ 171.1 176.7 167.1 0.1484 0.232 The models were used to compute predicted probabilities ␺ 172.4 178.0 168.4 0.3822 0.103 and to classify the 295 observations into one of the response Multivariate 164 175.3 156.0 0.0001 0.390 CAMPLING ET AL.: LOGISTIC MODELING TO PREDICT SOIL DRAINAGE CLASS PROBABILITY 1395 levels following the nested modeling procedure. A bias-adjusted between different univariate models indicated that ele- classification table was used to determine cut-off probabilities vation (Z) was the best variable to predict nonhydric (SAS, 1990). Threshold probabilities were determined for soils, followed closely by the DC (Table 2). Univariate each model on the basis of percentages correctly classified, false positive and false negative observations. For each model models using VIB5B4 and tan ␤ were significant but had the probability level at which a maximum percentage of cor- low measures of association (Table 2). These covariates rectly classified observations was obtained and used to classify were included in the multivariate model, after checking the validation dataset. Validation of the fitted models and for collinearity, which achieved a very high association respective cut-off probabilities was conducted by classifying value (␥ ⫽ 0.99). The odds were positively associated the validation observations according to the best performing (odds ratio ⬎1) with Z, DC, and tan␤, and were nega- multivariate models. Predictors incorporated into the best per- forming models are spatially referenced drainage-class deter- tively associated (odds ratio ⬍1) with VIB5B4 (Table minants. 3). The odds ratio of DC was 1.546, which meant that one unit (the unit being 1000 m, Table 3), increase in Drainage Class Probability Mapping distance from the river channel increased the odds by To produce drainage class probability maps Eq. [15] was 55% of the soil being nonhydric. The areas that were rewritten as: increasingly further away from the river channel were therefore more likely to have nonhydric soils. 1 EXCESSMOD (Fig. 4) predicted the probability of Pr(Y ⫽ 1|x) ⫽ [17] 1 ⫹ exp [(⫺1)(Bkxk)] excessively drained soil drainage class (D1). A compari- Using Arc/Info GRID (Environment Systems Research Insti- son between different univariate models (Table 2) indi- tute [ESRI], 2000), the logistic equation (Eq. [17]) was applied cated that slope (tan␤) was the best variable to predict with input from the analysis of maximum likelihood estimates excessively drained soils, followed closely by the com- for each multivariate binary logistic model to the gridded data pound topographic index (␭), the slope-aspect index sets of significant variables. The mapping procedure followed the nested modeling strategy (Fig. 4), so that, for example, (␸), and VIB5B4 (Table 2). As both ␭ and ␸ were the EXCESSMOD algorithm was only applied to the nonhy- collinear to tan␤, these covariates were not included dric soils area, determined by the NONHYDRICMOD algo- in the multivariate model. The covariates meeting the rithm. The probability thresholds resulting in the highest num- stepwise 0.15 entry, 0.10 exit criteria were tan␤, VIB5B4, ber of correctly predicted grid cells were used to define the and DC (Table 2). The odds were positively associated drainage class maps. To assess the overall accuracy of the (odds ratio ⬎1) with tan␤, and negatively associated drainage maps, Cohen’s kappa (␬) statistic (SAS, 1990) was calculated, which indicates the similarity between actual and (odds ratio ⬍1) with DC and VIB5B4 (Table 3). The predicted drainage classes. tan␤ odds ratio of 1.296 indicated that, as slopes in- creased there was a higher probability of excessively RESULTS drained soils occurring. The VIB5B4 and DC variables, on the other hand, showed that an increase of one unit Logistic Model Calibration of these variables would decrease the likelihood of ex- NONHYDRICMOD predicted the probability of cessively drained soils. All three variables, therefore, nonhydric soil drainage classes (Fig. 4). A comparison demonstrated the high association between the escarp- Table 3. Analysis of maximum likelihood estimates for multivariate binary logistic models predicting the probability of drainage classes (Odds ratio is e␤ for Z, ␸, ␭ and ␺, e0.01⫻␤ for Tan␤, VIB5B4, VIB7B5, and ␹, e1000⫻␤ for DC). Odds Odds Ratio Model Variable† ␤ SE (␤) Wald ␹2 P ⬎ ␹2 Ratio 95% CI NONHYDRICMOD Intercept ⫺15.207 3.2243 21.72 0.0001 · · Z 0.0289 0.00815 12.629 0.0004 1.029 1.013–1.046 Tan␤ 16.92 7.361 5.2865 0.0215 1.184 1.025–1.368 DC 0.00044 0.000187 5.4419 0.0197 1.546 1.072–2.31 VIB5B4 ⫺16.496 8.684 3.608 0.0575 0.848 0.715–1.005 EXCESSMOD Intercept ⫺2.0917 0.9817 4.5397 0.031 · · Tan␤ 25.91 6.93 13.945 0.0002 1.296 1.131–1.485 VIB5B4 ⫺14.45 3.932 13.516 0.0002 0.865 0.801–0.935 DC ⫺0.00016 0.000058 7.4264 0.0064 0.853 0.76–0.96 POORMOD Intercept 5.7018 1.6885 11.403 0.0007 · · Z ⫺0.0428 0.00973 19.3491 0.0001 0.958 0.940–0.977 VIB7B5 ⫺9.3547 4.0844 5.2456 0.0220 0.911 0.841–0.987 DC ⫺0.00095 0.000382 6.2152 0.0127 0.386 0.183–0.816 ␹ ⫺4.8828 2.253 4.697 0.0302 0.952 0.911–0.995 ␺ ⫺0.00606 0.00292 4.3113 0.0379 0.994 0.988–1.00 VPOORMOD Intercept 41.8 18.3354 5.2055 0.0225 · · Z ⫺0.0985 0.0431 5.228 0.0222 0.906 0.833–0.986 VIB7B5 29.96 12.7 5.58 0.0182 1.349 1.052–1.730 ␭ ⫺1.988 1.07 3.45 0.0632 0.137 0.017–1.116 IMPERFMOD Intercept 0.9591 0.7859 1.4892 0.2223 · · Tan␤ 16.2153 6.1586 6.9580 0.0083 1.176 1.043–1.327 ␸ ⫺0.0929 0.0392 5.6178 0.0178 0.911 0.844–0.984 VIB5B4 6.9974 4.0060 3.0511 0.0807 1.072 0.991–1.160 1396 SOIL SCI. SOC. AM. J., VOL. 66, JULY–AUGUST 2002 ment and excessively drained soils, and the plateau and the best variable to predict poorly and very poorly well drained soils. In addition, the residual hill slopes drained soils, followed closely by the DC. Aspect (␺), in both the sandstone and shale areas had a high proba- VIB7B5, and the compound topographic index (␭) also bility of being excessively drained. resulted in significant models but in all instances the POORMOD (Fig. 4) predicted the probability of association values (␥) were below 0.3 (Table 2). The poorly (D5) and very poorly drained (D6) soils com- covariates meeting the stepwise 0.15 entry, 0.10 exit bined together. A comparison between different univar- criteria were Z, VIB7B5, DC, ␹, and ␺. The odds were iate models (Table 2) indicated that elevation (Z) was all negatively associated (odds ratio ⬍1) (Table 3). The DC had the smallest odds, 0.386, which meant that a one unit increase in the distance from the river channel Table 4. Classification and validation of multivariate logistic models to predict drainage classes (ncal is the number of cali- decreased the odds by 61% of the soils being either bration observations, nval is the number of validation obser- poorly or very poorly drained. There was therefore a vations). high sensitivity in the degree of proximity to the river Prob Level Correct False ⫹ve False ⫺ve channel in predicting the wetter soils in the shale area. VPOORMOD (Fig. 4) predicted the probability of NONHYDRICMOD Classification ncal ⫽ 295 very poorly drained soils (D6). A comparison between different univariate models indicated that VIB7B5 was 0.45 96 1 2 0.5 96 1 2 the best variable to predict poorly drained soils, fol- 0.55 97 1 2 lowed closely by VIB5B4 and elevation (Z) (Table 2). 0.6 97 1 2 Validation nval ⫽ 72 After checking for collinearity, the covariates meeting the stepwise 0.15 entry, 0.10 exit criteria were Z, 0.45 97 3 0 0.5 97 3 0 VIB7B5, and ␭ (Table 2). The odds were positively 0.55 97 3 0 associated with VIB7B5 (odds ratio ⬎1) and negatively 0.6 97 3 0 EXCESSMOD associated with Z and ␭ (odds ratio ⬍1) (Table 3). The Classification ncal ⫽ 139 compound topographic index, ␭, had a very low odds 0.50 86 4 9 ratio, 0.137, which indicated that with a one unit increase 0.55 89 2 9 in ␭ there was a 86% decrease in the odds of very poorly 0.60 89 2 14 drained soils. High compound topographic index values 0.65 89 1 10 Validation nval ⫽ 31 are associated with river channels, and the soils on the 0.50 94 3 3 floodplain are better drained than the soils located on 0.55 94 3 3 the lower interfluve (Table 1). Thus relatively low ␭ 0.60 97 3 0 values were used in this model to associate the D6 soils 0.65 94 7 0 POORMOD with areas not in close proximity to the river channel. Classification ncal ⫽ 154 IMPERFMOD (Fig. 4) predicted the probability of 0.45 85 6 9 imperfectly drained soils (D4). A comparison between 0.5 83 6 12 different univariate models indicated that none of the 0.55 83 5 12 univariate models were particularly successful in pre- 0.6 82 5 14 Validation nval ⫽ 41 dicting the imperfectly drained soils, as no model 0.45 83 7 10 achieved a ␥ value ⬎0.26 (Table 2). The three covariates 0.5 83 7 10 included in the multivariate model were tan␤, ␸, and 0.55 85 5 10 0.6 83 5 12 VIB5B4. The odds were all positively associated (odds VPOORMOD ratio ⬎1) with tan␤ and VIB5B4 and negatively associ- Classification ncal ⫽ 32 ated (odds ratio ⬍1) with ␸ (Table 3), but the differences 0.55 84 9 6 with unity were ⬍20%, which indicated that none of 0.60 84 6 9 the variables in IMPERFMOD were sensitive. 0.65 84 6 9 0.70 88 3 9 Validation nval ⫽ 17 Logistic Model Validation 0.55 82 18 0 0.60 94 6 0 NONHYDRICMOD resulted in 97% correctly classi- 0.65 94 6 0 fied observations in the calibration data set (ncal ⫽ 0.70 88 6 6 295) at the 0.55 probability level (Table 4). At the 0.55 IMPERFMOD probability level the validation observations (nval ⫽ Classification ncal ⫽ 123 72), 97% were correctly classified. 0.45 62 20 19 0.50 65 13 22 EXCESSMOD resulted in 89% correctly classified 0.55 62 10 28 observations in the calibration set (ncal ⫽ 139) at the 0.60 57 7 37 0.65 probability level (Table 4). The validation observa- Validation nval ⫽ 24 tions (nval ⫽ 31), were 94% correctly classified, but a 0.45 71 17 13 0.50 71 13 17 higher performance (97%) could be achieved at the 0.60 0.55 63 13 25 probability level. 0.60 54 8 38 POORMOD correctly classified 85% of the calibra- CAMPLING ET AL.: LOGISTIC MODELING TO PREDICT SOIL DRAINAGE CLASS PROBABILITY 1397 Fig. 5. (a) Nonhydric and hydric soil drainage classes map and (b) three-soil drainage classes map. tion data set (ncal ⫽ 154) at the 0.45 probability level, 17) were 88% correctly classified, although a higher and the validation observations (nval ⫽ 41) were 83% performance could be achieved (94%) at the 0.60 and correctly classified. There were 6% false positive classi- 0.65 probability level. fications and 9% false negative classifications in the IMPERFMOD only achieved a 65% correct classifi- calibration set, and in the validation set there were 7% cation for the calibration observations (ncal ⫽ 123) at false negative classifications and 10% false positive clas- the 0.50 probability level, and a 71% correct classifica- sifications. tion for the validation observations (nval ⫽ 24). The VPOORMOD correctly classified 88% at a 0.70 prob- high false positive (13%) and false negative (22%) re- ability level, and the validation observations (nval ⫽ sults indicated that there was higher degree of confusion 1398 SOIL SCI. SOC. AM. J., VOL. 66, JULY–AUGUST 2002 Fig. 5c. Six-soil drainage classes map. between the drainage classes than other models where model performed with increasing complexity. There was the false positive and negative percentages remained a decrease in mapping accuracy from 91 to 70%, as mostly below 10% (Table 4). more drainage classes were introduced, because of the propagation of model errors. The approach, however, Drainage Class Probability Map allowed the modeler to assess at what degree of com- Drainage class probability maps were derived using plexity the model was reliable. the model parameters of the significant variables (Table The statistical analysis assessed the explanatory 3), and the probability levels determined by the unbi- power of 14 variables individually (univariate analysis) ased classification tables (Table 4). Maps were com- and collectively (multivariate analysis). Distance-to-the- bined to produce drainage class maps at different stages river channel was clearly an important explanatory vari- in the modeling procedure for the entire study area able (Fig. 6a), as it featured in three of the five best (Fig. 5). The nonhydric–hydric drainage map (Fig. 5a), multivariate models. It was particularly important in achieved a kappa (␬) of 0.918 (ASE ⫽ 0.021). The map- the NONHYDRICMOD and POORMOD models, as ping of the nonhydric and hydric soils was least clear it had odds ratios of 1.546 and 0.386, respectively. Al- at the boundary between the escarpment and the Cross though DC is a proximity attribute, it can be regarded River Plains. The model was able to map the excessively as a surrogate terrain variable as it integrates the terrain drained hill slopes on the Cross River Plains. The three and landform information inherent to the measure of drainage classes map (Fig. 5b) had a ␬ of 0.818 (ASE ⫽ river channel proximity. Elevation (Z) and Slope (tan␤) 0.024). The wettest soils (D5D6) were clearly mapped were the most important terrain variables (Fig. 6b,c) as being in closer proximity to the main river channels reflecting clearly the role that landscape plays in distin- than the D3D4 soils. The area of wettest soils widened guishing between drainage classes. The study site is out below the confluence of the River Ebonyi and River highly heterogeneous, having both sandstone and shale Amanyi. The six drainage class map had a ␬ of 0.705 lithologies, which have undergone a prolonged period (ASE ⫽ 0.025), which meant that over 70% of predicted of denudation since the Tertiary. This accounts for the drainage classes were correct. major contrasts in landform features, and hence the important role that primary terrain attributes such as elevation and slope have made in the analysis. DISCUSSION At least one vegetation index features in all of the The nested dichotomous modeling strategy provided multivariate models, which supports the observation a stepwise approach to mapping the drainage classes of that vegetation can assist in indicating the likelihood of the study area, and brought insights into how well the a drainage class. VIB7B5 (Fig. 6d) is more significant CAMPLING ET AL.: LOGISTIC MODELING TO PREDICT SOIL DRAINAGE CLASS PROBABILITY 1399 Fig. 6. (a) Distance-to-channel grid and (b) elevation grid. than VIB5B4 in predicting the wetter soils as it occurred entiate between landscapes where soil water converges in the POORMOD and VPOORMOD models. or diverges. The third derivative terrain models (␻, ␾, and ␹) did not perform well in predicting drainage classes. This CONCLUSIONS may be due to the resolution of the DEM (based on a 1:50 000 topographic map), which does not provide Logistic modeling provided an explicitly quantitative enough terrain slope information to adequately differ- approach to predict the spatial occurrence of soil drain- 1400 SOIL SCI. SOC. AM. J., VOL. 66, JULY–AUGUST 2002 Fig. 6. (c) Slope grid (tan␤) and (d) Vegetation Index grid (VIB7B5). age classes, by use of digital terrain attributes and vege- to produce drainage class maps, and enabled the step- tation indices from the LANDSAT-5 Thematic Mapper. wise nature of the modeling procedure to be monitored. A nested dichotomous modeling strategy was adopted The model distinguishing between the nonhydric CAMPLING ET AL.: LOGISTIC MODELING TO PREDICT SOIL DRAINAGE CLASS PROBABILITY 1401 (D1D2) and hydric (D3D4D5D6) soils performed best soil morphology: Methods and application in soil-landscape classifi- cation. Soil Sci. Soc. Am. J. 64:1423–1433. (␬ ⫽ 0.91), followed by the three-drainage class model Hosmer, D.W., and S. Lemeshow. 1989. Applied logistic regression. (D1D2, D3D4, D5D6) (␬ ⫽ 0.818), and then the six- Wiley series in probability and mathematical statistics. Wiley and drainage model (␬ ⫽ 0.704). The most important explan- Sons, New York. atory variables were DC, elevation (Z), slope (tan␤), Hurt, G.W., and R.B. Brown. 1995. Development and application of compound topographic index (␭) and two vegetation hydric soil indicators in Florida. Wetlands 15:74–81. Hutchinson, M.F. 1989. A new procedure for gridding elevation and indices (VIB7B5 and VIB5B4). In all cases, the multi- stream line data with automatic removal of spurious pits. J. Hy- variate models, which combined terrain and vegetation drol. 106:211–232. index information, performed better than the univariate Jenson, S.K., and J.O. Dominigue. 1988. Extracting topographic struc- models. It has been shown, therefore, that digital terrain ture from digital elevation data for geographic information system analysis. Photogramm. Eng. Remote Sens. 54:1593–1600. models and satellite information have a complementary Kirkby, M.J. 1975. Hydrograph modelling strategies. p. 69–90. In R. role as explanatory variables to predict drainage classes Peel et al. (ed.) Progress in physical and human geography. Heine- in humid tropical environments. mann, London. Landon, J.R. (ed.) 1991. Booker tropical soil manual. A handbook for soil survey and agricultural land evaluation in the tropics and ACKNOWLEDGMENTS the subtropics. Longman Scientific and Technical, UK and Booker Tate, UK. The cooperation of the farmers and villagers in the Nsukka Levine, E., R. Knox, and W. Lawrence. 1994. Relationships between Agricultural Zone who permitted us to carry out the land and soil properties and vegetation at the Northern Experimental Forest, soil surveying on their land is greatly appreciated. We are Howland, Maine. Remote Sens. Environ. 47:231–241. indebted to our interpreter Raymond Agbo. The study was Mather, P.M. 1995. Computer processing of remotely sensed images. part of the KU Leuven–UNN Inter University Project–Water An introduction. John Wiley and Sons, Chichester, UK. resources development for domestic use and small-scale irriga- Merot, Ph., B. Ezzahar, C. Walter, and P. Aurousseau. 1995. Mapping tion, funded by the Belgian Agency for Development Cooper- waterlogging of soils using digital terrain models. Hydro. Proc. 9: ation (BADC). 27–34. Moore, I.D., G.J. Burch, and D.H. Mackenzie. 1988. Topographic effects on the distribution of surface soil water and the location REFERENCES of ephemeral gullies. Trans. ASAE 31:1098–1107. Moore, I.D., R.B. Grayson, and A.R. Ladson. 1991. Digital terrain Bell, J.C., R.L. Cunningham, and M.W. Havens. 1994. Soil drainage modelling: a review of hydrological, geomorphological, and biologi- class probability mapping using a soil-landscape model. Soil Sci. cal applications. Hydro. Proc. 5:3–30. Soc. Am. J. 58:464–470. Moore, I.D., P.E. Gessler, G.A. Nielsen, and G.A. Peterson. 1993. Brinkman, R., and W.A. Blokhuis. 1985.Classification of the soils In Soil attribute prediction using terrain analysis. Soil Sci. Soc. Am. A.S.R. Juo and J.A. Lowe (ed.) The wetlands and rice in subsaharan J. 57:443–452. Africa. Proc. Int. Conf. On wetland utilization for rice production O’Loughlin, E.M. 1981. Saturation regions in catchments and their in subsaharan Africa, 4–8 Nov. 1985. IITA, Ibadan, Nigeria. relations to soil and topographic properties. J. Hydrol. 53:229–246. Cialella, A.T., R. Dubayah, W. Lawrence, and E. Levine. 1997. Pre- Sader, S.A., D. Ahl, and W.S. Liou. 1995. Accuracy of LANDSAT- dicting soil drainage class using remotely sensed and digital eleva- TM and GIS rule-based methods for forest wetland classification tion data. Photogramm. Eng. Remote Sens. 63:171–178. in Maine. Remote Sens. Environ. 53:133–144. ESRI. 1996. Arc/Info Version 7.1.1. Environ. Sys. Res. Inst., Red- SAS Institute. 1990. SAS/STAT user’s guide, version 6. Fourth edition, lands, CA. Vol. 1 and 2. SAS Institute, Cary, NC. Food and Agricultural Organization (FAO). 1990. Guidelines for soil Soil Survey Staff. 1998. Keys to soil taxonomy. 8th edition U.S. Gov. profile description. FAO, AGLS, Rome, Italy. Print. Office, Washington, DC. FAO. 1998. The world reference base for soil resources. World Soil Thompson, J.A., J.C. Bell, and C.A. Butler. 1997. Quantitative soil- Resources Reports 84. ISSS/AISS/IBG/ISRIC/FAO, Rome, Italy. landscape modeling for estimating the areal extent of hydromorphic Field Register. 1994. Changes in hydric soil of the United States. soils. Soil Sci. Soc. Am. J. 61:971–980. Federal Register, 13 July, Washington, DC. Thompson, J.A., J.C. Bell, and C.W. Zanner. 1998. Hydrology and Gessler, P.E., I.D. Moore, N.J. Mckenzie, and P.J. Ryan. 1995. Soil- hydric soil extent within a mollisol catena in southeastern Minne- landscape modelling and spatial prediction of soil attributes. Int. sota. Soil Sci. Soc. Am. J. 62:1126–1133. J. Geograph. Info. Syst. 9(4):421–432. Tucker, C.J. 1979. Red and photographic infrared linear combinations Gobin, A., P. Campling, J. Deckers, and J. Feyen. 2000a. Integrated for monitoring vegetation. Remote Sens. Environ. 10:127–150. toposequence analyses to combine local and scientific knowledge Zevenbergen, L.W., and C.R. Thorne. 1987. Quantitative analysis of systems. Geoderma 97:103–123. land surface topography. Earth Surf. Processes Landforms 12:47– Gobin, A., P. Campling, J. Deckers, and J. Feyen. 2000b. Quantifying 56.

(PDF) Logistic Modeling to Spatially Predict the Probability of Soil Drainage Classes