# Working with indices#

Indices calculated from multispectral satellite imagery are powerful ways to quantitatively analyze these data. They take advantage of different spectral properties of materials to differentiate between them. Many of these indices can be calculated with simple arithmetic operations. So now that our data are in xarray.Dataset’s, it’s fairly easy to calculate them. As an example, we’ll use two example scenes from before and after the Brumadinho tailings dam disaster to try to image and quantify the total area flooded by the damn collapse.

Trigger warning

This tutorial uses data from the tragic Brumadinho tailings dam disaster, in which over 250 people lost their lives. We use this dataset to illustrate the usefulness of remote sensing data for monitoring such disasters but we want to acknowledge its tragic human consequences. Some readers may find this topic disturbing and may not wish to read futher.

First, we must import the required packages, download our two sample scenes, and load them with xlandsat.load_scene:

import xlandsat as xls
import matplotlib.pyplot as plt

after

<xarray.Dataset>
Dimensions:   (easting: 400, northing: 300)
Coordinates:
* easting   (easting) float64 5.844e+05 5.844e+05 ... 5.963e+05 5.964e+05
* northing  (northing) float64 -2.232e+06 -2.232e+06 ... -2.223e+06 -2.223e+06
Data variables:
blue      (northing, easting) float16 0.0686 0.07043 ... 0.05823 0.0564
green     (northing, easting) float16 0.1027 0.09839 ... 0.07593 0.07043
red       (northing, easting) float16 0.09778 0.09778 ... 0.06799 0.06177
nir       (northing, easting) float16 0.2988 0.2715 0.2881 ... 0.2637 0.251
swir1     (northing, easting) float16 0.2311 0.2274 0.2316 ... 0.1608 0.142
swir2     (northing, easting) float16 0.145 0.1442 0.144 ... 0.09961 0.08655
Attributes: (12/19)
Conventions:                CF-1.8
title:                      Landsat 8 scene from 2019-01-30 (path/row=218...
digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
origin:                     Image courtesy of the U.S. Geological Survey
landsat_product_id:         LC08_L2SP_218074_20190130_20200829_02_T1
processing_level:           L2SP
...                         ...
ellipsoid:                  WGS84
date_acquired:              2019-01-30
scene_center_time:          12:57:09.1851220Z
wrs_path:                   218
wrs_row:                    74
mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

Let’s make RGB composites to get a sense of what these two scenes contain:

rgb_before = xls.composite(before, rescale_to=(0.03, 0.2))
rgb_after = xls.composite(after, rescale_to=(0.03, 0.2))

fig, axes = plt.subplots(2, 1, figsize=(10, 12), layout="tight")
for ax, rgb in zip(axes, [rgb_before, rgb_after]):
rgb.plot.imshow(ax=ax)
ax.set_title(rgb.attrs["title"])
ax.set_aspect("equal")
plt.show() The dam is located at around 592000 east and -2225000 north. The after scene clearly shows all of the red mud that flooded the region to the southwest of the dam. Notice also the red tinge of the Paraopeba River in the after image as it was contaminated by the mud flow.

Tip

# NDVI#

We can calculate the NDVI for these scenes to see if we can isolate the effect of the flood following the dam collapse. NDVI highlights vegetation, which we assume will have decreased in the after scene due to the flood. NDVI is defined as:

$NDVI = \dfrac{NIR - Red}{NIR + Red}$

which we can calculate with xarray as:

ndvi_before = (before.nir - before.red) / (before.nir + before.red)
ndvi_before

<xarray.DataArray (northing: 300, easting: 400)>
array([[0.4604, 0.4507, 0.5728, ..., 0.545 , 0.594 , 0.554 ],
[0.445 , 0.478 , 0.479 , ..., 0.628 , 0.5664, 0.4626],
[0.4734, 0.494 , 0.4739, ..., 0.6387, 0.596 , 0.588 ],
...,
[0.5044, 0.5186, 0.4983, ..., 0.527 , 0.4824, 0.474 ],
[0.4883, 0.47  , 0.4724, ..., 0.525 , 0.476 , 0.4553],
[0.5024, 0.476 , 0.4934, ..., 0.479 , 0.4536, 0.458 ]],
dtype=float16)
Coordinates:
* easting   (easting) float64 5.835e+05 5.835e+05 ... 5.954e+05 5.955e+05
* northing  (northing) float64 -2.232e+06 -2.232e+06 ... -2.223e+06 -2.223e+06

Now we can do the same for the after scene:

ndvi_after = (after.nir - after.red) / (after.nir + after.red)
ndvi_after

<xarray.DataArray (northing: 300, easting: 400)>
array([[0.5073, 0.4705, 0.4907, ..., 0.515 , 0.4946, 0.4565],
[0.4277, 0.478 , 0.4817, ..., 0.4949, 0.5166, 0.5474],
[0.37  , 0.4304, 0.441 , ..., 0.522 , 0.5786, 0.6226],
...,
[0.57  , 0.594 , 0.595 , ..., 0.328 , 0.462 , 0.54  ],
[0.5845, 0.5796, 0.583 , ..., 0.4995, 0.566 , 0.572 ],
[0.61  , 0.5664, 0.5938, ..., 0.609 , 0.5903, 0.605 ]],
dtype=float16)
Coordinates:
* easting   (easting) float64 5.844e+05 5.844e+05 ... 5.963e+05 5.964e+05
* northing  (northing) float64 -2.232e+06 -2.232e+06 ... -2.223e+06 -2.223e+06

for ndvi in [ndvi_before, ndvi_after]:
ndvi.attrs["long_name"] = "normalized difference vegetation index"
ndvi.attrs["units"] = "dimensionless"
ndvi_before.attrs["title"] = "NDVI before"
ndvi_after.attrs["title"] = "NDVI after"


Now we can make pseudo-color plots of the NDVI from before and after the disaster:

fig, axes = plt.subplots(2, 1, figsize=(10, 12), layout="tight")
for ax, ndvi in zip(axes, [ndvi_before, ndvi_after]):
# Limit the scale to [-1, +1] so the plots are easier to compare
ndvi.plot(ax=ax, vmin=-1, vmax=1, cmap="RdBu_r")
ax.set_title(ndvi.attrs["title"])
ax.set_aspect("equal")
plt.show() # Tracking differences#

An advantage of having our data in xarray.DataArray format is that we can perform coordinate-aware calculations. This means that taking the difference between our two arrays will take into account the coordinates of each pixel and only perform the operation where the coordinates align.

We can calculate the change in NDVI from one scene to the other by taking the difference:

ndvi_change = ndvi_before - ndvi_after

ndvi_change.name = "ndvi_change"
ndvi_change.attrs["long_name"] = "NDVI change"
ndvi_change.attrs["title"] = (
f"NDVI change between {before.attrs['date_acquired']} and "
f"{after.attrs['date_acquired']}"
)
ndvi_change

<xarray.DataArray 'ndvi_change' (northing: 300, easting: 370)>
array([[ 0.05908 ,  0.06323 ,  0.0542  , ...,  0.004395, -0.009766,
0.07324 ],
[ 0.0498  ,  0.07764 ,  0.07495 , ...,  0.0957  ,  0.012695,
0.003906],
[-0.010254,  0.11743 ,  0.0747  , ...,  0.03125 ,  0.01807 ,
0.04004 ],
...,
[-0.000977,  0.01123 ,  0.000977, ...,  0.00928 ,  0.01367 ,
0.00708 ],
[ 0.00879 ,  0.02344 ,  0.01318 , ...,  0.006836,  0.00586 ,
0.001221],
[-0.01221 ,  0.02637 ,  0.006836, ...,  0.01343 ,  0.01221 ,
0.0105  ]], dtype=float16)
Coordinates:
* easting   (easting) float64 5.844e+05 5.844e+05 ... 5.954e+05 5.955e+05
* northing  (northing) float64 -2.232e+06 -2.232e+06 ... -2.223e+06 -2.223e+06
Attributes:
long_name:  NDVI change
title:      NDVI change between 2019-01-14 and 2019-01-30

Did you notice?

The keen-eyed among you may have noticed that the number of points along the "easting" dimension has decreased. This is because xarray only makes the calculations for pixels where the two scenes coincide. In this case, there was an East-West shift between scenes but our calculations take that into account.

Now lets plot the difference:

fig, ax = plt.subplots(1, 1, figsize=(10, 6))
ndvi_change.plot(ax=ax, vmin=-1, vmax=1, cmap="PuOr")
ax.set_aspect("equal")
ax.set_title(ndvi_change.attrs["title"])
plt.show() There’s some noise in the cloudy areas of both scenes in the northeast but otherwise this plots highlights the area affected by flooding from the dam collapse in purple at the center.

# Estimating area#

One things we can do with indices and their differences in time is calculated area estimates. If we know that the region of interest has index values within a given value range, the area can be calculated by counting the number of pixels within that range (a pixel in Landsat 8/9 scenes is 30 x 30 = 900 m²).

First, let’s slice our NDVI difference to just the flooded area to avoid the effect of the clouds in North. We’ll use the xarray.DataArray.sel method to slice using the UTM coordinates of the scene:

flood = ndvi_change.sel(
easting=slice(587000, 594000),
northing=slice(-2230000, -2225000),
)

fig, ax = plt.subplots(1, 1, figsize=(10, 6))
flood.plot(ax=ax, vmin=-1, vmax=1, cmap="PuOr")
ax.set_aspect("equal")
plt.show() Now we can create a mask of the flood area by selecting pixels that have a high NDVI difference. Using a > comparison (or any other logical operator in Python), we can create a boolean (True or False) xarray.DataArray as our mask:

# Threshold value determined by trial-and-error


<xarray.DataArray 'ndvi_change' (northing: 167, easting: 234)>
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]])
Coordinates:
* easting   (easting) float64 5.87e+05 5.87e+05 ... 5.94e+05 5.94e+05
* northing  (northing) float64 -2.23e+06 -2.23e+06 ... -2.225e+06 -2.225e+06
Attributes:
long_name:  flood mask

Plotting boolean arrays will use 1 to represent True and 0 to represent False:

fig, ax = plt.subplots(1, 1, figsize=(10, 6))
ax.set_aspect("equal")
plt.show() Notice that our mask isn’t perfect. There are little bloobs classified as flood pixels that are clearly outside the flood region. For more sophisticated analysis, see the image segmentation methods in scikit-image.

Counting the number of True values is as easy as adding all of the boolean values (remember that True corresponds to 1 and False to 0), which we’ll do with xarray.DataArray.sum:

flood_pixels = flood_mask.sum().values
print(flood_pixels)

2095


Note

We use .values above because sum returns an xarray.DataArray with a single value instead of the actual number. This is usually not a problem but it looks ugly when printed, so we grab the number with .values.

Finally, the flood area is the number of pixels multiplied by the area of each pixel (30 x 30 m²):

flood_area = flood_pixels * 30**2

print(f"Flooded area is approximately {flood_area:.0f} m²")

Flooded area is approximately 1885500 m²


Values in m² are difficult to imagine so a good way to communicate these numbers is to put them into real-life context. In this case, we can use the football pitches as a unit that many people will understand:

flood_area_pitches = flood_area / 7140

print(f"Flooded area is approximately {flood_area_pitches:.0f} football pitches")

Flooded area is approximately 264 football pitches


Warning

This is a very rough estimate! The final value will vary greatly if you change the threshold used to generate the mask (try it yourself). For a more thorough analysis of the disaster using remote-sensing data, see Silva Rotta et al. (2020).

# Other indices#

Calculating other indices will follow a very similar strategy to NDVI since most of them only involve arithmetic operations on different bands. As an example, let’s calculate and plot the Modified Soil Adjusted Vegetation Index (MSAVI) for our two scenes:

import numpy as np

# This time, use a loop and put them in a list to avoid repeated code
msavi_collection = []
for scene in [before, after]:
msavi = (
(
2 * scene.nir + 1 - np.sqrt(
(2 * scene.nir + 1) * 2 - 8 * (scene.nir - scene.red)
)
) / 2
)
msavi.name = "msavi"
msavi.attrs["long_name"] = "modified soil adjusted vegetation index"
msavi.attrs["units"] = "dimensionless"
msavi.attrs["title"] = scene.attrs["title"]
msavi_collection.append(msavi)

# Plotting is mostly the same
fig, axes = plt.subplots(2, 1, figsize=(10, 12), layout="tight")
for ax, msavi in zip(axes, msavi_collection):
msavi.plot(ax=ax, vmin=-0.5, vmax=0.5, cmap="RdBu_r")
ax.set_title(msavi.attrs["title"])
ax.set_aspect("equal")
plt.show() With this same logic, you could calculate NBR and dNBR, other variants of NDVI, NDSI, etc.