Loading and cropping scenes#

One of the main features of xlandsat is being able to read scenes downloaded from USGS EarthExplorer along with all of the associated metadata. EarthExplorer allows you to download scenes in two main formats:

  1. As a single .tar file containing .TIF files for the bands and a file ending in MTL.txt with the metadata.

  2. As individual .TIF and metadata files.

We support reading from both formats so you don’t have really have to do much after downloading the scene.

In this tutorial, we’ll be using some of our sample datasets instead of actual full scenes from EarthExplorer. This is mostly so we don’t have to use the large (~1Gb) files, which can take a bit of time to download and load. The scenes we use have been cropped to make them smaller. But everything we do here is exactly the same when you use it on full scenes from EarthExplorer.

Downloading scenes from EarthExplorer

New to EarthExplorer? Watch this tutorial on how to use the service and download scenes that you can use with xlandsat: https://www.youtube.com/watch?v=Wn_G4fvitV8

As always, we’ll start by importing xlandsat and other libraries we’ll use:

import xlandsat as xls
import matplotlib.pyplot as plt
import pathlib

Note

All of this works for Collection 2 Level 2 and Level 1 scenes.

Load a scene from a .tar archive#

If you downloaded a full scene from EarthExplorer in a .tar archive, xlandsat can load the data from the archive directly. You don’t have to unpack it yourself and xlandsat reads everything in it by default. This is usually the easiest way to work with these data but the downside is that the archive can be quite large, particularly if you don’t need all of the different bands (see below for an alternative).

To simulate this use case, let’s download the .tar archive for one of our sample scenes using xlandsat.datasets.fetch_liverpool:

path_to_archive = xls.datasets.fetch_liverpool()
print(path_to_archive)
/home/runner/.cache/xlandsat/v0.5.0/LC08_L2SP_204023_20200927_20201006_02_T1-cropped.tar.gz

This will download the .tar archive to your computer and return the path to the file.

Note

Our sample data come in .tar.gz archives, which have been compressed (hence the .gz to save space and bandwidth. But all our functions work the same with .tar or .tar.gz archives.

To load a scene directly from the archive, use xlandsat.load_scene with the path to the archive file, which can be a string or a pathlib.Path:

scene = xls.load_scene(path_to_archive)
scene
<xarray.Dataset> Size: 2MB
Dimensions:          (easting: 433, northing: 267)
Coordinates:
  * easting          (easting) float64 3kB 4.87e+05 4.87e+05 ... 5e+05 5e+05
  * northing         (northing) float64 2kB 5.922e+06 5.922e+06 ... 5.93e+06
Data variables:
    coastal_aerosol  (northing, easting) float16 231kB 0.06238 ... 0.0769
    blue             (northing, easting) float16 231kB 0.0708 ... 0.08301
    green            (northing, easting) float16 231kB 0.08618 ... 0.09753
    red              (northing, easting) float16 231kB 0.06824 0.06824 ... 0.116
    nir              (northing, easting) float16 231kB 0.04553 0.04553 ... 0.173
    swir1            (northing, easting) float16 231kB 0.04626 ... 0.2213
    swir2            (northing, easting) float16 231kB 0.047 0.04663 ... 0.1686
    thermal          (northing, easting) float16 231kB 287.0 287.0 ... 290.5
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

The scene contains all of the bands available in the archive and has metadata populated from the MTL.txt file. Notice also that the values for each band have been converted to surface reflectance and surface temperature automatically.

Load a scene from a folder#

If you don’t need all of the bands, you can save space by downloading only the .TIF files that you need from EarthExplorer. Once you do that, place the .TIF files and the associated MTL.txt file (don’t forget it) in the same folder. It’s important that a folder can only contain a single scene, so if you’re working with multiple scenes you’ll have to place them in different folders.

Let’s simulate this use case by telling fetch_liverpool to unpack the archive and give us the path to the folder instead of the archive:

path_to_folder = xls.datasets.fetch_liverpool(untar=True)
print(path_to_folder)
/home/runner/.cache/xlandsat/v0.5.0/LC08_L2SP_204023_20200927_20201006_02_T1-cropped.tar.gz.untar

Notice that there is now a .untar at the end of the name, indicating that this is the folder where the archive has been unpacked. We can use the pathlib module from the Python standard library to list all of the files that are in this folder:

path_to_folder = pathlib.Path(path_to_folder)
print(f"This is indeed a folder: {path_to_folder.is_dir()}")
print("Folder contents:")
for file in path_to_folder.iterdir():
    print(f"  {file.name}")
This is indeed a folder: True
Folder contents:
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B1.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B7.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B6.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B4.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_MTL.txt
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B5.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B2.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_SR_B3.TIF
  LC08_L2SP_204023_20200927_20201006_02_T1_ST_B10.TIF

As you can see, the band .TIF files are there as well as the MTL.txt file. Now that we have the path to a folder that has these files, we can pass it to xlandsat.load_scene and it will do its job:

scene = xls.load_scene(path_to_folder)
scene
<xarray.Dataset> Size: 2MB
Dimensions:          (easting: 433, northing: 267)
Coordinates:
  * easting          (easting) float64 3kB 4.87e+05 4.87e+05 ... 5e+05 5e+05
  * northing         (northing) float64 2kB 5.922e+06 5.922e+06 ... 5.93e+06
Data variables:
    coastal_aerosol  (northing, easting) float16 231kB 0.06238 ... 0.0769
    blue             (northing, easting) float16 231kB 0.0708 ... 0.08301
    green            (northing, easting) float16 231kB 0.08618 ... 0.09753
    red              (northing, easting) float16 231kB 0.06824 0.06824 ... 0.116
    nir              (northing, easting) float16 231kB 0.04553 0.04553 ... 0.173
    swir1            (northing, easting) float16 231kB 0.04626 ... 0.2213
    swir2            (northing, easting) float16 231kB 0.047 0.04663 ... 0.1686
    thermal          (northing, easting) float16 231kB 287.0 287.0 ... 290.5
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

Notice that this is the same result we had before.

Hint

Only the .TIF files present will be loaded by xlandsat.load_scene. So you don’t need some of them, don’t include them in the folder.

The scene, bands, and metadata#

The scene itself is a xarray.Dataset that contains:

  1. easting and northing dimensions which are the UTM coordinates of the pixels (in meters).

  2. Several data variables that each represent a band. The bands are referenced by name, not by number. Each band is a xarray.DataArray that has the same dimensions as the scene.

  3. Missing values in the scene (either from the padding or out-of-bounds pixels) are represented by numpy.nan.

  4. Metadata for the scene, each dimension, and each band.

Placing a xarray.Dataset or xarray.DataArray at the end of a Jupyter notebook cell will display a nice preview of the contents:

scene
<xarray.Dataset> Size: 2MB
Dimensions:          (easting: 433, northing: 267)
Coordinates:
  * easting          (easting) float64 3kB 4.87e+05 4.87e+05 ... 5e+05 5e+05
  * northing         (northing) float64 2kB 5.922e+06 5.922e+06 ... 5.93e+06
Data variables:
    coastal_aerosol  (northing, easting) float16 231kB 0.06238 ... 0.0769
    blue             (northing, easting) float16 231kB 0.0708 ... 0.08301
    green            (northing, easting) float16 231kB 0.08618 ... 0.09753
    red              (northing, easting) float16 231kB 0.06824 0.06824 ... 0.116
    nir              (northing, easting) float16 231kB 0.04553 0.04553 ... 0.173
    swir1            (northing, easting) float16 231kB 0.04626 ... 0.2213
    swir2            (northing, easting) float16 231kB 0.047 0.04663 ... 0.1686
    thermal          (northing, easting) float16 231kB 287.0 287.0 ... 290.5
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

In the preview above, click on the icons to the right to access the metadata for each dimension and band and a preview of their data values. The metadata for the scene itself can be accessed by clicking in “Attributes”. Go ahead and explore what’s available!

The metadata is available programmatically through the attrs attribute of the scene. It behaves like a dictionary:

print(scene.attrs["landsat_product_id"])
print(scene.attrs["date_acquired"])
LC08_L2SP_204023_20200927_20201006_02_T1
2020-09-27

The metadata for the bands and the dimensions can be accessed the same way:

print(scene.blue.attrs["filename"])
print(scene.easting.attrs["long_name"])
LC08_L2SP_204023_20200927_20201006_02_T1_SR_B2.TIF
UTM easting

Selecting which bands to load#

If you have more bands downloaded than you actually want to use, then we can save time and memory by only loading the desired bands. For example, if our only goal is to make an RGB composite, then we only really need the red, green, and blue bands. Instead of having to edit the .tar archive or move files out of our data folder, we can tell xlandsat.load_scene which bands we want by passing it a list of band names like so:

scene = xls.load_scene(path_to_archive, bands=["red", "green", "blue"])
scene
<xarray.Dataset> Size: 699kB
Dimensions:   (easting: 433, northing: 267)
Coordinates:
  * easting   (easting) float64 3kB 4.87e+05 4.87e+05 4.871e+05 ... 5e+05 5e+05
  * northing  (northing) float64 2kB 5.922e+06 5.922e+06 ... 5.93e+06 5.93e+06
Data variables:
    blue      (northing, easting) float16 231kB 0.0708 0.07117 ... 0.08301
    green     (northing, easting) float16 231kB 0.08618 0.08667 ... 0.09753
    red       (northing, easting) float16 231kB 0.06824 0.06824 ... 0.1193 0.116
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

This works the same if reading from an archive or from a folder that contains more band files than we want:

scene = xls.load_scene(path_to_folder, bands=["red", "green", "blue"])
scene
<xarray.Dataset> Size: 699kB
Dimensions:   (easting: 433, northing: 267)
Coordinates:
  * easting   (easting) float64 3kB 4.87e+05 4.87e+05 4.871e+05 ... 5e+05 5e+05
  * northing  (northing) float64 2kB 5.922e+06 5.922e+06 ... 5.93e+06 5.93e+06
Data variables:
    blue      (northing, easting) float16 231kB 0.0708 0.07117 ... 0.08301
    green     (northing, easting) float16 231kB 0.08618 0.08667 ... 0.09753
    red       (northing, easting) float16 231kB 0.06824 0.06824 ... 0.1193 0.116
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

Loading only a segment of the scene#

Since Landsat scenes are large, it’s not uncommon to need only a smaller section of a scene. Limiting the spatial extent loaded can also help reduce the memory requirement, particularly when loading a time series of scenes. We could crop an existing scene after loading by using xarray.Dataset.sel with the UTM bounding box of the desired region:

scene = xls.load_scene(path_to_archive)
cropped = scene.sel(
    easting=slice(4.88e5, 4.90e5),
    northing=slice(5.925e6, 5.927e6),
)
cropped
<xarray.Dataset> Size: 73kB
Dimensions:          (easting: 67, northing: 67)
Coordinates:
  * easting          (easting) float64 536B 4.88e+05 4.88e+05 ... 4.9e+05
  * northing         (northing) float64 536B 5.925e+06 5.925e+06 ... 5.927e+06
Data variables:
    coastal_aerosol  (northing, easting) float16 9kB 0.05811 0.06042 ... 0.05505
    blue             (northing, easting) float16 9kB 0.0675 0.06873 ... 0.06519
    green            (northing, easting) float16 9kB 0.08569 0.08655 ... 0.08752
    red              (northing, easting) float16 9kB 0.06885 0.06934 ... 0.07495
    nir              (northing, easting) float16 9kB 0.04407 0.0448 ... 0.04346
    swir1            (northing, easting) float16 9kB 0.04602 0.04651 ... 0.04553
    swir2            (northing, easting) float16 9kB 0.04626 0.04639 ... 0.04675
    thermal          (northing, easting) float16 9kB 286.8 286.8 ... 286.5 286.5
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

But this will still load the full scene, which takes up time and memory. A better way to do this is to crop the scene directly when loading it:

cropped = xls.load_scene(
    path_to_archive,
    region=(4.88e5, 4.90e5, 5.925e6, 5.927e6),
)
cropped
<xarray.Dataset> Size: 74kB
Dimensions:          (easting: 68, northing: 67)
Coordinates:
  * easting          (easting) float64 544B 4.88e+05 4.88e+05 ... 4.9e+05
  * northing         (northing) float64 536B 5.925e+06 5.925e+06 ... 5.927e+06
Data variables:
    coastal_aerosol  (northing, easting) float16 9kB 0.05786 0.05811 ... 0.05505
    blue             (northing, easting) float16 9kB 0.06775 0.0675 ... 0.06519
    green            (northing, easting) float16 9kB 0.08557 0.08569 ... 0.08752
    red              (northing, easting) float16 9kB 0.06885 0.06885 ... 0.07495
    nir              (northing, easting) float16 9kB 0.04431 0.04407 ... 0.04346
    swir1            (northing, easting) float16 9kB 0.04602 0.04602 ... 0.04553
    swir2            (northing, easting) float16 9kB 0.04651 0.04626 ... 0.04675
    thermal          (northing, easting) float16 9kB 286.8 286.8 ... 286.5 286.5
Attributes: (12/19)
    Conventions:                CF-1.8
    title:                      Landsat 8 scene from 2020-09-27 (path/row=204...
    digital_object_identifier:  https://doi.org/10.5066/P9OGBGM6
    origin:                     Image courtesy of the U.S. Geological Survey
    landsat_product_id:         LC08_L2SP_204023_20200927_20201006_02_T1
    processing_level:           L2SP
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

Notice that in both examples we were able to use the natural UTM coordinates of the scene instead of pixel numbers. This is particularly important when cropping scenes with the same WRS path/row at different times, since their boundaries won’t coincide exactly and cropping by pixels would result in misaligned images.

Load the panchromatic band#

The panchromatic band from Level 1 scenes will be ignored by xlandsat.load_scene if it’s present in an archive or folder. This is because of it’s higher spatial resolution, which means that it can’t share dimension coordinates with the other bands. For this reason, we have the separate function xarray.load_panchromatic for loading it. Just like with regular scenes, we can provide either a .tar archive or a folder that contains the band and the MTL.txt file:

path_to_pan = xls.datasets.fetch_liverpool_panchromatic()
pan = xls.load_panchromatic(path_to_pan)
pan
<xarray.DataArray 'pan' (northing: 534, easting: 867)> Size: 2MB
array([[0.04228   , 0.04231999, 0.04229999, ..., 0.0603    , 0.0497    ,
        0.04848   ],
       [0.04224   , 0.04207999, 0.04212   , ..., 0.05284   , 0.04792   ,
        0.03968   ],
       [0.04251999, 0.04236   , 0.04207999, ..., 0.05194   , 0.04589999,
        0.03664   ],
       ...,
       [0.04477999, 0.04421999, 0.04423999, ..., 0.05402   , 0.05444   ,
        0.05415999],
       [0.04465999, 0.04457999, 0.04387999, ..., 0.05435999, 0.05398   ,
        0.05471999],
       [0.04421999, 0.04408   , 0.04382   , ..., 0.05731999, 0.05505999,
        0.05444   ]], dtype=float32)
Coordinates:
  * easting   (easting) float64 7kB 4.87e+05 4.87e+05 4.87e+05 ... 5e+05 5e+05
  * northing  (northing) float64 4kB 5.922e+06 5.922e+06 ... 5.93e+06 5.93e+06
Attributes: (12/25)
    long_name:                  panchromatic
    units:                      reflectance
    number:                     8
    filename:                   LC08_L1TP_204023_20200927_20201006_02_T1_B8.TIF
    scaling_mult:               2e-05
    scaling_add:                -0.1
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

And we can also crop the panchromatic band upon loading to the same extent as our regular scene:

cropped_pan = xls.load_panchromatic(
    path_to_pan,
    region=(4.88e5, 4.90e5, 5.925e6, 5.927e6),
)
cropped_pan
<xarray.DataArray 'pan' (northing: 134, easting: 134)> Size: 72kB
array([[0.04355999, 0.04343999, 0.04321999, ..., 0.04406   , 0.0451    ,
        0.04511999],
       [0.0436    , 0.04265999, 0.04251999, ..., 0.04532   , 0.04532   ,
        0.04545999],
       [0.04343999, 0.04355999, 0.04331999, ..., 0.04586   , 0.04581999,
        0.046     ],
       ...,
       [0.04522   , 0.04423999, 0.04387999, ..., 0.04681999, 0.04669999,
        0.04724   ],
       [0.04387999, 0.04457999, 0.04457999, ..., 0.04739999, 0.04756   ,
        0.0472    ],
       [0.04454   , 0.04409999, 0.04445999, ..., 0.04702   , 0.04693999,
        0.04736   ]], dtype=float32)
Coordinates:
  * easting   (easting) float64 1kB 4.88e+05 4.88e+05 ... 4.9e+05 4.9e+05
  * northing  (northing) float64 1kB 5.925e+06 5.925e+06 ... 5.927e+06 5.927e+06
Attributes: (12/25)
    long_name:                  panchromatic
    units:                      reflectance
    number:                     8
    filename:                   LC08_L1TP_204023_20200927_20201006_02_T1_B8.TIF
    scaling_mult:               2e-05
    scaling_add:                -0.1
    ...                         ...
    ellipsoid:                  WGS84
    date_acquired:              2020-09-27
    scene_center_time:          11:10:50.3140030Z
    wrs_path:                   204
    wrs_row:                    23
    mtl_file:                   GROUP = LANDSAT_METADATA_FILE\n  GROUP = PROD...

This is particularly useful for pansharpening to make higher resolution RGB composites.