Hint

You can run this notebook in a live session with Binder.

Crunch#

Crunching as defined in eodag is a way to filter the EO products contained in a SearchResult object. Several filters are available and further described in this document.

A SearchResult has a crunch() method that requires a filter instance as an argument, itself initialized with a dictionary that contains the required parameters. According to the filter used, some more kwargs may need to be passed to crunch(). The filters return a list of EOProducts.

Setup#

Results obtained from a search of Sentinel 2 Level-1C products over France in March 2021 are loaded in a SearchResult.

[1]:
from eodag import EODataAccessGateway
dag = EODataAccessGateway()
search_results = dag.deserialize("data/crunch_search_results.geojson")
print(f"This SearchResult stores {len(search_results)} products.")
This SearchResult stores 50 products.

The original search geometry is used throughout the notebook as long as with its representation as a a shapely object which is easier to map with folium.

[3]:
original_search_geometry = {"lonmin": 1, "latmin": 45, "lonmax": 5, "latmax": 47}
[4]:
import shapely

search_geometry = shapely.geometry.box(
    original_search_geometry["lonmin"],
    original_search_geometry["latmin"],
    original_search_geometry["lonmax"],
    original_search_geometry["latmax"],
)
[5]:
# To create interactive maps
import folium

def create_search_result_map(search_results, extent):
    """Small utility to create an interactive map with folium
    that displays an extent in red and EO Producs in blue"""
    fmap = folium.Map([46, 3], zoom_start=6)
    folium.GeoJson(
        extent,
        style_function=lambda x: dict(color="red")
    ).add_to(fmap)
    folium.GeoJson(
        search_results
    ).add_to(fmap)
    return fmap

Filter by start and end date#

FilterDate allows to filter out products that are older than a start date (optional) or more recent than an end date (optional).

This cruncher can also be called directly using SearchResult.filter_date().

[6]:
from eodag.crunch import FilterDate
[7]:
filtered_products = search_results.crunch(
    FilterDate(dict(start="2021-03-25", end="2021-03-29"))
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the date filter.")
21 products were filtered out by the date filter.

Filter by geometry#

FilterOverlap allows to filter out products that:

  • whose overlap area with a geometry is less than a percentage of their area

  • are not within a geometry

  • do not contain a geometry

  • do not intersect with a geometry

To execute a FilterOverlap, its instance must be created by passing a dictionary with either:

  • minimum_overlap set to a number between 0 and 100. within, contains and intersects cannot be used in that case.

  • One of within, contains and intersects (they are mutually exclusive) set to True. minimum_overlap cannot be used in that case.

Additionally, a geometry (shapely geometry, bounding box as a dictionary or a list) must be passed through the geometry parameter.

The examples below show how FilterOverlap filter out products. The original products will be displayed in blue and the filtered products in green.

This cruncher can also be called directly using SearchResult.filter_overlap().

[8]:
from eodag.crunch import FilterOverlap

All the products are displayed on the next map. As it can be observed, they all intersect with the search geometry.

[9]:
create_search_result_map(search_results, search_geometry)
[9]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The next two examples show how minimum_overlap affects the filter, with its value (i.e. percentage) set to 10 and 50%.

[10]:
filtered_products = search_results.crunch(
    FilterOverlap(dict(minimum_overlap=10)),
    geometry=search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
19 products were filtered out by the geometry filter.
[11]:
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the search area in green
folium.GeoJson(
    filtered_products,
    style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[11]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[12]:
filtered_products = search_results.crunch(
    FilterOverlap(dict(minimum_overlap=50)),
    geometry=search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
35 products were filtered out by the geometry filter.
[13]:
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the search area in green
folium.GeoJson(
    filtered_products,
    style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook

More and more products are filtered out when minimum_overlap increases. The next parameter given as an example is within, it is actually equivalent to setting minimum_overlap to 100.

[14]:
filtered_products = search_results.crunch(
    FilterOverlap(dict(within=True)),
    geometry=search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
39 products were filtered out by the geometry filter.
[15]:
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the filtered products in green
folium.GeoJson(
    filtered_products,
    style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

All the products not withing the read area are correctly filtered out by FilterOverlap. A new geometry is created in order to test the next parameter intersects.

[16]:
from shapely.geometry import Polygon
shifted_geom = Polygon([[4, 44], [9, 44], [9, 48], [4, 48], [4, 48]])
[17]:
filtered_products = search_results.crunch(
    FilterOverlap(dict(intersects=True)),
    geometry=shifted_geom
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
31 products were filtered out by the geometry filter.
[18]:
fmap = create_search_result_map(search_results, shifted_geom)
# Create a layer that represents the filtered products in green
folium.GeoJson(
    filtered_products,
    style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The products that do not intersect with the red area are correctly filtered out. Finally another new geometry is created to test the parameter contains.

[19]:
small_geom = Polygon([[3.2, 44.4], [3.7, 44.4], [3.7, 44.9], [3.2, 44.9], [3.2, 44.4]])
[20]:
filtered_products = search_results.crunch(
    FilterOverlap(dict(contains=True)),
    geometry=small_geom
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
49 products were filtered out by the geometry filter.
[21]:
fmap = create_search_result_map(search_results, small_geom)
# Create a layer that represents the filtered products in green
folium.GeoJson(
    filtered_products,
    style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[21]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The only product preserved is the one that contains the red area.

Filter by property#

FilterProperty evaluates a single property of all the products against a value (e.g. cloud cover less than 10). The dictionary it requires should contain:

  • A single property name from EOProduct.properties and its tested value, e.g. dict(cloudCover=10) or dict(storageStatus="ONLINE")

  • One (optional) operator among lt (<), le (<=), eq (==), ne (!=), ge (>=), gt (>). eq by default.

This cruncher can also be called directly using SearchResult.filter_property().

[22]:
from eodag.crunch import FilterProperty
[23]:
filtered_products = search_results.crunch(
    FilterProperty(dict(cloudCover=1, operator="lt"))
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the property filter.")
28 products were filtered out by the property filter.

List comprehensions over a collection of EO products are useful to quickly extract their properties, and here to check that the filter correctly filtered the products.

[24]:
all([p.properties["cloudCover"] < 1 for p in filtered_products])
[24]:
True

Filter for online products#

Sometimes you may want to avoid ordering OFFLINE products, and only download the one marked ONLINE.

You can already filter for online products using FilterProperty like this:

[25]:
filtered_products = search_results.crunch(
    FilterProperty(dict(storageStatus="ONLINE", operator="eq"))
)
print(f"{len(search_results) - len(filtered_products)} products are online.")
0 products are online.

While this code do the job, it is quite verbose. The better way is to use SearchResult.filter_online().

[26]:
filtered_products = search_results.filter_online()
print(f"{len(search_results) - len(filtered_products)} products are online.")
0 products are online.

Filter the latest products intersecting a geometry#

FilterLatestIntersect does the following:

  1. it sorts the products by date, from the newest to the oldest

  2. it filters out products that do not intersect with a requested geometry (a dictionary bounding box)

  3. it stops early if the requested geometry is 100% covered by the products, if not, it returns the result of 2.

This results in getting the most recent products that intersect (or completely cover) a given geometry.

This cruncher can also be called directly using SearchResult.filter_latest_intersect().

[27]:
from eodag.crunch import FilterLatestIntersect
[28]:
filtered_products = search_results.crunch(
    FilterLatestIntersect({}),
    geometry=original_search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the property filter.")
15 products were filtered out by the property filter.
[29]:
from shapely import geometry

fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the filtered products in green
folium.GeoJson(
    filtered_products,
    style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The map shows that the area is fully covered by products. The filtered products are indeed the most recent ones.

[30]:
[p.properties["startTimeFromAscendingNode"] for p in filtered_products][::10]
[30]:
['2021-03-30T10:30:21.024Z',
 '2021-03-28T10:36:29.024Z',
 '2021-03-28T10:36:29.024Z',
 '2021-03-26T10:50:31.024Z']