Crunch#
Crunching as defined in eodag
is a way to filter the EO products contained in a SearchResult object. Several filters are available and further described in this document.
A SearchResult has a crunch() method that requires a filter instance as an argument, itself initialized with a dictionary that contains the required parameters. According to the filter used, some more kwargs may need to be passed to crunch(). The
filters return a list
of EOProducts.
Setup#
Results obtained from a search of Sentinel 2 Level-1C products over France in March 2021 are loaded in a SearchResult.
[1]:
from eodag import EODataAccessGateway
dag = EODataAccessGateway()
search_results = dag.deserialize("data/crunch_search_results.geojson")
print(f"This SearchResult stores {len(search_results)} products.")
This SearchResult stores 50 products.
The original search geometry is used throughout the notebook as long as with its representation as a a shapely
object which is easier to map with folium
.
[3]:
original_search_geometry = {"lonmin": 1, "latmin": 45, "lonmax": 5, "latmax": 47}
[4]:
import shapely
search_geometry = shapely.geometry.box(
original_search_geometry["lonmin"],
original_search_geometry["latmin"],
original_search_geometry["lonmax"],
original_search_geometry["latmax"],
)
[5]:
# To create interactive maps
import folium
def create_search_result_map(search_results, extent):
"""Small utility to create an interactive map with folium
that displays an extent in red and EO Producs in blue"""
fmap = folium.Map([46, 3], zoom_start=6)
folium.GeoJson(
extent,
style_function=lambda x: dict(color="red")
).add_to(fmap)
folium.GeoJson(
search_results
).add_to(fmap)
return fmap
Filter by start and end date#
FilterDate allows to filter out products that are older than a start date (optional) or more recent than an end date (optional).
This cruncher can also be called directly using SearchResult.filter_date().
[6]:
from eodag.crunch import FilterDate
[7]:
filtered_products = search_results.crunch(
FilterDate(dict(start="2021-03-25", end="2021-03-29"))
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the date filter.")
21 products were filtered out by the date filter.
Filter by geometry#
FilterOverlap allows to filter out products that:
whose overlap area with a geometry is less than a percentage of their area
are not within a geometry
do not contain a geometry
do not intersect with a geometry
To execute a FilterOverlap, its instance must be created by passing a dictionary with either:
minimum_overlap
set to a number between 0 and 100.within
,contains
andintersects
cannot be used in that case.One of
within
,contains
andintersects
(they are mutually exclusive) set to True.minimum_overlap
cannot be used in that case.
Additionally, a geometry (shapely geometry, bounding box as a dictionary or a list) must be passed through the geometry
parameter.
The examples below show how FilterOverlap filter out products. The original products will be displayed in blue and the filtered products in green.
This cruncher can also be called directly using SearchResult.filter_overlap().
[8]:
from eodag.crunch import FilterOverlap
All the products are displayed on the next map. As it can be observed, they all intersect with the search geometry.
[9]:
create_search_result_map(search_results, search_geometry)
[9]:
The next two examples show how minimum_overlap
affects the filter, with its value (i.e. percentage) set to 10 and 50%.
[10]:
filtered_products = search_results.crunch(
FilterOverlap(dict(minimum_overlap=10)),
geometry=search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
19 products were filtered out by the geometry filter.
[11]:
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the search area in green
folium.GeoJson(
filtered_products,
style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[11]:
[12]:
filtered_products = search_results.crunch(
FilterOverlap(dict(minimum_overlap=50)),
geometry=search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
35 products were filtered out by the geometry filter.
[13]:
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the search area in green
folium.GeoJson(
filtered_products,
style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[13]:
More and more products are filtered out when minimum_overlap
increases. The next parameter given as an example is within
, it is actually equivalent to setting minimum_overlap
to 100.
[14]:
filtered_products = search_results.crunch(
FilterOverlap(dict(within=True)),
geometry=search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
39 products were filtered out by the geometry filter.
[15]:
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the filtered products in green
folium.GeoJson(
filtered_products,
style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[15]:
All the products not withing the read area are correctly filtered out by FilterOverlap. A new geometry is created in order to test the next parameter intersects
.
[16]:
from shapely.geometry import Polygon
shifted_geom = Polygon([[4, 44], [9, 44], [9, 48], [4, 48], [4, 48]])
[17]:
filtered_products = search_results.crunch(
FilterOverlap(dict(intersects=True)),
geometry=shifted_geom
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
31 products were filtered out by the geometry filter.
[18]:
fmap = create_search_result_map(search_results, shifted_geom)
# Create a layer that represents the filtered products in green
folium.GeoJson(
filtered_products,
style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[18]:
The products that do not intersect with the red area are correctly filtered out. Finally another new geometry is created to test the parameter contains
.
[19]:
small_geom = Polygon([[3.2, 44.4], [3.7, 44.4], [3.7, 44.9], [3.2, 44.9], [3.2, 44.4]])
[20]:
filtered_products = search_results.crunch(
FilterOverlap(dict(contains=True)),
geometry=small_geom
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.")
49 products were filtered out by the geometry filter.
[21]:
fmap = create_search_result_map(search_results, small_geom)
# Create a layer that represents the filtered products in green
folium.GeoJson(
filtered_products,
style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[21]:
The only product preserved is the one that contains the red area.
Filter by property#
FilterProperty evaluates a single property of all the products against a value (e.g. cloud cover less than 10). The dictionary it requires should contain:
A single property name from EOProduct
.properties
and its tested value, e.g.dict(cloudCover=10)
ordict(storageStatus="ONLINE")
One (optional) operator among
lt
(<),le
(<=),eq
(==),ne
(!=),ge
(>=),gt
(>).eq
by default.
This cruncher can also be called directly using SearchResult.filter_property().
[22]:
from eodag.crunch import FilterProperty
[23]:
filtered_products = search_results.crunch(
FilterProperty(dict(cloudCover=1, operator="lt"))
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the property filter.")
28 products were filtered out by the property filter.
List comprehensions over a collection of EO products are useful to quickly extract their properties, and here to check that the filter correctly filtered the products.
[24]:
all([p.properties["cloudCover"] < 1 for p in filtered_products])
[24]:
True
Filter for online products#
Sometimes you may want to avoid ordering OFFLINE products, and only download the one marked ONLINE.
You can already filter for online products using FilterProperty like this:
[25]:
filtered_products = search_results.crunch(
FilterProperty(dict(storageStatus="ONLINE", operator="eq"))
)
print(f"{len(search_results) - len(filtered_products)} products are online.")
0 products are online.
While this code do the job, it is quite verbose. The better way is to use SearchResult.filter_online().
[26]:
filtered_products = search_results.filter_online()
print(f"{len(search_results) - len(filtered_products)} products are online.")
0 products are online.
Filter the latest products intersecting a geometry#
FilterLatestIntersect does the following:
it sorts the products by date, from the newest to the oldest
it filters out products that do not intersect with a requested geometry (a dictionary bounding box)
it stops early if the requested geometry is 100% covered by the products, if not, it returns the result of 2.
This results in getting the most recent products that intersect (or completely cover) a given geometry.
This cruncher can also be called directly using SearchResult.filter_latest_intersect().
[27]:
from eodag.crunch import FilterLatestIntersect
[28]:
filtered_products = search_results.crunch(
FilterLatestIntersect({}),
geometry=original_search_geometry
)
print(f"{len(search_results) - len(filtered_products)} products were filtered out by the property filter.")
15 products were filtered out by the property filter.
[29]:
from shapely import geometry
fmap = create_search_result_map(search_results, search_geometry)
# Create a layer that represents the filtered products in green
folium.GeoJson(
filtered_products,
style_function=lambda x: dict(color="green")
).add_to(fmap)
fmap
[29]:
The map shows that the area is fully covered by products. The filtered products are indeed the most recent ones.
[30]:
[p.properties["startTimeFromAscendingNode"] for p in filtered_products][::10]
[30]:
['2021-03-30T10:30:21.024Z',
'2021-03-28T10:36:29.024Z',
'2021-03-28T10:36:29.024Z',
'2021-03-26T10:50:31.024Z']