{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Crunch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Crunching as defined in `eodag` is a way to filter the EO products contained in a [SearchResult](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult) object. Several filters are available and further described in this document." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A [SearchResult](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult) has a [crunch()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.crunch) method that requires a filter instance as an argument, itself initialized with a dictionary that contains the required parameters. According to the filter used, some more kwargs may need to be passed to [crunch()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.crunch). The filters return a `list` of [EOProduct](../../api_reference/eoproduct.rst#eodag.api.product._product.EOProduct)s." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Results obtained from a search of *Sentinel 2 Level-1C* products over France in March 2021 are loaded in a [SearchResult](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from eodag import EODataAccessGateway\n", "dag = EODataAccessGateway()\n", "search_results = dag.deserialize(\"data/crunch_search_results.geojson\")\n", "print(f\"This SearchResult stores {len(search_results)} products.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# This code cell has a special metadata entry: \"nbsphinx\": \"hidden\"\n", "# That hides it when the documentation is built with nbsphinx/sphinx.\n", "\n", "# Uncomment these lines to regenerate the GeoJSON file used in this notebook.\n", "\n", "#search_results, _ = dag.search(\n", "# productType=\"S2_MSI_L1C\",\n", "# start=\"2021-03-01\",\n", "# end=\"2021-03-31\",\n", "# geom={\"lonmin\": 1, \"latmin\": 45, \"lonmax\": 5, \"latmax\": 47},\n", "# items_per_page=50\n", "#)\n", "#dag.serialize(search_results, \"data/crunch_search_results.geojson\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The original search geometry is used throughout the notebook as long as with its representation as a a `shapely` object which is easier to map with `folium`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "original_search_geometry = {\"lonmin\": 1, \"latmin\": 45, \"lonmax\": 5, \"latmax\": 47}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shapely\n", "\n", "search_geometry = shapely.geometry.box(\n", " original_search_geometry[\"lonmin\"],\n", " original_search_geometry[\"latmin\"],\n", " original_search_geometry[\"lonmax\"],\n", " original_search_geometry[\"latmax\"],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To create interactive maps\n", "import folium\n", "\n", "def create_search_result_map(search_results, extent):\n", " \"\"\"Small utility to create an interactive map with folium\n", " that displays an extent in red and EO Producs in blue\"\"\"\n", " fmap = folium.Map([46, 3], zoom_start=6)\n", " folium.GeoJson(\n", " extent,\n", " style_function=lambda x: dict(color=\"red\")\n", " ).add_to(fmap)\n", " folium.GeoJson(\n", " search_results\n", " ).add_to(fmap)\n", " return fmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter by start and end date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[FilterDate](../../plugins_reference/generated/eodag.plugins.crunch.filter_date.FilterDate.rst#eodag.plugins.crunch.filter_date.FilterDate) allows to filter out products that are older than a start date (optional) or more recent than an end date (optional).\n", "\n", "> This cruncher can also be called directly using [SearchResult.filter_date()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.filter_date)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from eodag.crunch import FilterDate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterDate(dict(start=\"2021-03-25\", end=\"2021-03-29\"))\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the date filter.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter by geometry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[FilterOverlap](../../plugins_reference/generated/eodag.plugins.crunch.filter_overlap.FilterOverlap.rst#eodag.plugins.crunch.filter_overlap.FilterOverlap) allows to filter out products that:\n", "\n", "* whose overlap area with a geometry is less than a percentage of their area\n", "* are not *within* a geometry\n", "* do not *contain* a geometry\n", "* do not *intersect* with a geometry\n", "\n", "To execute a [FilterOverlap](../../plugins_reference/generated/eodag.plugins.crunch.filter_overlap.FilterOverlap.rst#eodag.plugins.crunch.filter_overlap.FilterOverlap), its instance must be created by passing a dictionary with either:\n", "\n", "* `minimum_overlap` set to a number between 0 and 100. `within`, `contains` and `intersects` cannot be used in that case.\n", "* **One** of `within`, `contains` and `intersects` (they are mutually exclusive) set to True. `minimum_overlap` cannot be used in that case.\n", "\n", "Additionally, a geometry (shapely geometry, bounding box as a dictionary or a list) must be passed through the `geometry` parameter.\n", "\n", "The examples below show how [FilterOverlap](../../plugins_reference/generated/eodag.plugins.crunch.filter_overlap.FilterOverlap.rst#eodag.plugins.crunch.filter_overlap.FilterOverlap) filter out products. The original products will be displayed in blue and the filtered products in green.\n", "\n", "> This cruncher can also be called directly using [SearchResult.filter_overlap()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.filter_overlap)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from eodag.crunch import FilterOverlap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the products are displayed on the next map. As it can be observed, they all intersect with the search geometry." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_search_result_map(search_results, search_geometry)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next two examples show how `minimum_overlap` affects the filter, with its value (i.e. percentage) set to 10 and 50%." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterOverlap(dict(minimum_overlap=10)),\n", " geometry=search_geometry\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fmap = create_search_result_map(search_results, search_geometry)\n", "# Create a layer that represents the search area in green\n", "folium.GeoJson(\n", " filtered_products,\n", " style_function=lambda x: dict(color=\"green\")\n", ").add_to(fmap)\n", "fmap" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterOverlap(dict(minimum_overlap=50)),\n", " geometry=search_geometry\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fmap = create_search_result_map(search_results, search_geometry)\n", "# Create a layer that represents the search area in green\n", "folium.GeoJson(\n", " filtered_products,\n", " style_function=lambda x: dict(color=\"green\")\n", ").add_to(fmap)\n", "fmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More and more products are filtered out when `minimum_overlap` increases. The next parameter given as an example is `within`, it is actually equivalent to setting `minimum_overlap` to 100." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterOverlap(dict(within=True)),\n", " geometry=search_geometry\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fmap = create_search_result_map(search_results, search_geometry)\n", "# Create a layer that represents the filtered products in green\n", "folium.GeoJson(\n", " filtered_products,\n", " style_function=lambda x: dict(color=\"green\")\n", ").add_to(fmap)\n", "fmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the products not withing the read area are correctly filtered out by [FilterOverlap](../../plugins_reference/generated/eodag.plugins.crunch.filter_overlap.FilterOverlap.rst#eodag.plugins.crunch.filter_overlap.FilterOverlap). A new geometry is created in order to test the next parameter `intersects`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from shapely.geometry import Polygon\n", "shifted_geom = Polygon([[4, 44], [9, 44], [9, 48], [4, 48], [4, 48]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterOverlap(dict(intersects=True)),\n", " geometry=shifted_geom\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fmap = create_search_result_map(search_results, shifted_geom)\n", "# Create a layer that represents the filtered products in green\n", "folium.GeoJson(\n", " filtered_products,\n", " style_function=lambda x: dict(color=\"green\")\n", ").add_to(fmap)\n", "fmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The products that do not intersect with the red area are correctly filtered out. Finally another new geometry is created to test the parameter `contains`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "small_geom = Polygon([[3.2, 44.4], [3.7, 44.4], [3.7, 44.9], [3.2, 44.9], [3.2, 44.4]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterOverlap(dict(contains=True)),\n", " geometry=small_geom\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the geometry filter.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fmap = create_search_result_map(search_results, small_geom)\n", "# Create a layer that represents the filtered products in green\n", "folium.GeoJson(\n", " filtered_products,\n", " style_function=lambda x: dict(color=\"green\")\n", ").add_to(fmap)\n", "fmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only product preserved is the one that contains the red area." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter by property" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[FilterProperty](../../plugins_reference/generated/eodag.plugins.crunch.filter_property.FilterProperty.rst#eodag.plugins.crunch.filter_property.FilterProperty) evaluates a single property of all the products against a value (e.g. cloud cover less than 10). The dictionary it requires should contain:\n", "\n", "* A single property name from [EOProduct](../../api_reference/eoproduct.rst#eodag.api.product._product.EOProduct)`.properties` and its tested value, e.g. `dict(cloudCover=10)` or `dict(storageStatus=\"ONLINE\")`\n", "* One (optional) operator among `lt` (<), `le` (<=), `eq` (==), `ne` (!=), `ge` (>=), `gt` (>). `eq` by default.\n", "\n", "> This cruncher can also be called directly using [SearchResult.filter_property()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.filter_property)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from eodag.crunch import FilterProperty" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterProperty(dict(cloudCover=1, operator=\"lt\"))\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the property filter.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "List comprehensions over a collection of EO products are useful to quickly extract their properties, and here to check that the filter correctly filtered the products." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all([p.properties[\"cloudCover\"] < 1 for p in filtered_products])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter for online products" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes you may want to avoid ordering OFFLINE products, and only download the one marked ONLINE.\n", "\n", "You can already filter for online products using [FilterProperty](../../plugins_reference/generated/eodag.plugins.crunch.filter_property.FilterProperty.rst#eodag.plugins.crunch.filter_property.FilterProperty) like this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterProperty(dict(storageStatus=\"ONLINE\", operator=\"eq\"))\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products are online.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this code do the job, it is quite verbose. The better way is to use [SearchResult.filter_online()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.filter_online)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.filter_online()\n", "print(f\"{len(search_results) - len(filtered_products)} products are online.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter the latest products intersecting a geometry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[FilterLatestIntersect](../../plugins_reference/generated/eodag.plugins.crunch.filter_latest_intersect.FilterLatestIntersect.rst#eodag.plugins.crunch.filter_latest_intersect.FilterLatestIntersect) does the following:\n", "\n", "1. it sorts the products by date, from the newest to the oldest\n", "2. it filters out products that do not intersect with a requested geometry (a dictionary bounding box)\n", "3. it stops early if the requested geometry is 100% covered by the products, if not, it returns the result of 2.\n", "\n", "This results in getting the most recent products that intersect (or completely cover) a given geometry.\n", "\n", "> This cruncher can also be called directly using [SearchResult.filter_latest_intersect()](../../api_reference/searchresult.rst#eodag.api.search_result.SearchResult.filter_latest_intersect)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from eodag.crunch import FilterLatestIntersect" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filtered_products = search_results.crunch(\n", " FilterLatestIntersect({}),\n", " geometry=original_search_geometry\n", ")\n", "print(f\"{len(search_results) - len(filtered_products)} products were filtered out by the property filter.\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from shapely import geometry\n", "\n", "fmap = create_search_result_map(search_results, search_geometry)\n", "# Create a layer that represents the filtered products in green\n", "folium.GeoJson(\n", " filtered_products,\n", " style_function=lambda x: dict(color=\"green\")\n", ").add_to(fmap)\n", "fmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The map shows that the area is fully covered by products. The filtered products are indeed the most recent ones." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[p.properties[\"startTimeFromAscendingNode\"] for p in filtered_products][::10]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.10 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "nbsphinx": { "execute": "always" }, "orig_nbformat": 2, "vscode": { "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" } } }, "nbformat": 4, "nbformat_minor": 2 }