This repository contains embeddings of  [Sentinel-2 on AWS](https://registry.opendata.aws/sentinel-2/) data embedded with [Clay v1.5](https://huggingface.co/made-with-clay/Clay).


Source: This data has been created with comptute support by [AWS](https://aws.amazon.com/).

Contact: For feedback and questions, please file a ticket on the [model repo of Clay](https://github.com/Clay-foundation/model/issues) or email us at contact@madewithclay.org


Data Source: [Sentinel-2 on AWS](https://registry.opendata.aws/sentinel-2/): 
* We are doing AoI updates based on user feedback. As of Dec'24 it includes embeddings of Suriname, Brazil, Andhra Pradesh, USA, in a combination always starting in 2024 backwards in time, sometimes up to 2018. We plan to provide comprehensive global coverage in 2025. 

Model source: Embeddings generated from inference with Clay v1.5: 
* Embeddings have `1024` dimensions and correspond to the "class" embedding that is used alongside the patch embeddings at the end of the encoder. 
* Each tile is split into tiles of size 256x256, and the attention patch size inside Clay v1.5 is 8x8 px. 
* Inference run was done starting in December '24 on AWS using g4 and g6 EC2, at roughly ~20 embeddings/second, or ~100k embeddings/$ (highly variable). 

Embeddings License: Clay [CC-By](https://creativecommons.org/licenses/by/4.0/)

Format:
* Folder structure follows the same as Sentinel-2 folder structure.
* File format is parquet, with two columns: `geometry` and `embeddings`


Usage example

```python
import duckdb
path = "https://data.source.coop/clay/<PATH>.parquet"
d = duckdb.read_parquet(path)
df = d.to_df()
df.head()
```

