{
  "type": "Collection",
  "stac_version": "1.0.0",
  "id": "gbif-derived",
  "title": "GBIF Occurrence Data & Derived Products",
  "description": "Processed GBIF occurrence data hexed at H3 resolutions 0–10, plus derived products (taxonomic aggregations, redlining spatial joins). **The current release is 2026-06** (GBIF 2026-06-01 snapshot, 3.5 B occurrences); releases are versioned by year-month.\n\n## Per-species fast access (sidecar)\nThe hex is partitioned by h0 (geography), not species, so `WHERE specieskey=…` alone scans all 122 files. Use the `gbif-species-h0-index-2026-06` sidecar to prune to the species' partitions in one query:\n\n```sql\nSELECT h8, COUNT(*) FROM read_parquet('s3://public-gbif/2026-06/hex/h0=*/data_0.parquet', hive_partitioning=true)\nWHERE h0 IN (SELECT h0 FROM read_parquet('s3://public-gbif/2026-06/species-h0-index.parquet') WHERE specieskey = 2969135)\n  AND specieskey = 2969135 GROUP BY h8;\n```\n\nLicense: GBIF mediated data carries per-record licenses (CC0 / CC-BY / CC-BY-NC mix); the aggregate is treated **NonCommercial**.",
  "license": "various",
  "stac_extensions": [
    "https://stac-extensions.github.io/table/v1.2.0/schema.json",
    "https://stac-extensions.github.io/file/v2.1.0/schema.json"
  ],
  "keywords": [
    "biodiversity",
    "GBIF",
    "occurrences",
    "H3",
    "redlining",
    "taxonomy",
    "environmental justice"
  ],
  "providers": [
    {
      "name": "Global Biodiversity Information Facility",
      "roles": [
        "producer"
      ],
      "url": "https://www.gbif.org/"
    },
    {
      "name": "Boettiger Lab",
      "roles": [
        "processor",
        "host"
      ],
      "url": "https://github.com/boettiger-lab"
    }
  ],
  "extent": {
    "spatial": {
      "bbox": [
        [
          -180.0,
          -90.0,
          180.0,
          90.0
        ]
      ]
    },
    "temporal": {
      "interval": [
        [
          "1900-01-01T00:00:00Z",
          null
        ]
      ]
    }
  },
  "links": [
    {
      "rel": "root",
      "href": "https://s3-west.nrp-nautilus.io/public-data/stac/catalog.json",
      "type": "application/json"
    },
    {
      "rel": "parent",
      "href": "https://s3-west.nrp-nautilus.io/public-data/stac/catalog.json",
      "type": "application/json"
    },
    {
      "rel": "self",
      "href": "https://data.source.coop/cboettig/gbif/stac-collection.json",
      "type": "application/json"
    },
    {
      "rel": "describedby",
      "href": "https://data.source.coop/cboettig/gbif/README.md",
      "type": "text/markdown",
      "title": "Dataset Description"
    },
    {
      "rel": "source",
      "href": "https://www.gbif.org/",
      "type": "text/html",
      "title": "GBIF Homepage"
    },
    {
      "rel": "license",
      "href": "https://www.gbif.org/terms",
      "type": "text/html"
    }
  ],
  "assets": {
    "gbif-hex-2026-06": {
      "href": "https://data.source.coop/cboettig/gbif/2026-06/hex/h0=*/data_0.parquet",
      "type": "application/vnd.apache.parquet",
      "title": "GBIF Occurrences H3 Hex (2026-06, current)",
      "roles": [
        "h3-parquet",
        "data"
      ],
      "description": "GBIF occurrence records (2026-06-01 snapshot) indexed to H3, one row per occurrence, Hive-partitioned by h0 (122 partitions, one file each). Columns h0–h10 give the H3 cell at each resolution (native h10 = one ~15 m² cell per point; h0–h9 are parents). Filter on h-columns to scope spatially; for a single species use the companion species-h0-index sidecar to prune partitions (see collection description). 3,499,090,951 occurrences. NonCommercial in aggregate — see license.",
      "table:columns": [
        {
          "name": "gbifid",
          "type": "string",
          "description": "GBIF Occurrence ID"
        },
        {
          "name": "datasetkey",
          "type": "string",
          "description": "GBIF Dataset key"
        },
        {
          "name": "occurrenceid",
          "type": "string",
          "description": "Occurrence ID"
        },
        {
          "name": "kingdom",
          "type": "string",
          "description": "Taxonomic kingdom"
        },
        {
          "name": "phylum",
          "type": "string",
          "description": "Taxonomic phylum"
        },
        {
          "name": "class",
          "type": "string",
          "description": "Linnaean taxonomic class (e.g. Mammalia, Aves, Actinopterygii) — open scientific nomenclature from the GBIF/IUCN taxonomic backbone, not a fixed enumeration."
        },
        {
          "name": "order",
          "type": "string",
          "description": "Taxonomic order"
        },
        {
          "name": "family",
          "type": "string",
          "description": "Taxonomic family"
        },
        {
          "name": "genus",
          "type": "string",
          "description": "Taxonomic genus"
        },
        {
          "name": "species",
          "type": "string",
          "description": "Taxonomic species"
        },
        {
          "name": "scientificname",
          "type": "string",
          "description": "Scientific name"
        },
        {
          "name": "countrycode",
          "type": "string",
          "description": "ISO 3166-1 alpha-2 country code of the occurrence locality."
        },
        {
          "name": "stateprovince",
          "type": "string",
          "description": "State or province"
        },
        {
          "name": "occurrencestatus",
          "type": "string",
          "description": "Occurrence status"
        },
        {
          "name": "h0",
          "type": "int64",
          "description": "H3 resolution 0 cell (UBIGINT, partition key)"
        },
        {
          "name": "h1",
          "type": "uint64",
          "description": "H3 resolution 1 cell (UBIGINT)"
        },
        {
          "name": "h2",
          "type": "uint64",
          "description": "H3 resolution 2 cell (UBIGINT)"
        },
        {
          "name": "h3",
          "type": "uint64",
          "description": "H3 resolution 3 cell (UBIGINT)"
        },
        {
          "name": "h4",
          "type": "uint64",
          "description": "H3 resolution 4 cell (UBIGINT)"
        },
        {
          "name": "h5",
          "type": "uint64",
          "description": "H3 resolution 5 cell (UBIGINT)"
        },
        {
          "name": "h6",
          "type": "uint64",
          "description": "H3 resolution 6 cell (UBIGINT)"
        },
        {
          "name": "h7",
          "type": "uint64",
          "description": "H3 resolution 7 cell (UBIGINT)"
        },
        {
          "name": "h8",
          "type": "uint64",
          "description": "H3 resolution 8 cell (UBIGINT)"
        },
        {
          "name": "h9",
          "type": "uint64",
          "description": "H3 resolution 9 cell (UBIGINT)"
        },
        {
          "name": "h10",
          "type": "uint64",
          "description": "H3 resolution 10 cell (UBIGINT)"
        }
      ],
      "h3:native_resolution": 10,
      "h3:parent_resolutions": [
        9,
        8,
        7,
        6,
        5,
        4,
        3,
        2,
        1,
        0
      ]
    },
    "gbif-species-h0-index-2026-06": {
      "href": "https://data.source.coop/cboettig/gbif/2026-06/species-h0-index.parquet",
      "type": "application/vnd.apache.parquet",
      "title": "Species → h0 partition index (sidecar, 2026-06)",
      "roles": [
        "index"
      ],
      "description": "Tiny lookup of the distinct (specieskey, species, h0) triples in the hex. The hex is partitioned by geography (h0), not species, so a bare WHERE specieskey=… opens all 122 files. Join this sidecar in a subquery to drive DuckDB dynamic partition pruning — a median species reads ~1/122 files (~2s vs ~minutes). 4,098,172 rows / 1,398,487 species.",
      "table:columns": [
        {
          "name": "specieskey",
          "type": "int64",
          "description": "GBIF speciesKey (canonical species id). Joins to hex.specieskey."
        },
        {
          "name": "species",
          "type": "string",
          "description": "Scientific (binomial) species name. NULL for genus-or-coarser records (excluded from the sidecar; specieskey IS NOT NULL)."
        },
        {
          "name": "h0",
          "type": "int64",
          "description": "H3 res-0 cell = hive partition key of the hex. One row per (species, h0)."
        }
      ]
    },
    "taxonomic-aggregations": {
      "href": "https://data.source.coop/cboettig/gbif/2026-06/taxonomy/h0=*/data_0.parquet",
      "type": "application/vnd.apache.parquet",
      "title": "Taxonomic Aggregations per h0 (2026-06)",
      "roles": [
        "data"
      ],
      "description": "Aggregated counts of unique taxon + location combinations within H3 resolution 0 hexagons. Groups all taxonomic name columns with COUNT(*) occurrences. 122 h0 partitions (full global coverage). Release 2025-06.",
      "table:columns": [
        {
          "name": "kingdom",
          "type": "string",
          "description": "Taxonomic kingdom"
        },
        {
          "name": "phylum",
          "type": "string",
          "description": "Taxonomic phylum"
        },
        {
          "name": "class",
          "type": "string",
          "description": "Linnaean taxonomic class (e.g. Mammalia, Aves, Actinopterygii) — open scientific nomenclature from the GBIF/IUCN taxonomic backbone, not a fixed enumeration."
        },
        {
          "name": "order",
          "type": "string",
          "description": "Taxonomic order"
        },
        {
          "name": "family",
          "type": "string",
          "description": "Taxonomic family"
        },
        {
          "name": "genus",
          "type": "string",
          "description": "Taxonomic genus"
        },
        {
          "name": "species",
          "type": "string",
          "description": "Taxonomic species (binomial)"
        },
        {
          "name": "infraspecificepithet",
          "type": "string",
          "description": "Infraspecific epithet (subspecies, variety, etc.)"
        },
        {
          "name": "taxonrank",
          "type": "string",
          "description": "Taxon rank (SPECIES, GENUS, FAMILY, etc.)"
        },
        {
          "name": "scientificname",
          "type": "string",
          "description": "Full scientific name as provided"
        },
        {
          "name": "verbatimscientificname",
          "type": "string",
          "description": "Verbatim scientific name from source record"
        },
        {
          "name": "verbatimscientificnameauthorship",
          "type": "string",
          "description": "Authorship of the verbatim scientific name"
        },
        {
          "name": "n",
          "type": "integer",
          "description": "Count of occurrences matching this taxon in this h0 cell"
        },
        {
          "name": "h0",
          "type": "int64",
          "description": "H3 resolution 0 cell (UBIGINT, partition key)"
        }
      ]
    },
    "redlined-cities-gbif": {
      "href": "https://data.source.coop/cboettig/gbif/redlined_cities_gbif.parquet",
      "type": "application/vnd.apache.parquet",
      "title": "GBIF Occurrences in Redlined Cities",
      "roles": [
        "data"
      ],
      "description": "GBIF data joined with HOLC grades (Mapping Inequality)",
      "table:columns": [
        {
          "name": "gbifid",
          "type": "string",
          "description": "GBIF Occurrence ID"
        },
        {
          "name": "scientificname",
          "type": "string",
          "description": "Scientific Name"
        },
        {
          "name": "grade",
          "type": "string",
          "description": "HOLC Grade (A-D)"
        },
        {
          "name": "city",
          "type": "string",
          "description": "City Name"
        }
      ]
    },
    "taxa-list": {
      "href": "https://data.source.coop/cboettig/gbif/taxa.parquet",
      "type": "application/vnd.apache.parquet",
      "title": "Taxa List",
      "roles": [
        "metadata"
      ],
      "table:columns": [
        {
          "name": "kingdom",
          "type": "string",
          "description": "Linnaean taxonomic kingdom (GBIF backbone nomenclature)."
        },
        {
          "name": "phylum",
          "type": "string",
          "description": "Linnaean taxonomic phylum (GBIF backbone nomenclature)."
        },
        {
          "name": "class",
          "type": "string",
          "description": "Linnaean taxonomic class (e.g. Mammalia, Aves, Actinopterygii) — open scientific nomenclature from the GBIF/IUCN taxonomic backbone, not a fixed enumeration."
        },
        {
          "name": "order",
          "type": "string",
          "description": "Linnaean taxonomic order (GBIF backbone nomenclature)."
        },
        {
          "name": "family",
          "type": "string",
          "description": "Linnaean taxonomic family (GBIF backbone nomenclature)."
        },
        {
          "name": "genus",
          "type": "string",
          "description": "Linnaean taxonomic genus (GBIF backbone nomenclature)."
        },
        {
          "name": "species",
          "type": "string",
          "description": "Species name (genus + specific epithet)."
        },
        {
          "name": "infraspecificepithet",
          "type": "string",
          "description": "Infraspecific epithet (subspecies/variety), where applicable."
        },
        {
          "name": "taxonrank",
          "type": "string",
          "description": "GBIF taxon rank of the matched name (e.g. SPECIES, GENUS, SUBSPECIES)."
        },
        {
          "name": "scientificname",
          "type": "string",
          "description": "GBIF-backbone canonical scientific name."
        },
        {
          "name": "verbatimscientificname",
          "type": "string",
          "description": "Verbatim scientific name as supplied by the source occurrence record."
        },
        {
          "name": "verbatimscientificnameauthorship",
          "type": "string",
          "description": "Verbatim authorship string of the scientific name as supplied by the source."
        },
        {
          "name": "n",
          "type": "double",
          "description": "Number of GBIF occurrence records aggregated for this taxon."
        }
      ]
    }
  },
  "summaries": {
    "formats": [
      "h3-parquet"
    ]
  }
}
