Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Species Habitat Dataset for Faceted Map Examples #683

Open
5 tasks
dsmedia opened this issue Feb 11, 2025 · 2 comments · May be fixed by #684
Open
5 tasks

Add Species Habitat Dataset for Faceted Map Examples #683

dsmedia opened this issue Feb 11, 2025 · 2 comments · May be fixed by #684
Assignees
Labels
dataset-content Pull requests that add, modify, or update dataset contents

Comments

@dsmedia
Copy link
Collaborator

dsmedia commented Feb 11, 2025

This proposal is for adding both a new dataset and its generation script to vega-datasets to enable categorical faceted map examples. This addresses a gap identified in vega/altair#1711 where we currently lack examples suitable for categorical faceting across geographic regions (unlike existing examples of temporal/ordinal faceting like income by state groups).

Dataset Description

The dataset will contain county-level habitat distribution data derived from USGS Gap Analysis Project (GAP) species habitat maps:

  • Format: CSV file (consider JSON or alternative formats?)
  • Content: County-level habitat percentages for multiple species
  • Size: ~3000 counties × 4 species
  • Compatibility: Works with existing US county topology file us-10m.json in vega-datasets
  • Fields: county_id, species, pct (percentage of suitable habitat)

Sample data structure:

county_id,species,pct
53000,robin,0.138
53073,robin,0.710
30105,robin,0.160

To do:

  • Create generation script - Uses exactextract for efficient processing (~20 minutes total) thank you @mattijn + @dangotbanned
  • Generate sample dataset
  • Share visualizations with different species combinations to find the preferred mix

Follow-ups

  • add example to altair (and vega-lite?) gallery

Benefits

  1. Fills gap in example datasets for categorical faceted maps
  2. Uses public domain data with clear categorical groups
  3. Works with existing topology data

Tracking

(Roadmap issue) that may be addressed by fix: facet fails on geoshape vega-lite#9292

@dsmedia dsmedia self-assigned this Feb 11, 2025
@domoritz
Copy link
Member

Great. If it's for Altair only, an option could also be to use parquet. Otherwise CSV or Arrow makes sense.

@dsmedia dsmedia linked a pull request Feb 11, 2025 that will close this issue
16 tasks
@dangotbanned
Copy link
Member

Great. If it's for Altair only, an option could also be to use parquet. Otherwise CSV or Arrow makes sense.

I've updated the description to get the tracking working for (vega/vega-lite#3729).

@domoritz I know that you're quite busy with (vega/vega#3990), so I'll just provide a summary of what I understand.
Maybe someone else with the right domain knowledge can help get that PR over the line.

@mattijn's example in (vega/altair#1711 (comment)) would only work for altair.
It works around a known bug with facet and {"mark": "geoshape"} via a loop & concat combo.

The potential fix changes only a single line in vega-lite/src/compile/projection/assemble.ts

The blame shows this code was last modified in (vega/vega-lite#4843) by @jheer (6 years ago).
However it appears that line has been there since the file was introduced in (vega/vega-lite#2734) by @willium (7 years ago).

@dsmedia dsmedia added the dataset-content Pull requests that add, modify, or update dataset contents label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset-content Pull requests that add, modify, or update dataset contents
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants