with ease-of-use
Map your urban dataset with the street network of your choice, urban pipeline your analysis, and propagate it along the urban research network!
Important
- We highly recommend to explore the
/example
folder for Jupyter Notebook-based tutorials 🎉 - The following library is under active-development and is not yet stable. Expect bugs & frequent changes!
OSMNxMapping
–– f(.)
–– brings road networks from OpenStreetMap –– X
–– and your urban datasets –– Y
–– together
through the function f(X, Y) = X ⋈ Y, allowing you to map these components in any direction—whether attaching the
streets based on your dataset’s latitude
and longitude
coordinates, or computing insights from your datasets
to
attach to the street network.
OSMNxMapping
, built with a Scikit-Learn-like philosophy – i.e., (I) from loading
to viz.
passing by mapping
,
we want to cover as much as users’ wishes in a welcoming way without having to code 20+/50+ lines of code for one,
non-reproducible, non-shareable, non-updatable piece of code; and (II) the library’s flexibility allows for easy
contributions to sub-modules without having to start from scratch “all the time”.
👀Read me! Click here ⬅️
To answer (I) –– one out many other ways –– we propose a scikit-like
pipeline to, for instance, stack the following
steps:
- Query a user-defined road network via the use of the great
OSMNx
–– Network module; - Load your geospatial data (
CSV
,Parquet
, orshapefiles
) using its Loader module; - Wrangle the loaded data with optional imputation and filtering to handle missing coordinates or irrelevant regions –– Preprocessing module;
- Map data to street nodes, enrich the network (e.g., averaging building floors per street or counting taxi pickups per street segments) – no big deal, a factory makes it “easy” to do so –– Enricher module;
- In order to visualise results statically or interactively –– Visual module;
- Optional but save your analysis for later use or sharing with other urban experts.
Though, to answer (II) using the right state-of-the-art open-source initiatives and tools, highly type safe and tested and documented library is a must. We already are fully highly-typed thanks to BearType, yet we aim at reaching a decent test coverage and documentation to make the library more robust and user-friendly.
Who knows— we'd like you to deal with what matters to you; e.g., if you are a machine learning enthusiast, you can apply machine learning to the enriched networks, if you are a researcher, you can easily map your data to street networks and get insights from them. Nonetheless, if you want to contribute to the library, you can easily do so by adding new modules or extending the existing ones, and we are happy in advance to welcome you doing so! 🥐
We embrace a DRY (Do Not Repeat Yourself) philosophy—focusing on what matters and letting us handle the mapping
intricacies. Of course, I mentioned the pipeline
, but each of the steps mentioned works independently to each other
🙃!
See further notebook-based examples in the examples/
directory. 📓
We highly recommend using uv
for installation from source to avoid the hassle of Conda
or other package managers.
It is also the fastest known to date on the OSS market and manages dependencies seamlessly without manual environment
activation (Biggest flex!). If you do not want to use uv
, there are no issues, but we will cover it in the upcoming
documentation – not as follows.
First, ensure uv
is installed on your machine by
following these instructions.
- Install
uv
as described above. - Clone
Auctus
(required for alpha development) into the same parent directory asOSMNxMapping
. Use:This step ensuresgit clone [email protected]:VIDA-NYU/auctus_search.git
pyproject.toml
buildsauctus_search
from source during installation, though we plan forauctus_search
to become a PyPi package (uv add auctus_search
orpip install auctus_search
) in future releases.
Note
Future versions will simplify this process: auctus_search
will move to PyPi, removing the need for manual cloning,
and Jupyter extensions will auto-install via pyproject.toml
configuration.
- Clone the
OSMNxMapping
repository:git clone https://github.com/yourusername/OSMNxMapping.git cd OSMNxMapping
- Lock and sync dependencies with
uv
:uv lock uv sync
- (Recommended) Install Jupyter extensions for interactive visualisations requiring Jupyter widgets:
uv run jupyter labextension install @jupyter-widgets/jupyterlab-manager
- Launch Jupyter Lab to explore
OSMNxMapping
(Way faster than running Jupyter withoutuv
):uv run --with jupyter jupyter lab
Voila 🥐 ! You’re all set to explore OSMNxMapping
in Jupyter Lab.
Below are two approaches to get you started with the OSMNxMapping
library in a Jupyter notebook. These examples are also available in the examples/
directory as 1-OSMNX_MAPPING_with_Auctus_basics.ipynb
(for the step-by-step approach) and 5-Advanced_Urban_Pipeline_Save_and_Load.ipynb
(for the pipeline approach).
🐥Fine-Grained Step-by-Step
This detailed approach walks you through each step of mapping urban data to a street network using PLUTO (Primary Land Use Tax Lot Output) buildings in New York City as an example. It’s perfect for understanding the full process.
import osmnx_mapping as oxm
pluto_buildings = oxm.OSMNxMapping() # Here, PLUTO buildings represent an urban analysis study of The Primary Land Use Tax Lot Output in New York City, USA. Note that nothing is loaded or queried yet—everything is to be done.
Note: You can always load your dataset manually—see the /examples
folder for details. Here, we use Auctus
to search for datasets related to "PLUTO".
collection = pluto_buildings.search_datasets(search_query="PLUTO", display_initial_results=True)
# Search for datasets related to "PLUTO". The `search_datasets` method queries the Auctus API and returns a
# `DatasetCollection`. Setting `display_initial_results=True` shows the initial results interactively in the notebook,
# allowing you to see available datasets right away.
# More parameters like page and size for pagination are available—check the Auctus Search / OSMNxMapping API for details.
dataset = pluto_buildings.load_dataset_from_auctus()
# After selecting a dataset in the previous step, this loads it into memory as a `pandas.DataFrame` (or
# `geopandas.GeoDataFrame` if spatial). By default, it displays an interactive table preview of the dataset.
Note: load_from_dataframe
doesn’t reload the data entirely—it transposes it into a format OSMNxMapping understands.
loaded_data = pluto_buildings.loader.load_from_dataframe(
input_dataframe=dataset,
latitude_column="latitude", # Assuming the dataset has a column named "latitude" for latitude values
longitude_column="longitude" # Assuming the dataset has a column named "longitude" for longitude values
)
pluto_buildings.table_vis.interactive_display(loaded_data)
graph, nodes, edges = pluto_buildings.network.network_from_place("Manhattan, New York City, USA", render=True) # render=True shows the plain network.
By default, this creates a new column in loaded_data
with the node ID to which each record (e.g., a building) is closest—key for enrichment.
loaded_data = pluto_buildings.network.map_nearest_street(
data=loaded_data,
longitude_column="longitude",
latitude_column="latitude"
)
First, we impute missing values in the latitude
and longitude
columns using SimpleGeoImputer
, which naively drops rows with missing values. For advanced methods, see the PreprocessingMixin
API.
loaded_data = (
pluto_buildings.preprocessing
.with_default_imputer(latitude_column_name="latitude", longitude_column_name="longitude")
.transform(input_data=loaded_data)
)
Second, we filter data to keep only points within the road network’s bounding box using BoundingBoxFilter
. See the PreprocessingMixin
API for other filters.
loaded_data = (
pluto_buildings.preprocessing
.with_default_filter(nodes=nodes)
.transform(input_data=loaded_data)
)
We enrich the network by calculating the average number of floors (numfloors
) per street segment using CreateEnricher
.
pluto_buildings_enricher = (
CreateEnricher()
.with_data(group_by="nearest_node", values_from="numfloors")
.aggregate_with(method="mean", output_column="avg_numfloors")
)
# Preview the enricher configuration (optional)
print(pluto_buildings_enricher.preview())
# Apply the enricher
enriched_data, graph, nodes, edges = pluto_buildings.enricher.enrich_network(
input_data=loaded_data,
input_graph=graph,
input_nodes=nodes,
input_edges=edges
)
We visualise the enriched network with StaticVisualiser
(default) for a Matplotlib plot.
viz = pluto_buildings.visual.visualise(graph, edges, "avg_numfloors")
viz
Or use InteractiveVisualiser
for an interactive Folium map.
from osmnx_mapping import InteractiveVisualiser
viz = pluto_buildings.visual(visualiser=InteractiveVisualiser()).visualise(graph, edges, "avg_numfloors")
viz
💨 Urban Pipeline: ~10 Lines of Code!
For a faster, more concise, and reproducible approach, use the UrbanPipeline
class to chain all steps into a single workflow. Here’s an example with local PLUTO data (pluto.csv
), as Auctus is not available in a pipeline you may reckon why!
import osmnx_mapping as oxm
from osmnx_mapping.modules.network import OSMNxNetwork
from osmnx_mapping.modules.loader import CSVLoader
from osmnx_mapping.modules.preprocessing import CreatePreprocessor
from osmnx_mapping.modules.enricher import CreateEnricher
from osmnx_mapping.modules.visualiser import InteractiveVisualiser
from osmnx_mapping.pipeline import UrbanPipeline
# Define the pipeline with all steps
pipeline = UrbanPipeline([
("network", OSMNxNetwork(place_name="Manhattan, NYC", network_type="drive")),
("load", CSVLoader(file_path="./pluto.csv")),
("impute", CreatePreprocessor().with_default_imputer().build()), # yes latitude and longitude based columns are passed during the compose_transform, like X, and Y during a Sklearn pipeline, if modified are passed throughout the steps.
("filter", CreatePreprocessor().with_default_filter().build()), # yes nodes are passed during the compose_transform, like X, and Y during a Sklearn pipeline, if modified are passed throughout the steps.
("enrich", CreateEnricher()
.with_data(group_by="nearest_node", values_from="numfloors")
.aggregate_with(method="mean", output_column="avg_numfloors")
.build()),
("viz", InteractiveVisualiser())
])
# Execute the pipeline and visualise the result
data, graph, nodes, edges = pipeline.compose_transform("latitude", "longitude")
viz = pipeline.visualise("avg_numfloors", colormap="Greens", tile_provider="CartoDB positron")
viz
# Save the pipeline for reuse
# pipeline.save("pluto_pipeline.joblib")
- Network: Queries Manhattan’s road network.
- Load: Loads
pluto.csv
locally. - Impute/Filter: Cleans and bounds the data.
- Enrich: Averages floors per street segment.
- Visualise: Shows an interactive Folium map.
- Save: Stores the pipeline for reuse.
This ~10-line pipeline replaces the detailed steps above, offering efficiency and reproducibility. Load it later with UrbanPipeline.load("pluto_pipeline.joblib")
and visualise again!
Note: Adjust the file path and column names (
latitude
,longitude
,numfloors
) to match your local dataset.
Voila! 🥐 Whether you prefer the fine-grained control of the step-by-step approach or the concise reproducible urban pipeline, you’ve successfully mapped urban data to a street network, enriched it, and visualised the results. 🎉
Note
More advanced usage is possible—explore the API and examples/
directory for details!
Note
For more about future works, explore the issues
tab above!
- From labs to more general communities, we want to advance
OSMNxMapping
by attaining large unit-test coverage, integrating routines viaG.Actions
, and producing thorough documentation for users all around. - We are also looking at building a function f(X, set(Ys)) that could introduce a
MultiAggregatorEnricher
to handle multiple datasets –– yes, at the same time –– necessitating a rethink of visualisation approaches—brainstorming is underway. - Finally, we’re pondering
whether
X
, currently OSMNx street networks, could evolve to other urban networks, questioning if alternatives exist or if we might redefine networks beyond roads, with these discussions still in progress.
We'd be welcome to see more loader
, geo imputer
and geo filter
primitives to be pull requested, as well as
enricher
and visualiser
primitives to be extended. We are also looking forward to seeing more examples in the
examples/
directory, and we are happy to welcome you to contribute to the library 🎄
Important
The following project is fully python-typed safe and uses the great @beartype! It should reduce side effects and better library usability on the user end side.
Users familiar with data pipelines
will find the modular, scikit-learn-inspired
design of the OSMNxMapping
library
clear-cut. For others, believe us it is the way to go!
We offer a set of mixins that simplify difficult chores including data loading
, road network building
,
preprocessing
, enrichment
, and visualising
of enriched graph data. Your main interface is these mixins, which
neatly wrap the underlying modules for a flawless performance.
LoaderMixin – Load Your Urban Data
The LoaderMixin
handles loading geospatial data from files or DataFrames, converting it into a GeoDataFrame
for
further analysis.
[!NOTE]
Only .csv, .parquet, and shapefiles are supported for now. If you need additional formats, please let us know! Or pssst! You can contribute to the library by adding new loader primitive to theloader
module.
-
load_from_file(file_path, latitude_column="", longitude_column="")
- Purpose: Loads data from a file (CSV, Parquet, or Shapefile) into a
GeoDataFrame
. - Parameters:
file_path
(str): Path to the file.latitude_column
(str, optional): Name of the latitude column.longitude_column
(str, optional): Name of the longitude column.
- Returns: A
geopandas.GeoDataFrame
. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # The loader module handles csv, parquet, and shapefiles as a factory that means, no need for you to worry about # the file format. data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon")
- Purpose: Loads data from a file (CSV, Parquet, or Shapefile) into a
-
load_from_dataframe(input_data, latitude_column, longitude_column)
-
Purpose: Converts a DataFrame to a
GeoDataFrame
using specified lat/lon columns. -
Parameters:
input_data
(pandas.DataFrame or geopandas.GeoDataFrame): The input data.latitude_column
(str): Latitude column name.longitude_column
(str): Longitude column name.
-
Returns: A
geopandas.GeoDataFrame
. -
Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() import pandas as pd df = pd.DataFrame({"lat": [40.7128], "lon": [-74.0060]}) geo_data = mapping.loader.load_from_dataframe(df, "lat", "lon")
Another example is if you are using Auctus loaded selected dataset:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming you have loaded a dataset from Auctus into `new_data` geo_data = mapping.loader.load_from_dataframe(new_data, "lat", "lon")
-
NetworkMixin – Build and Map Road Networks
The NetworkMixin
lets you query road networks from OpenStreetMap and map data points to the nearest street nodes.
-
network_from_place(place_name, network_type="drive", render=False)
- Purpose: Queries a road network for a specified place.
- Parameters:
place_name
(str): Location (e.g., "Manhattan, New York City, USA").network_type
(str, default="drive"): Type of network ("drive", "walk", "bike").render
(bool, default=False): If True, displays a plot of the network.
- Returns: A tuple (
networkx.MultiDiGraph
,geopandas.GeoDataFrame
,geopandas.GeoDataFrame
) of the graph, nodes, and edges. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA")
-
**
map_nearest_street(data, longitude_column, latitude_column, output_column="nearest_node", reset_output_column=False, **kwargs)
**- Purpose: Maps data points to the nearest street nodes in the network.
- Parameters:
data
(geopandas.GeoDataFrame): Input data with lat/lon.longitude_column
(str): Longitude column name.latitude_column
(str): Latitude column name.output_column
(str, default="nearest_node"): Column to store node IDs.reset_output_column
(bool, default=False): Overwrite existing output column.**kwargs
: Additional parameters for OSMnx’snearest_nodes
.
- Returns: A
geopandas.GeoDataFrame
with mapped nodes. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming data is a GeoDataFrame from previous steps (e.g., LoaderMixin) mapped_data = mapping.network.map_nearest_street(data, "lon", "lat")
PreprocessingMixin – Clean and Filter Data
The PreprocessingMixin
offers tools to handle missing values and filter data geographically.
[!IMPORTANT]
You cannot stack a filter with an imputer (or vice versa) in a singlePreprocessingMixin
instance. Each instance can only perform one action—either imputing or filtering. If you want to stack operations (e.g., impute then filter, or filter then impute), simply use the pipeline and create two steps—it’s as easy as that! See the UrbanPipelineMixin section for more details on chaining steps.
[!TIP]
Available imputers:
SimpleGeoImputer
: "Naively" drops rows with missing latitude or longitude values.AddressGeoImputer
: Fills missing lat/lon by geocoding an address column if available (requiresaddress_column_name
).
> Available filter:BoundingBoxFilter
: Keeps only data points within the bounding box of the road network’s nodes (requiresnodes
).
-
with_imputer(imputer_type, latitude_column_name=None, longitude_column_name=None, **extra_params)
- Purpose: Configures an imputer to handle missing lat/lon values.
- Parameters:
imputer_type
(str): Imputer type (e.g., "SimpleGeoImputer", "AddressGeoImputer").latitude_column_name
(str, optional): Latitude column name. If omitted and used within a pipeline, it will be set by the pipeline’scompose
method.longitude_column_name
(str, optional): Longitude column name. If omitted and used within a pipeline, it will be set by the pipeline’scompose
method.**extra_params
: Additional parameters (e.g.,address_column_name
for "AddressGeoImputer").
- Returns: The mixin instance for chaining.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() mapping.preprocessing.with_imputer("SimpleGeoImputer", "lat", "lon")
-
with_default_imputer(latitude_column_name=None, longitude_column_name=None)
- Purpose: Uses a default imputer that drops rows with missing lat/lon.
- Parameters: Same as above, without
imputer_type
. - Returns: The mixin instance.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() mapping.preprocessing.with_default_imputer("lat", "lon")
-
with_filter(filter_type, **extra_params)
- Purpose: Configures a filter (e.g., "BoundingBoxFilter").
- Parameters:
filter_type
(str): Filter type.**extra_params
: Filter-specific parameters (e.g.,nodes
for bounding box).
- Returns: The mixin instance.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming nodes is from network_from_place graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.preprocessing.with_filter("BoundingBoxFilter", nodes=nodes)
-
with_default_filter(nodes)
- Purpose: Uses a default filter to keep data within the road network’s bounding box.
- Parameters:
nodes
(geopandas.GeoDataFrame): Nodes from the road network defining the bounding box.
- Returns: The mixin instance.
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() # Assuming nodes is from network_from_place graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.preprocessing.with_default_filter(nodes)
-
transform(input_data)
- Purpose: Applies the configured imputer or filter to the data.
- Parameters:
input_data
(geopandas.GeoDataFrame): Data to preprocess.
- Returns: A preprocessed
geopandas.GeoDataFrame
. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") mapping.preprocessing.with_default_imputer("lat", "lon") cleaned_data = mapping.preprocessing.transform(data)
EnricherMixin – Enrich Your Network with Data
The EnricherMixin
is the core component of the library, empowering you to aggregate urban data (e.g., traffic counts,
building heights) and map it onto a road network's edges. It's designed for flexibility with advanced customization
through the CreateEnricher
factory, while also offering a simpler default setup for standard use cases.
[!NOTE]
How the Enricher Works:
The enricher processes data in two key steps:
- Aggregation: It groups your data by a specified column that connects with the graph (e.g.,
nearest_node
followingmap_nearest_street(.)
) and applies an aggregation method likemean
,sum
, orcount
to compute values for each group. For example, it could sum traffic volumes per node.- Edge Mapping: These aggregated values are then assigned to the network's edges (streets) using a method like
average
,sum
,max
, ormin
, based on the values at the edge's connected nodes.
> This process transforms raw data into meaningful insights mapped onto the road network, making it ideal for urban analysis tasks like traffic studies or accident mapping.
The CreateEnricher
factory (an alias for EnricherFactory
) is the primary and recommended way to configure enrichers.
It offers a flexible, step-by-step approach to define how data is aggregated and mapped to the network, giving you full
control over the enrichment process.
-
Key Methods:
with_data(group_by, values_from=None)
:- Purpose: Specifies the column to group data by (e.g.,
"nearest_node"
) and, optionally, the column containing values to aggregate (e.g.,"traffic"
). - Example:
enricher_factory = CreateEnricher().with_data(group_by="nearest_node", values_from="traffic")
- Purpose: Specifies the column to group data by (e.g.,
aggregate_with(method, edge_method='average', output_column=None)
:- Purpose: Configures the aggregation method (e.g.,
"sum"
,"mean"
) and how aggregated values are mapped to edges. - Parameters:
method
(str): Aggregation method (e.g.,"mean"
,"sum"
,"median"
,"min"
,"max"
).edge_method
(str, optional, default="average"): Method to compute edge values (e.g.,"average"
,"sum"
,"max"
,"min"
).output_column
(str, optional): Name of the output column in the edges GeoDataFrame.
- Example:
enricher_factory = enricher_factory.aggregate_with(method="sum", edge_method="average", output_column="total_traffic")
- Purpose: Configures the aggregation method (e.g.,
count_by(edge_method='sum', output_column=None)
:- Purpose: Configures a counting aggregation (e.g., counting accidents per node), without needing a
values_from
column. - Parameters:
edge_method
(str, optional, default="sum"): Method to map counts to edges.output_column
(str, optional): Name of the output column.
- Example:
enricher_factory = CreateEnricher().with_data(group_by="nearest_node").count_by(edge_method="sum", output_column="accident_count")
- Purpose: Configures a counting aggregation (e.g., counting accidents per node), without needing a
using_enricher(enricher_type)
:- Purpose: Selects a specific enricher type (currently, only
"SingleAggregatorEnricher"
is available). - Example:
enricher_factory = enricher_factory.using_enricher("SingleAggregatorEnricher")
- Purpose: Selects a specific enricher type (currently, only
preview(format="ascii")
:- Purpose: Displays a summary of the current configuration, helping you verify settings before building the enricher.
- Example:
print(enricher_factory.preview())
build()
:- Purpose: Constructs and returns the configured
EnricherBase
instance. - Example:
enricher = enricher_factory.build()
- Purpose: Constructs and returns the configured
-
Example (Full Configuration):
from osmnx_mapping.modules.enricher import CreateEnricher enricher = (CreateEnricher() .with_data(group_by="nearest_node", values_from="traffic") .aggregate_with(method="sum", edge_method="average", output_column="total_traffic") .build())
[!TIP]
- Use
CreateEnricher
when you need full control over the enrichment process, such as experimenting with different aggregation methods or counting occurrences without a value column.- Call
preview()
beforebuild()
to verify your configuration and catch potential errors early.
If you do not need advanced customisation and prefer a quick setup with sensible defaults, the with_default
method in
EnricherMixin
provides a convenient shortcut. It internally uses CreateEnricher
with predefined settings, making it
ideal for standard use cases.
- **
with_default(group_by_column, values_from_column, output_column="aggregated_value", method="mean", edge_method="average")
**- Purpose: Quickly configures a default enricher using
CreateEnricher
with predefined settings. - Parameters:
group_by_column
(str): Column to group by (e.g.,"nearest_node"
).values_from_column
(str): Column to aggregate (e.g.,"traffic"
).output_column
(str, optional): Name of the output column (default:"aggregated_value"
).method
(str, optional): Aggregation method (default:"mean"
).edge_method
(str, optional): Edge mapping method (default:"average"
).
- Returns: The
EnricherMixin
instance for method chaining. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() mapping.enricher.with_default("nearest_node", "traffic", method="sum", edge_method="average")
- Purpose: Quickly configures a default enricher using
[!TIP]
- Use
with_default
for standard use cases where you want a quick setup with minimal configuration.- If you need more control, switch to
CreateEnricher
for advanced customisation.
Once configured (using either CreateEnricher
or with_default
), the enricher can be applied to the network using the
enrich_network
method.
enrich_network(input_data, input_graph, input_nodes, input_edges, **kwargs)
- Purpose: Applies the configured enricher to the road network, enriching edges with aggregated data.
- Parameters:
input_data
(geopandas.GeoDataFrame): Dataset to enrich with.input_graph
(networkx.MultiDiGraph): Road network graph.input_nodes
(geopandas.GeoDataFrame): Network nodes.input_edges
(geopandas.GeoDataFrame): Network edges.**kwargs
: Additional options for custom enrichers.
- Returns: A tuple (
GeoDataFrame
,MultiDiGraph
,GeoDataFrame
,GeoDataFrame
) of enriched data, graph, nodes, and edges. - Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.enricher.with_default("nearest_node", "traffic", method="sum", edge_method="average") enriched_data, graph, nodes, edges = mapping.enricher.enrich_network(data, graph, nodes, edges)
[!TIP]
- Counting Occurrences: Use
count_by
inCreateEnricher
to count events (e.g., accidents) per group without needing avalues_from
column.- Choosing Between Approaches: Start with
with_default
for simplicity, but switch toCreateEnricher
if you need advanced customisation or encounter limitations.
VisualMixin – Visualise Your Results
The VisualMixin
provides tools to visualise your enriched network. By default, it uses StaticVisualiser
for static
Matplotlib plots, but you can pass any VisualiserBase
subclass (e.g., InteractiveVisualiser
for interactive Folium
maps) to the constructor for custom visualisations.
[!TIP]
Available visualisers:
StaticVisualiser
: Generates a static Matplotlib plot of the network (default).InteractiveVisualiser
: Creates an interactive Folium map for exploration in a browser.
visualise(graph, edges, result_columns, **kwargs)
- Purpose: Creates a visualisation of the enriched network using the configured visualiser.
- Parameters:
graph
(networkx.MultiDiGraph): The network graph.edges
(geopandas.GeoDataFrame): Enriched edges.result_columns
(str or list of str): Column(s) to visualise. For static visualisers (e.g.,StaticVisualiser
), provide a single string (e.g.,"aggregated_value"
). For interactive visualisers (e.g.,InteractiveVisualiser
), provide a list of strings (e.g.,["column1", "column2"]
) to enable multi-layer visualisation with a dropdown selection.**kwargs
: Visualisation parameters (e.g.,colormap="Blues"
forStaticVisualiser
, ortile_provider="CartoDB positron"
forInteractiveVisualiser
).
- Returns: A Matplotlib figure (for
StaticVisualiser
) or Folium map (forInteractiveVisualiser
), depending on the visualiser. - Example (Static Visualiser):
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.enricher.with_default("nearest_node", "traffic", method="sum") enriched_data, graph, nodes, edges = mapping.enricher.enrich_network(data, graph, nodes, edges) fig = mapping.visual.visualise(graph, edges, "aggregated_value", colormap="Blues")
- Example (Interactive visualiser):
import osmnx_mapping as oxm from osmnx_mapping.modules.visualiser.visualisers.interactive_visualiser import InteractiveVisualiser mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") graph, nodes, edges = mapping.network.network_from_place("Manhattan, New York City, USA") mapping.enricher.with_default("nearest_node", "traffic", method="sum") enriched_data, graph, nodes, edges = mapping.enricher.enrich_network(data, graph, nodes, edges) # Use InteractiveVisualiser for multi-layer visualisation –– Note that here we assume "aggregated_value" and # "traffic_density" are columns in the enriched edges GeoDataFrame. fmap = mapping.visual(InteractiveVisualiser()).visualise( graph, edges, ["aggregated_value", "traffic_density"], colormap="Greens", tile_provider="CartoDB positron" )
TableVisMixin – Interactive Data Exploration
The TableVisMixin
offers interactive table visualisations for your data within Jupyter notebooks using the great
Skrub
library.
interactive_display(dataframe, n_rows=10, order_by=None, title="Table Report", column_filters=None, verbose=1)
- Purpose: Displays an interactive table for exploring your data.
- Parameters:
dataframe
(pandas.DataFrame or geopandas.GeoDataFrame): The data to display.n_rows
(int, default=10): Number of rows to show.order_by
(str or list, optional): Column(s) to sort by.title
(str, optional): Title of the table.column_filters
(dict, optional): Filters for specific columns.verbose
(int, default=1): Verbosity level.
- Returns: Displays the table (no return value).
- Example:
import osmnx_mapping as oxm mapping = oxm.OSMNxMapping() data = mapping.loader.load_from_file("city_data.csv", latitude_column="lat", longitude_column="lon") mapping.table_vis.interactive_display(data, n_rows=5)
AuctusSearchMixin – Discover (Urban) Datasets
The AuctusSearchMixin
integrates with Auctus Search, allowing you to discover,
profile, and load (urban) datasets directly into your OSMNxMapping workflow.
For detailed usage and examples, please refer to the Auctus Search README. In the meantime, here are the key methods for using AuctusSearchMixin with OSMNxMapping:
-
explore_datasets_from_auctus(search_query, page=1, size=10, display_initial_results=False)
- Purpose: Searches Auctus for datasets matching the query and optionally displays initial results.
- Parameters:
search_query
(str or list): Search term(s).page
(int, default=1): Page number (pagination).size
(int, default=10): Number of results per page.display_initial_results
(bool, default=False): If True, displays initial search results. Note that if you add.with_<action>
filtering from AuctusSearch, results display before filtering; use.display()
afterward to see filtered datasets.
- Returns: An
AuctusDatasetCollection
object. See more in the Auctus Search README.
-
profile_dataset_from_auctus()
- Purpose: Displays an interactive data profile summary of the selected dataset using the Data Profile Viz library.
- Parameters: None
- Returns: None (displays the profile interactively in the notebook)
- Example:
osmnx_mapping = OSMNxMapping() osmnx_mapping.explore_datasets_from_auctus("Taxis") # Select a dataset from the interactive results osmnx_mapping.profile_dataset_from_auctus() # Displays the profile using Data Profile Viz.
-
load_dataset_from_auctus(display_table=True)
- Purpose: Loads the selected dataset from Auctus after choosing one via "Select This Dataset" from the interactive search results. Afterward, you can use the OSMNxMapping Loader module’s
load_from_dataframe
method. - Parameters:
display_table
(bool, default=True): If True, displays a preview table usingSkrub
.
- Returns: A
pandas.DataFrame
orgeopandas.GeoDataFrame
.
- Purpose: Loads the selected dataset from Auctus after choosing one via "Select This Dataset" from the interactive search results. Afterward, you can use the OSMNxMapping Loader module’s
UrbanPipelineMixin – Chain Your Workflow
The UrbanPipelineMixin
enables you to chain multiple steps into a single, reproducible pipeline, modeled after scikit-learn’s Pipeline
.
[!IMPORTANT]
Pipeline Restrictions (per configuration):
- Exactly 1
NetworkBase
step (e.g.,OSMNxNetwork
).- Exactly 1
LoaderBase
step (e.g.,CSVLoader
).- 1 or more
EnricherBase
steps (e.g.,SingleAggregatorEnricher
).- 0 or 1
VisualiserBase
step.- 0 or more
GeoImputerBase
orGeoFilterBase
steps.
Steps must adhere to these constraints, or the pipeline will raise a validation error upon creation or execution.
[!NOTE]
When using multipleEnricherBase
steps, ensure each writes to a uniqueoutput_column
. If multiple enrichers target the sameoutput_column
, the last one executed will silently overwrite the others.
- Purpose: Constructs a pipeline from a list of (name, step) tuples, where each step is an instance of a supported base class.
- Parameters:
steps
(list of tuples): Steps to include, e.g.,[("loader", CSVLoader(...)), ("network", OSMNxNetwork(...)), ("enricher", CreateEnricher().with_data(...).build())]
.
- Returns: An
UrbanPipeline
object. - Example:
import osmnx_mapping as oxm from osmnx_mapping.modules.loader.loaders.csv_loader import CSVLoader from osmnx_mapping.modules.network.networks.osmnx_network import OSMNxNetwork from osmnx_mapping.modules.enricher import CreateEnricher mapping = oxm.OSMNxMapping() pipeline = mapping.urban_pipeline([ ("loader", CSVLoader(file_path="city_data.csv")), ("network", OSMNxNetwork(place_name="Manhattan, New York City, USA")), ("enricher1", CreateEnricher() .with_data(group_by="nearest_node", values_from="traffic") .aggregate_with(method="sum", output_column="total_traffic") .build()), ("enricher2", CreateEnricher() .with_data(group_by="nearest_node", values_from="incidents") .count_by(output_column="incident_count") .build()) ])
- Purpose: Configures the pipeline by setting latitude and longitude column names, which are propagated to all relevant steps (e.g., imputers, filters) requiring geographic data.
- Parameters:
latitude_column_name
(str): Name of the latitude column in the input data.longitude_column_name
(str): Name of the longitude column in the input data.
- Example:
pipeline.compose("lat", "lon")
- Purpose: Executes the pipeline after
compose()
has been called, processing the data and returning the results. - Parameters: None (requires prior
compose()
call). - Returns: A tuple (
GeoDataFrame
,MultiDiGraph
,GeoDataFrame
,GeoDataFrame
) containing the processed data, network graph, nodes, and edges, respectively. - Example:
data, graph, nodes, edges = pipeline.transform()
- Purpose: Combines configuration and execution into a single step, configuring the pipeline and immediately processing the data.
- Parameters:
latitude_column_name
(str): Name of the latitude column.longitude_column_name
(str): Name of the longitude column.
- Returns: A tuple (
GeoDataFrame
,MultiDiGraph
,GeoDataFrame
,GeoDataFrame
) of processed data, graph, nodes, and edges. - Example:
data, graph, nodes, edges = pipeline.compose_transform("lat", "lon")
- Purpose: Visualises the pipeline’s output using the configured
VisualiserBase
step (if present). - Parameters:
result_columns
(str or list of str): Column(s) to visualise. Use a single string (e.g.,"total_traffic"
) for static visualisers (e.g.,StaticVisualiser
). Use a list of strings (e.g.,["total_traffic", "incident_count"]
) for interactive visualisers (e.g.,InteractiveVisualiser
) supporting multi-layer singular visualisation.**kwargs
: Additional visualisation options (e.g.,colormap="Blues"
,tile_provider="CartoDB positron"
).
- Returns: A plot (e.g., Matplotlib figure) for static visualisers or an interactive map for interactive visualisers.
- Note: Passing a list to
result_columns
with a static visualiser will raise an error. Ensure the type matches the visualiser used. - Example:
# For a static visualiser fig = pipeline.visualise("total_traffic", colormap="Blues") # For an interactive visualiser fmap = pipeline.visualise(["total_traffic", "incident_count"], colormap="Greens", tile_provider="CartoDB positron")
- Purpose: Saves the pipeline to a file or loads a previously saved pipeline for reuse.
- Parameters:
filepath
(str): Path to the file (e.g.,"my_pipeline.joblib"
).
- Example:
pipeline.save("my_pipeline.joblib") loaded_pipeline = UrbanPipeline.load("my_pipeline.joblib")
named_steps
: Access pipeline steps by name, e.g.,pipeline.named_steps["loader"]
.get_step_names()
: Returns a list of all step names in the pipeline.get_step(name)
: Retrieves a specific step by its name.get_params(deep=True)
: Intended to return all pipeline parameters (not yet implemented).set_params(**kwargs)
: Intended to update pipeline parameters (not yet implemented).
[!NOTE]
Theget_params
andset_params
methods are planned features and are not functional in the current release.
Important
Full documentation is forthcoming; Hence, expect some breaking changes in the API – Bare wth us a doc is cooking-up!
Check out the examples/
directory in the OSMNxMapping repo for more
detailed Jupyter notebook examples.
OSMNxMapping
is released under the MIT Licence.