Zoom Logo

Scaling geospatial vector data Workshop - Shared screen with speaker view
Joris Van den Bossche
07:47
Small problem: I don't hear anything ..
Julia Signell
08:02
yes we can hear you
Julia Signell
08:33
Please put questions in the chat and we'll do them at the end of the short talks
Julia Signell
17:54
This is the dask-geopandas repo: https://github.com/geopandas/dask-geopandas
Anita Graser (MovingPandas)
20:37
Space-filling curves could be one way to create an order by which to do spatial partitioning
Julia Signell
21:24
Yeah! I think we'll hear more about that approach in the next talk about SpatialPandas :)
Anita Graser (MovingPandas)
21:34
Excellent
Gael
31:09
sorry accident
Julia Signell
31:19
No worries!
Jorge López
35:36
do libraries that support separately Dask Dataframes and GeoDataFrames such as HoloViews/GeoViews also work with DaskGeoDataFrames?
Jorge López
36:57
in particular, when one calls there operations.datashade(), I'd expect in turn that Datashader also supports that? but not sure at all
Julia Signell
38:39
I'm not actually sure. I think it'll work but it will bring the whole dataset into memory rather than doing something more clever.
Jorge López
40:32
thanks!
Martin Fleischmann
46:18
Can you explain how is spatial lag computed?
James Bednar
54:33
Jorge, reading your question here I _think_ you are asking about dask_geopandas specifically, while my verbal answer was about holoviews, dask, and datashader in general, not specifically about dask_geopandas. *In general*, things work as I described: If both HoloViews and Datashader support a given data structure, then the data will stay in that data structure and be computed using multiple workers (if Dask) or the GPU (if CUDA). But I don’t actually know if we currently have any support for dask_geopandas at all. If we do, I’m pretty sure it would not extend down to Datashader, due to Datashader not supporting underlying GeoPandas or PyGEOS objects (yet).
James Bednar
56:08
Anita, if you start with GeoPandas and try to datashade it, right now if that works HoloVIews would presumably have to extract a Pandas dataframe of points or lines out of it and then pass that to Datashader. So it’s not surprising that such a case would be slow, though it is perhaps surprising that it works at all! :-)
Joris Van den Bossche
56:49
With the latest version of pygeos, it should be relatively easy (and cheap) to convert the geopandas geometries to the ragged array representation that datashader needs. I think the main discussion we need to have is *where* this conversion is done.
James Bednar
56:58
Yay!
Joris Van den Bossche
57:05
I.e. can datashader take responsibility for it? Or does datashader want a method in geopandas to call to get the converted arrays?
Joris Van den Bossche
57:21
But I will open an issue in datashader to discuss it
Jorge López
57:23
mmm I see, in my (naive) point of view I understand DaskGeoDataFrame is just a DaskDataFrame whose partitions are GeoDataFrames instead of ordinary Pandas DataFrames
James Bednar
57:28
Let’s discuss later.
Martin Fleischmann
57:35
I’d vote for having it in GeoPandas, it may be useful elsewhere as well.
Joris Van den Bossche
57:40
(and if it supports geopandas, then dask-geopandas will be trivial as well
James Bednar
57:59
Jorge, yes, but Datashader doesn’t know what to do with GeoDataframes, let alone DaskGeoDataframes.
James Bednar
58:41
Instead Datashader currently understands SpatialPandas dataframes, not Geopandas dataframes.
Jorge López
58:49
so, given that GeoDataFrames are Pandas DataFrames with an additional 'geometry' column, it shouldn't be too difficult (without knowing the internals, hehe)
James Bednar
01:02:24
Historically, it *was* too difficult, because GeoPandas was based on opaque Python-based Shapely objects, while Datashader wants to muck directly with arrays of numbers. Now GeoPandas uses PyGEOS in efficient arrays, it can be a different story.
Julia Signell
01:02:50
To be clear geoviews will already work, but datashader might need more work.
James Bednar
01:03:36
Right; GeoViews doesn’t do any mucking about with the underlying densely packed arrays, but Datashader needs to do that to be efficient.
Jorge López
01:05:54
I see, well, hope integration becomes easier now with PyGEOS, it's quite difficult to familiarise with so many geo-based libraries :)
Joris Van den Bossche
01:06:50
I made a placeholder issue in datashader to support geopandas: https://github.com/holoviz/datashader/issues/1006
Joris Van den Bossche
01:07:11
I will update it with actual context and proposal after the workshop, but so if you want to follow that, you can already subscribe to the issue
James Bednar
01:07:18
An ideal outcome of this workshop would be for there to be fewer libraries because spatial indexing is done once (rather than all over the place), objects are stored in efficient arrays in nearly all cases, etc. That should help people not have to keep track of so many separate initiatives!
Anita Graser (MovingPandas)
01:07:42
+1
Anita Graser (MovingPandas)
01:12:48
might be worth to look into geomesa code re efficient indexing of non-point geometries. it's mentioned here on slide 23 https://www.slideshare.net/VisionGEOMATIQUE2014/geomesa-scalable-geospatial-analytics but unfortunately the presentation doesn't explain any details
Dahn
01:18:06
I'm actually working a project that has moving data and uses daskFor us, the representation we chose is- index: user id (for quick grouping)- data sorted by (user, timestamp), so that algorithms can work sequentiall and just keep track of trajectory borders- spatial coordinates as just x, y columns
Anita Graser (MovingPandas)
01:20:15
@Dahn That sounds very similar to the Pandas+datashader approach I followed in https://github.com/anitagraser/EDA-protocol-movement-data/blob/main/protocol.ipynb
Anita Graser (MovingPandas)
01:20:37
It would certainly profit from adding dask to speed up the processing steps
Dahn
01:20:53
Yes, for example, we implemented the stop locations (staypoints) algorithm that way in a distributed manner
Anita Graser (MovingPandas)
01:21:09
Is your code online?
Dahn
01:21:11
And thanks for that notebook, definitely wanted to go through it after the presentation :)
Dahn
01:21:55
Sadly no, it's closed source. Happy to discuss details though.
Benoit Bovy
01:24:28
Spatial indexing libraries that could be used across different packages would be very useful for Xarray too (we are currently updating Xarray to be able to plugin custom indexes) and I was wondering if something similar was planned in Geopandas too?
Martin Fleischmann
01:25:49
Yes, we have a vague idea to provide an API for various spatial indexing libraries. We internally support both rtree and pygeos index at the moment but we want to allow others in some way.
Anita Graser (MovingPandas)
01:27:38
Re standardization of movement data: there's also the OGC moving features standard working group. It's very much work in progress and they have defined three different data models that each have advantages and disadvantages
Joris Van den Bossche
01:30:34
@Benoit, do you have a reference to the discussions / issues in xarray related to this?
Benoit Bovy
01:31:37
There’s some discussion in this repo: https://github.com/xarray-contrib/xoak
Joris Van den Bossche
01:32:03
Thanks!
Joris Van den Bossche
01:42:46
A dask issue about different kind of indices: https://github.com/dask/dask/issues/6246
Benoit Bovy
01:45:10
I think it’s worth mentioning https://github.com/google/s2geometry, which also uses space filling curves and already provides many features for handling and indexing multiple geometries (on the sphere)
James Bednar
01:51:56
There’s also https://www.cgal.org , for 3D; not sure if they address spatial indexing.
Anita Graser (MovingPandas)
01:52:55
ad indexing polygons: https://www.geomesa.org/documentation/stable/tutorials/geohash-substrings.html
Dask Track 3
01:58:05
Hello everyone! Just a time check, we will be closing this room in ~5 mins.
Dani Arribas-Bel
01:58:38
agreed, thank you very much for putting this together!
iacopo
01:58:57
thanks!!
Dahn
01:58:58
Thanks for this, it was an amazingly interesting session!