You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "kylebarron (via GitHub)" <gi...@apache.org> on 2023/03/25 02:29:58 UTC

[GitHub] [arrow-adbc] kylebarron opened a new issue, #546: Support for Postgres extension types?

kylebarron opened a new issue, #546:
URL: https://github.com/apache/arrow-adbc/issues/546

   This question might be too premature at this time for ADBC, but I'm wondering if anyone has given any thought to how a user might extend ADBC's default types.
   
   In particular, I'm curious if it would be possible to add support for PostGIS's geospatial types (via the [geoarrow](https://github.com/geoarrow/geoarrow) encoding). I'm guessing these sort of extension types wouldn't make sense to declare as part of arrow-adbc itself; would there maybe be a way to inject other type handling into the `TypeMapping`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] kylebarron commented on issue #546: Support for Postgres extension types?

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1487109690

   > Do you happen to know how a POSTGIS geometry arrives over the wire via `COPY`? 
   
   I don't. That indeed seems like the first thing to establish. I thought that it serialized to WKB by default but maybe that happens on the client and not on the wire itself.
   
   > One approach would be to fork the Postgres and SQLite drivers in the geoarrow org and re-package them with geometry support.
   
   Would that work well with ADBC? Forking postgres and sqlite drivers seems like a big undertaking. I was imagining it should be possible to have some small function that just "plugs in" and converts to geoarrow without a large fork


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on issue #546: Support for Postgres extension types?

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1488731602

   Right now it should fail. 
   
   My thought is
   - Common extensions should have first-class support if there is also an equivalent Arrow extension type (possibly toggleable in case your application doesn't understand the extension type)
   - Otherwise, we should try to return the base type and let the application handle it
   - We may need knobs to inform the driver what Postgres extension types map to which Arrow types 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] kylebarron commented on issue #546: Support for Postgres extension types?

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1707351502

   Reading through this again... this is really cool and I want to try this first-hand soon. Maybe we should move discussion to the geoarrow repo, but I wonder here how we should handle the `SRID=900914`. Should the onus be on the user to also query the spatial ref sys table so that it knows what the `SRID` defines?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1533544042

   After #636 this should work out-of-the-box. It looks like geometry columns are sent across the wire as EWKB:
   
   ``` r
   # docker run -it --rm \
   #   -e POSTGRES_PASSWORD=password \
   #   -e POSTGRES_USER=postgres \
   #   -p 5432:5432 \
   #   postgis/postgis:latest
   # ogr2ogr \
   #   -f PostgreSQL PG:"host=127.0.0.1 user=postgres password=password dbname=postgres" \
   #   /vsicurl/https://github.com/geoarrow/geoarrow-data/releases/download/latest-dev/ns-water-basin_point.gpkg \
   #   -nln basin_point
   
   # [libpq] Column #7 ("geom") has unknown type code 18011
   
   library(adbcdrivermanager)
   
   uri <- "postgresql://localhost:5432/postgres?user=postgres&password=password"
   db <- adbc_database_init(adbcpostgresql::adbcpostgresql(), uri = uri)
   con <- adbc_connection_init(db)
   
   stmt <- adbc_statement_init(con)
   stream <- nanoarrow::nanoarrow_allocate_array_stream()
   
   adbc_statement_set_sql_query(stmt, "SELECT * FROM basin_point")
   adbc_statement_execute_query(stmt, stream)
   #> [1] -1
   (result <- tibble::as_tibble(stream))
   #> # A tibble: 46 × 7
   #>      fid objectid feat_code basin_name river                hid             geom
   #>    <int>    <dbl> <chr>     <chr>      <chr>                <chr>         <blob>
   #>  1     1        1 WABA30    01EB000    BARRINGTON/CLYDE     919201D6… <raw 25 B>
   #>  2     2        2 WABA30    01EC000    ROSEWAY/SABLE/JORDAN 5293753C… <raw 25 B>
   #>  3     3        3 WABA30    01EA000    TUSKET RIVER         A7592A93… <raw 25 B>
   #>  4     4        4 WABA30    01DA000    METEGHAN             47EF929A… <raw 25 B>
   #>  5     5        5 WABA30    01ED000    MERSEY               425CA3DB… <raw 25 B>
   #>  6     6        6 WABA30    01EE000    HERRING COVE/MEDWAY  C7BF151C… <raw 25 B>
   #>  7     7        7 WABA30    01EG000    GOLD                 7C6E0383… <raw 25 B>
   #>  8     8        8 WABA30    01EF000    LAHAVE               B3D68107… <raw 25 B>
   #>  9     9        9 WABA30    01EJ000    SACKVILLE            4FA406BD… <raw 25 B>
   #> 10    10       10 WABA30    01EH000    EAST/INDIAN RIVER    72FEA676… <raw 25 B>
   #> # ℹ 36 more rows
   wk::as_wkb(result$geom)
   #> <wk_wkb[46]>
   #>  [1] <SRID=900914;POINT (277022.6 4820886)>
   #>  [2] <SRID=900914;POINT (315701.1 4855050)>
   #>  [3] <SRID=900914;POINT (255728.5 4851021)>
   #>  [4] <SRID=900914;POINT (245206.7 4895608)>
   #>  [5] <SRID=900914;POINT (337143 4860311)>  
   #>  [6] <SRID=900914;POINT (370526.6 4896768)>
   #>  [7] <SRID=900914;POINT (396155.5 4928849)>
   #>  [8] <SRID=900914;POINT (380065.5 4918188)>
   #>  [9] <SRID=900914;POINT (449560.5 4927417)>
   #> [10] <SRID=900914;POINT (417856.9 4932181)>
   #> [11] <SRID=900914;POINT (266163.3 4950370)>
   #> [12] <SRID=900914;POINT (512409.7 4958059)>
   #> [13] <SRID=900914;POINT (411009.1 4970316)>
   #> [14] <SRID=900914;POINT (486375.9 4946504)>
   #> [15] <SRID=900914;POINT (578532 4975507)>  
   #> [16] <SRID=900914;POINT (528643.6 4989779)>
   #> [17] <SRID=900914;POINT (430443.1 5003879)>
   #> [18] <SRID=900914;POINT (384152.3 4987549)>
   #> [19] <SRID=900914;POINT (598568 5004793)>  
   #> [20] <SRID=900914;POINT (468882.8 4991420)>
   #> [21] <SRID=900914;POINT (664220.6 4997062)>
   #> [22] <SRID=900914;POINT (316224.4 4988672)>
   #> [23] <SRID=900914;POINT (566212 5021547)>  
   #> [24] <SRID=900914;POINT (432768.6 5030333)>
   #> [25] <SRID=900914;POINT (478632.6 5029750)>
   #> [26] <SRID=900914;POINT (625039.5 5038466)>
   #> [27] <SRID=900914;POINT (357606 5039295)>  
   #> [28] <SRID=900914;POINT (526828 5044274)>  
   #> [29] <SRID=900914;POINT (667060.1 5053682)>
   #> [30] <SRID=900914;POINT (499160.7 5068941)>
   #> [31] <SRID=900914;POINT (449106.9 5068988)>
   #> [32] <SRID=900914;POINT (557031.5 5066144)>
   #> [33] <SRID=900914;POINT (576642 5050958)>  
   #> [34] <SRID=900914;POINT (406967.3 5085306)>
   #> [35] <SRID=900914;POINT (606687.9 5063043)>
   #> [36] <SRID=900914;POINT (418753.5 5086491)>
   #> [37] <SRID=900914;POINT (653615.8 5083572)>
   #> [38] <SRID=900914;POINT (395483.2 5079346)>
   #> [39] <SRID=900914;POINT (668426.1 5118705)>
   #> [40] <SRID=900914;POINT (614040.2 5097386)>
   #> [41] <SRID=900914;POINT (720114.2 5094879)>
   #> [42] <SRID=900914;POINT (697678 5137499)>  
   #> [43] <SRID=900914;POINT (644382.4 5130242)>
   #> [44] <SRID=900914;POINT (663219.1 5197865)>
   #> [45] <SRID=900914;POINT (701578 5185235)>  
   #> [46] <SRID=900914;POINT (739542 4873410)>
   ```
   
   <sup>Created on 2023-05-03 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1486966789

   Do you happen to know how a POSTGIS geometry arrives over the wire via `COPY`? (I suppose one could open a debugger and check...I think it might be the bytes of the gserialized version of the geometry). I obviously have a vested interest in making this a thing (also for SQLite + GeoPackage, which has a similar issue where the way the geometry arrives by default is not *quite* how we'd want it represented in query output.
   
   One approach would be to fork the Postgres and SQLite drivers in the geoarrow org and re-package them with geometry support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1707530057

   It's a good point...going from value-level SRID to column-level SRID is hard and would require either:
   
   - collecting a certain number of rows or batches when a user calls `AdbcStatementExecuteQuery()` and using the first values to query the SRID table and add it to the type before populating the result `ArrowArrayStream`. This is what the SQLite driver does.
   - in the common case where the query is returning a field in a table, the `PQresult` would know what table and what column index (via `PQftable()` and `PQftablecol()`). In theory one could construct a query in `AdbcStatementExecuteQuery()` before populating the `ArrowArrayStream` result that fills in the CRS. SQLite similarly has options for this ( https://www.sqlite.org/c3ref/column_database_name.html ).
   
   The first option could be done in a wrapper driver...the second one would have to be passed on via field metadata to be used in a wrapper driver (and wouldn't work for queries with PostGIS or the various spatial SQLite function calls). Collecting some rows from the start might be helpful when attempting to convert a geometry column to geoarrow, too (it could try to guess a single geometry type, for example).
   
   Probably the first "geoadbc" driver should be a thin wrapper around OGR's ArrowArrayStream export? That may help establish some conventions (e.g., option names).
   
   Worth chatting about in the geoarrow org somewhere too, but also worth considering which pieces of metadata we would need from the upstream driver to make geospatial wrappers since those changes would have to happen here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1707532620

   It would also be cool to handle geospatial natively where possible (e.g., where "support" just means "add an extension type" and maybe querying some metadata). For Snowflake that would require parsing GeoJSON (but the Snowflake driver is written in Go which might make that easier to do). I imagine it would be useful for us to prototype the wrapper first (at the very least in Python or R).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1487152356

   I agree that would be ideal although I suspect (but don't know) that extensibility isn't likely to be a target for a bit and so I wonder if the first target should be a workaround of some kind.
   
   Another workaround would be to write a driver-that-wraps-a-driver...basically make a driver wrapper that passes along all the method calls but wraps the output `ArrowArrayStream` with an implementation that parses the raw output, releases the source array, and replaces it with the parsed version. Not ideal because there's one more copy than is theoretically necessary, but would at least get everybody's workflow sorted (e.g., use the ADBC postgis driver and get geoarrow output!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1487232788

   An easier initial target is probably GeoPackage + SQLite...those values almost certainly come through as binary and I've already written the bit that strips the WKB header ( https://github.com/paleolimbot/gpkg/blob/master/src/gpkg.cpp#L196-L232 ). Then the same logic could be applied to the postgres driver. I imagine that both drivers will need some updates to pass on more information about column types as schema metadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] kylebarron commented on issue #546: Support for Postgres extension types?

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1487213131

   > Another workaround would be to write a driver-that-wraps-a-driver...basically make a driver wrapper that passes along all the method calls but wraps the output `ArrowArrayStream` with an implementation that parses the raw output, releases the source array, and replaces it with the parsed version
   
   That sounds ideal to me to get started. I guess the first thing to check is what the current ADBC driver behavior is with postgis types. Would/could it return a `BinaryArray` with whatever blobs are returned from the postgis wire format, which a wrapper could convert into geoarrow? Based on the [type support](https://arrow.apache.org/adbc/0.1.0/driver/cpp/postgresql.html#type-support) doc I was guessing it would fail on non-default types, but I haven't tested this yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] paleolimbot commented on issue #546: Support for Postgres extension types?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #546:
URL: https://github.com/apache/arrow-adbc/issues/546#issuecomment-1533564999

   FWIW, I did learn in trying this out that PostGIS has a custom send/recv definition. Geoarrow could define a "geoarrow.ewkb" extension type to acomodate this...in that case, the driver would just have to attach an extension name which is probably doable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org