You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/09 09:55:13 UTC
[GitHub] [arrow-adbc] judahrand opened a new issue, #323: Spark Connect Driver
judahrand opened a new issue, #323:
URL: https://github.com/apache/arrow-adbc/issues/323
It looks like Spark Connect has an alpha implementation: https://github.com/apache/spark/tree/master/connector/connect
Could this be a candidate for an ADBC driver?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1377638923
Ah, ok. I'm out of date there then. Thanks for clarifying.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] zeroshade commented on issue #323: Spark Connect Driver
Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537167211
Maybe we can convince them to add substrait support? Lol
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] judahrand commented on issue #323: Spark Connect Driver
Posted by "judahrand (via GitHub)" <gi...@apache.org>.
judahrand commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537100085
Spark Connect has now been released and so I suspect it could now be possible to implement this driver.
https://spark.apache.org/docs/latest/spark-connect-overview.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] judahrand commented on issue #323: Spark Connect Driver
Posted by GitBox <gi...@apache.org>.
judahrand commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1377628015
> Possibly (ideally all of C/C++, Go, and Java) though it seems they still consider it experimental for now. IIRC, the sticking point would be that Spark Connect expects you to provide the Spark query plan, not a SQL query.
I'm not certain here but it looks to me like Spark Connect will support SQL: https://github.com/apache/spark/blob/c0769759f4fd3cbce859cde790dcd1df568cfd0b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto#L98-L102
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] judahrand commented on issue #323: Spark Connect Driver
Posted by "judahrand (via GitHub)" <gi...@apache.org>.
judahrand commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537201916
> Though possibly we can (ab)use the SQL relation as pointed out above: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.sql.html?highlight=sql#pyspark.sql.SparkSession.sql
This is what I was thinking
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver
Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537166773
Hmm, it seems like it does require Spark query plans?
> The Spark Connect client translates DataFrame operations into unresolved logical query plans which are encoded using protocol buffers. These are sent to the server using the gRPC framework.
Though possibly we can (ab)use the SQL relation as pointed out above: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.sql.html?highlight=sql#pyspark.sql.SparkSession.sql
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver
Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1538415328
I think Substrait/Spark logical plans was discussed somewhere...it would be good to take that up again
But (ab)using their SQL support seems reasonable as a first step
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1375663481
Possibly (ideally all of C/C++, Go, and Java) though it seems they still consider it experimental for now. IIRC, the sticking point would be that Spark Connect expects you to provide the Spark query plan, not a SQL query.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org