You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/09 09:55:13 UTC

[GitHub] [arrow-adbc] judahrand opened a new issue, #323: Spark Connect Driver

judahrand opened a new issue, #323:
URL: https://github.com/apache/arrow-adbc/issues/323

   It looks like Spark Connect has an alpha implementation: https://github.com/apache/spark/tree/master/connector/connect
   
   Could this be a candidate for an ADBC driver?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1377638923

   Ah, ok. I'm out of date there then. Thanks for clarifying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] zeroshade commented on issue #323: Spark Connect Driver

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537167211

   Maybe we can convince them to add substrait support? Lol


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] judahrand commented on issue #323: Spark Connect Driver

Posted by "judahrand (via GitHub)" <gi...@apache.org>.
judahrand commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537100085

   Spark Connect has now been released and so I suspect it could now be possible to implement this driver.
   
   https://spark.apache.org/docs/latest/spark-connect-overview.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] judahrand commented on issue #323: Spark Connect Driver

Posted by GitBox <gi...@apache.org>.
judahrand commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1377628015

   > Possibly (ideally all of C/C++, Go, and Java) though it seems they still consider it experimental for now. IIRC, the sticking point would be that Spark Connect expects you to provide the Spark query plan, not a SQL query.
   
   I'm not certain here but it looks to me like Spark Connect will support SQL: https://github.com/apache/spark/blob/c0769759f4fd3cbce859cde790dcd1df568cfd0b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto#L98-L102


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] judahrand commented on issue #323: Spark Connect Driver

Posted by "judahrand (via GitHub)" <gi...@apache.org>.
judahrand commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537201916

   
   > Though possibly we can (ab)use the SQL relation as pointed out above: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.sql.html?highlight=sql#pyspark.sql.SparkSession.sql
   
   This is what I was thinking 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1537166773

   Hmm, it seems like it does require Spark query plans?
   
   > The Spark Connect client translates DataFrame operations into unresolved logical query plans which are encoded using protocol buffers. These are sent to the server using the gRPC framework.
   
   Though possibly we can (ab)use the SQL relation as pointed out above: https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.sql.html?highlight=sql#pyspark.sql.SparkSession.sql


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1538415328

   I think Substrait/Spark logical plans was discussed somewhere...it would be good to take that up again
   
   But (ab)using their SQL support seems reasonable as a first step


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-adbc] lidavidm commented on issue #323: Spark Connect Driver

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #323:
URL: https://github.com/apache/arrow-adbc/issues/323#issuecomment-1375663481

   Possibly (ideally all of C/C++, Go, and Java) though it seems they still consider it experimental for now. IIRC, the sticking point would be that Spark Connect expects you to provide the Spark query plan, not a SQL query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org