You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/06/27 01:21:55 UTC

[GitHub] [pulsar] yjshen edited a comment on issue #4608: Spark streaming receiver for python

yjshen edited a comment on issue #4608: Spark streaming receiver for python
URL: https://github.com/apache/pulsar/issues/4608#issuecomment-506101197
 
 
   Hi @avatart93 , we've already developed a Spark-Pulsar Connector that allows you to read/write data from/to Pulsar using Spark's new data source API, it can be used in Java and Python quite similar with [Structured Streaming + Kafka Integration Guide](https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html). Moreover, life will be easier since we also integrate Pulsar Schema. We'll contribute it back soon.
   
   At your side, you can use [Spark-ML](https://spark.apache.org/docs/latest/ml-pipeline.html) with `DataFrame` API to do machine learning related tasks. Also, you can refer to [Project Hydrogen](https://www.slideshare.net/databricks/project-hydrogen-unifying-stateoftheart-ai-and-big-data-in-apache-spark-with-tim-hunter) for the latest effort of Spark-AI integration as well as existing AI project on Spark:
   
   > https://www.slideshare.net/databricks/project-hydrogen-unifying-stateoftheart-ai-and-big-data-in-apache-spark-with-tim-hunter
   
   similar issue:  https://github.com/apache/pulsar/issues/4585

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services