You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Andrew Osheroff (JIRA)" <ji...@apache.org> on 2015/06/21 04:16:00 UTC

[jira] [Commented] (SPARK-5037) support dynamic loading of input DStreams in pyspark streaming

    [ https://issues.apache.org/jira/browse/SPARK-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594919#comment-14594919 ] 

Andrew Osheroff commented on SPARK-5037:
----------------------------------------

[~jswisher] I don't have much to add in terms of the continued relevance of this feature (seems very useful!), but I do have a quick question about what motivated you to take the factory route. Do you remember why creating DStreams directly results in "unwanted stuff getting dragged into closures?" I'm currently directly creating a JDStream in Python that references a custom receiver, and since both your approach and the one in KafkaUtils use a Scala-side helper class for DStream creation, I'm wondering if I've overlooked something.  

> support dynamic loading of input DStreams in pyspark streaming
> --------------------------------------------------------------
>
>                 Key: SPARK-5037
>                 URL: https://issues.apache.org/jira/browse/SPARK-5037
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Streaming
>    Affects Versions: 1.2.0
>            Reporter: Jascha Swisher
>
> The scala and java streaming APIs support "external" InputDStreams (e.g. the ZeroMQReceiver example) through a number of mechanisms, for instance by overriding ActorReceiver or just subclassing Receiver directly. The pyspark streaming API does not currently allow similar flexibility, being limited at the moment to file-backed text and binary streams or socket text streams.
> It would be great to open up the pyspark streaming API to other stream sources, putting it closer to on par with the JVM APIs.
> One way of doing this could be to support dynamically loading InputDStream implementations through reflection at the JVM level, analogously to what is currently done for Hadoop InputFormats in the regular pyspark context.py Hadoop methods. 
> I'll submit a PR momentarily with my shot at this. Comments and alternative approaches more than welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org