You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Almog Tavor (Jira)" <ji...@apache.org> on 2021/09/13 06:26:00 UTC

[jira] [Commented] (CAMEL-16879) camel-spark: structured streaming and DStream libraries support addition

    [ https://issues.apache.org/jira/browse/CAMEL-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413927#comment-17413927 ] 

Almog Tavor commented on CAMEL-16879:
-------------------------------------

There should be connectors for each Spark Streaming’s supported connection in the Camel-Spark component. According to the Spark Streaming documentation:

Spark Streaming provides two categories of built-in streaming sources.
 * _Basic sources_: Sources directly available in the StreamingContext API. Example: file systems, socket connections, and Akka actors.
 * _Advanced sources_: Sources like Kafka, Flume, Kinesis, Twitter, etc. are available through extra utility classes. These require linking against extra dependencies as discussed in the [linking|https://spark.apache.org/docs/1.2.2/streaming-programming-guide.html#linking] section.

I think that there should be an option of choosing the source, and there will be an implementation for each of the following:
||Source||Artifact||
|Kafka|spark-streaming-kafka_2.10|
|Flume|spark-streaming-flume_2.10|
|Kinesis|spark-streaming-kinesis-asl_2.10 [Amazon Software License]|
|Twitter|spark-streaming-twitter_2.10|
|ZeroMQ|spark-streaming-zeromq_2.10|
|MQTT|spark-streaming-mqtt_2.10|

Plus an implementation for the basic sources, which should look pretty much the same for the Camel-Spark user.

> camel-spark: structured streaming and DStream libraries support addition
> ------------------------------------------------------------------------
>
>                 Key: CAMEL-16879
>                 URL: https://issues.apache.org/jira/browse/CAMEL-16879
>             Project: Camel
>          Issue Type: New Feature
>          Components: camel-spark
>            Reporter: Almog Tavor
>            Priority: Major
>
> It would be great if support for the park streaming libraries were added. The Spark Structured Streaming library is very popular and in general, the Spark Streaming libraries are in common use (see the [Stackoverflow questions count|https://stackoverflow.com/questions/tagged/spark-streaming]). The first support in my opinion would be added to the newer Spark Structured Streaming library - a library with a similar API to the Spark SQL Dataframes API. Then to the DStreams API (similar to the RDDs API). That's because the main use of the library is with the Structured Streaming library.
> I put it in a major priority because lots of Apache Spark users will tend to give up the use of Apache Camel if they are using Spark Structured Streaming, which is currently unsupported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)