You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "bluejoe (JIRA)" <ji...@apache.org> on 2018/01/02 06:00:00 UTC
[jira] [Created] (SPARK-22936) providing HttpStreamSource and
HttpStreamSink
bluejoe created SPARK-22936:
-------------------------------
Summary: providing HttpStreamSource and HttpStreamSink
Key: SPARK-22936
URL: https://issues.apache.org/jira/browse/SPARK-22936
Project: Spark
Issue Type: New Feature
Components: Structured Streaming
Affects Versions: 2.1.0
Reporter: bluejoe
Hi, in my project I completed a spark-http-stream, which is now available on https://github.com/bluejoe2008/spark-http-stream. I am thinking if it is useful to others and is ok to be integrated as a part of Spark.
spark-http-stream transfers Spark structured stream over HTTP protocol. Unlike tcp streams, Kafka streams and HDFS file streams, http streams often flow across distributed big data centers on the Web. This feature is very helpful to build global data processing pipelines across different data centers (scientific research institutes, for example) who own separated data sets.
The following code shows how to load messages from a HttpStreamSource:
{{val lines = spark.readStream.format(classOf[HttpStreamSourceProvider].getName)
.option("httpServletUrl", "http://localhost:8080/xxxx")
.option("topic", "topic-1");
.option("includesTimestamp", "true")
.load();}}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org