You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Viktor Vojnovski (JIRA)" <ji...@apache.org> on 2016/12/06 17:24:59 UTC

[jira] [Updated] (SPARK-18743) StreamingContext.textFileStream(directory) has no events shown in Web UI

     [ https://issues.apache.org/jira/browse/SPARK-18743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viktor Vojnovski updated SPARK-18743:
-------------------------------------
    Attachment: screenshot-1.png

> StreamingContext.textFileStream(directory) has no events shown in Web UI
> ------------------------------------------------------------------------
>
>                 Key: SPARK-18743
>                 URL: https://issues.apache.org/jira/browse/SPARK-18743
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI
>    Affects Versions: 1.6.0
>         Environment: Cloudera
>            Reporter: Viktor Vojnovski
>            Priority: Minor
>         Attachments: screenshot-1.png
>
>
> StreamingContext.textFileStream input is not reflected in the Web UI, ie. the Input rate stays at 0 events/sec (see attached screenshot).
> Please find below a reproduction scenario, and a link to the same issue being reported on the spark user list.
> http://mail-archives.apache.org/mod_mbox/spark-user/201604.mbox/%3CCAEko17iCNeeOzEbwqH9vGAkgXEpH3Rw=bWMkDfOOzCx30Zj2TA@mail.gmail.com%3E
> [vvojnovski@machine:~] % cat a.py
> from __future__ import print_function
> from pyspark import SparkContext, SparkConf
> from pyspark.streaming import StreamingContext
> SparkContext.setSystemProperty('spark.executor.instances', '3')
> conf = (SparkConf()
>         .setMaster("yarn-client")
>         .setAppName("My app")
>         .set("spark.executor.memory", "1g"))
> sc = SparkContext(conf=conf)
> ssc = StreamingContext(sc, 5)
> lines = ssc.textFileStream("testin")
> counts = lines.flatMap(lambda line: line.split(" "))\
>               .map(lambda x: (x, 1))\
>               .reduceByKey(lambda a, b: a+b)
> counts.pprint()
> ssc.start()
> ssc.awaitTermination()
> [vvojnovski@machine:~] % cat testin.input 
> 1 2
> 3 4
> 5 6
> 7 8
> 9 10
> 11 12
> [vvojnovski@machine:~] % hdfs dfs –mkdir testin
> [vvojnovski@machine:~] % spark-submit a.py &
> [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.1
> [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.2
> [vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org