You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Viktor Vojnovski (JIRA)" <ji...@apache.org> on 2016/12/06 17:23:58 UTC

[jira] [Created] (SPARK-18743) StreamingContext.textFileStream(directory) has no events shown in Web UI

Viktor Vojnovski created SPARK-18743:
----------------------------------------

             Summary: StreamingContext.textFileStream(directory) has no events shown in Web UI
                 Key: SPARK-18743
                 URL: https://issues.apache.org/jira/browse/SPARK-18743
             Project: Spark
          Issue Type: Bug
          Components: Web UI
    Affects Versions: 1.6.0
         Environment: Cloudera
            Reporter: Viktor Vojnovski
            Priority: Minor


StreamingContext.textFileStream input is not reflected in the Web UI, ie. the Input rate stays at 0 events/sec (see attached screenshot).

Please find below a reproduction scenario, and a link to the same issue being reported on the spark user list.

http://mail-archives.apache.org/mod_mbox/spark-user/201604.mbox/%3CCAEko17iCNeeOzEbwqH9vGAkgXEpH3Rw=bWMkDfOOzCx30Zj2TA@mail.gmail.com%3E

[vvojnovski@machine:~] % cat a.py
from __future__ import print_function

from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext

SparkContext.setSystemProperty('spark.executor.instances', '3')

conf = (SparkConf()
        .setMaster("yarn-client")
        .setAppName("My app")
        .set("spark.executor.memory", "1g"))

sc = SparkContext(conf=conf)

ssc = StreamingContext(sc, 5)

lines = ssc.textFileStream("testin")

counts = lines.flatMap(lambda line: line.split(" "))\
              .map(lambda x: (x, 1))\
              .reduceByKey(lambda a, b: a+b)

counts.pprint()

ssc.start()
ssc.awaitTermination()
[vvojnovski@machine:~] % cat testin.input 
1 2
3 4
5 6
7 8
9 10
11 12
[vvojnovski@machine:~] % hdfs dfs –mkdir testin
[vvojnovski@machine:~] % spark-submit a.py &
[vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.1
[vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.2
[vvojnovski@machine:~] % hdfs dfs -put testin.input testin/testin.input.3





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org