You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2014/11/25 16:13:12 UTC

[jira] [Commented] (SPARK-2072) Streaming not processing a file with particular number of entries

    [ https://issues.apache.org/jira/browse/SPARK-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224660#comment-14224660 ] 

Tathagata Das commented on SPARK-2072:
--------------------------------------

Does this issue still exist? If not, I am going to close this JIRA.

> Streaming not processing a file with particular number of entries
> -----------------------------------------------------------------
>
>                 Key: SPARK-2072
>                 URL: https://issues.apache.org/jira/browse/SPARK-2072
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.0.0
>            Reporter: Pravesh Jain
>            Priority: Minor
>         Attachments: Output_400k.txt, Output_800k.txt, StreamingJavaLR.java
>
>
> I am using Spark-1.0.0 over a 3 node cluster with 1 master and 2 slaves. I am trying to run LR algorithm over Spark Streaming. 
> I take input from files for training the model with JavaSparkContext and for testing the model with JavaStreamingContext. 
> I have used the data given in $SPARK_HOME/mllib/data/lr-data/random.data for training and testing. To obtain larger data sets, I have copied this data. The code works fine for every possible set of data in local mode. Over the cluster, however, it is not able to process the file containing 0.4million entries. For every other data set it works fine .But for 0.4 million entries it doesn't print the output in the file it is supposed to. 
> The worker logs don't output anything different. 
> I am attaching the code i am using as well the the output for 0.4 million and 0.8 million entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org