You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "vinoth (JIRA)" <ji...@apache.org> on 2015/12/17 03:03:46 UTC

[jira] [Updated] (SPARK-12389) In Cluster RDD Action results are not consistent

     [ https://issues.apache.org/jira/browse/SPARK-12389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

vinoth updated SPARK-12389:
---------------------------
    Attachment: local_spark.txt
                cluster_wide.txt

> In Cluster RDD Action results are not consistent
> ------------------------------------------------
>
>                 Key: SPARK-12389
>                 URL: https://issues.apache.org/jira/browse/SPARK-12389
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.5.2
>         Environment: Centos 6.5 Machine 
> One Master and 3 Worker Nodes in VM's
> Master : 192.168.56.102
> Worker : 192.168.56.103,192.168.56.104,192.168.56.105
>            Reporter: vinoth
>         Attachments: cluster_wide.txt, local_spark.txt
>
>
> Just to now how the RDD recreate the lost segments without replication and test how the cluster wide thing work in spark.
> I have the external file in linux , just to load the file and parallelize to cluster split the transformation and perform some action on it in local as well as in cluster wide.
> The below are the file content 
> =======================
> hai hello
> hai hello
> vinoth test
> test vinoth
> test hai  
> =======================
> The transformation and action i tried is in the shell is:
> data = sc.textFile("/tmp/test.txt")
> datamap = data.flatMap(lambda x : x.split(' '))
> datamap.count()
> That's it i keep running the datamap.count() on every time. The result it produces is not consistent.
> If you split the file and count it it will be 10. I just worked and the result is consistent  if we run the pyspark shell without master option.
> If we run it on providing the master option the results or not consistent. Some times it produces 10 and some times it produce 9.
> In between the run in shell i manually down one worker node 192.168.56.104. It's even surprising the result now show as "11"
> I attached the result i got it from cluster wide as well as in local mode.
> Please apologize me for wasting you time to read this issue , if this is the normal behavior in spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org