You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by buring <qy...@gmail.com> on 2014/12/16 11:57:24 UTC

RDD "toarray","first" behavior

Hi
	Recently I have some problems about rdd behaviors.It's about
"RDD.first","RDD.toArray" method when RDD only has one element. I can't get
the correct element in RDD. I will give more detail after the code.
	My code was as follows:
	//get and rdd with just one row RDD[(Long,Array[Byte])]	
	val alsresult =
sc.sequenceFile(args(0)+"/als",classOf[LongWritable],classOf[BytesWritable]).map{case(uid,sessions)=>
      sessions.setCapacity(sessions.getLength)
      (uid.get(),sessions.getBytes)
    }.filter{line=>
      line._1 == userindex.value //specified from arguments
    }
    //log information really surprised me
    logger.info("alsInformation:%d".format(alsresult.count()))
   
alsresult.toArray().foreach(e=>logger.info("alstoarray:%d\t%s".format(e._1,e._2.mkString("
"))))
   
alsresult.take(1).foreach(e=>logger.info("take1result:%d\t%s".format(e._1,e._2.mkString("
"))))
   
logger.info("firstInformation:%d\t%s".format(alsresult.first()._1,alsresult.first()._2.mkString("
")))
   
alsresult.collect().foreach(e=>logger.info("alscollectresult:%d\t%s".format(e._1,e._2.mkString("
"))))
   
alsresult.take(3).foreach(e=>logger.info("alstake3result:%d\t%s".format(e._1,e._2.mkString("
")))) //3 is big than the rdd.count()

    I get a RDD which just have one element. But use the different method ,I
got the different element. My print information as follows:

    		                    userindex.value =28116855			 userindex.value
=123456    
alsInformation			1 											1
alstoarray			28116855	16 32 0 22 13 49 19...  		12345616	32 0 22 13 49 19...
take1result			28116855	16 52 31 42 29 36 14... 		12345639 	39 21 34 25 49 51
...
firstInformation		28116855	16 52 31 42 29 36 14... 		12345639 	39 21 34 25
49 51 ...
alscollectresult		28116855	16 32 0 22 13 49 19...   		12345616 	32 0 22 13
49 19...
alstake3result		28116855	16 32 0 22 13 49 19... 		12345616 	32 0 22 13 49
19...
 I filter the rdd and guarantee the RDD.count() equal 1.,I think different
"userindex.value"arguments should get different alsresult ,
but "RDD.toArray","RDD.collect","RDD.take(3)" ,have the same result and
under the same argument "toArray" ,"take(1)","take(3)" 
have the different resultmethod ,It's really surpurised me.

Can anyone explain it or give me some reference?

Thanks 




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-toarray-first-behavior-tp20710.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org