You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by rapelly kartheek <ka...@gmail.com> on 2014/09/12 08:37:32 UTC

replicate() method in BlockManager.scala choosing only one node for replication.

Hi,

I just wanted to see the flow of nodes getting allocated for rdd
replication. I see that all the blocks are getting replicated in the same
node. I was expecting that each block gets replicated over different nodes.
I have a humble three node spark cluster :).

Below is the trace of replicate() method through print statements:



Before calling the getPeers
null

After calling: WrappedArray(BlockManagerId(1, s1, 47511, 0))

Inside the forloop

host: s1  port: 47511  execID: 1  netty: 0

Try to replicate BlockId rdd_1_4 once; The size of the data is 38722395
Bytes. To node: BlockManagerId(1, s1, 47511, 0)

Before calling the getPeers:
WrappedArray(BlockManagerId(1, s1, 47511, 0))

Inside the forloop

host: s1  port: 47511 exeID: 1  netty: 0

Try to replicate BlockId rdd_1_0 once; The size of the data is 139496007
Bytes. To node: BlockManagerId(1, s1, 47511, 0)

Before calling the getPeers
WrappedArray(BlockManagerId(1, s1, 47511, 0))

Inside the forloop

host: s1  port: 47511 execID: 1  netty: 0

Try to replicate BlockId rdd_1_1 once; The size of the data is 139495994
Bytes. To node: BlockManagerId(1, s1, 47511, 0)

Before calling the getPeers
WrappedArray(BlockManagerId(1, s1, 47511, 0))

Inside the forloop

host: s1  port: 47511 execID: 1 netty: 0

Try to replicate BlockId rdd_1_2 once; The size of the data is 139496003
Bytes. To node: BlockManagerId(1, s1, 47511, 0).

Can someone please tell me why this is happening??
Why is it that the entire rdd is replicated on a single node??

Thank you
-Karthik.

Re: replicate() method in BlockManager.scala choosing only one node for replication.

Posted by "Kartheek.R" <ka...@gmail.com>.

When I see the storage details of the rdd in the webUI, I find that each
block is replicated twice and not on a single node. All the nodes in the
cluster are hosting some block or the other.

Why is this difference?? The trace of replicate() method shows only one
node. But, webUI shows multiple nodes.

Can someone correct me if my understanding is not correct.

-Karthik



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/replicate-method-in-BlockManager-scala-choosing-only-one-node-for-replication-tp14059p14072.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org