You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ilya Rakitsin (JIRA)" <ji...@apache.org> on 2015/06/21 16:29:00 UTC

[jira] [Comment Edited] (SPARK-8503) SizeEstimator returns negative value for recursive data structures

    [ https://issues.apache.org/jira/browse/SPARK-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595060#comment-14595060 ] 

Ilya Rakitsin edited comment on SPARK-8503 at 6/21/15 2:28 PM:
---------------------------------------------------------------

The structure is a simple cycled graph, like you would imagine it:
{code}
public abstract class Edge implements Serializable {
    private static final long serialVersionUID = MavenVersion.VERSION.getUID();
    private int id;
    protected Vertex fromv;
    protected Vertex tov;
...
}
{code}

public abstract class Vertex implements Serializable, Cloneable {
    private String name;
    private transient Edge[] incoming = new Edge[0];
    private transient Edge[] outgoing = new Edge[0];
...
}

So, as you can see, edges in vertex are transient, so are serialized correctly (basically, not serialized) when using kryo or regular serialization. But when broadcasting, size is computed in a eternal loop until it's negative (at least it seems that way) due to cycles in the graph and transient edges not being handled.

Does this help?

Another issue is that in SizeTracker#takeSample() negative value returned by the estimator is not handled as well. Do you think this could be a separate issue, or could you investigate it as well? Hope this helps.




was (Author: irakitin):
The structure is a simple cycled graph, like you would imagine it:

public abstract class Edge implements Serializable {
    private static final long serialVersionUID = MavenVersion.VERSION.getUID();
    private int id;
    protected Vertex fromv;
    protected Vertex tov;
...
}

public abstract class Vertex implements Serializable, Cloneable {
    private String name;
    private transient Edge[] incoming = new Edge[0];
    private transient Edge[] outgoing = new Edge[0];
...
}

So, as you can see, edges in vertex are transient, so are serialized correctly (basically, not serialized) when using kryo or regular serialization. But when broadcasting, size is computed in a eternal loop until it's negative (at least it seems that way) due to cycles in the graph and transient edges not being handled.

Does this help?

Another issue is that in SizeTracker#takeSample() negative value returned by the estimator is not handled as well. Do you think this could be a separate issue, or could you investigate it as well? Hope this helps.



> SizeEstimator returns negative value for recursive data structures
> ------------------------------------------------------------------
>
>                 Key: SPARK-8503
>                 URL: https://issues.apache.org/jira/browse/SPARK-8503
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.3.1
>            Reporter: Ilya Rakitsin
>
> When estimating size of recursive data structures like graphs, with transient fields referencing one another, SizeEstimator may return negative value if the structure if big enough.
> This then affects the logic of other components, e.g. SizeTracker#takeSample() and may lead to incorrect behavior and exceptions like:
> java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -9223372036854691384
> 	at scala.Predef$.require(Predef.scala:233)
> 	at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
> 	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:810)
> 	at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:637)
> 	at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:991)
> 	at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
> 	at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
> 	at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> 	at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
> 	at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> 	at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org