You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Larry Xiao (JIRA)" <ji...@apache.org> on 2014/08/12 05:27:11 UTC

[jira] [Updated] (SPARK-2981) PartitionStrategy: VertexID hash overflow

     [ https://issues.apache.org/jira/browse/SPARK-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Larry Xiao updated SPARK-2981:
------------------------------

    Description: 
In PartitionStrategy.scala a PartitionID is calculated by multiplying VertexId with a mixingPrime (1125899906842597L) then cast to Int, and mod numParts.

The Long is overflowed, and when cast to Int:

{quote}
scala> (1125899906842597L*1).toInt
res1: Int = -27

scala> (1125899906842597L*2).toInt
res2: Int = -54

scala> (1125899906842597L*3).toInt
res3: Int = -81
{quote}
As the cast produce number that are multiplies of 3, the partition is not useable when partitioning to multiples of 3.

for example when you partition to 6 or 9 parts:
{quote}
14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: psrc Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0))
14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: pdst Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 

14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: psrc Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0))
14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: pdst Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) 

so the vertices are partitioned to 0,3 for 6; and 0 for 9
{quote}

I think solution is to cast after mod.
{quote}
scala> (1125899906842597L*3)
res4: Long = 3377699720527791

scala> (1125899906842597L*3) % 9
res5: Long = 3

scala> ((1125899906842597L*3) % 9).toInt
res5: Int = 3
{quote}

  was:
In PartitionStrategy.scala a PartitionID is calculated by multiplying VertexId with a mixingPrime (1125899906842597L) then cast to Int, and mod numParts.

The Long is overflowed, and when cast to Int:

{quote}
scala> (1125899906842597L*1).toInt
res1: Int = -27

scala> (1125899906842597L*2).toInt
res2: Int = -54

scala> (1125899906842597L*3).toInt
res3: Int = -81
{quote}
As the cast produce number that are multiplies of 3, the partition is not useable when partitioning to multiples of 3.

for example when you partition to 6 or 9 parts:
{quote}
14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: psrc Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0))
14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: pdst Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 

14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: psrc Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0))
14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: pdst Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) 
{quote}
I think solution is to cast after mod.
{quote}
scala> (1125899906842597L*3)
res4: Long = 3377699720527791

scala> (1125899906842597L*3) % 9
res5: Long = 3

scala> ((1125899906842597L*3) % 9).toInt
res5: Int = 3
{quote}


> PartitionStrategy: VertexID hash overflow
> -----------------------------------------
>
>                 Key: SPARK-2981
>                 URL: https://issues.apache.org/jira/browse/SPARK-2981
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.0.2
>            Reporter: Larry Xiao
>              Labels: newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> In PartitionStrategy.scala a PartitionID is calculated by multiplying VertexId with a mixingPrime (1125899906842597L) then cast to Int, and mod numParts.
> The Long is overflowed, and when cast to Int:
> {quote}
> scala> (1125899906842597L*1).toInt
> res1: Int = -27
> scala> (1125899906842597L*2).toInt
> res2: Int = -54
> scala> (1125899906842597L*3).toInt
> res3: Int = -81
> {quote}
> As the cast produce number that are multiplies of 3, the partition is not useable when partitioning to multiples of 3.
> for example when you partition to 6 or 9 parts:
> {quote}
> 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: psrc Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0))
> 14/08/12 09:26:21 INFO GraphXPartition: GRAPHX: pdst Array((0,4347084), (1,0), (2,0), (3,3832578), (4,0), (5,0)) 
> 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: psrc Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0))
> 14/08/12 09:21:46 INFO GraphXPartition: GRAPHX: pdst Array((0,8179662), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0)) 
> so the vertices are partitioned to 0,3 for 6; and 0 for 9
> {quote}
> I think solution is to cast after mod.
> {quote}
> scala> (1125899906842597L*3)
> res4: Long = 3377699720527791
> scala> (1125899906842597L*3) % 9
> res5: Long = 3
> scala> ((1125899906842597L*3) % 9).toInt
> res5: Int = 3
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org