You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jakub Liska (JIRA)" <ji...@apache.org> on 2016/12/18 22:12:59 UTC

[jira] [Updated] (SPARK-18919) PrimitiveKeyOpenHashMap is boxing values

     [ https://issues.apache.org/jira/browse/SPARK-18919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakub Liska updated SPARK-18919:
--------------------------------
    Description: 
Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory footprint and I noticed, that the footprint is higher then it should, if I add 1M [Long,Long] entries to it, it has : 
~ 34 MB in total
~ 17 MB  at OpenHashSet as keys
~ 17 MB at Array as values

The Array size is strange though, because its initial size is 1 048 576 (_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value has 8 bytes. Therefore I think that the values are getting boxed in this collection.

The consequence of this problem is that if you put more than 100M Long entries to this map, the GC gets choked to death with unlimited heap size ...

Strange thing is that I get the same results with using @miniboxed instead of @specialized 

This is the scalameter code I used :
{code}
class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {

  override def measurer = new Executor.Measurer.MemoryFootprint

  val sizes = Gen.single("size")(1*1000*1000)

  performance of "MemoryFootprint" in {
    performance of "PrimitiveKeyOpenHashMap" in {
      using(sizes) config (
        exec.benchRuns -> 1,
        exec.maxWarmupRuns -> 0,
        exec.independentSamples -> 1,
        exec.requireGC -> true,
        exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m", "-XX:+UseG1GC")
      ) in { size =>
          val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
          var index = 0L
          while (index < size) {
            map(index) = 0L
            index+=1
          }
          println("Size " + SizeEstimator.estimate(map))
          while (index != 0) {
            index-=1
            assert(map.contains(index))
          }
        map
      }
    }
  }
}
{code}

  was:
Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory footprint and I noticed, that the footprint is higher then it should, if I add 1M [Long,Long] entries to it, it has : 
~ 34 MB in total
~ 17 MB  at OpenHashSet as keys
~ 17 MB at Array as values

The Array size is strange though, because its initial size is 1 048 576 (_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value has 8 bytes. Therefore I think that the values are getting boxed in this collection.

The consequence of this problem is that if you put more than 100M Long entries to this map, the GC gets choked to death...

This is the scalameter code I used :
{code}
class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {

  override def measurer = new Executor.Measurer.MemoryFootprint

  val sizes = Gen.single("size")(1*1000*1000)

  performance of "MemoryFootprint" in {
    performance of "PrimitiveKeyOpenHashMap" in {
      using(sizes) config (
        exec.benchRuns -> 1,
        exec.maxWarmupRuns -> 0,
        exec.independentSamples -> 1,
        exec.requireGC -> true,
        exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m", "-XX:+UseG1GC")
      ) in { size =>
          val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
          var index = 0L
          while (index < size) {
            map(index) = 0L
            index+=1
          }
          println("Size " + SizeEstimator.estimate(map))
          while (index != 0) {
            index-=1
            assert(map.contains(index))
          }
        map
      }
    }
  }
}
{code}


> PrimitiveKeyOpenHashMap is boxing values
> ----------------------------------------
>
>                 Key: SPARK-18919
>                 URL: https://issues.apache.org/jira/browse/SPARK-18919
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: ubuntu 16.04, scala 2.11.7, 2.12.1, java 1.8.0_111
>            Reporter: Jakub Liska
>            Priority: Critical
>
> Hey, I was benchmarking PrimitiveKeyOpenHashMap for speed and memory footprint and I noticed, that the footprint is higher then it should, if I add 1M [Long,Long] entries to it, it has : 
> ~ 34 MB in total
> ~ 17 MB  at OpenHashSet as keys
> ~ 17 MB at Array as values
> The Array size is strange though, because its initial size is 1 048 576 (_keySet.capacity) so it should have ~ 8MB, not ~ 17MB because a Long value has 8 bytes. Therefore I think that the values are getting boxed in this collection.
> The consequence of this problem is that if you put more than 100M Long entries to this map, the GC gets choked to death with unlimited heap size ...
> Strange thing is that I get the same results with using @miniboxed instead of @specialized 
> This is the scalameter code I used :
> {code}
> class PrimitiveKeyOpenHashMapBench extends Bench.ForkedTime {
>   override def measurer = new Executor.Measurer.MemoryFootprint
>   val sizes = Gen.single("size")(1*1000*1000)
>   performance of "MemoryFootprint" in {
>     performance of "PrimitiveKeyOpenHashMap" in {
>       using(sizes) config (
>         exec.benchRuns -> 1,
>         exec.maxWarmupRuns -> 0,
>         exec.independentSamples -> 1,
>         exec.requireGC -> true,
>         exec.jvmflags -> List("-server", "-Xms1024m", "-Xmx6548m", "-XX:+UseG1GC")
>       ) in { size =>
>           val map = new PrimitiveKeyOpenHashMap[Long, Long](size)
>           var index = 0L
>           while (index < size) {
>             map(index) = 0L
>             index+=1
>           }
>           println("Size " + SizeEstimator.estimate(map))
>           while (index != 0) {
>             index-=1
>             assert(map.contains(index))
>           }
>         map
>       }
>     }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org