You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cxzl25 <gi...@git.apache.org> on 2018/05/12 11:18:28 UTC

[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

GitHub user cxzl25 opened a pull request:

    https://github.com/apache/spark/pull/21311

    [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new size may be wrong

    ## What changes were proposed in this pull request?
    
    LongToUnsafeRowMap
    Calculate the new size simply by multiplying by 2
    At this time, the size of the application may not be enough to store data
    Some data is lost and the data read out is dirty
    
    ## How was this patch tested?
    HashedRelationSuite
    test("LongToUnsafeRowMap with big values")
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cxzl25/spark fix_LongToUnsafeRowMap_page_size

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21311
    
----
commit d9d8e62c2de7d9d04534396ab3bbf984ab16c7f5
Author: sychen <sy...@...>
Date:   2018-05-12T11:14:17Z

    LongToUnsafeRowMap Calculate the new correct size

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90967/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    @gatorsmile @hvanhovell Could you trigger tests?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90575/testReport)** for PR 21311 at commit [`d9d8e62`](https://github.com/apache/spark/commit/d9d8e62c2de7d9d04534396ab3bbf984ab16c7f5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90966/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91041/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90967/testReport)** for PR 21311 at commit [`f3916e7`](https://github.com/apache/spark/commit/f3916e765c8efe640c0b3c9d33ea336393df5d01).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    @cloud-fan 
    LongToUnsafeRowMap#append(key: Long, row: UnsafeRow)
    when row.getSizeInBytes > newPageSize( oldPage.length * 8L * 2),still use newPageSize value.
    When the new page size is insufficient to hold the entire row of data, Platform.copyMemory is still called.No error.
    At this time, the last remaining data was discarded.
    When reading data, read this buffer according to offset and length,the last data is unpredictable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189905873
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -626,6 +618,32 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
       }
     
    +  private def grow(neededSize: Int): Unit = {
    +    // There is 8 bytes for the pointer to next value
    +    val totalNeededSize = cursor + 8 + neededSize
    --- End diff --
    
    @cloud-fan Thank you for your suggestion and code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187907473
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    --- End diff --
    
    ok . Doubling the size when growing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91066/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    LGTM, good catch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90966/testReport)** for PR 21311 at commit [`6fe1dd0`](https://github.com/apache/spark/commit/6fe1dd07c4dc4ad8c42f374def86ecacf5ad08c4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    @cxzl25, to clarify:
    
    > Some data is lost and the data read out is dirty
    
    To clarify, is this a potential cause of a wrong-answer correctness bug? If so, we should be sure to backport the resulting fix to maintenance branches. /cc @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187861750
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
    --- End diff --
    
    This is not related to this pr though, `sys.error` instead of `UnsupportedOperationException`?
    https://github.com/apache/spark/blob/b6c50d7820aafab172835633fb0b35899e93146b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java#L45


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187907559
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    -      val newPage = new Array[Long](used * 2)
    +      val multiples = math.max(math.ceil(needSize.toDouble / (used * 8L)).toInt, 2)
    +      ensureAcquireMemory(used * 8L * multiples)
    --- End diff --
    
    ok.Spliting append func into two parts: grow/append.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    > Calculate the new size simply by multiplying by 2
    At this time, the size of the application may not be enough to store data
    Some data is lost and the data read out is dirty
    
    Can you explain more about it? IIUC if we don't have enough memory for `size * 2`, we would just fail with OOM, instead of setting a wrong size.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189878180
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -626,6 +618,32 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
       }
     
    +  private def grow(neededSize: Int): Unit = {
    +    // There is 8 bytes for the pointer to next value
    +    val totalNeededSize = cursor + 8 + neededSize
    --- End diff --
    
    nit:
    ```
    private def grow(inputRowSize: Int): Unit = {
      val neededSize = cursor + 8 + inputRowSize
      val currentSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91009/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189908375
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -626,6 +618,29 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
       }
     
    +  private def grow(inputRowSize: Int): Unit = {
    +    val neededNumWords = (cursor - Platform.LONG_ARRAY_OFFSET + 8 + inputRowSize + 7) / 8
    +    if (neededNumWords > page.length) {
    +      if (neededNumWords > (1 << 30)) {
    +        throw new UnsupportedOperationException(
    +          "Can not build a HashedRelation that is larger than 8G")
    +      }
    +      val newNumWords = math.max(neededNumWords, math.min(page.length * 2, 1 << 30))
    +      if (newNumWords > ARRAY_MAX) {
    --- End diff --
    
    we won't need this check now, `newNumWords` is guaranteed to be less than (1 << 30), which is much smaller than `ARRAY_MAX`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189912606
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Seq(BoundReference(0, StringType, false)))
    +    val keys = Seq(0L)
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +    val bigStr = UTF8String.fromString("x" * 1024 * 1024 * 2)
    +    keys.foreach { k =>
    +      map.append(k, unsafeProj(InternalRow(bigStr)))
    +    }
    +    map.optimize()
    +    val row = unsafeProj(InternalRow(bigStr)).copy()
    --- End diff --
    
    `val resultRow = new UnsafeRow(1)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90575/testReport)** for PR 21311 at commit [`d9d8e62`](https://github.com/apache/spark/commit/d9d8e62c2de7d9d04534396ab3bbf984ab16c7f5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91052/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189910936
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Seq(BoundReference(0, StringType, false)))
    +    val keys = Seq(0L)
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +    val bigStr = UTF8String.fromString("x" * 1024 * 1024 * 2)
    +    keys.foreach { k =>
    --- End diff --
    
    we just have one key, why use loop?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90970/testReport)** for PR 21311 at commit [`d7da8ae`](https://github.com/apache/spark/commit/d7da8aebf3f99cc38bba4f8bf9f148e87ed2fa23).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21311


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90575/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91066/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90966/testReport)** for PR 21311 at commit [`6fe1dd0`](https://github.com/apache/spark/commit/6fe1dd07c4dc4ad8c42f374def86ecacf5ad08c4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90981/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90574/testReport)** for PR 21311 at commit [`22a2767`](https://github.com/apache/spark/commit/22a2767b98185edf32be3c36bb255f5837ad7466).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    cc @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91009/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91052/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90970/testReport)** for PR 21311 at commit [`d7da8ae`](https://github.com/apache/spark/commit/d7da8aebf3f99cc38bba4f8bf9f148e87ed2fa23).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    @JoshRosen  @cloud-fan @gatorsmile 
    When introducing [SPARK-10399](https://issues.apache.org/jira/browse/SPARK-10399),UnsafeRow#getUTF8String check the size at this time.
    [UnsafeRow#getUTF8String](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L420)
    [OnHeapMemoryBlock](https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/memory/OnHeapMemoryBlock.java#L34)
    >The sum of size 2097152 and offset 32 should not be larger than the size of the given memory space 2097168
    ![image](https://user-images.githubusercontent.com/3898450/40353741-cbd02616-5de4-11e8-9406-4fcb65a69294.png)
    
    But when this patch is not introduced, no error, get wrong value.
    
    ![spark-2 2 0](https://user-images.githubusercontent.com/3898450/40353815-febe5494-5de4-11e8-9c43-e75a95a3e74f.png)
    
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91009/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90981/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90574/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187861317
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    -      val newPage = new Array[Long](used * 2)
    +      val multiples = math.max(math.ceil(needSize.toDouble / (used * 8L)).toInt, 2)
    +      ensureAcquireMemory(used * 8L * multiples)
    --- End diff --
    
    How about shaping up this logic along with the other similar ones (spliting this func into two parts: `grow`/`append`)? e.g., `UTF8StringBuilder`  https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UTF8StringBuilder.java#L43


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    thanks, merging to master/2.3/2.2/2.1/2.0! There is no conflict so I backported all the way to 2.0. I'll watch the jenkins build in the few days.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91018/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r190146533
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17 (1M bytes),
    +    // so here we need a value larger than 1 << 18 (2M bytes),to trigger the bug
    +    val bigStr = UTF8String.fromString("x" * (1 << 22))
    --- End diff --
    
    LongToUnsafeRowMap#getRow
    resultRow=UnsafeRow#pointTo(page(1<<18), baseOffset(16), sizeInBytes(1<<21+16))
    
    UTF8String#getBytes
    copyMemory(base(page), offset, bytes, BYTE_ARRAY_OFFSET, numBytes(1<<21+16));
    
    In the case of similar size sometimes, can still read the original value.
    
    When introducing SPARK-10399,UnsafeRow#getUTF8String check the size at this time.
    If we pick 1 << 18 + 1, 100% reproduce this bug.
    
    But when this patch is not introduced, differences that are too small sometimes do not trigger.
    So I chose a larger value.
    
    My understanding may be problematic. Please advise. Thank you.
    
    ```java
            sun.misc.Unsafe unsafe;
            try {
                Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe");
                unsafeField.setAccessible(true);
                unsafe = (sun.misc.Unsafe) unsafeField.get(null);
            } catch (Throwable cause) {
                unsafe = null;
            }
    
            String value = "xxxxx";
            byte[] src = value.getBytes();
    
            byte[] dst = new byte[3];
            byte[] newDst = new byte[5];
    
            unsafe.copyMemory(src, 16, dst, 16, src.length);
            unsafe.copyMemory(dst, 16, newDst, 16, src.length);
    
            System.out.println("dst:" + new String(dst));
            System.out.println("newDst:" + new String(newDst));
    ```
    output:
    >dst:xxx
    >newDst:xxxxx
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91018/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r190106818
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17 (1M bytes),
    +    // so here we need a value larger than 1 << 18 (2M bytes),to trigger the bug
    +    val bigStr = UTF8String.fromString("x" * (1 << 22))
    --- End diff --
    
    do you mean this bug can't be reproduced consistently? e.g. if we pick `1 << 18 + 1`, we may not expose this bug, so we have to use `1 << 22` to  100% reproduce this bug?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90967/testReport)** for PR 21311 at commit [`f3916e7`](https://github.com/apache/spark/commit/f3916e765c8efe640c0b3c9d33ea336393df5d01).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187883931
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    -      val newPage = new Array[Long](used * 2)
    +      val multiples = math.max(math.ceil(needSize.toDouble / (used * 8L)).toInt, 2)
    +      ensureAcquireMemory(used * 8L * multiples)
    --- End diff --
    
    +1 on grow/append


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r190256996
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17 (1M bytes),
    +    // so here we need a value larger than 1 << 18 (2M bytes),to trigger the bug
    +    val bigStr = UTF8String.fromString("x" * (1 << 22))
    --- End diff --
    
    then 1 << 19 should be good enough as it doubles the size?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Thanks for your review. @maropu @kiszk @cloud-fan 
    
    I submitted a modification including the following:
    1. spliting append func into two parts:grow/appendG
    2. doubling the size when growing
    3. sys.error instead of UnsupportedOperationException



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91066/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189974407
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17 (1M bytes),
    +    // so here we need a value larger than 1 << 18 (2M bytes),to trigger the bug
    +    val bigStr = UTF8String.fromString("x" * (1 << 22))
    --- End diff --
    
    to double check, do we have to use `1 << 22` to trigger this bug?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189907763
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -626,6 +618,29 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
       }
     
    +  private def grow(inputRowSize: Int): Unit = {
    +    val neededNumWords = (cursor - Platform.LONG_ARRAY_OFFSET + 8 + inputRowSize + 7) / 8
    --- End diff --
    
    don't forget the comment for the 8 bytes pointer


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189911070
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -30,6 +30,7 @@ import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.plans.physical.BroadcastMode
     import org.apache.spark.sql.types.LongType
     import org.apache.spark.unsafe.Platform
    +import org.apache.spark.unsafe.array.ByteArrayMethods
    --- End diff --
    
    not needed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189911946
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Seq(BoundReference(0, StringType, false)))
    --- End diff --
    
    nit: `UnsafeProjection.create(Array(StringType))`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189910750
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Seq(BoundReference(0, StringType, false)))
    +    val keys = Seq(0L)
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +    val bigStr = UTF8String.fromString("x" * 1024 * 1024 * 2)
    --- End diff --
    
    let's add a comment to say, the page array is initialized with length `1 << 17`, so here we need a value larger than `1 << 18`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90970/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91052/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187884949
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    --- End diff --
    
    Doubling the size when growing is very typical, seems what you want to address is when the memory is enough for the requsted size but not enough for doubling the size. I'd suggest we should double the size most of the time, as long as there is enough memory.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187857950
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    -      val newPage = new Array[Long](used * 2)
    +      val multiples = math.max(math.ceil(needSize.toDouble / (used * 8L)).toInt, 2)
    +      ensureAcquireMemory(used * 8L * multiples)
    --- End diff --
    
    Do we move the size check into before the allocation? IIUC, we have to check `used * multiplies` <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189997697
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17 (1M bytes),
    +    // so here we need a value larger than 1 << 18 (2M bytes),to trigger the bug
    +    val bigStr = UTF8String.fromString("x" * (1 << 22))
    --- End diff --
    
    Not necessary. 
    Just chose a larger value to make it easier to lose data.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r190345942
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17 (1M bytes),
    +    // so here we need a value larger than 1 << 18 (2M bytes),to trigger the bug
    +    val bigStr = UTF8String.fromString("x" * (1 << 22))
    --- End diff --
    
    Yes. I think so.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91041/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187857852
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
           }
    -      ensureAcquireMemory(used * 8L * 2)
    -      val newPage = new Array[Long](used * 2)
    +      val multiples = math.max(math.ceil(needSize.toDouble / (used * 8L)).toInt, 2)
    +      ensureAcquireMemory(used * 8L * multiples)
    +      val newPage = new Array[Long](used * multiples)
    --- End diff --
    
    Do we move the size check into before the allocation? IIUC, we have to check `used * multiplies` <= `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91018/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r187907410
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -568,13 +568,16 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
         }
     
         // There is 8 bytes for the pointer to next value
    -    if (cursor + 8 + row.getSizeInBytes > page.length * 8L + Platform.LONG_ARRAY_OFFSET) {
    +    val needSize = cursor + 8 + row.getSizeInBytes
    +    val nowSize = page.length * 8L + Platform.LONG_ARRAY_OFFSET
    +    if (needSize > nowSize) {
           val used = page.length
           if (used >= (1 << 30)) {
             sys.error("Can not build a HashedRelation that is larger than 8G")
    --- End diff --
    
    ok. sys.error instead of UnsupportedOperationException


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90574/testReport)** for PR 21311 at commit [`22a2767`](https://github.com/apache/spark/commit/22a2767b98185edf32be3c36bb255f5837ad7466).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #90981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90981/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21311#discussion_r189937018
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala ---
    @@ -254,6 +254,30 @@ class HashedRelationSuite extends SparkFunSuite with SharedSQLContext {
         map.free()
       }
     
    +  test("LongToUnsafeRowMap with big values") {
    +    val taskMemoryManager = new TaskMemoryManager(
    +      new StaticMemoryManager(
    +        new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    +        Long.MaxValue,
    +        Long.MaxValue,
    +        1),
    +      0)
    +    val unsafeProj = UnsafeProjection.create(Array[DataType](StringType))
    +    val map = new LongToUnsafeRowMap(taskMemoryManager, 1)
    +
    +    val key = 0L
    +    // the page array is initialized with length 1 << 17,
    +    // so here we need a value larger than 1 << 18
    +    val bigStr = UTF8String.fromString("x" * 1024 * 1024 * 2)
    --- End diff --
    
    nit: can we just do `"x" * (1 << 19)` here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    **[Test build #91041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91041/testReport)** for PR 21311 at commit [`b8b6324`](https://github.com/apache/spark/commit/b8b632450d824d7abf092c10f8e94f6938be1104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21311
  
    @cloud-fan Thank you very much for your help.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org