You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by scwf <gi...@git.apache.org> on 2014/07/25 09:26:49 UTC

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

GitHub user scwf opened a pull request:

    https://github.com/apache/spark/pull/1587

    [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/scwf/spark fixColumnBuilder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1587
    
----
commit 4f1629e39b9d25fb2d7fc17c8e2dfa496dd7bc3a
Author: wangfei <wa...@huawei.com>
Date:   2014-07-25T07:26:22Z

    [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1587#discussion_r15390285
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnBuilder.scala ---
    @@ -118,7 +118,7 @@ private[sql] class BinaryColumnBuilder extends ComplexColumnBuilder(BINARY)
     private[sql] class GenericColumnBuilder extends ComplexColumnBuilder(GENERIC)
     
     private[sql] object ColumnBuilder {
    -  val DEFAULT_INITIAL_BUFFER_SIZE = 10 * 1024 * 104
    +  val DEFAULT_INITIAL_BUFFER_SIZE = 10 * 1024 * 1024
    --- End diff --
    
    While this is definitely a bug fix, it is also pretty dangerous. This means a 100 column table would use 1G of buffer per task, and with 32 cores, that's 32G of ram just for the buffer.  @marmbrus @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1587#issuecomment-50116636
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/1587#issuecomment-50127420
  
    ok.thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by scwf <gi...@git.apache.org>.

Github user scwf closed the pull request at:

    https://github.com/apache/spark/pull/1587


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by scwf <gi...@git.apache.org>.

Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1587#discussion_r15391086
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnBuilder.scala ---
    @@ -118,7 +118,7 @@ private[sql] class BinaryColumnBuilder extends ComplexColumnBuilder(BINARY)
     private[sql] class GenericColumnBuilder extends ComplexColumnBuilder(GENERIC)
     
     private[sql] object ColumnBuilder {
    -  val DEFAULT_INITIAL_BUFFER_SIZE = 10 * 1024 * 104
    +  val DEFAULT_INITIAL_BUFFER_SIZE = 10 * 1024 * 1024
    --- End diff --
    
    i am voluntary to fix this issue, can you assign this issue for me？@rxin @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1587#discussion_r15390674
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnBuilder.scala ---
    @@ -118,7 +118,7 @@ private[sql] class BinaryColumnBuilder extends ComplexColumnBuilder(BINARY)
     private[sql] class GenericColumnBuilder extends ComplexColumnBuilder(GENERIC)
     
     private[sql] object ColumnBuilder {
    -  val DEFAULT_INITIAL_BUFFER_SIZE = 10 * 1024 * 104
    +  val DEFAULT_INITIAL_BUFFER_SIZE = 10 * 1024 * 1024
    --- End diff --
    
    Yeah, lets wait to fix this until we fix the other issues....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [sql] fix DEFAULT_INITIAL_BUFFER_SIZE

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1587#issuecomment-50126452
  
    @marmbrus has already filed [SPARK-2650](https://issues.apache.org/jira/browse/SPARK-2650) to track this issue and assigned to me. The `104` here is definitely a stupid typo (my bad, sorry...). But the more important missing piece here is that we haven't implemented the logic to calculate a proper initial buffer size like what Shark does. Shark initializes the buffer size by estimating the size of the target column based on DFS block size and `ColumnType.defaultSize` (see [here](https://github.com/amplab/shark/blob/f8c16f2b6c5990279376e9fd7178301f0341c6b2/src/main/scala/shark/memstore2/ColumnarSerDe.scala#L57-L65) and [here](https://github.com/amplab/shark/blob/f8c16f2b6c5990279376e9fd7178301f0341c6b2/src/main/scala/shark/memstore2/ColumnarSerDe.scala#L57-L65)).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---