You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2016/02/22 20:49:39 UTC

[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/11307

    [SPARK-13437][SQL] Introduce InternalColumn and MutableColumn

    ## What changes were proposed in this pull request?
    This PR introduce ``InternalColumn`` and ``MutableColumn`` to abstract a columnar storage at code generation as an analogy with {{InternalRow}}. This would allow us to use different implementations of column storage.
    For now, {{ColumnarVector}} extends {{InternalRow}} and {{MutableColumn}}.
    
    ## How was the this patch tested?
    Succeeded to compile


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-13437

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11307
    
----
commit a4f0eb74ed29802c0c17a9edab2efe49b7705134
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2016-02-22T19:44:54Z

    Introduce InternalColumn and MutableColumn

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187575800
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51735/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187575556
  
    **[Test build #51735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51735/consoleFull)** for PR 11307 at commit [`f05938c`](https://github.com/apache/spark/commit/f05938cf8113377012091f5254b0b8f6bab94581).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187555111
  
    There are three values of doing this.
    
    a. **Make APIs clean and readable**: A mandatory set ofAPIs are well represented by introducing such an abstract class. The values of a. is already shown by introducing ``InternalRow``
    
    b. **Abstract implementation of columnar storage**: To use accessors defined in an abstract class, generated code and runtime code for operations can be portable among different types of columnar storage. For example, ``ColumnVector``, ``ColumnAccessor``, and possible Apache Arrow.
    In the current  generated code of whole stage code gen, accesses to elements is well abstracted. The generated code does not take care of an actual class of ``InternalRow``.
    The values of b. is already shown by introducing ``InternalRow``
    
    ````
        protected void processNext() throws java.io.IOException {
          while (input.hasNext()) {
            InternalRow inputadapter_row = (InternalRow) input.next();
            boolean inputadapter_isNull = inputadapter_row.isNullAt(0)
            ...
    ````
    
    c. **Do not degrade runtime performance**: If we make getter methods final, JIT compiler will inline these method and eliminate method calls. Thus, this abstraction does not degrade performance, as done in `InternalRow`. The values of c. is already shown by introducing ``InternalRow``


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-189087910
  
    Dear @nongli and @davies , I would appreciate it your review and comments about this abstraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk closed the pull request at:

    https://github.com/apache/spark/pull/11307


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187402102
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51663/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-188368967
  
    cc: @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187402096
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187545502
  
    **[Test build #51735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51735/consoleFull)** for PR 11307 at commit [`f05938c`](https://github.com/apache/spark/commit/f05938cf8113377012091f5254b0b8f6bab94581).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187458333
  
    What's the value of doing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187575797
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187401427
  
    **[Test build #51663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51663/consoleFull)** for PR 11307 at commit [`a4f0eb7`](https://github.com/apache/spark/commit/a4f0eb74ed29802c0c17a9edab2efe49b7705134).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-211058424
  
    Thru my recent activities, I realized it is unnecessary. Let me close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-187402082
  
    **[Test build #51663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51663/consoleFull)** for PR 11307 at commit [`a4f0eb7`](https://github.com/apache/spark/commit/a4f0eb74ed29802c0c17a9edab2efe49b7705134).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class InternalColumn extends SpecializedGetters with Serializable `
      * `abstract class MutableColumn extends InternalColumn `
      * `public abstract class ColumnVector extends MutableColumn `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13437][SQL] Introduce InternalColumn an...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11307#issuecomment-189515896
  
    For now, I did not see the value to do this. I think we can defer it to the day we actually need it.
    
    Maybe @nongli could say more on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org