You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/01/05 16:01:49 UTC

[GitHub] spark pull request #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory an...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16479

    [SPARK-19085][SQL] cleanup OutputWriterFactory and OutputWriter

    ## What changes were proposed in this pull request?
    
    `OutputWriterFactory`/`OutputWriter` are internal interfaces and we can remove some unnecessary APIs:
    1. `OutputWriterFactory.newWriter(path: String)`: no one calls it and no one implements it.
    2. `OutputWriter.write(row: Row): Unit`: during execution we only call `writeInternal`, which is weird as `OutputWriter` is already an internal interface. We should rename `writeInternal` to `write` and remove `def write(row: Row): Unit` and it's related converter code. All implementations should just implement `def write(row: InternalRow): Unit`
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark hive-writer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16479
    
----
commit 1de8e4946ed1f0c1ae5738b872acb6b995a8295f
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-05T15:56:28Z

    cleanup OutputWriterFactory and OutputWriter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    cc @liancheng @gatorsmile @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    Everything in package `org.apache.spark.sql.execution` should be internal to Spark SQL. Technically you can still implement `OutputWriter` outside of Spark, but there is no guarantee about the stability.
    
    Ideally we should not change any interface if unnecessary, but this change is reasonable. As an internal interface, it's more efficient to use `InternalRow` directly, instead of converting `InternalRow` to `Row` and then operate on `Row`. I'm sorry that this breaks spark-avro, but we can make spark-avro more efficient by switching to the new interface. Or we can just copy the previous conversion code to spark-avro, so that we can still covert `InternalRow` to `Row` and operate on `Row` in spark-avro.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    **[Test build #70947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70947/testReport)** for PR 16479 at commit [`79bb30c`](https://github.com/apache/spark/commit/79bb30cf222c43c98d4d52ab207d65fdca1f83b5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by lokkju <gi...@git.apache.org>.

Github user lokkju commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    So it essentially compiles each implementation against different spark versions, then *both* bytecodes are included in the final jar?  Then reflection to instantiate it.
    
    That works, without too much pain.  Might go that route, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    What is the benefit of making these changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16479
  
    **[Test build #70932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70932/testReport)** for PR 16479 at commit [`1de8e49`](https://github.com/apache/spark/commit/1de8e4946ed1f0c1ae5738b872acb6b995a8295f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org