You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2016/05/18 22:32:28 UTC

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/13181

    Revert "[SPARK-10216][SQL] Avoid creating empty files during overwrit…

    This reverts commit 8d05a7a from #8411, which seems to have caused regressions when working with empty DataFrames.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark revert12855

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13181
    
----
commit 2222b38f82593f0d93fcaa85ac641092469639ec
Author: Michael Armbrust <mi...@databricks.com>
Date:   2016-05-18T22:30:16Z

    Revert "[SPARK-10216][SQL] Avoid creating empty files during overwriting with group by query"
    
    This reverts commit 8d05a7a98bdbd3ce7c81d273e05a375877ebe68f.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220195031
  
    **[Test build #58821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58821/consoleFull)** for PR 13181 at commit [`2222b38`](https://github.com/apache/spark/commit/2222b38f82593f0d93fcaa85ac641092469639ec).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220180441
  
    **[Test build #58818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58818/consoleFull)** for PR 13181 at commit [`2222b38`](https://github.com/apache/spark/commit/2222b38f82593f0d93fcaa85ac641092469639ec).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220527118
  
    @jurriaan Maybe I am doing this wrong. I will tell you after testing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220704424
  
    I'm going to go ahead and merge this, but please to ping me on follow up issues that try to add this back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220207798
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13181


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220195260
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220200736
  
    @marmbrus Sure I will


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220183568
  
    **[Test build #58821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58821/consoleFull)** for PR 13181 at commit [`2222b38`](https://github.com/apache/spark/commit/2222b38f82593f0d93fcaa85ac641092469639ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220195134
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220522607
  
    @marmbrus I tested and could produce the exceptions for reading in https://issues.apache.org/jira/browse/SPARK-15393 but it seems this might not be the reason.
    
    I tested the codes below on https://github.com/apache/spark/commit/c0c3ec35476c756e569a1f34c4b258eb0490585c (right before this PR) and master branch.
    
    ```scala
      test("SPARK-15393: create empty file") {
        withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
          withTempPath { path =>
            val schema = StructType(
              StructField("k", StringType, true) ::
              StructField("v", IntegerType, false) :: Nil)
            val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
            emptyDf.write
              .format("parquet")
              .save(path.getCanonicalPath)
    
            val copyEmptyDf = spark.read
              .format("parquet")
              .load(path.getCanonicalPath)
    
            copyEmptyDf.show()
          }
        }
      }
    ```
    
    and it seems both produce the exceptions below:
    
    ```scala
    Unable to infer schema for ParquetFormat at /private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443. It must be specified manually;
    org.apache.spark.sql.AnalysisException: Unable to infer schema for ParquetFormat at /private/var/folders/9j/gf_c342d7d150mwrxvkqnc180000gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443. It must be specified manually;
    	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
    	at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
    	at scala.Option.getOrElse(Option.scala:121)
    	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:323)
    ```
    
    I will try to figure out why but please feel free to revert this if you think my PR is problematic. I will fix the both issues together anyway later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220179010
  
    **[Test build #58818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58818/consoleFull)** for PR 13181 at commit [`2222b38`](https://github.com/apache/spark/commit/2222b38f82593f0d93fcaa85ac641092469639ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220222603
  
    Hi @marmbrus , it seems okay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220198280
  
    hmmm, this might be failing tests?  @HyukjinKwon can you investigate if it fails again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220183153
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220196004
  
    **[Test build #58829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58829/consoleFull)** for PR 13181 at commit [`2222b38`](https://github.com/apache/spark/commit/2222b38f82593f0d93fcaa85ac641092469639ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by jurriaan <gi...@git.apache.org>.
Github user jurriaan commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220526725
  
    Interesting, I'm currently working with a custom build where I've reverted the PR manually to work around the issue. Will add an testcase to the JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220180463
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58818/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220207650
  
    **[Test build #58829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58829/consoleFull)** for PR 13181 at commit [`2222b38`](https://github.com/apache/spark/commit/2222b38f82593f0d93fcaa85ac641092469639ec).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220207800
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58829/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220195138
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58821/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13181#issuecomment-220180461
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org