You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuanyuanking <gi...@git.apache.org> on 2017/07/17 12:11:06 UTC

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

GitHub user xuanyuanking opened a pull request:

    https://github.com/apache/spark/pull/18654

    [SPARK-21435][SQL] Empty files should be skipped while write to file

    ## What changes were proposed in this pull request?
    
    Add EmptyDirectoryWriteTask for empty task while writing files.
    
    ## How was this patch tested?
    
    Add new test in `FileFormatWriterSuite `

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuanyuanking/spark SPARK-21435

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18654
    
----
commit ff92ba3ae3abaae0d8ce8bb8c08465f43c14906f
Author: xuanyuanking <xy...@gmail.com>
Date:   2017-07-17T07:38:56Z

    empty files should be skipped while write to file

commit e08fb1939d5267a38f3318af7506b6ed8628ebbf
Author: xuanyuanking <xy...@gmail.com>
Date:   2017-07-17T10:20:34Z

    Handle the empty result of parquet

commit 6153001bc42deee197030ad91fbb4f72bd1aa5d3
Author: xuanyuanking <xy...@gmail.com>
Date:   2017-07-17T12:09:08Z

    Modify UT and add more notes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79695/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127722941
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    --- End diff --
    
    `withTempPath` can be used instead I believe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    I think ORC can be dealt with separately (the problem is within ORC source given my past investigation).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79700/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18654


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127722734
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
         committer.setupTask(taskAttemptContext)
     
         val writeTask =
    -      if (description.partitionColumns.isEmpty && description.bucketIdExpression.isEmpty) {
    +      if (sparkPartitionId != 0 && !iterator.hasNext) {
    --- End diff --
    
    I guess this might be okay in that sense we are guaranteed partitions to be started from 0 up to my knowledge. Could you take a look and see if it makes sense to you cc @cloud-fan if you don't mind? I am not confident enough to proceed reviewing and leave a sign-off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79700/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127724518
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    --- End diff --
    
    Could I ask what this test targets? I think I am lost around here ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127867254
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
         committer.setupTask(taskAttemptContext)
     
         val writeTask =
    -      if (description.partitionColumns.isEmpty && description.bucketIdExpression.isEmpty) {
    +      if (sparkPartitionId != 0 && !iterator.hasNext) {
    --- End diff --
    
    This is a little hacky but is the simplest fix I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127867341
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79667/testReport)** for PR 18654 at commit [`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127868378
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    +
    +      withTempDir { dst_dir =>
    +        dst_dir.delete()
    +        df.where("id = 50").write.parquet(dst_dir.toString)
    --- End diff --
    
    I was thinking just in order to make sure the (previous) number of files written out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)** for PR 18654 at commit [`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127856498
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    --- End diff --
    
    Both is ok I think, just copy this from `HadoopFsRelationSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127723973
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    --- End diff --
    
    Can we just do this simpler? for example,
    
    ```
    .listFiles().filter { f =>
      f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_")
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79694/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127888746
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,43 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempPath { dir =>
    --- End diff --
    
    More clear :) No need to create source files in real.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127860327
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    --- End diff --
    
    Yea. If both are okay, let's go for the shorter one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Yep, empty result dir need this meta, otherwise will throw the exception:
    ```
    org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
      at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188)
      at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:187)
      at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
      at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:571)
      at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:555)
      ... 48 elided
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    > leaving the first partition for meta writing
    
    What is the meta we need to write?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    ping @cloud-fan @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    retest this please...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79667/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79687/testReport)** for PR 18654 at commit [`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r128134922
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
         committer.setupTask(taskAttemptContext)
     
         val writeTask =
    -      if (description.partitionColumns.isEmpty && description.bucketIdExpression.isEmpty) {
    +      if (sparkPartitionId != 0 && !iterator.hasNext) {
    --- End diff --
    
    cc @yhuai too who reviewed my similar PR before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127860788
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    --- End diff --
    
    but I guess this one (the latter) does not test this change? If this test passes regardless of this PR change, I would rather remove this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127867549
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    +
    +      withTempDir { dst_dir =>
    +        dst_dir.delete()
    +        df.where("id = 50").write.parquet(dst_dir.toString)
    +        val allFiles = dst_dir.listFiles(new FilenameFilter {
    +          override def accept(dir: File, name: String): Boolean = {
    +            !name.startsWith(".") && !name.startsWith("_")
    +          }
    +        })
    +        // First partition file and the data file
    --- End diff --
    
    Ideally we only need the first partition file if all other partitions are empty, but this is hard to do right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79667/testReport)** for PR 18654 at commit [`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79687/testReport)** for PR 18654 at commit [`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127865091
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    --- End diff --
    
    OK, I'll remove this assert and leave a note.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127872988
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    +
    +      withTempDir { dst_dir =>
    +        dst_dir.delete()
    +        df.where("id = 50").write.parquet(dst_dir.toString)
    +        val allFiles = dst_dir.listFiles(new FilenameFilter {
    +          override def accept(dir: File, name: String): Boolean = {
    +            !name.startsWith(".") && !name.startsWith("_")
    +          }
    +        })
    +        // First partition file and the data file
    --- End diff --
    
    Can't agree more,  firstly I try to implement like this but the `FileFormatWriter.write` can only see the iterator of each task self.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79687/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127867380
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    --- End diff --
    
    +1 for the shorter one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79695/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    @HyukjinKwon Thanks for you comment, as your mentioned in #18650 and #17395, empty results of parquet can be fixed by leave the first partition, how about the orc format? The orc format error for empty result should also consider together within this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)** for PR 18654 at commit [`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127867290
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ---
    @@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
         committer.setupTask(taskAttemptContext)
     
         val writeTask =
    -      if (description.partitionColumns.isEmpty && description.bucketIdExpression.isEmpty) {
    +      if (sparkPartitionId != 0 && !iterator.hasNext) {
    --- End diff --
    
    cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79704/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79700/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127724839
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    +
    +      withTempDir { dst_dir =>
    +        dst_dir.delete()
    +        df.where("id = 50").write.parquet(dst_dir.toString)
    --- End diff --
    
    I would explicitly repartition here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79704/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127876852
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,43 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempPath { dir =>
    --- End diff --
    
    Could we maybe just do as below?
    
    ```scala
    withTempPath { path =>
      spark.range(100).repartition(10).where("id = 50").write.parquet(path)
      val partFiles = path.listFiles()
        .filter(f => f.isFile && !f.getName.startsWith(".") && !f.getName.startsWith("_"))    
      assert(partFiles.length === 2)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79704/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127869754
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    +
    +      withTempDir { dst_dir =>
    +        dst_dir.delete()
    +        df.where("id = 50").write.parquet(dst_dir.toString)
    --- End diff --
    
    I mean..  for example, if we happen to have a single partition in the `df` in any event, I guess this test will become invalid ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    schema and the footer in case of Parquet. There is more context here - https://github.com/apache/spark/pull/17395#discussion_r107611325.
    
    For example, if we don't write out the empty files, it breaks:
    
    ```scala
    spark.range(100).filter("id > 100").write.parquet("/tmp/abc")
    spark.read.parquet("/tmp/abc").show()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18654
  
    **[Test build #79695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79695/testReport)** for PR 18654 at commit [`d118d68`](https://github.com/apache/spark/commit/d118d685374242599a12d6536675ba7aeae4bfb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127867486
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    +
    +      withTempDir { dst_dir =>
    +        dst_dir.delete()
    +        df.where("id = 50").write.parquet(dst_dir.toString)
    --- End diff --
    
    why we need repartition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18654#discussion_r127856720
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.datasources
    +
    +import java.io.{File, FilenameFilter}
    +
    +import org.apache.spark.sql.QueryTest
    +import org.apache.spark.sql.test.SharedSQLContext
    +
    +class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
    +
    +  test("empty file should be skipped while write to file") {
    +    withTempDir { dir =>
    +      dir.delete()
    +      spark.range(10000).repartition(10).write.parquet(dir.toString)
    +      val df = spark.read.parquet(dir.toString)
    +      val allFiles = dir.listFiles(new FilenameFilter {
    +        override def accept(dir: File, name: String): Boolean = {
    +          !name.startsWith(".") && !name.startsWith("_")
    +        }
    +      })
    +      assert(allFiles.length == 10)
    --- End diff --
    
    Just make sure the source dir have many files, and the output dir only have 2 files.
    If this make people confuse, just leave a notes and delete the assert?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org