You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by windpiger <gi...@git.apache.org> on 2017/01/11 07:01:20 UTC

[GitHub] spark pull request #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsR...

GitHub user windpiger opened a pull request:

    https://github.com/apache/spark/pull/16545

    [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions to InsertIntoHadoopFsRelationCommand.deleteMatchingPrefix

    ## What changes were proposed in this pull request?
    
    InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions delete all files that match a static prefix, such as a partition file path(/table/foo=1), or a no partition file path(/xxx/a.json).
    
    while the method name deleteMatchingPartitions indicates that only the partition file will be deleted. This name make a confused.
    
    It is better to rename the method name.
    
    ## How was this patch tested?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark modifyAMethodName

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16545
    
----
commit 0622c04c6129ef699ca0d8f6907d8bbc6d025387
Author: windpiger <so...@outlook.com>
Date:   2017-01-11T06:59:22Z

    [SPARK-19166][SQL]change method name from InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions to InsertIntoHadoopFsRelationCommand.deleteMatchingPrefix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    sorry, it is not the point. the example make some confuse.
    replaced it with another example.
    
    val df = spark.read.json("/path/a")
    val df1 = spark.read.json("/path/b")
    
    df.createOrReplaceTempView("x")
    df1.createOrReplaceTempView("y")
    
    spark.sql("insert overwrite table x select * from y")
    
    this sql executed will hit the function `deleteMatchingPartitions` here
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L104
    
    the point is `/path/a` is `not a partition path`(e.g. /path/a=1),we still delete it through `deleteMatchingPartitions`. 
    
    From the method name `MatchingPartitions` ,we may think it is only apply to a `partition file`(e.g. /path/a=1). If it also apply to `no partition file`, we will confused.
    
    if we change the name from `deleteMatchingPartitions` to `deleteMatchingPrefix` , is't more clear?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    thanks!
    
    val df = spark.read.json("/path/jsonfile")
    val df1 = spark.read.json("/path/jsonfile_1")
    
    df.createOrReplaceTempView("t")
    df1.createOrReplaceTempView("t1")
    
    spark.sql("insert overwrite table t select * from t1")
    
    this sql executed will hit the function `deleteMatchingPartitions` here
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L104
    
    while `/path/jsonfile` is `not` a partition path file,  `deleteMatchingPartitions` is not properly here?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71190/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    1. yes,  it take issue with the internal method/doc name, just a minor improvement .
        after all, partitioned and non-partitioned case are not different.
    2.a file matching the same prefix being deleted, this doesn't happen, it is ok.
      it seems like `deleteMatchingPrefix ` also make some confused.
     
    maybe `deleteMatchingPaths` more properly.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    So you're saying it works fine, you just take issue with the internal method/doc name? OK, but in the non-partitioned case, deleting everything is also a matter of deleting `/a*`. Implicitly there is one partition in the sense used here. I just don't see a big deal here.
    
    But the example in the comment you added seems to refer to a different case, about a file matching the same prefix being deleted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    thanks! even though it is a no-partition file, it will also be deleted, so I think this change will more clear


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsR...

Posted by windpiger <gi...@git.apache.org>.
Github user windpiger closed the pull request at:

    https://github.com/apache/spark/pull/16545


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71896/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    **[Test build #71190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71190/testReport)** for PR 16545 at commit [`0622c04`](https://github.com/apache/spark/commit/0622c04c6129ef699ca0d8f6907d8bbc6d025387).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    **[Test build #71896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71896/testReport)** for PR 16545 at commit [`15f6759`](https://github.com/apache/spark/commit/15f6759f4d9c111a4876928714da052652056c4c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    **[Test build #71413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71413/testReport)** for PR 16545 at commit [`6d1defb`](https://github.com/apache/spark/commit/6d1defb57407a12c6bf6020ed18cb2249328e435).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    **[Test build #71413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71413/testReport)** for PR 16545 at commit [`6d1defb`](https://github.com/apache/spark/commit/6d1defb57407a12c6bf6020ed18cb2249328e435).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Why would a file exist there? if it doesn't exist in any normal operation then I don't see a good motive for changing this, as it works as designed already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    I don't think this kind of thing is worth changing; the docs look correct. When would you have other files as the peer of a partition directory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    **[Test build #71896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71896/testReport)** for PR 16545 at commit [`15f6759`](https://github.com/apache/spark/commit/15f6759f4d9c111a4876928714da052652056c4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Are you saying the problem arises when the path to one dataset/table is a prefix of another?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16545: [SPARK-19166][SQL]rename from InsertIntoHadoopFsRelation...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16545
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71413/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org