You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/09 02:51:35 UTC

[GitHub] [arrow] westonpace opened a new pull request, #12848: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

westonpace opened a new pull request, #12848:
URL: https://github.com/apache/arrow/pull/12848

   Also changes the dataset writer to use this new method (specifying `missing_dir_ok=true`).  This should address behavior seen in ARROW-12358
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou closed pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing
URL: https://github.com/apache/arrow/pull/12848


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1110308694

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/0cfcde80f9b44355960905b9d31ab22a...d9c36b20a327420ea37d649df4131d84/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #12848:
URL: https://github.com/apache/arrow/pull/12848#discussion_r853221035


##########
cpp/src/arrow/filesystem/gcsfs.cc:
##########
@@ -477,10 +478,22 @@ class GcsFileSystem::Impl {
     std::vector<Future<>> submitted;
     // This iterates over all the objects, and schedules parallel deletes.
     auto prefix = p.object.empty() ? gcs::Prefix() : gcs::Prefix(canonical);
+    bool at_least_one_obj = false;
     for (const auto& o : client_.ListObjects(p.bucket, prefix)) {
+      at_least_one_obj = true;
       submitted.push_back(DeferNotOk(io_context.executor()->Submit(async_delete, o)));
     }
 
+    if (!missing_dir_ok && !at_least_one_obj) {
+      // If any files were found the directory implicitly exists.  If none were
+      // found then we still have to consider a marker-only directory so we test
+      // for that.
+      ARROW_ASSIGN_OR_RAISE(auto file_info, GetFileInfo(p));
+      if (file_info.type() == FileType::NotFound) {
+        return Status::IOError("No directory or any files matching path: ", p.full_path);
+      }
+    }

Review Comment:
   Hmm, I think you can reuse the `dir` variable introduced above?
   ```suggestion
       if (!missing_dir_ok && !at_least_one_obj && !dir) {
         // No files were found and no directory marker exists
         return Status::IOError("No such directory: ", p.full_path);
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1093617604

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12848: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1093615889

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW
   
   Opening JIRAs ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #12848:
URL: https://github.com/apache/arrow/pull/12848#discussion_r852230370


##########
python/pyarrow/fs.py:
##########
@@ -346,18 +346,24 @@ def create_dir(self, path, recursive):
     def delete_dir(self, path):
         self.fs.rm(path, recursive=True)
 
-    def _delete_dir_contents(self, path):
-        for subpath in self.fs.listdir(path, detail=False):
+    def _delete_dir_contents(self, path, missing_dir_ok):
+        try:
+            subpaths = self.fs.listdir(path, detail=False)
+        except FileNotFoundError:
+            if missing_dir_ok:
+                return
+            raise
+        for subpath in subpaths:
             if self.fs.isdir(subpath):
                 self.fs.rm(subpath, recursive=True)
             elif self.fs.isfile(subpath):
                 self.fs.rm(subpath)
 
-    def delete_dir_contents(self, path):
+    def delete_dir_contents(self, path, missing_dir_ok):

Review Comment:
   I don't think either approach you suggest is better than the explicit compatibility breakage. This is indeed an annoyance for third-party implementations, but there are probably not many of them, and their authors should be competent enough to write the necessary compatibility handling code if required (basically by inspecting the PyArrow version).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #12848:
URL: https://github.com/apache/arrow/pull/12848#discussion_r851327053


##########
python/pyarrow/fs.py:
##########
@@ -346,18 +346,24 @@ def create_dir(self, path, recursive):
     def delete_dir(self, path):
         self.fs.rm(path, recursive=True)
 
-    def _delete_dir_contents(self, path):
-        for subpath in self.fs.listdir(path, detail=False):
+    def _delete_dir_contents(self, path, missing_dir_ok):
+        try:
+            subpaths = self.fs.listdir(path, detail=False)
+        except FileNotFoundError:
+            if missing_dir_ok:
+                return
+            raise
+        for subpath in subpaths:
             if self.fs.isdir(subpath):
                 self.fs.rm(subpath, recursive=True)
             elif self.fs.isfile(subpath):
                 self.fs.rm(subpath)
 
-    def delete_dir_contents(self, path):
+    def delete_dir_contents(self, path, missing_dir_ok):

Review Comment:
   I suppose this will break existing implementations, because our C++ code will now pass this keyword, while existing handlers won't yet accept that keyword in this method like above. 
   
   Now, that will _always_ happen if we add new keywords, and we want to be able to add new keywords (so not saying we shouldn't do this change). But therefore wondering:
   
   - Should we recommend adding `**kwargs` (that get ignored) to all methods on the Handler class to avoid such breakage? (of course, that would ignore the keyword initially when we add one, so also result in unexpected behaviour for the user. So that is maybe not better as initially raising an error until the handler gets updated for the latest arrow version)
   - In theory we could only pass the keyword in the C++ code (in `PyFileSystem::DeleteDirContents`) to the handler method _if_ it has a non-default value, and otherwise not pass it (and set a default on the line above). Of course that complicates the C++ code with an if/else block (and introduces a risk of introducing inconsistent defaults at some point if one would change).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1101961422

   CI was failing because I did not have a GCS solution.  I've added that and think this is probably ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1105297429

   Benchmark runs are scheduled for baseline = 7414cba1ddbf90b5aab34671ea5ec34ee0c7f648 and contender = a5e45cecb24229433b825dac64e0ffd10d400e8c. a5e45cecb24229433b825dac64e0ffd10d400e8c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/2edfc6b1b0e54393a819aa3fd10c5ce9...573b7131b41645a995b4287830ede341/)
   [Finished :arrow_down:0.63% :arrow_up:0.08%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a7f667a65a9045a59841e2f4488d0232...bb6326696e8f4ec0bc1f5b2f1779bf4a/)
   [Failed :arrow_down:0.75% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/0cfcde80f9b44355960905b9d31ab22a...d9c36b20a327420ea37d649df4131d84/)
   [Finished :arrow_down:0.3% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f1dc63fa0c9843e59af14005fbd3d9fe...e1210cf39e3b40b1906b8d5c823133cf/)
   Buildkite builds:
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/550| `a5e45cec` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/538| `a5e45cec` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/536| `a5e45cec` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/548| `a5e45cec` ursa-thinkcentre-m75q>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/549| `7414cba1` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/536| `7414cba1` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/535| `7414cba1` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/547| `7414cba1` ursa-thinkcentre-m75q>
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1093617587

   https://issues.apache.org/jira/browse/ARROW-16159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #12848: ARROW-16159: [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing

Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #12848:
URL: https://github.com/apache/arrow/pull/12848#issuecomment-1101803107

   > Perhaps you can add to the generic filesystems tests in arrow/filesystem/test_util.cc?
   
   @pitrou 
   
   Good idea, thanks.  I removed the test I had added to `localfs_test.cc` since this was a brand new test and a subset of the generic test.  I left the additions to `s3fs_test.cc` and `hdfs_test.cc` as they already had independent tests for DeleteDirContents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org