You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/22 18:06:57 UTC

[GitHub] [arrow] pitrou opened a new pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

pitrou opened a new pull request #10574:
URL: https://github.com/apache/arrow/pull/10574


   * Ensure the HadoopFileSystem meets most requirements from the FileSystem API.
   * Implement HadoopFileSystem::CopyFile.
   * Enable generic filesystem tests for HadoopFileSystem.
   * Add generic filesystem test for special characters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#issuecomment-866220006


   @github-actions crossbow submit test-conda-python-3.7-hdfs-3.2.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#discussion_r656927513



##########
File path: cpp/src/arrow/filesystem/test_util.cc
##########
@@ -453,9 +453,11 @@ void GenericFileSystemTest::TestMoveDir(FileSystem* fs) {
   AssertAllFiles(fs, {"EF/ghi", "KL/CD/def", "KL/abc"});
 
   // Destination is a non-empty directory
-  ASSERT_RAISES(IOError, fs->Move("KL", "EF"));
-  AssertAllDirs(fs, {"EF", "KL", "KL/CD"});
-  AssertAllFiles(fs, {"EF/ghi", "KL/CD/def", "KL/abc"});
+  if (!allow_move_dir_over_non_empty_dir()) {
+    ASSERT_RAISES(IOError, fs->Move("KL", "EF"));
+    AssertAllDirs(fs, {"EF", "KL", "KL/CD"});
+    AssertAllFiles(fs, {"EF/ghi", "KL/CD/def", "KL/abc"});
+  }

Review comment:
       Yes, will do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10574:
URL: https://github.com/apache/arrow/pull/10574


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#issuecomment-866219885


   @github-actions crossbow submit test-conda-python-3.7-hdfs-2.9.2 test-conda-python-3.7-hdfs-3.21


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#issuecomment-866229901


   Crossbow builds submitted at https://github.com/ursacomputing/crossbow/branches/all?query=build-364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#discussion_r656488996



##########
File path: cpp/src/arrow/filesystem/hdfs.cc
##########
@@ -134,21 +140,39 @@ class HadoopFileSystem::Impl {
     }
     std::vector<FileInfo> results;
 
+    // Fetch working directory.
+    // If select.base_dir is relative, we need to trim it from the start
+    // of paths returned by ListDirectory.
+    // If select.base_dir is absolute, we need to trim the "URI authority"
+    // portion of the working directory.
     std::string wd;
-    if (select.base_dir.empty() || select.base_dir.front() != '/') {
-      // Fetch working directory, because we need to trim it from the start
-      // of paths returned by ListDirectory as select.base_dir is relative.
-      RETURN_NOT_OK(client_->GetWorkingDirectory(&wd));
-      Uri wd_uri;
-      RETURN_NOT_OK(wd_uri.Parse(wd));
-      wd = wd_uri.path();
+    RETURN_NOT_OK(client_->GetWorkingDirectory(&wd));
+
+    if (!select.base_dir.empty() && select.base_dir.front() == '/') {
+      // base_dir is absolute, only keep the URI authority portion.
+      // As mentioned in StatSelector() above, the URI may contain unescaped
+      // special chars and therefore may not be a valid URI, so we parse by hand.
+      auto pos = wd.find("://");  // start of host:port portion
+      if (pos == std::string::npos) {
+        return Status::Invalid("xxx");

Review comment:
       Well, of course, they're just terse :-D
   I'll fix them tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#issuecomment-866217633


   @github-actions crossbow submit test-*hdfs*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#discussion_r656469358



##########
File path: cpp/src/arrow/filesystem/test_util.cc
##########
@@ -453,9 +453,11 @@ void GenericFileSystemTest::TestMoveDir(FileSystem* fs) {
   AssertAllFiles(fs, {"EF/ghi", "KL/CD/def", "KL/abc"});
 
   // Destination is a non-empty directory
-  ASSERT_RAISES(IOError, fs->Move("KL", "EF"));
-  AssertAllDirs(fs, {"EF", "KL", "KL/CD"});
-  AssertAllFiles(fs, {"EF/ghi", "KL/CD/def", "KL/abc"});
+  if (!allow_move_dir_over_non_empty_dir()) {
+    ASSERT_RAISES(IOError, fs->Move("KL", "EF"));
+    AssertAllDirs(fs, {"EF", "KL", "KL/CD"});
+    AssertAllFiles(fs, {"EF/ghi", "KL/CD/def", "KL/abc"});
+  }

Review comment:
       Is it worth testing the other side of this? (Presumably it'd succeed and we'd have `EF/KL/{abc, CD/def}`?)

##########
File path: cpp/src/arrow/filesystem/hdfs.cc
##########
@@ -134,21 +140,39 @@ class HadoopFileSystem::Impl {
     }
     std::vector<FileInfo> results;
 
+    // Fetch working directory.
+    // If select.base_dir is relative, we need to trim it from the start
+    // of paths returned by ListDirectory.
+    // If select.base_dir is absolute, we need to trim the "URI authority"
+    // portion of the working directory.
     std::string wd;
-    if (select.base_dir.empty() || select.base_dir.front() != '/') {
-      // Fetch working directory, because we need to trim it from the start
-      // of paths returned by ListDirectory as select.base_dir is relative.
-      RETURN_NOT_OK(client_->GetWorkingDirectory(&wd));
-      Uri wd_uri;
-      RETURN_NOT_OK(wd_uri.Parse(wd));
-      wd = wd_uri.path();
+    RETURN_NOT_OK(client_->GetWorkingDirectory(&wd));
+
+    if (!select.base_dir.empty() && select.base_dir.front() == '/') {
+      // base_dir is absolute, only keep the URI authority portion.
+      // As mentioned in StatSelector() above, the URI may contain unescaped
+      // special chars and therefore may not be a valid URI, so we parse by hand.
+      auto pos = wd.find("://");  // start of host:port portion
+      if (pos == std::string::npos) {
+        return Status::Invalid("xxx");

Review comment:
       Were these supposed to be real error messages? :slightly_smiling_face: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10574: ARROW-12790: [C++] Improve HadoopFileSystem conformance

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10574:
URL: https://github.com/apache/arrow/pull/10574#issuecomment-866215494


   https://issues.apache.org/jira/browse/ARROW-12790


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org