You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/13 14:49:39 UTC

[GitHub] [arrow] jorisvandenbossche opened a new pull request #9192: [NO MERGE] ARROW-10264: trigger failing hdfs test

jorisvandenbossche opened a new pull request #9192:
URL: https://github.com/apache/arrow/pull/9192


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9192: [NO MERGE] ARROW-10264: trigger failing hdfs test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-759605028


   Revision: 8d3d6d4fe81bf53e9129d1aa2c4ea9cbc5523df7
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-880](https://github.com/ursa-labs/crossbow/branches/all?query=actions-880)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.7-hdfs-3.2|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-880-github-test-conda-python-3.7-hdfs-3.2)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-880-github-test-conda-python-3.7-hdfs-3.2)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#discussion_r557468664



##########
File path: python/pyarrow/parquet.py
##########
@@ -1493,15 +1493,16 @@ def __init__(self, path_or_paths, filesystem=None, filters=None,
                 single_file = path_or_paths[0]
         else:
             if _is_path_like(path_or_paths):
-                path = str(path_or_paths)
+                path_or_paths = str(path_or_paths)
                 if filesystem is None:
                     # path might be a URI describing the FileSystem as well
                     try:
-                        filesystem, path = FileSystem.from_uri(path)
+                        filesystem, path_or_paths = FileSystem.from_uri(
+                            path_or_paths)

Review comment:
       Yup.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-760328860


   The crossbow build crashes on GHA but it succeeds on my work computer. In any case, the crash seems unrelated, so I'm going to merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-760284747


   https://issues.apache.org/jira/browse/ARROW-10264


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#discussion_r557464597



##########
File path: cpp/src/arrow/filesystem/hdfs.cc
##########
@@ -69,6 +69,14 @@ class HadoopFileSystem::Impl {
   HdfsOptions options() const { return options_; }
 
   Result<FileInfo> GetFileInfo(const std::string& path) {
+    // It has unfortunately been a frequent logic error to pass URIs down
+    // to GetFileInfo (e.g. ARROW-10264).  Unlike other filesystems, HDFS
+    // silently accepts URIs but returns different results than if given the
+    // equivalent in-filesystem paths.  Instead of raising cryptic errors
+    // later, notify the underlying problem immediately.
+    if (path.substr(0, 5) == "hdfs:") {

Review comment:
       or "viewfs" ? 
   (I am not familiar with it, I only know that in the python/cython code there are some places that checks for this as well ..)

##########
File path: python/pyarrow/parquet.py
##########
@@ -1493,15 +1493,16 @@ def __init__(self, path_or_paths, filesystem=None, filters=None,
                 single_file = path_or_paths[0]
         else:
             if _is_path_like(path_or_paths):
-                path = str(path_or_paths)
+                path_or_paths = str(path_or_paths)
                 if filesystem is None:
                     # path might be a URI describing the FileSystem as well
                     try:
-                        filesystem, path = FileSystem.from_uri(path)
+                        filesystem, path_or_paths = FileSystem.from_uri(
+                            path_or_paths)

Review comment:
       Ah, good catch. So we were passing below still the original `path_or_paths` URI to the dataset constructor (instead of the non-URI path returned by from_uri), but also passing the filesystem inferred from the URI here. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-760260288


   Crossbow build: https://github.com/ursacomputing/crossbow/branches/all?query=build-16


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-760273431


   GHA builds:
   * https://github.com/pitrou/arrow/actions/runs/485674460
   * https://github.com/pitrou/arrow/actions/runs/485674458
   * https://github.com/pitrou/arrow/actions/runs/485674461
   * https://github.com/pitrou/arrow/actions/runs/485674464
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#discussion_r557468518



##########
File path: cpp/src/arrow/filesystem/hdfs.cc
##########
@@ -69,6 +69,14 @@ class HadoopFileSystem::Impl {
   HdfsOptions options() const { return options_; }
 
   Result<FileInfo> GetFileInfo(const std::string& path) {
+    // It has unfortunately been a frequent logic error to pass URIs down
+    // to GetFileInfo (e.g. ARROW-10264).  Unlike other filesystems, HDFS
+    // silently accepts URIs but returns different results than if given the
+    // equivalent in-filesystem paths.  Instead of raising cryptic errors
+    // later, notify the underlying problem immediately.
+    if (path.substr(0, 5) == "hdfs:") {

Review comment:
       I know, though I don't want to goldplate this either.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-760272959


   Travis-CI build: https://travis-ci.com/github/pitrou/arrow/builds/213174149


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-760250755


   @jorisvandenbossche Can you review this quickly?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #9192: ARROW-10264: [Python] Fix failing hdfs test

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #9192:
URL: https://github.com/apache/arrow/pull/9192


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #9192: [NO MERGE] ARROW-10264: trigger failing hdfs test

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #9192:
URL: https://github.com/apache/arrow/pull/9192#issuecomment-759499754


   @github-actions crossbow submit test-conda-python-3.7-hdfs-3.2


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org