You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 15:57:01 UTC

[GitHub] [arrow] pitrou opened a new pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

pitrou opened a new pull request #11997:
URL: https://github.com/apache/arrow/pull/11997


   An occasional misunderstanding is to pass a URI to filesystem methods, where a regular path is expected.
   
   Make these situations easier to diagnose by raising a specific error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#discussion_r772501730



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -797,6 +838,11 @@ TEST_F(GcsIntegrationTest, OpenInputStreamInfoInvalid) {
   ASSERT_RAISES(IOError, fs->OpenInputStream(info));
 }
 
+TEST_F(GcsIntegrationTest, OpenInputStreamUri) {
+  auto fs = internal::MakeGcsFileSystemForTest(TestGcsOptions());
+  ASSERT_RAISES(Invalid, fs->OpenInputStream("gcs:" + PreexistingObjectPath()));

Review comment:
       I see, thank you.
   By the way, it seems other packages have adopted the `gcs://` convention: https://gcsfs.readthedocs.io/en/latest/#integration




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998228834


   Benchmark runs are scheduled for baseline = cfcce5a243981deb21b47c358059ea85488fee86 and contender = 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b. 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1b0f6b9cc263472bac82be4e6c3b9737...a24f786e0c1d4df8845de453fb6cd977/)
   [Finished :arrow_down:2.24% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8ce4b7420dd14c97b1388ef2772c13ec...23fdebb3b6a34a63b944aa82d9267ae1/)
   [Finished :arrow_down:1.24% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/7e78eaaf2d764d0681934b7dcc160696...e03fd24b10ee48898226c3140af31a26/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#discussion_r772587608



##########
File path: cpp/src/arrow/filesystem/s3fs.cc
##########
@@ -402,6 +402,10 @@ struct S3Path {
   std::vector<std::string> key_parts;
 
   static Result<S3Path> FromString(const std::string& s) {
+    if (internal::IsLikelyUri(s)) {
+      return Status::Invalid(
+          "Expected a S3 object path of the form 'bucket/key...', got a URI: '", s, "'");

Review comment:
       ```suggestion
             "Expected an S3 object path of the form 'bucket/key...', got a URI: '", s, "'");
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998052247


   https://issues.apache.org/jira/browse/ARROW-10998


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#discussion_r772508120



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -797,6 +838,11 @@ TEST_F(GcsIntegrationTest, OpenInputStreamInfoInvalid) {
   ASSERT_RAISES(IOError, fs->OpenInputStream(info));
 }
 
+TEST_F(GcsIntegrationTest, OpenInputStreamUri) {
+  auto fs = internal::MakeGcsFileSystemForTest(TestGcsOptions());
+  ASSERT_RAISES(Invalid, fs->OpenInputStream("gcs:" + PreexistingObjectPath()));

Review comment:
       Sigh... My *guess* (and let me stress that this would be just a guess) is that `gs://` is more familiar to GCS users.  I can live with `gcs://` too
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#discussion_r772499030



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -797,6 +838,11 @@ TEST_F(GcsIntegrationTest, OpenInputStreamInfoInvalid) {
   ASSERT_RAISES(IOError, fs->OpenInputStream(info));
 }
 
+TEST_F(GcsIntegrationTest, OpenInputStreamUri) {
+  auto fs = internal::MakeGcsFileSystemForTest(TestGcsOptions());
+  ASSERT_RAISES(Invalid, fs->OpenInputStream("gcs:" + PreexistingObjectPath()));

Review comment:
       I know this is just for testing purposes, but the conventional prefix for GCS is `gs://`, at least that is what `gsutil` uses:
   
   https://cloud.google.com/storage/docs/gsutil#syntax
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998228834


   Benchmark runs are scheduled for baseline = cfcce5a243981deb21b47c358059ea85488fee86 and contender = 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b. 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1b0f6b9cc263472bac82be4e6c3b9737...a24f786e0c1d4df8845de453fb6cd977/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8ce4b7420dd14c97b1388ef2772c13ec...23fdebb3b6a34a63b944aa82d9267ae1/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/7e78eaaf2d764d0681934b7dcc160696...e03fd24b10ee48898226c3140af31a26/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998052349


   @coryan Would you like to take a look at the GCS changes here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998228834


   Benchmark runs are scheduled for baseline = cfcce5a243981deb21b47c358059ea85488fee86 and contender = 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b. 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1b0f6b9cc263472bac82be4e6c3b9737...a24f786e0c1d4df8845de453fb6cd977/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8ce4b7420dd14c97b1388ef2772c13ec...23fdebb3b6a34a63b944aa82d9267ae1/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/7e78eaaf2d764d0681934b7dcc160696...e03fd24b10ee48898226c3140af31a26/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-1000349034


   Cool, this is a nice usability improvement!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #11997:
URL: https://github.com/apache/arrow/pull/11997


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998228834


   Benchmark runs are scheduled for baseline = cfcce5a243981deb21b47c358059ea85488fee86 and contender = 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b. 238b36318ec7228853feb1b4f6b8f9ef7fa8cc4b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1b0f6b9cc263472bac82be4e6c3b9737...a24f786e0c1d4df8845de453fb6cd977/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8ce4b7420dd14c97b1388ef2772c13ec...23fdebb3b6a34a63b944aa82d9267ae1/)
   [Finished :arrow_down:1.24% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/7e78eaaf2d764d0681934b7dcc160696...e03fd24b10ee48898226c3140af31a26/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#discussion_r772514349



##########
File path: cpp/src/arrow/filesystem/gcsfs_test.cc
##########
@@ -797,6 +838,11 @@ TEST_F(GcsIntegrationTest, OpenInputStreamInfoInvalid) {
   ASSERT_RAISES(IOError, fs->OpenInputStream(info));
 }
 
+TEST_F(GcsIntegrationTest, OpenInputStreamUri) {
+  auto fs = internal::MakeGcsFileSystemForTest(TestGcsOptions());
+  ASSERT_RAISES(Invalid, fs->OpenInputStream("gcs:" + PreexistingObjectPath()));

Review comment:
       Ah, it looks like they accept `gs://` too: https://github.com/fsspec/gcsfs/blob/main/gcsfs/core.py#L1171




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #11997: ARROW-10998: [C++] Detect URIs where a filesystem path is expected

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #11997:
URL: https://github.com/apache/arrow/pull/11997#issuecomment-998187765


   >  Maybe drop a note that colons are not supported in filenames here (https://github.com/apache/arrow/blob/master/docs/source/cpp/io.rst#filesystems) where we mention . and .. are not supported?
   
   Hmm, they should be more or less supported except in the first segment of a path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org