You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/17 15:52:35 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #13390: ARROW-16424: [C++] Update uri_path parsing in FromProto

westonpace commented on code in PR #13390:
URL: https://github.com/apache/arrow/pull/13390#discussion_r900279339


##########
cpp/src/arrow/engine/substrait/relation_internal.cc:
##########
@@ -106,25 +106,37 @@ Result<compute::Declaration> FromProto(const substrait::Rel& rel,
           path = item.uri_path_glob();
         }
 
+        ::arrow::internal::Uri uri;
+        RETURN_NOT_OK(uri.Parse(path));
         if (item.format() ==
             substrait::ReadRel::LocalFiles::FileOrFiles::FILE_FORMAT_PARQUET) {
           format = std::make_shared<dataset::ParquetFileFormat>();
-        } else if (util::string_view{path}.ends_with(".arrow")) {
+        } else if (uri.extension() == ".arrow") {

Review Comment:
   This "extension comparison" mechanism is short-lived.  The latest version of Substrait supports specifying IPC as a file format.  However, I'm interested in the thought of supporting `.arrows` as that would imply we are reading files from a filesystem where the files were written with the streaming mode.  Do you think that is a use case we might encounter?
   
   I purposely did not include support for the streaming mode when adding the Substrait support for IPC to LocalFiles because I didn't think it would make sense for a dataset of files stored on disk to have IPC files written with the streaming mode (if you're going to be writing to disk, you might as well use the file mode).  Do you think this is a situation we might encounter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org