You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:32:49 UTC

[GitHub] [arrow] JoshADHD opened a new issue #7857: Underscores at beginning of directory names create problems for open_dataset function

JoshADHD opened a new issue #7857:
URL: https://github.com/apache/arrow/issues/7857


   When attempting to source parquet files for a dataset, I've found that underscores at the beginning of directory names (which I use often for OCD purposes) cause the function to not find any files.
   
   Examples: 
   `dataset <- open_dataset(
       sources = "/data/_split_data",
       partitioning = hive_partition(year = int32(), month = string())
   )
   
   dataset <- open_dataset(
       sources = "/_data/split_data",
       partitioning = hive_partition(year = int32(), month = string())
   )`
   
   Both of these return `FileSystemDataset with 0 Parquet files` when queried via console. If I remove underscores from the beginning of directory names, the function returns as expected.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on issue #7857: Underscores at beginning of directory names create problems for open_dataset function

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #7857:
URL: https://github.com/apache/arrow/issues/7857#issuecomment-668448152


   I suppose I remember https://issues.apache.org/jira/browse/ARROW-8427, which seems somewhat similar (about only skipping underscores in child directories, not in the base path). I suppose that case fixed it for a list of file paths, and not for a directory. Opened https://issues.apache.org/jira/browse/ARROW-9644 for this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on issue #7857: Underscores at beginning of directory names create problems for open_dataset function

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #7857:
URL: https://github.com/apache/arrow/issues/7857#issuecomment-668441257


   Actually, this is a slightly different issue, since it is not about underscores in partition keys, but in the base path. I thought this was something we already fixed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on issue #7857: Underscores at beginning of directory names create problems for open_dataset function

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on issue #7857:
URL: https://github.com/apache/arrow/issues/7857#issuecomment-665256697


   Thanks. Please open a JIRA and we can discuss there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed issue #7857: Underscores at beginning of directory names create problems for open_dataset function

Posted by GitBox <gi...@apache.org>.
nealrichardson closed issue #7857:
URL: https://github.com/apache/arrow/issues/7857


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on issue #7857: Underscores at beginning of directory names create problems for open_dataset function

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #7857:
URL: https://github.com/apache/arrow/issues/7857#issuecomment-668426497


   There is already an issue about this -> https://issues.apache.org/jira/browse/ARROW-9573


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org