You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "natbprice (via GitHub)" <gi...@apache.org> on 2023/09/20 19:47:06 UTC

[GitHub] [arrow] natbprice opened a new issue, #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

natbprice opened a new issue, #37816:
URL: https://github.com/apache/arrow/issues/37816

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   I have a multi-file parquet dataset in cloud storage mounted as a local directory. This directory appears as a local directory when calling arrow. 
   
   In R, when I call `open_dataset("nyc-taxi", unify_schemas=FALSE)` it lists all the files, but then also opens each file which triggers all files to be downloaded locally. The desired behavior would be to list all the files and only open 1 file in order to determine the schema.
   
   In Python, when I call `ds.dataset("nyc-taxi", partitioning="hive")` it only downloads (i.e., opens) 1 file to learn the schema as expected.
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] natbprice commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "natbprice (via GitHub)" <gi...@apache.org>.
natbprice commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1729672708

   I have not tested opening bucket with a uri because Azure Blob storage is not supported. I am using [blobfuse2](https://github.com/Azure/azure-storage-fuse). A listing operation (e.g., `ls`) does not trigger file downloads. From arrow's perspective, I don't think there is any requirement for specialized listing code.  In Python arrow, I have verified I can create the dataset while only downloading 1 file and then run queries to trigger downloads of correct partitions.
   
   I think a good next step would to be to see if we can replicate the issue using a local directory without cloud storage, but I am not sure how to log when R arrow is opening a file and how to differentiate that from simply listing files. Maybe `ls -l --time=atime` or `strace`? Maybe there is a way to create a mock filesystem in order to verify what operations R arrow is performing? Any advice is appreciated.
   
   The blobfuse2 log for the mounted directory indicates that R arrow is opening all the files:
   
   ```
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_CRIT [mount.go (405)]: Starting Blobfuse2 Mount : 2.1.1-preview.1 on [Ubuntu 22.04.2 LTS]
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_CRIT [mount.go (406)]: Logging level set to : LOG_INFO
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [libfuse.go (244)]: Libfuse::Validate : UID 0, GID 0
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [libfuse.go (305)]: Libfuse::Configure : read-only true, allow-other true, allow-root false, default-perm 511, entry-timeout 600, attr-time 600, negative-timeout 600, ignore-open-flags true, nonempty false, direct_io false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [file_cache.go (282)]: FileCache::Configure : Using default eviction policy
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [file_cache.go (304)]: FileCache::Configure : create-empty false, cache-timeout 86400, tmp-path /workingdir/cache, max-size-mb 4096, high-mark 80, low-mark 60, refresh-sec 0, max-eviction 5000
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [attr_cache.go (156)]: AttrCache::Configure : cache-timeout 86400, symlink false, cache-on-list false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [config.go (388)]: ParseAndValidateConfig : using the following proxy address from the config file: 
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [config.go (392)]: ParseAndValidateConfig : sdk logging from the config file: false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [config.go (485)]: ParseAndValidateConfig : Account: accountname, Container: nyc-taxi, AccountType: ADLS, Auth: KEY, Prefix: , Endpoint: https://accountname.dfs.core.windows.net/, ListBlock: 0, MD5 : false false, Virtual Directory: true, Max Results For List 2, Disable Compression: false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [config.go (489)]: ParseAndValidateConfig : Retry Config: Retry count 5, Max Timeout 900, BackOff Time 4, Max Delay 60
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [config.go (492)]: ParseAndValidateConfig : Telemetry : 
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [mount.go (415)]: mount: Mounting blobfuse2 on /workingdir/nyc-taxi
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_CRIT [mount.go (405)]: Starting Blobfuse2 Mount : 2.1.1-preview.1 on [Ubuntu 22.04.2 LTS]
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_CRIT [mount.go (406)]: Logging level set to : LOG_INFO
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse.go (244)]: Libfuse::Validate : UID 0, GID 0
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse.go (305)]: Libfuse::Configure : read-only true, allow-other true, allow-root false, default-perm 511, entry-timeout 600, attr-time 600, negative-timeout 600, ignore-open-flags true, nonempty false, direct_io false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (282)]: FileCache::Configure : Using default eviction policy
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (304)]: FileCache::Configure : create-empty false, cache-timeout 86400, tmp-path /workingdir/cache, max-size-mb 4096, high-mark 80, low-mark 60, refresh-sec 0, max-eviction 5000
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [attr_cache.go (156)]: AttrCache::Configure : cache-timeout 86400, symlink false, cache-on-list false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [config.go (388)]: ParseAndValidateConfig : using the following proxy address from the config file: 
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [config.go (392)]: ParseAndValidateConfig : sdk logging from the config file: false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [config.go (485)]: ParseAndValidateConfig : Account: accountname, Container: nyc-taxi, AccountType: ADLS, Auth: KEY, Prefix: , Endpoint: https://accountname.dfs.core.windows.net/, ListBlock: 0, MD5 : false false, Virtual Directory: true, Max Results For List 2, Disable Compression: false
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [config.go (489)]: ParseAndValidateConfig : Retry Config: Retry count 5, Max Timeout 900, BackOff Time 4, Max Delay 60
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [config.go (492)]: ParseAndValidateConfig : Telemetry : 
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [mount.go (415)]: mount: Mounting blobfuse2 on /workingdir/nyc-taxi
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [lru_policy.go (139)]: lruPolicy::StartPolicy : Policy set with 86400 timeout
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (178)]: Libfuse::initFuse : Mounting with fuse3 library
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (258)]: Libfuse::NotifyMountToParent : Notifying parent for successful mount
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (265)]: Libfuse::libfuse_init : Kernel Caps : 60817371
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (272)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_PARALLEL_DIROPS
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (278)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_AUTO_INVAL_DATA
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (285)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_READDIRPLUS
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (291)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_ASYNC_READ
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (297)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_SPLICE_WRITE
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7088] : LOG_INFO [libfuse_handler.go (309)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_WRITEBACK_CACHE
   Thu Sep 21 09:45:47 EDT 2023 : blobfuse2[7076] : LOG_INFO [mount.go (467)]: mount: Child [7088] mounted successfully at /workingdir/nyc-taxi
   Thu Sep 21 09:45:58 EDT 2023 : blobfuse2[7088] : LOG_INFO [azstorage.go (290)]: AzStorage::StreamDir : Unblocked List API
   Thu Sep 21 09:46:02 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:02 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:02 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:02 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:02 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:03 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2010/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:04 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2011/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:05 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2012/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:06 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2013/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:07 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2014/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:08 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2015/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:09 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2016/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:10 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2017/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2018/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:11 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2019/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:12 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2020/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=10/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=11/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=12/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=2/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=3/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=4/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=5/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=6/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=7/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=8/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2021/month=9/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2022/month=1/part-0.parquet, fd=13
   Thu Sep 21 09:46:13 EDT 2023 : blobfuse2[7088] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2022/month=2/part-0.parquet, fd=13
   ```
   
   In contrast, Python arrow is only opening 1 file:
   
   ```
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_CRIT [mount.go (405)]: Starting Blobfuse2 Mount : 2.1.1-preview.1 on [Ubuntu 22.04.2 LTS]
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_CRIT [mount.go (406)]: Logging level set to : LOG_INFO
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [libfuse.go (244)]: Libfuse::Validate : UID 0, GID 0
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [libfuse.go (305)]: Libfuse::Configure : read-only true, allow-other true, allow-root false, default-perm 511, entry-timeout 600, attr-time 600, negative-timeout 600, ignore-open-flags true, nonempty false, direct_io false
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [file_cache.go (282)]: FileCache::Configure : Using default eviction policy
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [file_cache.go (304)]: FileCache::Configure : create-empty false, cache-timeout 86400, tmp-path /workingdir/blobfuse/cache, max-size-mb 4096, high-mark 80, low-mark 60, refresh-sec 0, max-eviction 5000
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [attr_cache.go (156)]: AttrCache::Configure : cache-timeout 86400, symlink false, cache-on-list false
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [config.go (388)]: ParseAndValidateConfig : using the following proxy address from the config file: 
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [config.go (392)]: ParseAndValidateConfig : sdk logging from the config file: false
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [config.go (485)]: ParseAndValidateConfig : Account: accountname, Container: nyc-taxi, AccountType: ADLS, Auth: KEY, Prefix: , Endpoint: https://accountname.dfs.core.windows.net/, ListBlock: 0, MD5 : false false, Virtual Directory: true, Max Results For List 2, Disable Compression: false
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [config.go (489)]: ParseAndValidateConfig : Retry Config: Retry count 5, Max Timeout 900, BackOff Time 4, Max Delay 60
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [config.go (492)]: ParseAndValidateConfig : Telemetry : 
   Thu Sep 21 09:53:37 EDT 2023 : blobfuse2[7998] : LOG_INFO [mount.go (415)]: mount: Mounting blobfuse2 on /workingdir/blobfuse/nyc-taxi
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_CRIT [mount.go (405)]: Starting Blobfuse2 Mount : 2.1.1-preview.1 on [Ubuntu 22.04.2 LTS]
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_CRIT [mount.go (406)]: Logging level set to : LOG_INFO
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse.go (244)]: Libfuse::Validate : UID 0, GID 0
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse.go (305)]: Libfuse::Configure : read-only true, allow-other true, allow-root false, default-perm 511, entry-timeout 600, attr-time 600, negative-timeout 600, ignore-open-flags true, nonempty false, direct_io false
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [file_cache.go (282)]: FileCache::Configure : Using default eviction policy
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [file_cache.go (304)]: FileCache::Configure : create-empty false, cache-timeout 86400, tmp-path /workingdir/blobfuse/cache, max-size-mb 4096, high-mark 80, low-mark 60, refresh-sec 0, max-eviction 5000
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [attr_cache.go (156)]: AttrCache::Configure : cache-timeout 86400, symlink false, cache-on-list false
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [config.go (388)]: ParseAndValidateConfig : using the following proxy address from the config file: 
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [config.go (392)]: ParseAndValidateConfig : sdk logging from the config file: false
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [config.go (485)]: ParseAndValidateConfig : Account: accountname, Container: nyc-taxi, AccountType: ADLS, Auth: KEY, Prefix: , Endpoint: https://accountname.dfs.core.windows.net/, ListBlock: 0, MD5 : false false, Virtual Directory: true, Max Results For List 2, Disable Compression: false
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [config.go (489)]: ParseAndValidateConfig : Retry Config: Retry count 5, Max Timeout 900, BackOff Time 4, Max Delay 60
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [config.go (492)]: ParseAndValidateConfig : Telemetry : 
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [mount.go (415)]: mount: Mounting blobfuse2 on /workingdir/blobfuse/nyc-taxi
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [lru_policy.go (139)]: lruPolicy::StartPolicy : Policy set with 86400 timeout
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (178)]: Libfuse::initFuse : Mounting with fuse3 library
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (258)]: Libfuse::NotifyMountToParent : Notifying parent for successful mount
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (265)]: Libfuse::libfuse_init : Kernel Caps : 60817371
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (272)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_PARALLEL_DIROPS
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (278)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_AUTO_INVAL_DATA
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (285)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_READDIRPLUS
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (291)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_ASYNC_READ
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (297)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_SPLICE_WRITE
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[8009] : LOG_INFO [libfuse_handler.go (309)]: Libfuse::libfuse_init : Enable Capability : FUSE_CAP_WRITEBACK_CACHE
   Thu Sep 21 09:53:38 EDT 2023 : blobfuse2[7998] : LOG_INFO [mount.go (467)]: mount: Child [8009] mounted successfully at /workingdir/blobfuse/nyc-taxi
   Thu Sep 21 09:54:25 EDT 2023 : blobfuse2[8009] : LOG_INFO [azstorage.go (290)]: AzStorage::StreamDir : Unblocked List API
   Thu Sep 21 09:54:29 EDT 2023 : blobfuse2[8009] : LOG_INFO [file_cache.go (952)]: FileCache::OpenFile : file=year=2009/month=1/part-0.parquet, fd=13
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] natbprice commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "natbprice (via GitHub)" <gi...@apache.org>.
natbprice commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1730166201

   I wasn't able to diagnose what is going on with VS Code. In VS Code with R Extension there is option to open an R terminal. This also seems to be where code chunks are run if working in R Markdown document.  For some unknown reason code run here triggers file downloads for me which would suggest it is reading the data files. You can hover over the terminal name and get the PID. I ran a trace on the PID and didn't see any extra read operations.
   
   If I open a regular terminal in VS Code and then start R it works correctly. It also works correctly from RStudio


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "jonkeane (via GitHub)" <gi...@apache.org>.
jonkeane commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1729637921

   Could you explain more about what kind of cloud storage + ephemeral local / syncing system that you're using? Using a dataset backed by something like Google Drive, One drive, Dropbox, iCloud drive etc. is not something we recommend since the performance can be so variable depending on if a file is truly local or needs to be fetched first. It would still be good to know which of those (or some other one) you're using in case we run into it elsewhere.  
   
   When opening a dataset, there is a process the recursively scans the directories + files to find the partitions + get a list of parquet files that make up the dataset. (There's a pretty good explanation of this for a different dataset in https://github.com/apache/arrow/issues/34145#issuecomment-1432181304 ). So it's possible that that listing might trigger the cloud -> local syncing process to start, hence downloading everything (even without trying to unify schemas). There _might_ be a way around this in the listing code inside of arrow, but there are a lot of complexities with these kinds of filesystems.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1730436963

   The vsc R tools have a session watcher that collects data about the env and uses that for codelenses etc. (e.g. show variable value on hover) it might be poking our R6 objects and triggering the downloads?
   
   If you open an R terminal via the extension the session watcher attaches automatically but that isn't the case for manually opened ones. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1728599693

   Yeah with `unify_schemas=FALSE` it should only touch one file to get the schema (atleast that is my read on the documentation...) not sure if the downloads are triggered due to some other access due to the fact that it appears to be a local fs. Have you tested what happens when opening the dataset via bucket uri?
   
   related issue: #33312 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "jonkeane (via GitHub)" <gi...@apache.org>.
jonkeane commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1730013003

   > It seems like there is something in VS Code (or RMarkdown in VS Code) that was causing the issue. Maybe the GUI is some how triggering the read operation by how it tracks R data objects?
   
   Oh hmmm, interesting. Would you mind adding more details about what you were doing here (like running R in the vscode terminal pane? rendering the rmarkdown for preview?) even if we can't totally diagnose what's up — it might help other folks running into this.
   
   I would also be interested if you see the same behavior in another IDE like RStudio or the R GUI bundled with R. I wouldn't expect that running in either of those would trigger more file reading than necessary — but I also wouldn't expect the same from VS Code either!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "jonkeane (via GitHub)" <gi...@apache.org>.
jonkeane commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1729950592

   Thanks for that output — and sorry I skipped over the bit about pyarrow going quicker in your first message.
   
   I presume those are the fuse outputs when you run 
   
   `open_dataset("nyc-taxi", unify_schemas=FALSE)` and then `ds.dataset("nyc-taxi", partitioning="hive")` respectively, yeah? And are you running them in that order? If they are ordered, do you see the same behavior if you run the R version a second time?
   
   > Maybe ls -l --time=atime or strace? Maybe there is a way to create a mock filesystem in order to verify what operations R arrow is performing? Any advice is appreciated.
   
   IIRC, both Python and R are using the exact same C++-based filesystem machinery under the hood. There might be small misalignments of options being passed (which we should investigate), but ultimately both are using [the same C++ filesystem interface](https://arrow.apache.org/docs/cpp/io.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "jonkeane (via GitHub)" <gi...@apache.org>.
jonkeane commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1730191756

   > If I open a regular terminal in VS Code and then start R it works correctly. It also works correctly from RStudio
   
   Whoa, that is very odd indeed. Thanks for the info + debugging. 
   
   I'm going to close this for now — but feel free to reopen or create a new issue if we want to pull on any of these other threads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane closed issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "jonkeane (via GitHub)" <gi...@apache.org>.
jonkeane closed issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files
URL: https://github.com/apache/arrow/issues/37816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] natbprice commented on issue #37816: [R] open_dataset(..., unify_schemas=FALSE) opens all files

Posted by "natbprice (via GitHub)" <gi...@apache.org>.
natbprice commented on issue #37816:
URL: https://github.com/apache/arrow/issues/37816#issuecomment-1729969679

   Sorry, after further testing this is actually working correctly! 
   
   It seems like there is something in VS Code (or RMarkdown in VS Code) that was causing the issue. Maybe the GUI is some how triggering the read operation by how it tracks R data objects?
   
   However, it works from the command line and it even seems to work over WSL in Windows with RStudio.
   
   This trace also verifies that R arrow is only reading 1 file whether locally or with the blobfuse mount:
   
   `strace -o r-log.txt -e trace=openat,open,stat,getdents,read,readdir,mmap,close R -e "arrow::open_dataset('nyc-taxi-local')"`
   
   ```
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2015/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2014/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2020/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2022", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2022/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2022/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2016/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2019/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2011/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2018/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2013/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2021/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2010/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2012/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=6", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=12", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=4", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=3", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=7", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=5", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=1", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=2", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=11", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=10", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=9", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2017/month=8", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
   close(4)                                = 0
   openat(AT_FDCWD, "/workingdir/nyc-taxi-local/year=2009/month=1/part-0.parquet", O_RDONLY) = 4
   mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f322a9fe000
   close(4)                                = 0
   mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f322a7fe000
   mmap(NULL, 4190208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f322a5ff000
   mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f3229bff000
   openat(AT_FDCWD, "/workingdir/R/x86_64-pc-linux-gnu-library/4.1/R6/R/R6.rdb", O_RDONLY) = 4
   read(4, "\350\246Y\307\2000P\2551gu\205]\37\223\21M\316{\230\234\262\214^\263ut\344\303\316\237x"..., 3238) = 3238
   read(4, "\0\0\0\306x\234\213\340b```f`adb`f\0052\31XCC\334t-\30\30\230\204"..., 28672) = 28672
   read(4, "\350\246Y\307\2000P\2551gu\205]\37\223\21M\316{\230\234\262\214^\263ut\344\303\316\237x"..., 4096) = 3238
   close(4)                                = 0
   read(3, "", 4096)                       = 0
   mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f322c3e7000
   close(3)                                = 0
   +++ exited with 0 +++
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org