You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "coryan (via GitHub)" <gi...@apache.org> on 2023/02/06 15:12:45 UTC

[GitHub] [arrow] coryan opened a new pull request, #34052: GH-34051: [C++] GcsFileSystem lazily starts sequential reads

coryan opened a new pull request, #34052:
URL: https://github.com/apache/arrow/pull/34052

   `OpenInputFile()` returns a `io::RandomAccessFile` which supports sequential reads as well as random access reads. The previous implementation eagerly started a sequential read, but many applications do not use that aspect of the API. Because GCS has fairly high latency, this can slow down applications that are only going to read data using `ReadAt()`. This includes applications using Parquet files via PyArrow.
   
   Fixes #34051 
   
   ### What changes are included in this PR?
   
   Change the GcsFileSystem class to lazily start the download used to implement the `io::InputFile` APIs.
   
   ### Are these changes tested?
   
   I think so: the existing tests cover the affected functions.
   
   ### Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34052: GH-34051: [C++] GcsFileSystem lazily starts sequential reads

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34052:
URL: https://github.com/apache/arrow/pull/34052#issuecomment-1419242260

   :warning: GitHub issue #34051 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou merged pull request #34052: GH-34051: [C++] GcsFileSystem lazily starts sequential reads

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #34052:
URL: https://github.com/apache/arrow/pull/34052


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on pull request #34052: GH-34051: [C++] GcsFileSystem lazily starts sequential reads

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on PR #34052:
URL: https://github.com/apache/arrow/pull/34052#issuecomment-1419684354

   The failure in [Python / AMD64 Conda Python 3.9 Sphinx & Numpydoc](https://github.com/apache/arrow/actions/runs/4106241652/jobs/7084210511) seems unrelated, or at least I cannot figure out how it relates to the changes in this PR.  If the failure was indeed caused by this PR I would appreciate a hint in the right direction.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34052: GH-34051: [C++] GcsFileSystem lazily starts sequential reads

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34052:
URL: https://github.com/apache/arrow/pull/34052#issuecomment-1419242164

   * Closes: #34051


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34052: GH-34051: [C++] GcsFileSystem lazily starts sequential reads

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34052:
URL: https://github.com/apache/arrow/pull/34052#issuecomment-1420923212

   Benchmark runs are scheduled for baseline = 7423f0332cb11eb780f421c07bac71f87bf44a03 and contender = 771c37aab8757287b3fa9cfe1bfb87992126ee08. 771c37aab8757287b3fa9cfe1bfb87992126ee08 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1fa959e4d4b14267a05fc085417464bd...35b6a58d65a04a3e993f902c6c869c0b/)
   [Failed :arrow_down:0.67% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/500a58e0187d433cb34558a9702fec5d...2bc85982da154871b5dacee5aca273b4/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8fb37e4c79ad4829a67e9b0cdc521e39...5d5ea7de210e4a6fab2a53c54ed17fa5/)
   [Finished :arrow_down:1.65% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/b82a0da8dee741228f8fbf19232054dd...d347b556fd884492b2f3fee383a51589/)
   Buildkite builds:
   [Finished] [`771c37aa` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2338)
   [Failed] [`771c37aa` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2366)
   [Finished] [`771c37aa` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2336)
   [Finished] [`771c37aa` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2357)
   [Finished] [`7423f033` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2337)
   [Failed] [`7423f033` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2365)
   [Finished] [`7423f033` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2335)
   [Finished] [`7423f033` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2356)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org