You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@libcloud.apache.org by GitBox <gi...@apache.org> on 2019/10/04 11:50:24 UTC

[GitHub] [libcloud] pquentin opened a new pull request #1353: Reuse TCP connections when uploading files

pquentin opened a new pull request #1353: Reuse TCP connections when uploading files
URL: https://github.com/apache/libcloud/pull/1353
 
 
   ## Reuse TCP connections when uploading files)
   
   ### Description
   
   It's easy to break connection reuse when using the requests API: just use `stream=True` and never read the response. The connection used to make the request will never be reused, and will be dropped when the urllib3's connection pool is full.
   
   It turns out uploading objects using the S3 API goes through `prepared_request`, which incorrectly sets `stream` to the value of `raw`, `True` in our case. And since we don't read the response data, the connection are never reused, and each upload requires its own connection.
   
   This is particularly wasteful when uploading many small objects, which can easily happen with JSON or Parquet files generated by Apache Spark, where setting up the connection takes significant time compared to uploading a few bytes.
   
   Setting `stream=stream` in the `prepared_request` method matches the code in the `request` method and fixes the bug.
   
   ### Status
   
   - work in progress
   
   ### Checklist (tick everything that applies)
   
   - [x] [Code linting](http://libcloud.readthedocs.org/en/latest/development.html#code-style-guide) (required, can be done after the PR checks)
   - [x] Documentation
   - [x] [Tests](http://libcloud.readthedocs.org/en/latest/testing.html)
   - [x] [ICLA](http://libcloud.readthedocs.org/en/latest/development.html#contributing-bigger-changes) (required for bigger changes)
   
   cc @Kami @tonybaloney 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services