You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@libcloud.apache.org by GitBox <gi...@apache.org> on 2020/01/02 16:09:23 UTC

[GitHub] [libcloud] rvolykh opened a new issue #1399: Upload large file to Azure Blobs

rvolykh opened a new issue #1399: Upload large file to Azure Blobs
URL: https://github.com/apache/libcloud/issues/1399
 
 
   ## Summary
   
   Could not upload a large file (100 GB) to Azure Blob Storage.
   
   ## Detailed Information
   
   Package libcloud: 2.7.0
   File 100.gb was generated with:
   ```
   import os
   f = open('100.gb', "wb")
   f.seek(107374127424-1)
   f.write(b"\0")
   f.close()
   ```
   Code snippet:
   ```
   from io import FileIO
   
   cls = get_driver(Provider.AZURE_BLOBS)
   driver = cls(key='STORAGE_ACCOUNT_NAME', secret='ACCESS_KEY')
   container = driver.get_container(container_name='CONTAINER_NAME')
   
   # This method blocks until all the parts have been uploaded.
   extra = {'content_type': 'application/octet-stream'}
   
   with FileIO('100.gb', 'rb') as iterator:
       obj = driver.upload_object_via_stream(iterator=iterator,
                                             container=container,
                                             object_name='libcloud/100.gb',
                                             extra=extra)
   ```
   Error:
   ```
   Traceback (most recent call last):
     File "/root/rvolykh/venv/lib/python3.5/site-packages/urllib3/connectionpool.py", line 672, in urlopen
       chunked=chunked,
     File "/root/rvolykh/venv/lib/python3.5/site-packages/urllib3/connectionpool.py", line 387, in _make_request
       conn.request(method, url, **httplib_request_kw)
     File "/usr/lib/python3.5/http/client.py", line 1122, in request
       self._send_request(method, url, body, headers)
     File "/usr/lib/python3.5/http/client.py", line 1167, in _send_request
       self.endheaders(body)
     File "/usr/lib/python3.5/http/client.py", line 1118, in endheaders
       self._send_output(message_body)
     File "/usr/lib/python3.5/http/client.py", line 946, in _send_output
       self.send(message_body)
     File "/usr/lib/python3.5/http/client.py", line 915, in send
       self.sock.sendall(datablock)
     File "/usr/lib/python3.5/ssl.py", line 891, in sendall
       v = self.send(data[count:])
     File "/usr/lib/python3.5/ssl.py", line 861, in send
       return self._sslobj.write(data)
     File "/usr/lib/python3.5/ssl.py", line 586, in write
       return self._sslobj.write(data)
   ConnectionResetError: [Errno 104] Connection reset by peer
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [libcloud] c-w closed issue #1399: Upload large file to Azure Blobs

Posted by GitBox <gi...@apache.org>.
c-w closed issue #1399: Upload large file to Azure Blobs
URL: https://github.com/apache/libcloud/issues/1399
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [libcloud] c-w commented on issue #1399: Upload large file to Azure Blobs

Posted by GitBox <gi...@apache.org>.
c-w commented on issue #1399: Upload large file to Azure Blobs
URL: https://github.com/apache/libcloud/issues/1399#issuecomment-570704353
 
 
   **TL;DR**
   The maximum file size currently supported by the Azure Storage driver is 256 MB. Uploading larger file sizes will require a code change in libcloud.
   
   **Details**
   The Azure Storage driver's implementation of [upload_object_via_stream](https://github.com/apache/libcloud/blob/6dca82e649456b42d23f439854d3dc807c806abf/libcloud/storage/drivers/azure_blobs.py#L822-L841) delegates to [_put_object](https://github.com/apache/libcloud/blob/6dca82e649456b42d23f439854d3dc807c806abf/libcloud/storage/drivers/azure_blobs.py#L945-L951) which calls through to the generic [_upload_object](https://github.com/apache/libcloud/blob/6dca82e649456b42d23f439854d3dc807c806abf/libcloud/storage/base.py#L584-L592) which does a single PUT request to the storage backend. Given that [we're using Azure Storage API version 2016-05-31](https://github.com/apache/libcloud/blob/6dca82e649456b42d23f439854d3dc807c806abf/libcloud/storage/drivers/azure_blobs.py#L180), according to the [Put Blob documentation](https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob#remarks), the maximum file size that can be uploaded in one Put Blob request is 256 MB. As such, to support uploading files larger than 256 MB, the Azure Storage driver would have to implement chunked blob upload via [Put Block](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block) and [Put Block List](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-list). It looks like the Azure Storage driver used to implement the chunked blob upload flow (e.g. see [24f34c9](https://github.com/apache/libcloud/blob/24f34c99c9440523a53e940a346bced551281953/libcloud/storage/drivers/azure_blobs.py#L732-L788)). However, since [6e0040d](https://github.com/apache/libcloud/commit/6e0040d8904cacb5dbe88309e9051be08cdc59f9) the driver doesn't have support for chunked blob upload anymore.
   
   I encountered this limitation in several other projects (e.g. https://github.com/ascoderu/opwen-cloudserver/issues/219) so I will try to find some time and work on a fix.
   
   **Work-around**
   If you require access to Azure Storage via libcloud for uploading large files right now before the fix mentioned above is implemented, I would suggest to try the following: The [libcloud S3 driver currently implements chunked upload](https://github.com/apache/libcloud/blob/6dca82e649456b42d23f439854d3dc807c806abf/libcloud/storage/drivers/s3.py#L688-L694), so you could try deploying [MinIO](https://github.com/minio/minio) as a [gateway for Azure Storage](https://docs.min.io/docs/minio-gateway-for-azure.html) and using the libcloud S3 driver to talk to the MinIO frontend which in turn communicates efficiently with the Azure Storage backend. For MinIO [947bc8c](https://github.com/minio/minio/commit/947bc8c7d3b8ad98cdbb6ce0f8dea155df16aadf) and later, this approach should work for all types of cloud-based Azure Storage accounts (e.g. Storage, StorageV2, BlobStorage) as well as Azurite and Azure IoT Edge Storage. Once chunked blob upload is fixed in libcloud, you should be able to remove the MinIO indirection and switch to libcloud's Azure Storage driver with no additional code changes required.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [libcloud] rvolykh commented on issue #1399: Upload large file to Azure Blobs

Posted by GitBox <gi...@apache.org>.
rvolykh commented on issue #1399: Upload large file to Azure Blobs
URL: https://github.com/apache/libcloud/issues/1399#issuecomment-571056206
 
 
   Hello @c-w,
   
   Thanks a lot!
   
   That approach with MinIO I had before and currently looking for more native approaches. It is for a long perspective so I could wait. + Can help with testing of #1400 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [libcloud] rvolykh commented on issue #1399: Upload large file to Azure Blobs

Posted by GitBox <gi...@apache.org>.
rvolykh commented on issue #1399: Upload large file to Azure Blobs
URL: https://github.com/apache/libcloud/issues/1399#issuecomment-571959734
 
 
   Hello @c-w, 
   Tested 100 GB file upload/download to Azure Blob Storage from #1400. Works fine!!!!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services