You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Damien Katz <da...@apache.org> on 2009/01/16 01:07:35 UTC

streaming attachments writes

I checked in streaming attachment writes for attachment uploads (ie  
PUT /db/docid/attachment.txt ...). This allows that huge files can be  
uploaded without CouchDB buffering it to memory, making is possible to  
upload huge attachments.

Unfortunately, we don't yet stream the attachments for replication w  
and you can't update an attachment and the document json in single  
request yet, so this is of limited use for now. That will require  
sending documents in http multi-part, and support for that is on the  
to-do list.

-Damien

Re: streaming attachments writes

Posted by Benoit Chesneau <bc...@gmail.com>.
On Sun, Jan 18, 2009 at 9:14 PM, Patrick Antivackis <
patrick.antivackis@gmail.com> wrote:

> And what do you use to upload files bigger than 4 GB ?
>
>
Custom script in python, here his the link to it (posted in one previous
mail, I think)

http://friendpaste.com/28kxlg0FrjbOSTyDLFLbZb


It just send the file while reading it. Better implementation would be to
use sendfile on linux
to do it.


- benoît

Re: streaming attachments writes

Posted by Patrick Antivackis <pa...@gmail.com>.
And what do you use to upload files bigger than 4 GB ?


2009/1/17 Benoit Chesneau <bc...@gmail.com>

> On Fri, Jan 16, 2009 at 10:54 PM, Patrick Antivackis
> <pa...@gmail.com> wrote:
> > Damien,
> > For what i saw in previous tests, if you not use the chunk method, then
> you
> > get an error caught by mochiweb (so no error in couchdb).
> > It occurs in mochiweb_request line 138 with the gen_tcp:recv  call :
> >
> > recv(Length, Timeout) ->
> >    case gen_tcp:recv(Socket, Length, Timeout) of
> >        {ok, Data} ->
> >            put(?SAVE_RECV, true),
> >            Data;
> >        _ ->
> >            exit(normal)
> >    end.
> >
> >
> > The answer catched by the case  _  is in fact  {error, enomem}
> >
> > There is no error in couchdb as the exit is "normal" !!!
> >
> > For what i read (see answers from
> > http://www.google.fr/search?q=16M+gen_tcp%3Arecv), if Length is too big
> > (some says 16M, for me it's 64M) gen_tcp:recv sends back an enomem error.
> >
> >
> >
>
> well I can upload files bigger than 4GB here without chunk method. I
> only used it because of curl. Though chunk would allow better control
> of what is uploaded whith such big uploads I guess.
>
>
> - benoît
>

Re: streaming attachments writes

Posted by Benoit Chesneau <bc...@gmail.com>.
On Fri, Jan 16, 2009 at 10:54 PM, Patrick Antivackis
<pa...@gmail.com> wrote:
> Damien,
> For what i saw in previous tests, if you not use the chunk method, then you
> get an error caught by mochiweb (so no error in couchdb).
> It occurs in mochiweb_request line 138 with the gen_tcp:recv  call :
>
> recv(Length, Timeout) ->
>    case gen_tcp:recv(Socket, Length, Timeout) of
>        {ok, Data} ->
>            put(?SAVE_RECV, true),
>            Data;
>        _ ->
>            exit(normal)
>    end.
>
>
> The answer catched by the case  _  is in fact  {error, enomem}
>
> There is no error in couchdb as the exit is "normal" !!!
>
> For what i read (see answers from
> http://www.google.fr/search?q=16M+gen_tcp%3Arecv), if Length is too big
> (some says 16M, for me it's 64M) gen_tcp:recv sends back an enomem error.
>
>
>

well I can upload files bigger than 4GB here without chunk method. I
only used it because of curl. Though chunk would allow better control
of what is uploaded whith such big uploads I guess.


- benoît

Re: streaming attachments writes

Posted by Patrick Antivackis <pa...@gmail.com>.
Damien,
For what i saw in previous tests, if you not use the chunk method, then you
get an error caught by mochiweb (so no error in couchdb).
It occurs in mochiweb_request line 138 with the gen_tcp:recv  call :

recv(Length, Timeout) ->
    case gen_tcp:recv(Socket, Length, Timeout) of
        {ok, Data} ->
            put(?SAVE_RECV, true),
            Data;
        _ ->
            exit(normal)
    end.


The answer catched by the case  _  is in fact  {error, enomem}

There is no error in couchdb as the exit is "normal" !!!

For what i read (see answers from
http://www.google.fr/search?q=16M+gen_tcp%3Arecv), if Length is too big
(some says 16M, for me it's 64M) gen_tcp:recv sends back an enomem error.



2009/1/16 Damien Katz <da...@apache.org>

>
> On Jan 16, 2009, at 8:37 AM, Benoit Chesneau wrote:
>
>  On Fri, Jan 16, 2009 at 1:55 PM, Damien Katz <da...@apache.org> wrote:
>>
>>> Chunked isn't allowed right now. Why are you sending a file chunked?
>>>
>>> -Damien
>>>
>>
>> Chunked seem the only method to send a big files in with curl in
>> command line, since it force it to split its read. I don't really need
>> chunked encoding for now.
>>
>
> Chunked is only necessary when you don't know the length ahead of time,
> streaming a file of known length up to a web server shouldn't require any
> buffering by the http client. Just don't send it chunked and it should work
> and use very little memory.
>
> -Damien
>
>
>
>>
>> This script allowed me to send a 4Go files to couch without using too
>> much memory :
>> http://friendpaste.com/28kxlg0FrjbOSTyDLFLbZb
>>
>> - benoît
>>
>
>

Re: streaming attachments writes

Posted by Damien Katz <da...@apache.org>.
On Jan 16, 2009, at 8:37 AM, Benoit Chesneau wrote:

> On Fri, Jan 16, 2009 at 1:55 PM, Damien Katz <da...@apache.org>  
> wrote:
>> Chunked isn't allowed right now. Why are you sending a file chunked?
>>
>> -Damien
>
> Chunked seem the only method to send a big files in with curl in
> command line, since it force it to split its read. I don't really need
> chunked encoding for now.

Chunked is only necessary when you don't know the length ahead of  
time, streaming a file of known length up to a web server shouldn't  
require any buffering by the http client. Just don't send it chunked  
and it should work and use very little memory.

-Damien

>
>
> This script allowed me to send a 4Go files to couch without using too
> much memory :
> http://friendpaste.com/28kxlg0FrjbOSTyDLFLbZb
>
> - benoît


Re: streaming attachments writes

Posted by Benoit Chesneau <bc...@gmail.com>.
On Fri, Jan 16, 2009 at 1:55 PM, Damien Katz <da...@apache.org> wrote:
> Chunked isn't allowed right now. Why are you sending a file chunked?
>
> -Damien

Chunked seem the only method to send a big files in with curl in
command line, since it force it to split its read. I don't really need
chunked encoding for now.

This script allowed me to send a 4Go files to couch without using too
much memory :
http://friendpaste.com/28kxlg0FrjbOSTyDLFLbZb

- benoît

Re: streaming attachments writes

Posted by Damien Katz <da...@apache.org>.
Chunked isn't allowed right now. Why are you sending a file chunked?

-Damien

On Jan 16, 2009, at 5:27 AM, Benoit Chesneau wrote:

> On Fri, Jan 16, 2009 at 11:15 AM, Benoit Chesneau  
> <bc...@gmail.com> wrote:
>> On Fri, Jan 16, 2009 at 1:07 AM, Damien Katz <da...@apache.org>  
>> wrote:
>>> I checked in streaming attachment writes for attachment uploads  
>>> (ie PUT
>>> /db/docid/attachment.txt ...). This allows that huge files can be  
>>> uploaded
>>> without CouchDB buffering it to memory, making is possible to uploa
>
>
>
>> Thanks for this update :) I've just tested this morning, here on a  
>> 2GB
>> of ram machine, it didn''t work with curl in chunked transfert
>> encoding. It doesn't work either in chunked mode with one script I
>> have. However it works with this script based on py-restclient
>> (attached) in normal mode.
>>
>> Thanks for this progress :)
>>
>> - benoit
>>
>
> hum I forgot some details indeed (spotted by jan on irc) SO here is
> error from curl :
>
> benoitc@pollen:~$ curl -T test.mp3 --header "Accept: application/json"
> --header "Content-Type: application/octet-stream" --header
> "Transfer-Encoding: chunked" --header "Content-Length: 671088640"
> --header "Expect:" --header "Connection: keep-alive" --header
> "Keep-Alive: 300" -v
> http://127.0.0.1:5984/test/blah/test640?rev=2737377012
> * About to connect() to 127.0.0.1 port 5984 (#0)
> *   Trying 127.0.0.1... connected
> * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
>> PUT /test/blah/test640?rev=2737377012 HTTP/1.1
>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2  
>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
>> Host: 127.0.0.1:5984
>> Accept: application/json
>> Content-Type: application/octet-stream
>> Transfer-Encoding: chunked
>> Content-Length: 671088640
>> Connection: keep-alive
>> Keep-Alive: 300
>>
> * Empty reply from server
> * Connection #0 to host 127.0.0.1 left intact
> curl: (52) Empty reply from server
> * Closing connection #0
>
> no error from couchdb
>
>
>
> Error in python client :
> benoitc@pollen:~$ python test.py
>
> restclient.rest.RequestError: (28, 'Operation timed out after 20000
> milliseconds with 0 bytes received')
>
>
> Seem like Couchdb don't send any aswer to the client here .
>
> - benoît


Re: streaming attachments writes

Posted by Benoit Chesneau <bc...@gmail.com>.
On Fri, Jan 16, 2009 at 11:15 AM, Benoit Chesneau <bc...@gmail.com> wrote:
> On Fri, Jan 16, 2009 at 1:07 AM, Damien Katz <da...@apache.org> wrote:
>> I checked in streaming attachment writes for attachment uploads (ie PUT
>> /db/docid/attachment.txt ...). This allows that huge files can be uploaded
>> without CouchDB buffering it to memory, making is possible to uploa



> Thanks for this update :) I've just tested this morning, here on a 2GB
> of ram machine, it didn''t work with curl in chunked transfert
> encoding. It doesn't work either in chunked mode with one script I
> have. However it works with this script based on py-restclient
> (attached) in normal mode.
>
> Thanks for this progress :)
>
> - benoit
>

hum I forgot some details indeed (spotted by jan on irc) SO here is
error from curl :

benoitc@pollen:~$ curl -T test.mp3 --header "Accept: application/json"
--header "Content-Type: application/octet-stream" --header
"Transfer-Encoding: chunked" --header "Content-Length: 671088640"
--header "Expect:" --header "Connection: keep-alive" --header
"Keep-Alive: 300" -v
http://127.0.0.1:5984/test/blah/test640?rev=2737377012
* About to connect() to 127.0.0.1 port 5984 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> PUT /test/blah/test640?rev=2737377012 HTTP/1.1
> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
> Host: 127.0.0.1:5984
> Accept: application/json
> Content-Type: application/octet-stream
> Transfer-Encoding: chunked
> Content-Length: 671088640
> Connection: keep-alive
> Keep-Alive: 300
>
* Empty reply from server
* Connection #0 to host 127.0.0.1 left intact
curl: (52) Empty reply from server
* Closing connection #0

no error from couchdb



Error in python client :
benoitc@pollen:~$ python test.py

restclient.rest.RequestError: (28, 'Operation timed out after 20000
milliseconds with 0 bytes received')


Seem like Couchdb don't send any aswer to the client here .

- benoît

Re: streaming attachments writes

Posted by Benoit Chesneau <bc...@gmail.com>.
On Fri, Jan 16, 2009 at 1:07 AM, Damien Katz <da...@apache.org> wrote:
> I checked in streaming attachment writes for attachment uploads (ie PUT
> /db/docid/attachment.txt ...). This allows that huge files can be uploaded
> without CouchDB buffering it to memory, making is possible to upload huge
> attachments.
>
> Unfortunately, we don't yet stream the attachments for replication w and you
> can't update an attachment and the document json in single request yet, so
> this is of limited use for now. That will require sending documents in http
> multi-part, and support for that is on the to-do list.
>
> -Damien
>

Thanks for this update :) I've just tested this morning, here on a 2GB
of ram machine, it didn''t work with curl in chunked transfert
encoding. It doesn't work either in chunked mode with one script I
have. However it works with this script based on py-restclient
(attached) in normal mode.

Thanks for this progress :)

- benoit