You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mohammad Prabowo <ri...@gmail.com> on 2012/06/11 13:29:25 UTC

How to do Bulk-insert from Huge JSON File (460 MB)

Hi. I need to do bulk-insert of document in my CouchDB database.
I'm trying to follow the manual here:
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API

Here is my code:

~$ DB="http://localhost:5984/employees"
~$ curl -H "Content-Type:application/json" -d @employees_selfContained.json
-vX POST $DB/_bulk_docs

the file employees_selfContained.json is a huge file = 465 MB. I've
validated it using JSONLint and found nothing wrong
Here's the curl's verbose output:

 curl -H "Content-Type:application/json" -d @employees_selfContained.json
-vX POST $DB/_bulk_docs
* About to connect() to 127.0.0.1 port 5984 (#0)
* Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> POST /employees/_bulk_docs HTTP/1.1
> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k
zlib/1.2.3.3 libidn/1.15
> Host: 127.0.0.1:5984
> Accept: */*
> Content-Type:application/json
> Content-Length: 439203931
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* Empty reply from server
* Connection #0 to host 127.0.0.1 left intact
curl: (52) Empty reply from server
* Closing connection #0

How can i do bulk-insert from that Huge single file? I prefer not to split
the file into smaller size if possible..

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by CGS <cg...@gmail.com>.
See -F (http://curl.haxx.se/docs/manpage.html) option. I haven't tried it
by now with CouchDB, but usually it does the trick.

CGS




On Tue, Jun 12, 2012 at 11:58 AM, Mohammad Prabowo <ri...@gmail.com> wrote:

> Would you please tell me the curl syntax to upload the file as multipart?
> I've been scratching my head, and can't even find it from the curl's
> manual..
>
> On Tue, Jun 12, 2012 at 4:10 PM, CGS <cg...@gmail.com> wrote:
>
> > You said your file is 465 MB, but cURL is sending only
> >
> > > Content-Length: 439203931
> >
> > which raised a multipart file transmission which never occurred:
> >
> > > Expect: 100-continue
> >
> > Try either multipart file or split your JSON in smaller files (2 files
> > should be enough).
> >
> > CGS
> >
> >
> >
> >
> > On Tue, Jun 12, 2012 at 10:42 AM, Mohammad Prabowo <ri...@gmail.com>
> > wrote:
> >
> > > I can only see indication in Couch.log as if nothing happened. I've
> tried
> > > using smaller JSON file (4K) and it run successfully.
> > > I'm using couchDB 1.2
> > >
> > > On Tue, Jun 12, 2012 at 11:38 AM, Dave Cottlehuber <da...@muse.net.nz>
> > > wrote:
> > >
> > > > On 12 June 2012 06:27, Mohammad Prabowo <ri...@gmail.com> wrote:
> > > > > Still no result. I guess i have to split it into smaller
> documents..
> > > > >
> > > > > On Mon, Jun 11, 2012 at 7:03 PM, Robert Newson <rnewson@apache.org
> >
> > > > wrote:
> > > > >
> > > > >>
> > > > >> -d will load the whole file into memory and also interpret it as
> > > ascii,
> > > > >> which might make it invalid.
> > > > >>
> > > > >> use -T <filename> instead.
> > > > >>
> > > > >> B.
> > > > >>
> > > > >> On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:
> > > > >>
> > > > >> > Hi. I need to do bulk-insert of document in my CouchDB database.
> > > > >> > I'm trying to follow the manual here:
> > > > >> > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> > > > >> >
> > > > >> > Here is my code:
> > > > >> >
> > > > >> > ~$ DB="http://localhost:5984/employees"
> > > > >> > ~$ curl -H "Content-Type:application/json" -d
> > > > >> @employees_selfContained.json
> > > > >> > -vX POST $DB/_bulk_docs
> > > > >> >
> > > > >> > the file employees_selfContained.json is a huge file = 465 MB.
> > I've
> > > > >> > validated it using JSONLint and found nothing wrong
> > > > >> > Here's the curl's verbose output:
> > > > >> >
> > > > >> > curl -H "Content-Type:application/json" -d
> > > > @employees_selfContained.json
> > > > >> > -vX POST $DB/_bulk_docs
> > > > >> > * About to connect() to 127.0.0.1 port 5984 (#0)
> > > > >> > * Trying 127.0.0.1... connected
> > > > >> > * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> > > > >> >> POST /employees/_bulk_docs HTTP/1.1
> > > > >> >> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7
> > > > >> OpenSSL/0.9.8k
> > > > >> > zlib/1.2.3.3 libidn/1.15
> > > > >> >> Host: 127.0.0.1:5984
> > > > >> >> Accept: */*
> > > > >> >> Content-Type:application/json
> > > > >> >> Content-Length: 439203931
> > > > >> >> Expect: 100-continue
> > > > >> >>
> > > > >> > < HTTP/1.1 100 Continue
> > > > >> > * Empty reply from server
> > > > >> > * Connection #0 to host 127.0.0.1 left intact
> > > > >> > curl: (52) Empty reply from server
> > > > >> > * Closing connection #0
> > > > >> >
> > > > >> > How can i do bulk-insert from that Huge single file? I prefer
> not
> > to
> > > > >> split
> > > > >> > the file into smaller size if possible..
> > > > >>
> > > > >>
> > > >
> > > >
> > > > Mohammed,
> > > >
> > > > What do you see in the couch.log ?
> > > >
> > > > I'd be interested to hear if this same upload works against 1.1.1 vs
> > > 1.2.0.
> > > >
> > > > Thanks
> > > > Dave
> > > >
> > >
> >
>

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by Mohammad Prabowo <ri...@gmail.com>.
Would you please tell me the curl syntax to upload the file as multipart?
I've been scratching my head, and can't even find it from the curl's
manual..

On Tue, Jun 12, 2012 at 4:10 PM, CGS <cg...@gmail.com> wrote:

> You said your file is 465 MB, but cURL is sending only
>
> > Content-Length: 439203931
>
> which raised a multipart file transmission which never occurred:
>
> > Expect: 100-continue
>
> Try either multipart file or split your JSON in smaller files (2 files
> should be enough).
>
> CGS
>
>
>
>
> On Tue, Jun 12, 2012 at 10:42 AM, Mohammad Prabowo <ri...@gmail.com>
> wrote:
>
> > I can only see indication in Couch.log as if nothing happened. I've tried
> > using smaller JSON file (4K) and it run successfully.
> > I'm using couchDB 1.2
> >
> > On Tue, Jun 12, 2012 at 11:38 AM, Dave Cottlehuber <da...@muse.net.nz>
> > wrote:
> >
> > > On 12 June 2012 06:27, Mohammad Prabowo <ri...@gmail.com> wrote:
> > > > Still no result. I guess i have to split it into smaller documents..
> > > >
> > > > On Mon, Jun 11, 2012 at 7:03 PM, Robert Newson <rn...@apache.org>
> > > wrote:
> > > >
> > > >>
> > > >> -d will load the whole file into memory and also interpret it as
> > ascii,
> > > >> which might make it invalid.
> > > >>
> > > >> use -T <filename> instead.
> > > >>
> > > >> B.
> > > >>
> > > >> On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:
> > > >>
> > > >> > Hi. I need to do bulk-insert of document in my CouchDB database.
> > > >> > I'm trying to follow the manual here:
> > > >> > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> > > >> >
> > > >> > Here is my code:
> > > >> >
> > > >> > ~$ DB="http://localhost:5984/employees"
> > > >> > ~$ curl -H "Content-Type:application/json" -d
> > > >> @employees_selfContained.json
> > > >> > -vX POST $DB/_bulk_docs
> > > >> >
> > > >> > the file employees_selfContained.json is a huge file = 465 MB.
> I've
> > > >> > validated it using JSONLint and found nothing wrong
> > > >> > Here's the curl's verbose output:
> > > >> >
> > > >> > curl -H "Content-Type:application/json" -d
> > > @employees_selfContained.json
> > > >> > -vX POST $DB/_bulk_docs
> > > >> > * About to connect() to 127.0.0.1 port 5984 (#0)
> > > >> > * Trying 127.0.0.1... connected
> > > >> > * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> > > >> >> POST /employees/_bulk_docs HTTP/1.1
> > > >> >> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7
> > > >> OpenSSL/0.9.8k
> > > >> > zlib/1.2.3.3 libidn/1.15
> > > >> >> Host: 127.0.0.1:5984
> > > >> >> Accept: */*
> > > >> >> Content-Type:application/json
> > > >> >> Content-Length: 439203931
> > > >> >> Expect: 100-continue
> > > >> >>
> > > >> > < HTTP/1.1 100 Continue
> > > >> > * Empty reply from server
> > > >> > * Connection #0 to host 127.0.0.1 left intact
> > > >> > curl: (52) Empty reply from server
> > > >> > * Closing connection #0
> > > >> >
> > > >> > How can i do bulk-insert from that Huge single file? I prefer not
> to
> > > >> split
> > > >> > the file into smaller size if possible..
> > > >>
> > > >>
> > >
> > >
> > > Mohammed,
> > >
> > > What do you see in the couch.log ?
> > >
> > > I'd be interested to hear if this same upload works against 1.1.1 vs
> > 1.2.0.
> > >
> > > Thanks
> > > Dave
> > >
> >
>

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by CGS <cg...@gmail.com>.
You said your file is 465 MB, but cURL is sending only

> Content-Length: 439203931

which raised a multipart file transmission which never occurred:

> Expect: 100-continue

Try either multipart file or split your JSON in smaller files (2 files
should be enough).

CGS




On Tue, Jun 12, 2012 at 10:42 AM, Mohammad Prabowo <ri...@gmail.com> wrote:

> I can only see indication in Couch.log as if nothing happened. I've tried
> using smaller JSON file (4K) and it run successfully.
> I'm using couchDB 1.2
>
> On Tue, Jun 12, 2012 at 11:38 AM, Dave Cottlehuber <da...@muse.net.nz>
> wrote:
>
> > On 12 June 2012 06:27, Mohammad Prabowo <ri...@gmail.com> wrote:
> > > Still no result. I guess i have to split it into smaller documents..
> > >
> > > On Mon, Jun 11, 2012 at 7:03 PM, Robert Newson <rn...@apache.org>
> > wrote:
> > >
> > >>
> > >> -d will load the whole file into memory and also interpret it as
> ascii,
> > >> which might make it invalid.
> > >>
> > >> use -T <filename> instead.
> > >>
> > >> B.
> > >>
> > >> On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:
> > >>
> > >> > Hi. I need to do bulk-insert of document in my CouchDB database.
> > >> > I'm trying to follow the manual here:
> > >> > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> > >> >
> > >> > Here is my code:
> > >> >
> > >> > ~$ DB="http://localhost:5984/employees"
> > >> > ~$ curl -H "Content-Type:application/json" -d
> > >> @employees_selfContained.json
> > >> > -vX POST $DB/_bulk_docs
> > >> >
> > >> > the file employees_selfContained.json is a huge file = 465 MB. I've
> > >> > validated it using JSONLint and found nothing wrong
> > >> > Here's the curl's verbose output:
> > >> >
> > >> > curl -H "Content-Type:application/json" -d
> > @employees_selfContained.json
> > >> > -vX POST $DB/_bulk_docs
> > >> > * About to connect() to 127.0.0.1 port 5984 (#0)
> > >> > * Trying 127.0.0.1... connected
> > >> > * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> > >> >> POST /employees/_bulk_docs HTTP/1.1
> > >> >> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7
> > >> OpenSSL/0.9.8k
> > >> > zlib/1.2.3.3 libidn/1.15
> > >> >> Host: 127.0.0.1:5984
> > >> >> Accept: */*
> > >> >> Content-Type:application/json
> > >> >> Content-Length: 439203931
> > >> >> Expect: 100-continue
> > >> >>
> > >> > < HTTP/1.1 100 Continue
> > >> > * Empty reply from server
> > >> > * Connection #0 to host 127.0.0.1 left intact
> > >> > curl: (52) Empty reply from server
> > >> > * Closing connection #0
> > >> >
> > >> > How can i do bulk-insert from that Huge single file? I prefer not to
> > >> split
> > >> > the file into smaller size if possible..
> > >>
> > >>
> >
> >
> > Mohammed,
> >
> > What do you see in the couch.log ?
> >
> > I'd be interested to hear if this same upload works against 1.1.1 vs
> 1.2.0.
> >
> > Thanks
> > Dave
> >
>

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by Mohammad Prabowo <ri...@gmail.com>.
I can only see indication in Couch.log as if nothing happened. I've tried
using smaller JSON file (4K) and it run successfully.
I'm using couchDB 1.2

On Tue, Jun 12, 2012 at 11:38 AM, Dave Cottlehuber <da...@muse.net.nz> wrote:

> On 12 June 2012 06:27, Mohammad Prabowo <ri...@gmail.com> wrote:
> > Still no result. I guess i have to split it into smaller documents..
> >
> > On Mon, Jun 11, 2012 at 7:03 PM, Robert Newson <rn...@apache.org>
> wrote:
> >
> >>
> >> -d will load the whole file into memory and also interpret it as ascii,
> >> which might make it invalid.
> >>
> >> use -T <filename> instead.
> >>
> >> B.
> >>
> >> On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:
> >>
> >> > Hi. I need to do bulk-insert of document in my CouchDB database.
> >> > I'm trying to follow the manual here:
> >> > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> >> >
> >> > Here is my code:
> >> >
> >> > ~$ DB="http://localhost:5984/employees"
> >> > ~$ curl -H "Content-Type:application/json" -d
> >> @employees_selfContained.json
> >> > -vX POST $DB/_bulk_docs
> >> >
> >> > the file employees_selfContained.json is a huge file = 465 MB. I've
> >> > validated it using JSONLint and found nothing wrong
> >> > Here's the curl's verbose output:
> >> >
> >> > curl -H "Content-Type:application/json" -d
> @employees_selfContained.json
> >> > -vX POST $DB/_bulk_docs
> >> > * About to connect() to 127.0.0.1 port 5984 (#0)
> >> > * Trying 127.0.0.1... connected
> >> > * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> >> >> POST /employees/_bulk_docs HTTP/1.1
> >> >> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7
> >> OpenSSL/0.9.8k
> >> > zlib/1.2.3.3 libidn/1.15
> >> >> Host: 127.0.0.1:5984
> >> >> Accept: */*
> >> >> Content-Type:application/json
> >> >> Content-Length: 439203931
> >> >> Expect: 100-continue
> >> >>
> >> > < HTTP/1.1 100 Continue
> >> > * Empty reply from server
> >> > * Connection #0 to host 127.0.0.1 left intact
> >> > curl: (52) Empty reply from server
> >> > * Closing connection #0
> >> >
> >> > How can i do bulk-insert from that Huge single file? I prefer not to
> >> split
> >> > the file into smaller size if possible..
> >>
> >>
>
>
> Mohammed,
>
> What do you see in the couch.log ?
>
> I'd be interested to hear if this same upload works against 1.1.1 vs 1.2.0.
>
> Thanks
> Dave
>

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by Dave Cottlehuber <da...@muse.net.nz>.
On 12 June 2012 06:27, Mohammad Prabowo <ri...@gmail.com> wrote:
> Still no result. I guess i have to split it into smaller documents..
>
> On Mon, Jun 11, 2012 at 7:03 PM, Robert Newson <rn...@apache.org> wrote:
>
>>
>> -d will load the whole file into memory and also interpret it as ascii,
>> which might make it invalid.
>>
>> use -T <filename> instead.
>>
>> B.
>>
>> On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:
>>
>> > Hi. I need to do bulk-insert of document in my CouchDB database.
>> > I'm trying to follow the manual here:
>> > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
>> >
>> > Here is my code:
>> >
>> > ~$ DB="http://localhost:5984/employees"
>> > ~$ curl -H "Content-Type:application/json" -d
>> @employees_selfContained.json
>> > -vX POST $DB/_bulk_docs
>> >
>> > the file employees_selfContained.json is a huge file = 465 MB. I've
>> > validated it using JSONLint and found nothing wrong
>> > Here's the curl's verbose output:
>> >
>> > curl -H "Content-Type:application/json" -d @employees_selfContained.json
>> > -vX POST $DB/_bulk_docs
>> > * About to connect() to 127.0.0.1 port 5984 (#0)
>> > * Trying 127.0.0.1... connected
>> > * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
>> >> POST /employees/_bulk_docs HTTP/1.1
>> >> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7
>> OpenSSL/0.9.8k
>> > zlib/1.2.3.3 libidn/1.15
>> >> Host: 127.0.0.1:5984
>> >> Accept: */*
>> >> Content-Type:application/json
>> >> Content-Length: 439203931
>> >> Expect: 100-continue
>> >>
>> > < HTTP/1.1 100 Continue
>> > * Empty reply from server
>> > * Connection #0 to host 127.0.0.1 left intact
>> > curl: (52) Empty reply from server
>> > * Closing connection #0
>> >
>> > How can i do bulk-insert from that Huge single file? I prefer not to
>> split
>> > the file into smaller size if possible..
>>
>>


Mohammed,

What do you see in the couch.log ?

I'd be interested to hear if this same upload works against 1.1.1 vs 1.2.0.

Thanks
Dave

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by Mohammad Prabowo <ri...@gmail.com>.
Still no result. I guess i have to split it into smaller documents..

On Mon, Jun 11, 2012 at 7:03 PM, Robert Newson <rn...@apache.org> wrote:

>
> -d will load the whole file into memory and also interpret it as ascii,
> which might make it invalid.
>
> use -T <filename> instead.
>
> B.
>
> On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:
>
> > Hi. I need to do bulk-insert of document in my CouchDB database.
> > I'm trying to follow the manual here:
> > http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> >
> > Here is my code:
> >
> > ~$ DB="http://localhost:5984/employees"
> > ~$ curl -H "Content-Type:application/json" -d
> @employees_selfContained.json
> > -vX POST $DB/_bulk_docs
> >
> > the file employees_selfContained.json is a huge file = 465 MB. I've
> > validated it using JSONLint and found nothing wrong
> > Here's the curl's verbose output:
> >
> > curl -H "Content-Type:application/json" -d @employees_selfContained.json
> > -vX POST $DB/_bulk_docs
> > * About to connect() to 127.0.0.1 port 5984 (#0)
> > * Trying 127.0.0.1... connected
> > * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> >> POST /employees/_bulk_docs HTTP/1.1
> >> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7
> OpenSSL/0.9.8k
> > zlib/1.2.3.3 libidn/1.15
> >> Host: 127.0.0.1:5984
> >> Accept: */*
> >> Content-Type:application/json
> >> Content-Length: 439203931
> >> Expect: 100-continue
> >>
> > < HTTP/1.1 100 Continue
> > * Empty reply from server
> > * Connection #0 to host 127.0.0.1 left intact
> > curl: (52) Empty reply from server
> > * Closing connection #0
> >
> > How can i do bulk-insert from that Huge single file? I prefer not to
> split
> > the file into smaller size if possible..
>
>

Re: How to do Bulk-insert from Huge JSON File (460 MB)

Posted by Robert Newson <rn...@apache.org>.
-d will load the whole file into memory and also interpret it as ascii, which might make it invalid.

use -T <filename> instead.

B.

On 11 Jun 2012, at 12:29, Mohammad Prabowo wrote:

> Hi. I need to do bulk-insert of document in my CouchDB database.
> I'm trying to follow the manual here:
> http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
> 
> Here is my code:
> 
> ~$ DB="http://localhost:5984/employees"
> ~$ curl -H "Content-Type:application/json" -d @employees_selfContained.json
> -vX POST $DB/_bulk_docs
> 
> the file employees_selfContained.json is a huge file = 465 MB. I've
> validated it using JSONLint and found nothing wrong
> Here's the curl's verbose output:
> 
> curl -H "Content-Type:application/json" -d @employees_selfContained.json
> -vX POST $DB/_bulk_docs
> * About to connect() to 127.0.0.1 port 5984 (#0)
> * Trying 127.0.0.1... connected
> * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
>> POST /employees/_bulk_docs HTTP/1.1
>> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k
> zlib/1.2.3.3 libidn/1.15
>> Host: 127.0.0.1:5984
>> Accept: */*
>> Content-Type:application/json
>> Content-Length: 439203931
>> Expect: 100-continue
>> 
> < HTTP/1.1 100 Continue
> * Empty reply from server
> * Connection #0 to host 127.0.0.1 left intact
> curl: (52) Empty reply from server
> * Closing connection #0
> 
> How can i do bulk-insert from that Huge single file? I prefer not to split
> the file into smaller size if possible..