You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Chris Anderson <jc...@apache.org> on 2009/07/01 21:09:01 UTC

Re: chunked encoding problem ? - error messages from curl as well as lucene

On Tue, Jun 30, 2009 at 8:18 PM, Nitin Borwankar<ni...@borwankar.com> wrote:
> Hi Damien,
>
> Thanks for that tip.
>
> Turns out I had non-UTF-8 data
>
> adolfo.steiger-gar%E7%E3o:
>
> - not sure how it managed to get into the db.
>
> This is probably confusing the chunk termination.
>
> How did Couch let this data in ?

Currently CouchDB doesn't validate json string contents on input, only
on output.

Adding an option to block invalid unicode input would be a small
patch, but perhaps slow things down as we'd have to spend more time in
the encoder while writing. Worth measuring I suppose.

Is this something users are running into a lot? I've heard this once
before, if lots of people are seeing this, it's definitely worthy of
fixing.

  I uploaded via Python httplib - not
> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>
> Nitin
>
> 37% of all statistics are made up on the spot
> -------------------------------------------------------------------------------------
> Nitin Borwankar
> nborwankar@gmail.com
>
>
> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org> wrote:
>
>> This might be the json encoding issue that Adam fixed.
>>
>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try building
>> and installing from the branch and see if that fixes the problem:
>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>
>> -Damien
>>
>>
>>
>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>
>>  Oh and when I  use Futon and try to browse the docs around where curl
>>> gives
>>> an error,  when I hit the page containing the records around the error
>>> Futon
>>> just spins and doesn't render the page.
>>>
>>> Data corruption?
>>>
>>> Nitin
>>>
>>> 37% of all statistics are made up on the spot
>>>
>>> -------------------------------------------------------------------------------------
>>> Nitin Borwankar
>>> nborwankar@gmail.com
>>>
>>>
>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>> >wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 instance
>>>> on
>>>> Ubuntu.
>>>> Db name is 'plist'
>>>>
>>>> curl http://localhost:5984/plist gives
>>>>
>>>>
>>>>
>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0,
>>>>
>>>>
>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"}
>>>>
>>>> suggesting a non-corrupt db
>>>>
>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>
>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>
>>>>
>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}},
>>>> curl: (56) Received problem 2 in the chunky
>>>> parser                                          <<--------- note curl
>>>> error
>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>
>>>> suggesting a chunked data transfer error
>>>>
>>>>
>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>
>>>> [...]
>>>>
>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>> [couchdb-lucene] ERROR Error updating index.
>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>   at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>   at
>>>>
>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>   at
>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141)
>>>>   at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>   at
>>>>
>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>   at
>>>>
>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229)
>>>>   at
>>>>
>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178)
>>>>   at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>   at java.lang.Thread.run(Thread.java:595)
>>>>
>>>>
>>>> suggesting a chunking problem again.
>>>>
>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>
>>>> Help?
>>>>
>>>>
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Chris Anderson <jc...@apache.org>.
On Wed, Jul 1, 2009 at 9:55 PM, Per Ejeklint<ej...@mac.com> wrote:
> I totally agree with Nitin. Check membership on entrance, not exit. :-)
>
> +1 for blocking of invalid unicode at input.
>
> /Per
>

To patch this you could just patch the ?JSON_DECODE macro to attempt
an encode. If it doesn't fail ?JSON_ENCODE we can let it save. Patches
welcome and it shouldn't be too hard.

Cheers,
Chris

>
> 1 jul 2009 kl. 21.49 skrev Nitin Borwankar:
>
>> It seems backward ( I am probably missing something huge) to not validate
>> it on input. If you catch all the bad stuff going in you're less likely
>> (except when you're doing internal transformations) to have bad stuff there
>> in the first place and you can save yourself validations on the way out.
>>
>> Nitin
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Adam Kocoloski <ko...@apache.org>.
On Jul 2, 2009, at 9:09 AM, Curt Arnold wrote:

>
> On Jul 1, 2009, at 2:55 PM, Per Ejeklint wrote:
>
>> I totally agree with Nitin. Check membership on entrance, not  
>> exit. :-)
>>
>> +1 for blocking of invalid unicode at input.
>>
>> /Per
>>
>>
>> 1 jul 2009 kl. 21.49 skrev Nitin Borwankar:
>>
>>> It seems backward ( I am probably missing something huge) to not  
>>> validate it on input. If you catch all the bad stuff going in  
>>> you're less likely (except when you're doing internal  
>>> transformations) to have bad stuff there in the first place and  
>>> you can save yourself validations on the way out.
>>>
>>> Nitin
>>
>
> The title probably should be changed, but https://issues.apache.org/jira/browse/COUCHDB-345 
>  appears to be the same issue to me.

Agreed.

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Curt Arnold <ca...@apache.org>.
On Jul 1, 2009, at 2:55 PM, Per Ejeklint wrote:

> I totally agree with Nitin. Check membership on entrance, not  
> exit. :-)
>
> +1 for blocking of invalid unicode at input.
>
> /Per
>
>
> 1 jul 2009 kl. 21.49 skrev Nitin Borwankar:
>
>> It seems backward ( I am probably missing something huge) to not  
>> validate it on input. If you catch all the bad stuff going in  
>> you're less likely (except when you're doing internal  
>> transformations) to have bad stuff there in the first place and you  
>> can save yourself validations on the way out.
>>
>> Nitin
>

The title probably should be changed, but https://issues.apache.org/jira/browse/COUCHDB-345 
  appears to be the same issue to me.

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Per Ejeklint <ej...@mac.com>.
I totally agree with Nitin. Check membership on entrance, not exit. :-)

+1 for blocking of invalid unicode at input.

/Per


1 jul 2009 kl. 21.49 skrev Nitin Borwankar:

> It seems backward ( I am probably missing something huge) to not  
> validate it on input. If you catch all the bad stuff going in you're  
> less likely (except when you're doing internal transformations) to  
> have bad stuff there in the first place and you can save yourself  
> validations on the way out.
>
> Nitin


Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Nitin Borwankar <ni...@borwankar.com>.
Chris Anderson wrote:
> On Tue, Jun 30, 2009 at 8:18 PM, Nitin Borwankar<ni...@borwankar.com> wrote:
>   
>> Hi Damien,
>>
>> Thanks for that tip.
>>
>> Turns out I had non-UTF-8 data
>>
>> adolfo.steiger-gar%E7%E3o:
>>
>> - not sure how it managed to get into the db.
>>
>> This is probably confusing the chunk termination.
>>
>> How did Couch let this data in ?
>>     
>
> Currently CouchDB doesn't validate json string contents on input, only
> on output.
>
> Adding an option to block invalid unicode input would be a small
> patch, but perhaps slow things down as we'd have to spend more time in
> the encoder while writing. Worth measuring I suppose.
>
>   

To think about it from the user's point of view - data probably gets 
read more often than written.  So if you validate it when putting it in, 
you're saving a whole bunch of unnecessary validations on reading.

It seems backward ( I am probably missing something huge) to not 
validate it on input. If you catch all the bad stuff going in you're 
less likely (except when you're doing internal transformations) to have 
bad stuff there in the first place and you can save yourself validations 
on the way out.

Nitin

> Is this something users are running into a lot? I've heard this once
> before, if lots of people are seeing this, it's definitely worthy of
> fixing.
>
>   




>   I uploaded via Python httplib - not
>   
>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>
>> Nitin
>>
>> 37% of all statistics are made up on the spot
>> -------------------------------------------------------------------------------------
>> Nitin Borwankar
>> nborwankar@gmail.com
>>
>>
>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org> wrote:
>>
>>     
>>> This might be the json encoding issue that Adam fixed.
>>>
>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try building
>>> and installing from the branch and see if that fixes the problem:
>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>
>>> -Damien
>>>
>>>
>>>
>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>
>>>  Oh and when I  use Futon and try to browse the docs around where curl
>>>       
>>>> gives
>>>> an error,  when I hit the page containing the records around the error
>>>> Futon
>>>> just spins and doesn't render the page.
>>>>
>>>> Data corruption?
>>>>
>>>> Nitin
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>         
>>>>> wrote:
>>>>>           
>>>>         
>>>>> Hi,
>>>>>
>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 instance
>>>>> on
>>>>> Ubuntu.
>>>>> Db name is 'plist'
>>>>>
>>>>> curl http://localhost:5984/plist gives
>>>>>
>>>>>
>>>>>
>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0,
>>>>>
>>>>>
>>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"}
>>>>>
>>>>> suggesting a non-corrupt db
>>>>>
>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>
>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>
>>>>>
>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}},
>>>>> curl: (56) Received problem 2 in the chunky
>>>>> parser                                          <<--------- note curl
>>>>> error
>>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>>
>>>>> suggesting a chunked data transfer error
>>>>>
>>>>>
>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>
>>>>> [...]
>>>>>
>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>>   at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>>   at
>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141)
>>>>>   at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178)
>>>>>   at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>>   at java.lang.Thread.run(Thread.java:595)
>>>>>
>>>>>
>>>>> suggesting a chunking problem again.
>>>>>
>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>
>>>>> Help?
>>>>>
>>>>>
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>>           
>
>
>
>   


Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Damien Katz <da...@apache.org>.
How where you doing the uploads? Via individual PUTs or, by bulk  
request(s)?

Because if via PUTs, it's might be URL parsing that needs validation.

-Damien

On Jul 1, 2009, at 3:38 PM, Nitin Borwankar wrote:

> Chris Anderson wrote:
>> [...]
>> Currently CouchDB doesn't validate json string contents on input,  
>> only
>> on output.
>>
>
> That seems problematic & inconsistent - if you let it in you should  
> at least let it be read.
> In my case I uploaded a ton of stuff, saw no errors and then a huge  
> barf when doing .../_all_docs - with unhelpful error messages about  
> chunked encoding.
>
> Can I request more useful error messages when you detect an encoding  
> problem, if you decide need to keep the current read/write behavior.
>
> Nitin
>
>> Adding an option to block invalid unicode input would be a small
>> patch, but perhaps slow things down as we'd have to spend more time  
>> in
>> the encoder while writing. Worth measuring I suppose.
>>
>> Is this something users are running into a lot? I've heard this once
>> before, if lots of people are seeing this, it's definitely worthy of
>> fixing.
>>
>>  I uploaded via Python httplib - not
>>
>>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>>
>>> Nitin
>>>
>>> 37% of all statistics are made up on the spot
>>> -------------------------------------------------------------------------------------
>>> Nitin Borwankar
>>> nborwankar@gmail.com
>>>
>>>
>>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org>  
>>> wrote:
>>>
>>>
>>>> This might be the json encoding issue that Adam fixed.
>>>>
>>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue.  
>>>> Try building
>>>> and installing from the branch and see if that fixes the problem:
>>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>>
>>>> -Damien
>>>>
>>>>
>>>>
>>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>>
>>>> Oh and when I  use Futon and try to browse the docs around where  
>>>> curl
>>>>
>>>>> gives
>>>>> an error,  when I hit the page containing the records around the  
>>>>> error
>>>>> Futon
>>>>> just spins and doesn't render the page.
>>>>>
>>>>> Data corruption?
>>>>>
>>>>> Nitin
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9  
>>>>>> instance
>>>>>> on
>>>>>> Ubuntu.
>>>>>> Db name is 'plist'
>>>>>>
>>>>>> curl http://localhost:5984/plist gives
>>>>>>
>>>>>>
>>>>>>
>>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count": 
>>>>>> 0,"update_seq":11036,"purge_seq":0,
>>>>>>
>>>>>>
>>>>>> "compact_running":false,"disk_size": 
>>>>>> 243325178,"instance_start_time":"1246228896723181"}
>>>>>>
>>>>>> suggesting a non-corrupt db
>>>>>>
>>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>>
>>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>>
>>>>>>
>>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value": 
>>>>>> {"rev":"1-1209124545"}},
>>>>>> curl: (56) Received problem 2 in the chunky
>>>>>> parser                                          <<---------  
>>>>>> note curl
>>>>>> error
>>>>>> {"id":"ado.adamu","key":"ado.adamu","value": 
>>>>>> {"rev":"1-4226951654"}}
>>>>>>
>>>>>> suggesting a chunked data transfer error
>>>>>>
>>>>>>
>>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java: 
>>>>>> 207)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient 
>>>>>> .ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient 
>>>>>> .ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java: 
>>>>>> 369)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>>>  at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient 
>>>>>> .AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java: 
>>>>>> 194)
>>>>>>  at
>>>>>>
>>>>>> org 
>>>>>> .apache 
>>>>>> .commons 
>>>>>> .httpclient 
>>>>>> .AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>>>  at
>>>>>> com 
>>>>>> .github.rnewson.couchdb.lucene.Database.execute(Database.java: 
>>>>>> 141)
>>>>>>  at  
>>>>>> com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>>  at
>>>>>>
>>>>>> com 
>>>>>> .github 
>>>>>> .rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java: 
>>>>>> 82)
>>>>>>  at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>>> $Indexer.updateDatabase(Index.java:229)
>>>>>>  at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>>> $Indexer.updateIndex(Index.java:178)
>>>>>>  at com.github.rnewson.couchdb.lucene.Index 
>>>>>> $Indexer.run(Index.java:90)
>>>>>>  at java.lang.Thread.run(Thread.java:595)
>>>>>>
>>>>>>
>>>>>> suggesting a chunking problem again.
>>>>>>
>>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>>
>>>>>> Help?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 37% of all statistics are made up on the spot
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------------------------
>>>>>> Nitin Borwankar
>>>>>> nborwankar@gmail.com
>>>>>>
>>>>>>
>>>>>>
>>
>>
>>
>>
>


Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Nitin Borwankar <ni...@borwankar.com>.
Chris Anderson wrote:
> [...]
> Currently CouchDB doesn't validate json string contents on input, only
> on output.
>   

That seems problematic & inconsistent - if you let it in you should at 
least let it be read.
In my case I uploaded a ton of stuff, saw no errors and then a huge barf 
when doing .../_all_docs - with unhelpful error messages about chunked 
encoding.

Can I request more useful error messages when you detect an encoding 
problem, if you decide need to keep the current read/write behavior.

Nitin

> Adding an option to block invalid unicode input would be a small
> patch, but perhaps slow things down as we'd have to spend more time in
> the encoder while writing. Worth measuring I suppose.
>
> Is this something users are running into a lot? I've heard this once
> before, if lots of people are seeing this, it's definitely worthy of
> fixing.
>
>   I uploaded via Python httplib - not
>   
>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>
>> Nitin
>>
>> 37% of all statistics are made up on the spot
>> -------------------------------------------------------------------------------------
>> Nitin Borwankar
>> nborwankar@gmail.com
>>
>>
>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org> wrote:
>>
>>     
>>> This might be the json encoding issue that Adam fixed.
>>>
>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try building
>>> and installing from the branch and see if that fixes the problem:
>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>
>>> -Damien
>>>
>>>
>>>
>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>
>>>  Oh and when I  use Futon and try to browse the docs around where curl
>>>       
>>>> gives
>>>> an error,  when I hit the page containing the records around the error
>>>> Futon
>>>> just spins and doesn't render the page.
>>>>
>>>> Data corruption?
>>>>
>>>> Nitin
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>         
>>>>> wrote:
>>>>>           
>>>>         
>>>>> Hi,
>>>>>
>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 instance
>>>>> on
>>>>> Ubuntu.
>>>>> Db name is 'plist'
>>>>>
>>>>> curl http://localhost:5984/plist gives
>>>>>
>>>>>
>>>>>
>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0,
>>>>>
>>>>>
>>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"}
>>>>>
>>>>> suggesting a non-corrupt db
>>>>>
>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>
>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>
>>>>>
>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}},
>>>>> curl: (56) Received problem 2 in the chunky
>>>>> parser                                          <<--------- note curl
>>>>> error
>>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>>
>>>>> suggesting a chunked data transfer error
>>>>>
>>>>>
>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>
>>>>> [...]
>>>>>
>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>>   at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>>   at
>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141)
>>>>>   at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178)
>>>>>   at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>>   at java.lang.Thread.run(Thread.java:595)
>>>>>
>>>>>
>>>>> suggesting a chunking problem again.
>>>>>
>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>
>>>>> Help?
>>>>>
>>>>>
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>>           
>
>
>
>