You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Adam Kocoloski <ko...@apache.org> on 2009/06/30 20:47:50 UTC

Re: chunked encoding problem ? - error messages from curl as well as lucene

Hi Nitin, the specific bug I fixed only affected Unicode characters  
outside the Basic Multilingual Plane.   CouchDB would happily accept  
those characters in raw UTF-8 format, and would serve them back to the  
user escaped as UTF-16 surrogate pairs.  However, CouchDB would not  
allow users to upload documents where the characters were already  
escaped.  That's been fixed in 0.9.1

It looks like you've got a different problem.  It might be the case  
that we are too permissive in what we accept as raw UTF-8 in the  
upload.  I don't know.  Best,

Adam

On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:

> Hi Damien,
>
> Thanks for that tip.
>
> Turns out I had non-UTF-8 data
>
> adolfo.steiger-gar%E7%E3o:
>
> - not sure how it managed to get into the db.
>
> This is probably confusing the chunk termination.
>
> How did Couch let this data in ?  I uploaded via Python httplib - not
> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>
> Nitin
>
> 37% of all statistics are made up on the spot
> -------------------------------------------------------------------------------------
> Nitin Borwankar
> nborwankar@gmail.com
>
>
> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org>  
> wrote:
>
>> This might be the json encoding issue that Adam fixed.
>>
>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try  
>> building
>> and installing from the branch and see if that fixes the problem:
>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>
>> -Damien
>>
>>
>>
>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>
>> Oh and when I  use Futon and try to browse the docs around where curl
>>> gives
>>> an error,  when I hit the page containing the records around the  
>>> error
>>> Futon
>>> just spins and doesn't render the page.
>>>
>>> Data corruption?
>>>
>>> Nitin
>>>
>>> 37% of all statistics are made up on the spot
>>>
>>> -------------------------------------------------------------------------------------
>>> Nitin Borwankar
>>> nborwankar@gmail.com
>>>
>>>
>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar  
>>> <nitin@borwankar.com
>>>> wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9  
>>>> instance
>>>> on
>>>> Ubuntu.
>>>> Db name is 'plist'
>>>>
>>>> curl http://localhost:5984/plist gives
>>>>
>>>>
>>>>
>>>> {"db_name":"plist","doc_count":11036,"doc_del_count": 
>>>> 0,"update_seq":11036,"purge_seq":0,
>>>>
>>>>
>>>> "compact_running":false,"disk_size": 
>>>> 243325178,"instance_start_time":"1246228896723181"}
>>>>
>>>> suggesting a non-corrupt db
>>>>
>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>
>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>
>>>>
>>>> {"id":"adnen.chockri","key":"adnen.chockri","value": 
>>>> {"rev":"1-1209124545"}},
>>>> curl: (56) Received problem 2 in the chunky
>>>> parser                                          <<--------- note  
>>>> curl
>>>> error
>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>
>>>> suggesting a chunked data transfer error
>>>>
>>>>
>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>
>>>> [...]
>>>>
>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>> [couchdb-lucene] ERROR Error updating index.
>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java: 
>>>> 207)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java: 
>>>> 219)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient 
>>>> .ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>  at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient 
>>>> .AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>  at
>>>>
>>>> org 
>>>> .apache 
>>>> .commons 
>>>> .httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java: 
>>>> 158)
>>>>  at
>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java: 
>>>> 141)
>>>>  at com.github.rnewson.couchdb.lucene.Database.get(Database.java: 
>>>> 107)
>>>>  at
>>>>
>>>> com 
>>>> .github 
>>>> .rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>  at
>>>>
>>>> com.github.rnewson.couchdb.lucene.Index 
>>>> $Indexer.updateDatabase(Index.java:229)
>>>>  at
>>>>
>>>> com.github.rnewson.couchdb.lucene.Index 
>>>> $Indexer.updateIndex(Index.java:178)
>>>>  at com.github.rnewson.couchdb.lucene.Index 
>>>> $Indexer.run(Index.java:90)
>>>>  at java.lang.Thread.run(Thread.java:595)
>>>>
>>>>
>>>> suggesting a chunking problem again.
>>>>
>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>
>>>> Help?
>>>>
>>>>
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Nitin Borwankar <ni...@borwankar.com>.


Hi Damien,

There's 11,000 + docs in there and I am not sure there aren't other such 
cases.
So I am going to clean the data and re-create the database, rathre than 
pick thru them one at a time.

[[ For the list members: -  the iconv utility on Unix is very helpful ]]

I will create a bug report.  Some deadlines around mid day.  So maybe 
late in the day or tomorrow.

Nitin

Damien Katz wrote:
> Nitin, I would try to purge the bad document, using the _purge api 
> (deleting the document can still cause problems as we'll keep around a 
> deletion stub with the bad id), then things should be fixed. But 
> you'll have to know the rev id of the document to use it, which might 
> be hard to get via http.
>
> Purge:
>
> POST /db/_purge
> {"thedocid": "therevid"}
>
> Unless somehow the file got corrupted, this is definitely a CouchDB 
> bug, we shouldn't accept a string we can't later return to the caller. 
> Can you create a bug report? Adding failing test case would be the 
> best, but attaching the bad string will also do.
>
> -Damien
>
>
> On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:
>
>> Hi Nitin, the specific bug I fixed only affected Unicode characters 
>> outside the Basic Multilingual Plane.   CouchDB would happily accept 
>> those characters in raw UTF-8 format, and would serve them back to 
>> the user escaped as UTF-16 surrogate pairs.  However, CouchDB would 
>> not allow users to upload documents where the characters were already 
>> escaped.  That's been fixed in 0.9.1
>>
>> It looks like you've got a different problem.  It might be the case 
>> that we are too permissive in what we accept as raw UTF-8 in the 
>> upload.  I don't know.  Best,
>>
>> Adam
>>
>> On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
>>
>>> Hi Damien,
>>>
>>> Thanks for that tip.
>>>
>>> Turns out I had non-UTF-8 data
>>>
>>> adolfo.steiger-gar%E7%E3o:
>>>
>>> - not sure how it managed to get into the db.
>>>
>>> This is probably confusing the chunk termination.
>>>
>>> How did Couch let this data in ?  I uploaded via Python httplib - not
>>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>>
>>> Nitin
>>>
>>> 37% of all statistics are made up on the spot
>>> ------------------------------------------------------------------------------------- 
>>>
>>> Nitin Borwankar
>>> nborwankar@gmail.com
>>>
>>>
>>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org> wrote:
>>>
>>>> This might be the json encoding issue that Adam fixed.
>>>>
>>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try 
>>>> building
>>>> and installing from the branch and see if that fixes the problem:
>>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>>
>>>> -Damien
>>>>
>>>>
>>>>
>>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>>
>>>> Oh and when I  use Futon and try to browse the docs around where curl
>>>>> gives
>>>>> an error,  when I hit the page containing the records around the 
>>>>> error
>>>>> Futon
>>>>> just spins and doesn't render the page.
>>>>>
>>>>> Data corruption?
>>>>>
>>>>> Nitin
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>> ------------------------------------------------------------------------------------- 
>>>>>
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>>> wrote:
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 
>>>>>> instance
>>>>>> on
>>>>>> Ubuntu.
>>>>>> Db name is 'plist'
>>>>>>
>>>>>> curl http://localhost:5984/plist gives
>>>>>>
>>>>>>
>>>>>>
>>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0, 
>>>>>>
>>>>>>
>>>>>>
>>>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"} 
>>>>>>
>>>>>>
>>>>>> suggesting a non-corrupt db
>>>>>>
>>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>>
>>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>>
>>>>>>
>>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}}, 
>>>>>>
>>>>>> curl: (56) Received problem 2 in the chunky
>>>>>> parser                                          <<--------- note 
>>>>>> curl
>>>>>> error
>>>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>>>
>>>>>> suggesting a chunked data transfer error
>>>>>>
>>>>>>
>>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346) 
>>>>>>
>>>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) 
>>>>>>
>>>>>> at
>>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141) 
>>>>>>
>>>>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>> at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178) 
>>>>>>
>>>>>> at 
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>>> at java.lang.Thread.run(Thread.java:595)
>>>>>>
>>>>>>
>>>>>> suggesting a chunking problem again.
>>>>>>
>>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>>
>>>>>> Help?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 37% of all statistics are made up on the spot
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------------- 
>>>>>>
>>>>>> Nitin Borwankar
>>>>>> nborwankar@gmail.com
>>>>>>
>>>>>>
>>>>
>>
>

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Damien Katz <da...@apache.org>.

There is the purge test case:
https://svn.apache.org/repos/asf/couchdb/trunk/share/www/script/test/purge.js

Purge removes a document completely from the database, where delete  
puts the document into a "deleted" state. The reason is replication,  
to replicate a document deletion you need record of it. But when  you  
purge a document, the document meta data is removed to, so it's not  
possible to replicate a document purge.

-Damien

On Jul 1, 2009, at 12:25 PM, Zachary Zolton wrote:

> LOL! Yet another URL handler I have never heard of!?
>
> (Not listed in the httpd_global_handlers section of the config,  
> either...)
>
> So, what's the semantic difference between _purge and DELETE of a  
> document?
>
> On Wed, Jul 1, 2009 at 11:21 AM, Damien Katz<da...@apache.org> wrote:
>> Nitin, I would try to purge the bad document, using the _purge api  
>> (deleting
>> the document can still cause problems as we'll keep around a  
>> deletion stub
>> with the bad id), then things should be fixed. But you'll have to  
>> know the
>> rev id of the document to use it, which might be hard to get via  
>> http.
>>
>> Purge:
>>
>> POST /db/_purge
>> {"thedocid": "therevid"}
>>
>> Unless somehow the file got corrupted, this is definitely a CouchDB  
>> bug, we
>> shouldn't accept a string we can't later return to the caller. Can  
>> you
>> create a bug report? Adding failing test case would be the best, but
>> attaching the bad string will also do.
>>
>> -Damien
>>
>>
>> On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:
>>
>>> Hi Nitin, the specific bug I fixed only affected Unicode characters
>>> outside the Basic Multilingual Plane.   CouchDB would happily  
>>> accept those
>>> characters in raw UTF-8 format, and would serve them back to the  
>>> user
>>> escaped as UTF-16 surrogate pairs.  However, CouchDB would not  
>>> allow users
>>> to upload documents where the characters were already escaped.   
>>> That's been
>>> fixed in 0.9.1
>>>
>>> It looks like you've got a different problem.  It might be the  
>>> case that
>>> we are too permissive in what we accept as raw UTF-8 in the  
>>> upload.  I don't
>>> know.  Best,
>>>
>>> Adam
>>>
>>> On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
>>>
>>>> Hi Damien,
>>>>
>>>> Thanks for that tip.
>>>>
>>>> Turns out I had non-UTF-8 data
>>>>
>>>> adolfo.steiger-gar%E7%E3o:
>>>>
>>>> - not sure how it managed to get into the db.
>>>>
>>>> This is probably confusing the chunk termination.
>>>>
>>>> How did Couch let this data in ?  I uploaded via Python httplib -  
>>>> not
>>>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>>>
>>>> Nitin
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org>  
>>>> wrote:
>>>>
>>>>> This might be the json encoding issue that Adam fixed.
>>>>>
>>>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try
>>>>> building
>>>>> and installing from the branch and see if that fixes the problem:
>>>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>>>
>>>>> -Damien
>>>>>
>>>>>
>>>>>
>>>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>>>
>>>>> Oh and when I  use Futon and try to browse the docs around where  
>>>>> curl
>>>>>>
>>>>>> gives
>>>>>> an error,  when I hit the page containing the records around  
>>>>>> the error
>>>>>> Futon
>>>>>> just spins and doesn't render the page.
>>>>>>
>>>>>> Data corruption?
>>>>>>
>>>>>> Nitin
>>>>>>
>>>>>> 37% of all statistics are made up on the spot
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------------------------
>>>>>> Nitin Borwankar
>>>>>> nborwankar@gmail.com
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9
>>>>>>> instance
>>>>>>> on
>>>>>>> Ubuntu.
>>>>>>> Db name is 'plist'
>>>>>>>
>>>>>>> curl http://localhost:5984/plist gives
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count": 
>>>>>>> 0,"update_seq":11036,"purge_seq":0,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "compact_running":false,"disk_size": 
>>>>>>> 243325178,"instance_start_time":"1246228896723181"}
>>>>>>>
>>>>>>> suggesting a non-corrupt db
>>>>>>>
>>>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>>>
>>>>>>> {"id":"adnanmoh","key":"adnanmoh","value": 
>>>>>>> {"rev":"1-663736558"}},
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value": 
>>>>>>> {"rev":"1-1209124545"}},
>>>>>>> curl: (56) Received problem 2 in the chunky
>>>>>>> parser                                          <<---------  
>>>>>>> note curl
>>>>>>> error
>>>>>>> {"id":"ado.adamu","key":"ado.adamu","value": 
>>>>>>> {"rev":"1-4226951654"}}
>>>>>>>
>>>>>>> suggesting a chunked data transfer error
>>>>>>>
>>>>>>>
>>>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient 
>>>>>>> .ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient 
>>>>>>> .ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient 
>>>>>>> .ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java: 
>>>>>>> 369)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient.ChunkedInputStream.close(ChunkedInputStream.java: 
>>>>>>> 346)
>>>>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient 
>>>>>>> .AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java: 
>>>>>>> 194)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .commons 
>>>>>>> .httpclient 
>>>>>>> .AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>>>> at
>>>>>>> com 
>>>>>>> .github.rnewson.couchdb.lucene.Database.execute(Database.java: 
>>>>>>> 141)
>>>>>>> at  
>>>>>>> com.github.rnewson.couchdb.lucene.Database.get(Database.java: 
>>>>>>> 107)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> com 
>>>>>>> .github 
>>>>>>> .rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java: 
>>>>>>> 82)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>>>> $Indexer.updateDatabase(Index.java:229)
>>>>>>> at
>>>>>>>
>>>>>>>
>>>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>>>> $Indexer.updateIndex(Index.java:178)
>>>>>>> at com.github.rnewson.couchdb.lucene.Index 
>>>>>>> $Indexer.run(Index.java:90)
>>>>>>> at java.lang.Thread.run(Thread.java:595)
>>>>>>>
>>>>>>>
>>>>>>> suggesting a chunking problem again.
>>>>>>>
>>>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>>>
>>>>>>> Help?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 37% of all statistics are made up on the spot
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------------
>>>>>>> Nitin Borwankar
>>>>>>> nborwankar@gmail.com
>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>>

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Zachary Zolton <za...@gmail.com>.

LOL! Yet another URL handler I have never heard of!?

(Not listed in the httpd_global_handlers section of the config, either...)

So, what's the semantic difference between _purge and DELETE of a document?

On Wed, Jul 1, 2009 at 11:21 AM, Damien Katz<da...@apache.org> wrote:
> Nitin, I would try to purge the bad document, using the _purge api (deleting
> the document can still cause problems as we'll keep around a deletion stub
> with the bad id), then things should be fixed. But you'll have to know the
> rev id of the document to use it, which might be hard to get via http.
>
> Purge:
>
> POST /db/_purge
> {"thedocid": "therevid"}
>
> Unless somehow the file got corrupted, this is definitely a CouchDB bug, we
> shouldn't accept a string we can't later return to the caller. Can you
> create a bug report? Adding failing test case would be the best, but
> attaching the bad string will also do.
>
> -Damien
>
>
> On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:
>
>> Hi Nitin, the specific bug I fixed only affected Unicode characters
>> outside the Basic Multilingual Plane.   CouchDB would happily accept those
>> characters in raw UTF-8 format, and would serve them back to the user
>> escaped as UTF-16 surrogate pairs.  However, CouchDB would not allow users
>> to upload documents where the characters were already escaped.  That's been
>> fixed in 0.9.1
>>
>> It looks like you've got a different problem.  It might be the case that
>> we are too permissive in what we accept as raw UTF-8 in the upload.  I don't
>> know.  Best,
>>
>> Adam
>>
>> On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
>>
>>> Hi Damien,
>>>
>>> Thanks for that tip.
>>>
>>> Turns out I had non-UTF-8 data
>>>
>>> adolfo.steiger-gar%E7%E3o:
>>>
>>> - not sure how it managed to get into the db.
>>>
>>> This is probably confusing the chunk termination.
>>>
>>> How did Couch let this data in ?  I uploaded via Python httplib - not
>>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>>
>>> Nitin
>>>
>>> 37% of all statistics are made up on the spot
>>>
>>> -------------------------------------------------------------------------------------
>>> Nitin Borwankar
>>> nborwankar@gmail.com
>>>
>>>
>>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org> wrote:
>>>
>>>> This might be the json encoding issue that Adam fixed.
>>>>
>>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try
>>>> building
>>>> and installing from the branch and see if that fixes the problem:
>>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>>
>>>> -Damien
>>>>
>>>>
>>>>
>>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>>
>>>> Oh and when I  use Futon and try to browse the docs around where curl
>>>>>
>>>>> gives
>>>>> an error,  when I hit the page containing the records around the error
>>>>> Futon
>>>>> just spins and doesn't render the page.
>>>>>
>>>>> Data corruption?
>>>>>
>>>>> Nitin
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>>>
>>>>>> wrote:
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9
>>>>>> instance
>>>>>> on
>>>>>> Ubuntu.
>>>>>> Db name is 'plist'
>>>>>>
>>>>>> curl http://localhost:5984/plist gives
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0,
>>>>>>
>>>>>>
>>>>>>
>>>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"}
>>>>>>
>>>>>> suggesting a non-corrupt db
>>>>>>
>>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>>
>>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>>
>>>>>>
>>>>>>
>>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}},
>>>>>> curl: (56) Received problem 2 in the chunky
>>>>>> parser                                          <<--------- note curl
>>>>>> error
>>>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>>>
>>>>>> suggesting a chunked data transfer error
>>>>>>
>>>>>>
>>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>>> at
>>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141)
>>>>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229)
>>>>>> at
>>>>>>
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178)
>>>>>> at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>>> at java.lang.Thread.run(Thread.java:595)
>>>>>>
>>>>>>
>>>>>> suggesting a chunking problem again.
>>>>>>
>>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>>
>>>>>> Help?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 37% of all statistics are made up on the spot
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------------------------
>>>>>> Nitin Borwankar
>>>>>> nborwankar@gmail.com
>>>>>>
>>>>>>
>>>>
>>
>
>

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Nitin Borwankar <ni...@borwankar.com>.


Hi Damien,

There's 11,000 + docs in there and I am not sure there aren't other such 
cases.
So I am going to clean the data and re-create the database, rathre than 
pick thru them one at a time.

[[ For the list members: -  the iconv utility on Unix is very helpful ]]

I will create a bug report.  Some deadlines around mid day.  So maybe 
late in the day or tomorrow.

Nitin

Damien Katz wrote:
> Nitin, I would try to purge the bad document, using the _purge api 
> (deleting the document can still cause problems as we'll keep around a 
> deletion stub with the bad id), then things should be fixed. But 
> you'll have to know the rev id of the document to use it, which might 
> be hard to get via http.
>
> Purge:
>
> POST /db/_purge
> {"thedocid": "therevid"}
>
> Unless somehow the file got corrupted, this is definitely a CouchDB 
> bug, we shouldn't accept a string we can't later return to the caller. 
> Can you create a bug report? Adding failing test case would be the 
> best, but attaching the bad string will also do.
>
> -Damien
>
>
> On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:
>
>> Hi Nitin, the specific bug I fixed only affected Unicode characters 
>> outside the Basic Multilingual Plane.   CouchDB would happily accept 
>> those characters in raw UTF-8 format, and would serve them back to 
>> the user escaped as UTF-16 surrogate pairs.  However, CouchDB would 
>> not allow users to upload documents where the characters were already 
>> escaped.  That's been fixed in 0.9.1
>>
>> It looks like you've got a different problem.  It might be the case 
>> that we are too permissive in what we accept as raw UTF-8 in the 
>> upload.  I don't know.  Best,
>>
>> Adam
>>
>> On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
>>
>>> Hi Damien,
>>>
>>> Thanks for that tip.
>>>
>>> Turns out I had non-UTF-8 data
>>>
>>> adolfo.steiger-gar%E7%E3o:
>>>
>>> - not sure how it managed to get into the db.
>>>
>>> This is probably confusing the chunk termination.
>>>
>>> How did Couch let this data in ?  I uploaded via Python httplib - not
>>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>>
>>> Nitin
>>>
>>> 37% of all statistics are made up on the spot
>>> ------------------------------------------------------------------------------------- 
>>>
>>> Nitin Borwankar
>>> nborwankar@gmail.com
>>>
>>>
>>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org> wrote:
>>>
>>>> This might be the json encoding issue that Adam fixed.
>>>>
>>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try 
>>>> building
>>>> and installing from the branch and see if that fixes the problem:
>>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>>
>>>> -Damien
>>>>
>>>>
>>>>
>>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>>
>>>> Oh and when I  use Futon and try to browse the docs around where curl
>>>>> gives
>>>>> an error,  when I hit the page containing the records around the 
>>>>> error
>>>>> Futon
>>>>> just spins and doesn't render the page.
>>>>>
>>>>> Data corruption?
>>>>>
>>>>> Nitin
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>> ------------------------------------------------------------------------------------- 
>>>>>
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>>> wrote:
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 
>>>>>> instance
>>>>>> on
>>>>>> Ubuntu.
>>>>>> Db name is 'plist'
>>>>>>
>>>>>> curl http://localhost:5984/plist gives
>>>>>>
>>>>>>
>>>>>>
>>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0, 
>>>>>>
>>>>>>
>>>>>>
>>>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"} 
>>>>>>
>>>>>>
>>>>>> suggesting a non-corrupt db
>>>>>>
>>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>>
>>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>>
>>>>>>
>>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}}, 
>>>>>>
>>>>>> curl: (56) Received problem 2 in the chunky
>>>>>> parser                                          <<--------- note 
>>>>>> curl
>>>>>> error
>>>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>>>
>>>>>> suggesting a chunked data transfer error
>>>>>>
>>>>>>
>>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346) 
>>>>>>
>>>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) 
>>>>>>
>>>>>> at
>>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141) 
>>>>>>
>>>>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>> at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229) 
>>>>>>
>>>>>> at
>>>>>>
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178) 
>>>>>>
>>>>>> at 
>>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>>> at java.lang.Thread.run(Thread.java:595)
>>>>>>
>>>>>>
>>>>>> suggesting a chunking problem again.
>>>>>>
>>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>>
>>>>>> Help?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 37% of all statistics are made up on the spot
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------------- 
>>>>>>
>>>>>> Nitin Borwankar
>>>>>> nborwankar@gmail.com
>>>>>>
>>>>>>
>>>>
>>
>

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Damien Katz <da...@apache.org>.

Nitin, I would try to purge the bad document, using the _purge api  
(deleting the document can still cause problems as we'll keep around a  
deletion stub with the bad id), then things should be fixed. But  
you'll have to know the rev id of the document to use it, which might  
be hard to get via http.

Purge:

POST /db/_purge
{"thedocid": "therevid"}

Unless somehow the file got corrupted, this is definitely a CouchDB  
bug, we shouldn't accept a string we can't later return to the caller.  
Can you create a bug report? Adding failing test case would be the  
best, but attaching the bad string will also do.

-Damien


On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:

> Hi Nitin, the specific bug I fixed only affected Unicode characters  
> outside the Basic Multilingual Plane.   CouchDB would happily accept  
> those characters in raw UTF-8 format, and would serve them back to  
> the user escaped as UTF-16 surrogate pairs.  However, CouchDB would  
> not allow users to upload documents where the characters were  
> already escaped.  That's been fixed in 0.9.1
>
> It looks like you've got a different problem.  It might be the case  
> that we are too permissive in what we accept as raw UTF-8 in the  
> upload.  I don't know.  Best,
>
> Adam
>
> On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
>
>> Hi Damien,
>>
>> Thanks for that tip.
>>
>> Turns out I had non-UTF-8 data
>>
>> adolfo.steiger-gar%E7%E3o:
>>
>> - not sure how it managed to get into the db.
>>
>> This is probably confusing the chunk termination.
>>
>> How did Couch let this data in ?  I uploaded via Python httplib - not
>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>
>> Nitin
>>
>> 37% of all statistics are made up on the spot
>> -------------------------------------------------------------------------------------
>> Nitin Borwankar
>> nborwankar@gmail.com
>>
>>
>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org>  
>> wrote:
>>
>>> This might be the json encoding issue that Adam fixed.
>>>
>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try  
>>> building
>>> and installing from the branch and see if that fixes the problem:
>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>
>>> -Damien
>>>
>>>
>>>
>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>
>>> Oh and when I  use Futon and try to browse the docs around where  
>>> curl
>>>> gives
>>>> an error,  when I hit the page containing the records around the  
>>>> error
>>>> Futon
>>>> just spins and doesn't render the page.
>>>>
>>>> Data corruption?
>>>>
>>>> Nitin
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>> wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9  
>>>>> instance
>>>>> on
>>>>> Ubuntu.
>>>>> Db name is 'plist'
>>>>>
>>>>> curl http://localhost:5984/plist gives
>>>>>
>>>>>
>>>>>
>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count": 
>>>>> 0,"update_seq":11036,"purge_seq":0,
>>>>>
>>>>>
>>>>> "compact_running":false,"disk_size": 
>>>>> 243325178,"instance_start_time":"1246228896723181"}
>>>>>
>>>>> suggesting a non-corrupt db
>>>>>
>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>
>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>
>>>>>
>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value": 
>>>>> {"rev":"1-1209124545"}},
>>>>> curl: (56) Received problem 2 in the chunky
>>>>> parser                                          <<--------- note  
>>>>> curl
>>>>> error
>>>>> {"id":"ado.adamu","key":"ado.adamu","value": 
>>>>> {"rev":"1-4226951654"}}
>>>>>
>>>>> suggesting a chunked data transfer error
>>>>>
>>>>>
>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>
>>>>> [...]
>>>>>
>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java: 
>>>>> 207)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java: 
>>>>> 219)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient 
>>>>> .ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java: 
>>>>> 369)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient 
>>>>> .AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java: 
>>>>> 158)
>>>>> at
>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java: 
>>>>> 141)
>>>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java: 
>>>>> 107)
>>>>> at
>>>>>
>>>>> com 
>>>>> .github 
>>>>> .rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>> at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>> $Indexer.updateDatabase(Index.java:229)
>>>>> at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>> $Indexer.updateIndex(Index.java:178)
>>>>> at com.github.rnewson.couchdb.lucene.Index 
>>>>> $Indexer.run(Index.java:90)
>>>>> at java.lang.Thread.run(Thread.java:595)
>>>>>
>>>>>
>>>>> suggesting a chunking problem again.
>>>>>
>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>
>>>>> Help?
>>>>>
>>>>>
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>
>

Re: chunked encoding problem ? - error messages from curl as well as lucene

Posted by Damien Katz <da...@apache.org>.

Nitin, I would try to purge the bad document, using the _purge api  
(deleting the document can still cause problems as we'll keep around a  
deletion stub with the bad id), then things should be fixed. But  
you'll have to know the rev id of the document to use it, which might  
be hard to get via http.

Purge:

POST /db/_purge
{"thedocid": "therevid"}

Unless somehow the file got corrupted, this is definitely a CouchDB  
bug, we shouldn't accept a string we can't later return to the caller.  
Can you create a bug report? Adding failing test case would be the  
best, but attaching the bad string will also do.

-Damien


On Jun 30, 2009, at 2:47 PM, Adam Kocoloski wrote:

> Hi Nitin, the specific bug I fixed only affected Unicode characters  
> outside the Basic Multilingual Plane.   CouchDB would happily accept  
> those characters in raw UTF-8 format, and would serve them back to  
> the user escaped as UTF-16 surrogate pairs.  However, CouchDB would  
> not allow users to upload documents where the characters were  
> already escaped.  That's been fixed in 0.9.1
>
> It looks like you've got a different problem.  It might be the case  
> that we are too permissive in what we accept as raw UTF-8 in the  
> upload.  I don't know.  Best,
>
> Adam
>
> On Jun 30, 2009, at 2:18 PM, Nitin Borwankar wrote:
>
>> Hi Damien,
>>
>> Thanks for that tip.
>>
>> Turns out I had non-UTF-8 data
>>
>> adolfo.steiger-gar%E7%E3o:
>>
>> - not sure how it managed to get into the db.
>>
>> This is probably confusing the chunk termination.
>>
>> How did Couch let this data in ?  I uploaded via Python httplib - not
>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>
>> Nitin
>>
>> 37% of all statistics are made up on the spot
>> -------------------------------------------------------------------------------------
>> Nitin Borwankar
>> nborwankar@gmail.com
>>
>>
>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <da...@apache.org>  
>> wrote:
>>
>>> This might be the json encoding issue that Adam fixed.
>>>
>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try  
>>> building
>>> and installing from the branch and see if that fixes the problem:
>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>
>>> -Damien
>>>
>>>
>>>
>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>
>>> Oh and when I  use Futon and try to browse the docs around where  
>>> curl
>>>> gives
>>>> an error,  when I hit the page containing the records around the  
>>>> error
>>>> Futon
>>>> just spins and doesn't render the page.
>>>>
>>>> Data corruption?
>>>>
>>>> Nitin
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>> wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9  
>>>>> instance
>>>>> on
>>>>> Ubuntu.
>>>>> Db name is 'plist'
>>>>>
>>>>> curl http://localhost:5984/plist gives
>>>>>
>>>>>
>>>>>
>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count": 
>>>>> 0,"update_seq":11036,"purge_seq":0,
>>>>>
>>>>>
>>>>> "compact_running":false,"disk_size": 
>>>>> 243325178,"instance_start_time":"1246228896723181"}
>>>>>
>>>>> suggesting a non-corrupt db
>>>>>
>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>
>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>
>>>>>
>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value": 
>>>>> {"rev":"1-1209124545"}},
>>>>> curl: (56) Received problem 2 in the chunky
>>>>> parser                                          <<--------- note  
>>>>> curl
>>>>> error
>>>>> {"id":"ado.adamu","key":"ado.adamu","value": 
>>>>> {"rev":"1-4226951654"}}
>>>>>
>>>>> suggesting a chunked data transfer error
>>>>>
>>>>>
>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>
>>>>> [...]
>>>>>
>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java: 
>>>>> 207)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java: 
>>>>> 219)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient 
>>>>> .ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java: 
>>>>> 369)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient 
>>>>> .AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>> at
>>>>>
>>>>> org 
>>>>> .apache 
>>>>> .commons 
>>>>> .httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java: 
>>>>> 158)
>>>>> at
>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java: 
>>>>> 141)
>>>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java: 
>>>>> 107)
>>>>> at
>>>>>
>>>>> com 
>>>>> .github 
>>>>> .rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>> at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>> $Indexer.updateDatabase(Index.java:229)
>>>>> at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index 
>>>>> $Indexer.updateIndex(Index.java:178)
>>>>> at com.github.rnewson.couchdb.lucene.Index 
>>>>> $Indexer.run(Index.java:90)
>>>>> at java.lang.Thread.run(Thread.java:595)
>>>>>
>>>>>
>>>>> suggesting a chunking problem again.
>>>>>
>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>
>>>>> Help?
>>>>>
>>>>>
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>
>