You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Darrell Huang <ji...@memego.com> on 2009/05/25 06:07:55 UTC

I need to know how to parse attachment data

Hi, everyone, I got some trouble in reading the attachment data from 
documents with the HTTP API. When an attachment is downloaded with the API, 
some extra data is found before and after the attachment data, and the data 
can be different at different times.
For example, when I downloaded an attachment with the jpg format several 
weeks ago, there is a character sequence "dbc\r\n" preceding the actual file 
data, but when I did that today, it was "a24\r\n".
I don't know what this extra data means. Can someone tell me how to parse 
the packet content and extract the actual attachment data from it? 


Re: I need to know how to parse attachment data

Posted by Adam Kocoloski <ko...@apache.org>.
On May 26, 2009, at 3:59 AM, Brian Candler wrote:

> On Mon, May 25, 2009 at 12:07:55PM +0800, Darrell Huang wrote:
>> Hi, everyone, I got some trouble in reading the attachment data from
>> documents with the HTTP API. When an attachment is downloaded with  
>> the
>> API, some extra data is found before and after the attachment data,  
>> and
>> the data can be different at different times.
>> For example, when I downloaded an attachment with the jpg format  
>> several
>> weeks ago, there is a character sequence "dbc\r\n" preceding the  
>> actual
>> file data, but when I did that today, it was "a24\r\n".
>
> That would be HTTP chunking, wouldn't it? If you send a request  
> advertising
> HTTP/1.1 capability, then you are required to accept
> Transfer-Encoding: Chunked
> in the response.
>
> Regards,
>
> Brian.

Yep, and those character sequences are hex representations of the  
length of the chunk.

Adam

Re: I need to know how to parse attachment data

Posted by Brian Candler <B....@pobox.com>.
On Mon, May 25, 2009 at 12:07:55PM +0800, Darrell Huang wrote:
> Hi, everyone, I got some trouble in reading the attachment data from  
> documents with the HTTP API. When an attachment is downloaded with the 
> API, some extra data is found before and after the attachment data, and 
> the data can be different at different times.
> For example, when I downloaded an attachment with the jpg format several  
> weeks ago, there is a character sequence "dbc\r\n" preceding the actual 
> file data, but when I did that today, it was "a24\r\n".

That would be HTTP chunking, wouldn't it? If you send a request advertising
HTTP/1.1 capability, then you are required to accept
Transfer-Encoding: Chunked
in the response.

Regards,

Brian.

Re: I need to know how to parse attachment data

Posted by Paul Davis <pa...@gmail.com>.
An easy way to try and replicate the problem is to use curl to mimic
the sequence of call's you're trying to perform with your library. As
you're the first person to find this issue and attachments are in
extensive use my guess is that the problem is most likely in your
socket code. Granted its not impossible you've stumbled onto a bug.

HTH,
Paul Davis

On Mon, May 25, 2009 at 2:52 AM, Darrell Huang <ji...@memego.com> wrote:
> I directly used socket to send the requests.
> The messages I sent were like
>
> GET /test_db/test_doc/c.jpg HTTP/1.1
>
> Do CouchDB users usually send requests with some message wrappers so they
> never meet such a problem?
>
>>
>> can you post a Curl script or otherwise so we can attempt to repeat
>> your experience? It could be in your HTTP implementation (or it could
>> be us). The first step to find out is a repeatable test.
>>
>> Thanks,
>> Chris
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>
>

Re: I need to know how to parse attachment data

Posted by Antony Blakey <an...@gmail.com>.
On 25/05/2009, at 4:22 PM, Darrell Huang wrote:

> Do CouchDB users usually send requests with some message wrappers so  
> they never meet such a problem?

It would be normal to use an HTTP client library. You're in for a  
world of hurt using raw sockets.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Success is not the key to happiness. Happiness is the key to success.
  -- Albert Schweitzer


Re: I need to know how to parse attachment data

Posted by Darrell Huang <ji...@memego.com>.
I directly used socket to send the requests.
The messages I sent were like

GET /test_db/test_doc/c.jpg HTTP/1.1

Do CouchDB users usually send requests with some message wrappers so they 
never meet such a problem?

>
> can you post a Curl script or otherwise so we can attempt to repeat
> your experience? It could be in your HTTP implementation (or it could
> be us). The first step to find out is a repeatable test.
>
> Thanks,
> Chris
>
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couch.io
> 


Re: I need to know how to parse attachment data

Posted by Chris Anderson <jc...@apache.org>.
2009/5/24 Darrell Huang <ji...@memego.com>:
> Hi, everyone, I got some trouble in reading the attachment data from
> documents with the HTTP API. When an attachment is downloaded with the API,
> some extra data is found before and after the attachment data, and the data
> can be different at different times.
> For example, when I downloaded an attachment with the jpg format several
> weeks ago, there is a character sequence "dbc\r\n" preceding the actual file
> data, but when I did that today, it was "a24\r\n".
> I don't know what this extra data means. Can someone tell me how to parse
> the packet content and extract the actual attachment data from it?
>

can you post a Curl script or otherwise so we can attempt to repeat
your experience? It could be in your HTTP implementation (or it could
be us). The first step to find out is a repeatable test.

Thanks,
Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: I need to know how to parse attachment data

Posted by Sho Fukamachi <sh...@gmail.com>.
On 25/05/2009, at 8:10 PM, Darrell Huang wrote:

> Oh, yeah, I've confirmed it. The headers and trailers are only added  
> when I use HTTP/1.1. It's just a difference between protocol versions.
> So my problem is solved. Thanks for helping me, Jan and everyone!

Interesting. I was noticing the exact same thing recently when I was  
using a socket to read the output of the new continuous changes API. I  
set it to HTTP1.1 too and was getting all sorts of weird stuff,  
mysterious "37" strings appearing everywhere. So that's what caused it.

Thanks for the tip!

Sho


Re: I need to know how to parse attachment data

Posted by Darrell Huang <ji...@memego.com>.
Oh, yeah, I've confirmed it. The headers and trailers are only added when I 
use HTTP/1.1. It's just a difference between protocol versions.
So my problem is solved. Thanks for helping me, Jan and everyone!

>
>
> Maybe http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1?
>
> Cheers
> Jan
> --
>
>
> 


Re: I need to know how to parse attachment data

Posted by Jan Lehnardt <ja...@googlemail.com>.
On 25 May 2009, at 06:07, Darrell Huang wrote:

> Hi, everyone, I got some trouble in reading the attachment data from  
> documents with the HTTP API. When an attachment is downloaded with  
> the API, some extra data is found before and after the attachment  
> data, and the data can be different at different times.
> For example, when I downloaded an attachment with the jpg format  
> several weeks ago, there is a character sequence "dbc\r\n" preceding  
> the actual file data, but when I did that today, it was "a24\r\n".
> I don't know what this extra data means. Can someone tell me how to  
> parse the packet content and extract the actual attachment data from  
> it?


Maybe http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1?

Cheers
Jan
--