You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Gael Laurans <gl...@gmail.com> on 2007/02/17 15:52:24 UTC

getResponseBodyAsStream vs. getResponseBodyAsString

Hi,

I am facing a very odd problem which I don't understand at all. I am trying
to use HttpClient to do some things on a MediaWiki site (that's the software
running Wikipedia but I am also using it for another site). I wrote some
code to login and retrieve a page using getReponseBodyAsStream() and it's
working fine when I am using it on Wikipedia.

However, using the very same code on my own site (or on a test server on my
machine) produces strange results. The stream seems to be empty,
mystream.available() returns 0, but getResponseBodyAsString() does return
the content of the page. In both cases, status code is OK (200) and I cannot
see any difference in the wire log (the page gets transmitted even when I
cannot access it from the stream). Obviously that could be some kind of
issue with the server config but I have the same problem with two very
different servers (one Linux, Apache 1.something, old Mediawiki and one MS
Wndows, all software freshly installed last week) and it's all working just
fine when used manually with a web browser (namely Firefox).

I could not find anything about that problem in the mailling list or the
documentation. Did somebody already encounter that kind of behaviour ? What
could be going on ?

Thanks a lot for any suggestion,
Gaël

Re: getResponseBodyAsStream vs. getResponseBodyAsString

Posted by Roland Weber <ht...@dubioso.net>.
Gael Laurans wrote:
> Everything is working fine now.

All is well that ends well :-)

I should be reading Shakespeare instead of spending
my time in front of a computer screen... ;-)

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: getResponseBodyAsStream vs. getResponseBodyAsString

Posted by Gael Laurans <gl...@gmail.com>.
On 2/17/07, Roland Weber <ht...@dubioso.net> wrote:
>
> For heaven's sake, don't you read JavaDocs?
> http://java.sun.com/j2se/1.4.2/docs/api/java/io/Reader.html#ready()
> ready() tells you whether there is data available _without_blocking_!
> It does NOT tell you whether you're at the end of the stream.
>
> Specifications and JavaDocs are the reference for API behavior,
> not trial-and-error results with one or two sites and a debugger.


Yeah, I know that and obviously I am using them all the time but I got
confused by the fact it was working on some servers and completely
overlooked that. Sorry for that.

Well, than that page happens to be very small (<4k) and the
> Wikipedia server does not flush the output stream after the
> HTTP header and before the HTTP body. If the body happens to
> be transferred in the same IP packet as the end of the header,
> it is buffered locally on the client and will be detected by
> available() and ready().
> If the body is transferred in a separate IP packet, or does
> not fit into a single packet, available() will NOT return
> how much data there is in the page, but only how much of
> that happens to be buffered on the client. Which might be 0.
>
> If you're reading from either an InputStream or a Reader,
> you read _blocking_ until the read(...) method you're using
> returns a negative value. THEN you're at the end of the stream.


Thanks a lot for your explanations and your patience. Everything is working
fine now.

Regards,
Gaël

Re: getResponseBodyAsStream vs. getResponseBodyAsString

Posted by Roland Weber <ht...@dubioso.net>.
Hello Gael,

> Are you completely sure of that ?

I've sent you the link to the JavaDocs. Are they in any way unclear?
It's official Sun JavaDoc, not something I've made up on my home page.

> I just tried this available() thing while
> debugging after figuring out that no text seemed to be read. What I am
> actually doing is feeding the input stream to an
> InputFileReader/BufferedReader and parsing it in a while(reader.ready())
> loop. Is there something wrong with that then ?

For heaven's sake, don't you read JavaDocs?
http://java.sun.com/j2se/1.4.2/docs/api/java/io/Reader.html#ready()
ready() tells you whether there is data available _without_blocking_!
It does NOT tell you whether you're at the end of the stream.

Specifications and JavaDocs are the reference for API behavior,
not trial-and-error results with one or two sites and a debugger.

> Note that available() does
> return the size of the page when I am using Wikipedia's server.

Well, than that page happens to be very small (<4k) and the
Wikipedia server does not flush the output stream after the
HTTP header and before the HTTP body. If the body happens to
be transferred in the same IP packet as the end of the header,
it is buffered locally on the client and will be detected by
available() and ready().
If the body is transferred in a separate IP packet, or does
not fit into a single packet, available() will NOT return
how much data there is in the page, but only how much of
that happens to be buffered on the client. Which might be 0.

If you're reading from either an InputStream or a Reader,
you read _blocking_ until the read(...) method you're using
returns a negative value. THEN you're at the end of the stream.

cheers,
  Roland

> 
> Thank you for your help,
> Gaël
> 
> On 2/17/07, Roland Weber <ht...@dubioso.net> wrote:
>>
>> Hello Gael,
>>
>> > The stream seems to be empty,
>> > mystream.available() returns 0, but getResponseBodyAsString() does
>> return
>> > the content of the page.
>>
>> InputStream.available does NOT tell you how much bytes are in the stream.
>> It tells you how much you can read _without_blocking_. If you call the
>> read() method while available returns 0, it will simply block until some
>> bytes become available or EOF is detected.
>>
>>
>> http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html#available()
>>
>>
>> cheers,
>>   Roland
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org


Re: getResponseBodyAsStream vs. getResponseBodyAsString

Posted by Gael Laurans <gl...@gmail.com>.
This should obviously read InputStreamReader, here is the code snippet

        BufferedReader rdrEditForm =
            new BufferedReader(new InputStreamReader(
                inEditForm,
                wiki.getCharset()));

On 2/17/07, Gael Laurans <gl...@gmail.com> wrote:
>
> Hello,
>
> Are you completely sure of that ? I just tried this available() thing
> while debugging after figuring out that no text seemed to be read. What I am
> actually doing is feeding the input stream to an
> InputFileReader/BufferedReader and parsing it in a while( reader.ready())
> loop. Is there something wrong with that then ? Note that available() does
> return the size of the page when I am using Wikipedia's server.
>
> Thank you for your help,
> Gaël
>
> On 2/17/07, Roland Weber <ht...@dubioso.net> wrote:
> >
> > Hello Gael,
> >
> > > The stream seems to be empty,
> > > mystream.available() returns 0, but getResponseBodyAsString() does
> > return
> > > the content of the page.
> >
> > InputStream.available does NOT tell you how much bytes are in the
> > stream.
> > It tells you how much you can read _without_blocking_. If you call the
> > read() method while available returns 0, it will simply block until some
> > bytes become available or EOF is detected.
> >
> >
> > http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html#available()<http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html#available%28%29>
> >
> > cheers,
> >   Roland
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
> >
> >
>

Re: getResponseBodyAsStream vs. getResponseBodyAsString

Posted by Gael Laurans <gl...@gmail.com>.
Hello,

Are you completely sure of that ? I just tried this available() thing while
debugging after figuring out that no text seemed to be read. What I am
actually doing is feeding the input stream to an
InputFileReader/BufferedReader and parsing it in a while(reader.ready())
loop. Is there something wrong with that then ? Note that available() does
return the size of the page when I am using Wikipedia's server.

Thank you for your help,
Gaël

On 2/17/07, Roland Weber <ht...@dubioso.net> wrote:
>
> Hello Gael,
>
> > The stream seems to be empty,
> > mystream.available() returns 0, but getResponseBodyAsString() does
> return
> > the content of the page.
>
> InputStream.available does NOT tell you how much bytes are in the stream.
> It tells you how much you can read _without_blocking_. If you call the
> read() method while available returns 0, it will simply block until some
> bytes become available or EOF is detected.
>
>
> http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html#available()
>
> cheers,
>   Roland
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: httpclient-user-help@jakarta.apache.org
>
>

Re: getResponseBodyAsStream vs. getResponseBodyAsString

Posted by Roland Weber <ht...@dubioso.net>.
Hello Gael,

> The stream seems to be empty,
> mystream.available() returns 0, but getResponseBodyAsString() does return
> the content of the page.

InputStream.available does NOT tell you how much bytes are in the stream.
It tells you how much you can read _without_blocking_. If you call the
read() method while available returns 0, it will simply block until some
bytes become available or EOF is detected.

http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html#available()

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpclient-user-help@jakarta.apache.org