You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Mugoma Joseph Okomba <mu...@yengas.com> on 2012/05/11 19:41:59 UTC
HC 4: Excluding images ang other types of content
Hello,
I am using HC 4 to download web page. Since I am only interested in the
text of the web page I would like to exclude images and other content such
as javascript, css, etc
Is there a way to do this in HttClient?
Thanks.
Mugoma Joseph.
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org
Re: HC 4: Excluding images ang other types of content
Posted by William Speirs <ws...@apache.org>.
By default if you point HC4 at a web page it will only download the
HTML. You'd have to parse that HTML and extract all the links to get
the images, JavaScript, etc.
Give it a try...
Bill-
On Fri, May 11, 2012 at 1:41 PM, Mugoma Joseph Okomba <mu...@yengas.com> wrote:
> Hello,
>
> I am using HC 4 to download web page. Since I am only interested in the
> text of the web page I would like to exclude images and other content such
> as javascript, css, etc
>
> Is there a way to do this in HttClient?
>
> Thanks.
>
> Mugoma Joseph.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org