You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Tony Collen <tc...@collen.org> on 2002/03/08 06:18:43 UTC

Cocoon + UserAgent Strings

Hi all.. I posted this message to cocoon-users but it might be more 
likely to be answered here...

I'm playing around with Cocoon, and I'm using <map:generate type="html" 
src="http://foo.bar" /> to generate some XHTML from a website.  I'm 
getting back a 403 Denied error from the server, and I've deduced that 
Cocoon is being denied access to the URL based on the User-Agent string 
that it sends.  I did a little snooping and I came up with the following 
info out of my wwwlogs:

"GET / HTTP/1.1" 200 18919 "-" "Java1.3.1_01"

Is there any way, short of digging through code, to change the 
User-Agent string that Cocoon sends?  If not, is there someone who knows 
the Cocoon source well enough to make the User-Agent string something 
that could be configured through, say, cocoon.xconf?

Tony Collen
-=============
Web Programmer
NHGIS: National Historical Geographic Information System
http://www.nhgis.org
-=============


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Cocoon + UserAgent Strings

Posted by Tony Collen <tc...@hist.umn.edu>.
Matt Sergeant wrote:

>Please be aware that google has been stamping down hard on people
>re-selling their service lately. Case in point - they forced the Perl
>module WWW::Search::Google to be taken off CPAN. All I'm saying is please
>be aware of their terms of service agreement (somewhere off the about page
>IIRC).
>

Yeah, I was poking around their site and I found their terms of service 
page.  I was mostly going to do it as a proof-of-concept.. But Google 
rules so I'm not about to go piss them off... thanks for all the help 
everyone =]

Tony


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Cocoon + UserAgent Strings

Posted by Matt Sergeant <ma...@sergeant.org>.
On Fri, 8 Mar 2002, Tony Collen wrote:

> Donald Ball wrote:
>
> >3. get the site maintainer to get their act together and stop forbidding
> >access to urls based on user agent.
> >
> Hmm.. The site in question happens to be Google... I doubt they'd be
> willing to change some of their code.  So #1 or #2 it is.

Please be aware that google has been stamping down hard on people
re-selling their service lately. Case in point - they forced the Perl
module WWW::Search::Google to be taken off CPAN. All I'm saying is please
be aware of their terms of service agreement (somewhere off the about page
IIRC).

-- 
<!-- Matt -->
<:->Get a smart net</:->


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Cocoon + UserAgent Strings

Posted by Tony Collen <tc...@hist.umn.edu>.
Donald Ball wrote:

>3. get the site maintainer to get their act together and stop forbidding
>access to urls based on user agent.
>
Hmm.. The site in question happens to be Google... I doubt they'd be 
willing to change some of their code.  So #1 or #2 it is.

Thanks!

Tony



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Cocoon + UserAgent Strings

Posted by Donald Ball <ba...@webslingerZ.com>.
On Thu, 7 Mar 2002, Tony Collen wrote:

> I'm playing around with Cocoon, and I'm using <map:generate type="html"
> src="http://foo.bar" /> to generate some XHTML from a website.  I'm
> getting back a 403 Denied error from the server, and I've deduced that
> Cocoon is being denied access to the URL based on the User-Agent string
> that it sends.  I did a little snooping and I came up with the following
> info out of my wwwlogs:
>
> "GET / HTTP/1.1" 200 18919 "-" "Java1.3.1_01"
>
> Is there any way, short of digging through code, to change the
> User-Agent string that Cocoon sends?  If not, is there someone who knows
> the Cocoon source well enough to make the User-Agent string something
> that could be configured through, say, cocoon.xconf?

that is the user-agent string sent by the java.net.URL (or URLConnection)
object when the content request is issued. afaik, it is not
configurable. potential workarounds include:

1. configure the jvm to use a proxy server and change the user-agent
string there.

2. swap the java.net.URL stuff with another java http client, say the one
from the jakarta commons? that might be something you'd best do at the
parser level, though, since i think it's generally responsible for
actually getting the content for a url.

3. get the site maintainer to get their act together and stop forbidding
access to urls based on user agent.

- donald


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


AW: Cocoon + UserAgent Strings

Posted by Reinhard Potz <re...@gmx.net>.
Tony,

Have a look at
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101412181609134&w=2

I extented the HTMLGenerator and use the httpclient-library of jakarta.
Using this library you can set the UserAgent in the http-header.

I haven't had the time yet to complete it (see
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101412719423893&w=2) and to
send the diff to bugzilla. (planned for the next weeks).

Reinhard





 > -----Ursprungliche Nachricht-----
 > Von: Tony Collen [mailto:tc@collen.org]
 > Gesendet: Freitag, 08. Marz 2002 06:19
 > An: cocoon-dev@xml.apache.org
 > Betreff: Cocoon + UserAgent Strings
 >
 >
 > Hi all.. I posted this message to cocoon-users but it might be more
 > likely to be answered here...
 >
 > I'm playing around with Cocoon, and I'm using <map:generate type="html"
 > src="http://foo.bar" /> to generate some XHTML from a website.  I'm
 > getting back a 403 Denied error from the server, and I've deduced that
 > Cocoon is being denied access to the URL based on the User-Agent string
 > that it sends.  I did a little snooping and I came up with the following
 > info out of my wwwlogs:
 >
 > "GET / HTTP/1.1" 200 18919 "-" "Java1.3.1_01"
 >
 > Is there any way, short of digging through code, to change the
 > User-Agent string that Cocoon sends?  If not, is there someone who knows
 > the Cocoon source well enough to make the User-Agent string something
 > that could be configured through, say, cocoon.xconf?
 >
 > Tony Collen
 > -=============
 > Web Programmer
 > NHGIS: National Historical Geographic Information System
 > http://www.nhgis.org
 > -=============
 >
 >
 > ---------------------------------------------------------------------
 > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
 > For additional commands, email: cocoon-dev-help@xml.apache.org
 >


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org