You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Ortwin Glück (JIRA)" <ji...@apache.org> on 2007/06/04 17:14:36 UTC

[jira] Created: (HTTPCLIENT-655) User-Agent string violates RFC

User-Agent string violates RFC
------------------------------

                 Key: HTTPCLIENT-655
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-655
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient
    Affects Versions: 3.1 RC1
            Reporter: Ortwin Glück
            Priority: Minor


Our User-Agent says "Jakarta Commons-HttpClient/3.1-rc1". But space is a reserved character to separate individual *products* and comments according to RFC 2616, section 14.43. Jakarta is not a product. At the same time we may want to drop the Jakarta name altogether.

We should change this to something more standard like: 

"Apache-HttpClient/3.1-rc1 ("+ System.getProperty("os.name") +";"+ System.getProperty("os.arch") +") "+
"Java/"+ System.getProperty("java.vm.version") +" ("+ System.getProperty("java.vm.vendor") +")"

which renders:

"Apache-HttpClient/3.1-rc1 (Windows XP 5.1;x86) Java/1.5.0_08 (Sun Microsystems Inc.)"

Sun's internal Http client uses something like "Java/1.5.0_08".

I am completely ignoring the fact that real-world user agents use almost arbitrary strings.
Some fine examples of misbehaviour from my private logs:

"Jakmpqes dihurxf wfyiupsc" -- apparently somebody has to hide something...
"Missigua Locator 1.9"
"Poodle predictor 1.0"
"shelob v1.0"
"ISC Systems iRc Search 2.1"
"ping.blogug.ch aggregator 1.0"
"http://www.uni-koblenz.de/~flocke/robot-info.txt"  -- ...sigh

I am very tempted to write a User-Agent string validator that prevents misuse of this field in HttpClient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Resolved: (HTTPCLIENT-655) User-Agent string violates RFC

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HTTPCLIENT-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski resolved HTTPCLIENT-655.
------------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 4.0 Alpha 2)
                   4.0 Alpha 1

Fixed in SVN trunk. The User-Agent string now conforms to the standard defined in RFC 2616 and looks like "Apache-HttpClient/" + VersionInfo.getReleaseVersion() + " (java 1.4)"

Oleg

> User-Agent string violates RFC
> ------------------------------
>
>                 Key: HTTPCLIENT-655
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-655
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 RC1
>            Reporter: Ortwin Glück
>            Priority: Minor
>             Fix For: 4.0 Alpha 1
>
>
> Our User-Agent says "Jakarta Commons-HttpClient/3.1-rc1". But space is a reserved character to separate individual *products* and comments according to RFC 2616, section 14.43. Jakarta is not a product. At the same time we may want to drop the Jakarta name altogether.
> We should change this to something more standard like: 
> "Apache-HttpClient/3.1-rc1 ("+ System.getProperty("os.name") +";"+ System.getProperty("os.arch") +") "+
> "Java/"+ System.getProperty("java.vm.version") +" ("+ System.getProperty("java.vm.vendor") +")"
> which renders:
> "Apache-HttpClient/3.1-rc1 (Windows XP 5.1;x86) Java/1.5.0_08 (Sun Microsystems Inc.)"
> Sun's internal Http client uses something like "Java/1.5.0_08".
> I am completely ignoring the fact that real-world user agents use almost arbitrary strings.
> Some fine examples of misbehaviour from my private logs:
> "Jakmpqes dihurxf wfyiupsc" -- apparently somebody has to hide something...
> "Missigua Locator 1.9"
> "Poodle predictor 1.0"
> "shelob v1.0"
> "ISC Systems iRc Search 2.1"
> "ping.blogug.ch aggregator 1.0"
> "http://www.uni-koblenz.de/~flocke/robot-info.txt"  -- ...sigh
> I am very tempted to write a User-Agent string validator that prevents misuse of this field in HttpClient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Updated: (HTTPCLIENT-655) User-Agent string violates RFC

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HTTPCLIENT-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski updated HTTPCLIENT-655:
-----------------------------------------

    Fix Version/s: 4.0 Alpha 2

> User-Agent string violates RFC
> ------------------------------
>
>                 Key: HTTPCLIENT-655
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-655
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 RC1
>            Reporter: Ortwin Glück
>            Priority: Minor
>             Fix For: 4.0 Alpha 2
>
>
> Our User-Agent says "Jakarta Commons-HttpClient/3.1-rc1". But space is a reserved character to separate individual *products* and comments according to RFC 2616, section 14.43. Jakarta is not a product. At the same time we may want to drop the Jakarta name altogether.
> We should change this to something more standard like: 
> "Apache-HttpClient/3.1-rc1 ("+ System.getProperty("os.name") +";"+ System.getProperty("os.arch") +") "+
> "Java/"+ System.getProperty("java.vm.version") +" ("+ System.getProperty("java.vm.vendor") +")"
> which renders:
> "Apache-HttpClient/3.1-rc1 (Windows XP 5.1;x86) Java/1.5.0_08 (Sun Microsystems Inc.)"
> Sun's internal Http client uses something like "Java/1.5.0_08".
> I am completely ignoring the fact that real-world user agents use almost arbitrary strings.
> Some fine examples of misbehaviour from my private logs:
> "Jakmpqes dihurxf wfyiupsc" -- apparently somebody has to hide something...
> "Missigua Locator 1.9"
> "Poodle predictor 1.0"
> "shelob v1.0"
> "ISC Systems iRc Search 2.1"
> "ping.blogug.ch aggregator 1.0"
> "http://www.uni-koblenz.de/~flocke/robot-info.txt"  -- ...sigh
> I am very tempted to write a User-Agent string validator that prevents misuse of this field in HttpClient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-655) User-Agent string violates RFC

Posted by "Roland Weber (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501982 ] 

Roland Weber commented on HTTPCLIENT-655:
-----------------------------------------

Hi Odi,

a) I don't think we should make significant changes to the User-Agent header in the 3.1 code base, like dropping Jakarta from it. People may have set up filter rules that rely on the name. That is also the reason why I'm not sure about changing anything but the version indicator at all. Since it's an RFC violation, we might change the space character to a dash. Btw, section 3.8 of RFC 2616 also mentions:
[quote]
  successive versions of the same product SHOULD only differ in the product-version portion of the product value
[/quote]
What is the lesser evil here?

b) Dropping Jakarta for the 4.0 code base is fine. What I don't like are calls to System.getProperty() to collect a user agent string, at least not in the default User-Agent interceptor. We can have a selection of them of course. Like one that says Apache-HttpCore/J-4.0-a5 in core and another one that says Apache-HttpClient/J-4.0-a1 in client. And another one that collects values from system properties.
(I'd like to see the version number being updated by the build process, but I don't have the time nor inclination to learn Maven2...)

c) You suggestion also generates space characters in "(Windows XP 5.1;x86) and "(Sun Microsystems Inc.)" ;-)

d) A request interceptor that checks a header for compliance is a _really_ good idea. I am in favor of enabling such verification interceptors by default. People will never learn to comply with specifications unless exceptions are thrown into their faces. Misbehaviour must be punished, immediately and without mercy (Dubious API Dictator Roland ;-)

cheers,
  Roland


> User-Agent string violates RFC
> ------------------------------
>
>                 Key: HTTPCLIENT-655
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-655
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 RC1
>            Reporter: Ortwin Glück
>            Priority: Minor
>
> Our User-Agent says "Jakarta Commons-HttpClient/3.1-rc1". But space is a reserved character to separate individual *products* and comments according to RFC 2616, section 14.43. Jakarta is not a product. At the same time we may want to drop the Jakarta name altogether.
> We should change this to something more standard like: 
> "Apache-HttpClient/3.1-rc1 ("+ System.getProperty("os.name") +";"+ System.getProperty("os.arch") +") "+
> "Java/"+ System.getProperty("java.vm.version") +" ("+ System.getProperty("java.vm.vendor") +")"
> which renders:
> "Apache-HttpClient/3.1-rc1 (Windows XP 5.1;x86) Java/1.5.0_08 (Sun Microsystems Inc.)"
> Sun's internal Http client uses something like "Java/1.5.0_08".
> I am completely ignoring the fact that real-world user agents use almost arbitrary strings.
> Some fine examples of misbehaviour from my private logs:
> "Jakmpqes dihurxf wfyiupsc" -- apparently somebody has to hide something...
> "Missigua Locator 1.9"
> "Poodle predictor 1.0"
> "shelob v1.0"
> "ISC Systems iRc Search 2.1"
> "ping.blogug.ch aggregator 1.0"
> "http://www.uni-koblenz.de/~flocke/robot-info.txt"  -- ...sigh
> I am very tempted to write a User-Agent string validator that prevents misuse of this field in HttpClient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org


[jira] Commented: (HTTPCLIENT-655) User-Agent string violates RFC

Posted by "Ortwin Glück (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502025 ] 

Ortwin Glück commented on HTTPCLIENT-655:
-----------------------------------------

Hi Roland,

I agree that changing the User-Agent string may break existing filtering rules. So better don't change that now.

System properties: Well, yes, they are a bit nasty (SecurityManager comes to mind). But that's the official way to obtain that sort of information. Look at other User-Agent strings and you will see that most of them carry this information.

Spaces are legal within a comment. That is the text between parantheses.

Okay, I'll happily contribute a User-Agent validator :-)

Odi

> User-Agent string violates RFC
> ------------------------------
>
>                 Key: HTTPCLIENT-655
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-655
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 RC1
>            Reporter: Ortwin Glück
>            Priority: Minor
>
> Our User-Agent says "Jakarta Commons-HttpClient/3.1-rc1". But space is a reserved character to separate individual *products* and comments according to RFC 2616, section 14.43. Jakarta is not a product. At the same time we may want to drop the Jakarta name altogether.
> We should change this to something more standard like: 
> "Apache-HttpClient/3.1-rc1 ("+ System.getProperty("os.name") +";"+ System.getProperty("os.arch") +") "+
> "Java/"+ System.getProperty("java.vm.version") +" ("+ System.getProperty("java.vm.vendor") +")"
> which renders:
> "Apache-HttpClient/3.1-rc1 (Windows XP 5.1;x86) Java/1.5.0_08 (Sun Microsystems Inc.)"
> Sun's internal Http client uses something like "Java/1.5.0_08".
> I am completely ignoring the fact that real-world user agents use almost arbitrary strings.
> Some fine examples of misbehaviour from my private logs:
> "Jakmpqes dihurxf wfyiupsc" -- apparently somebody has to hide something...
> "Missigua Locator 1.9"
> "Poodle predictor 1.0"
> "shelob v1.0"
> "ISC Systems iRc Search 2.1"
> "ping.blogug.ch aggregator 1.0"
> "http://www.uni-koblenz.de/~flocke/robot-info.txt"  -- ...sigh
> I am very tempted to write a User-Agent string validator that prevents misuse of this field in HttpClient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org