You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Sami Ben Romdhane (JIRA)" <ji...@apache.org> on 2009/06/30 04:30:47 UTC

[jira] Created: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

some sites return 404 status code even though they are accessible by all browsers
---------------------------------------------------------------------------------

                 Key: HTTPCLIENT-857
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient
    Affects Versions: 3.1 Final
         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
            Reporter: Sami Ben Romdhane


very simple use case


	HttpClientParams params=new HttpClientParams();
        params.setSoTimeout(10000);
      
        HttpClient client=new HttpClient(params);
        GetMethod get=new GetMethod();
       get.setPath(url);
  int code= client.executeMethod(get);

here are just a few of the sites that would return a code of 404
http://www.blogattitudes.com
http://www.stocktwits.net/
http://twatweet.com/


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Ortwin Glück (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725551#action_12725551 ] 

Ortwin Glück commented on HTTPCLIENT-857:
-----------------------------------------

Try setting the user-agent string to that of a popular browser.

> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Sami Ben Romdhane (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725647#action_12725647 ] 

Sami Ben Romdhane commented on HTTPCLIENT-857:
----------------------------------------------

Oleg,
Sorry If I got carried away and vented a little bit out of frustration. I apologize. Our point for using HtttpClient and not use UrlConnection was not to worry with all the details so we were surprised when we had these problems (we also had a lot of redirection porblems where we are getting invalidRedirectionException). So it seems that we have to deal with the reality and spend more unplanned time to get our code working correctly. Again I apologize.
Thanks
Sami

Sebb, I just used the code snippet in the bug desc to test and it does not work, and yes it worked with a lot of different sites

Thanks guys for your time, I will spend more time on this today

> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725638#action_12725638 ] 

Oleg Kalnichevski commented on HTTPCLIENT-857:
----------------------------------------------

Go ahead and switch to another HTTP client implementation for all I care. Though, this will not free you from having to deal with badly written web sites optimized for a specific set of browsers that behave differently based on the value of the User-Agent header.

Open-source is not for off-loading your job responsibilities to other people.

Oleg

> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Sebb (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725639#action_12725639 ] 

Sebb commented on HTTPCLIENT-857:
---------------------------------

Also, I've just tried the URLs in JMeter with the HttpClient 3.1 sampler, and the URLs work fine.

Here is a sample request:

GET http://www.blogattitudes.com

[no cookies]

Request Headers:
Connection: close
User-Agent: Jakarta Commons-HttpClient/3.1
Host: www.blogattitudes.com

Even with the default user-agent the site works fine for me.

Are you sure you are requesting the correct URL?
Does your code work with any sites?


> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Resolved: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski resolved HTTPCLIENT-857.
------------------------------------------

    Resolution: Invalid

This issue has nothing to do with HttpClient. Report it to the administrators of those sites.

Oleg

> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Sami Ben Romdhane (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725627#action_12725627 ] 

Sami Ben Romdhane commented on HTTPCLIENT-857:
----------------------------------------------

Oleg, we are using Httpclient , and statistically 5 to 10% of the sites we're acessing return 404. Since the browser is able to access them and even with user agent string of popular browsers, Httpclient is getting 404l, then it becomes  an HttpClient issue and should be an HttpClient issue and not mine to resolve by contacting hundreds of web sites and begging them to work with HttpCLient. This is the wrong mindset for an open source project. So for me the resolution is: Stop using HttpClient!!! Thanks for your time

> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-857) some sites return 404 status code even though they are accessible by all browsers

Posted by "Ortwin Glück (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725634#action_12725634 ] 

Ortwin Glück commented on HTTPCLIENT-857:
-----------------------------------------

Sami, it's not an issue of the client. It issues correct HTTP requests. If the website chooses to respond with 404, its their decision. I suspect the website checks for some additional headers that are not present by default (and not required by any spec). If you want this fixed, I suggest you invest some time and investigate what kind of request artifacts these sites expect. The "fix" would then be to include these headers in your requests.

> some sites return 404 status code even though they are accessible by all browsers
> ---------------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-857
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-857
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient
>    Affects Versions: 3.1 Final
>         Environment: Windows Vista, jdk 1.6 no proxy, no firewall HttpClient 3.1 Final, did not try  with 4.0
>            Reporter: Sami Ben Romdhane
>
> very simple use case
> 	HttpClientParams params=new HttpClientParams();
>         params.setSoTimeout(10000);
>       
>         HttpClient client=new HttpClient(params);
>         GetMethod get=new GetMethod();
>        get.setPath(url);
>   int code= client.executeMethod(get);
> here are just a few of the sites that would return a code of 404
> http://www.blogattitudes.com
> http://www.stocktwits.net/
> http://twatweet.com/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org