You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Ryan Stewart (JIRA)" <ji...@apache.org> on 2010/03/29 19:53:27 UTC

[jira] Created: (HTTPCLIENT-928) Can't get list of redirect locations

Can't get list of redirect locations
------------------------------------

                 Key: HTTPCLIENT-928
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
             Project: HttpComponents HttpClient
          Issue Type: Improvement
          Components: HttpClient
    Affects Versions: 4.0.1
            Reporter: Ryan Stewart


HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.

Here's an example of why I need it:
1) Use HttpClient to retrieve http://foo.com
2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
3) http://foo.com/bar refers to a relative resource like "baz.html".

That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851032#action_12851032 ] 

Oleg Kalnichevski commented on HTTPCLIENT-928:
----------------------------------------------

Ryan,

Have you considered using a custom RedirectHanlder? You could just extend from the DefaultRedirectHandler class and add whatever extra processing your particular application requires.

Oleg

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Ryan Stewart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852186#action_12852186 ] 

Ryan Stewart commented on HTTPCLIENT-928:
-----------------------------------------

Yes, I did consider that. The DefaultRedirectHandler does everything almost exactly right, and it's not an easily extensible class. At a glance, the method it occurs in--getLocationURI--looks to have a complexity of about 20. It would be painful and unnecessary to duplicate and maintain all that code just for this one change. 

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Resolved: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski resolved HTTPCLIENT-928.
------------------------------------------

    Resolution: Invalid

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Ryan Stewart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852438#action_12852438 ] 

Ryan Stewart commented on HTTPCLIENT-928:
-----------------------------------------

Sorry, for some reason that approach to extending the DefaultRedirectHandler didn't occur to me. That should work, but I actually took the second approach you suggested. I extended RedirectLocations and overrode all of its methods to work off of a List of URIs instead of a Set. Then I injected my custom instance into the HttpContext before making the request. When the DefaultRedirectHandler handles a redirect, it looks for an existing RedirectLocations object first, so this worked. It isn't quite an ideal solution, though, since disabling HttpClient's circular redirect detection causes the RedirectLocations object to not be used at all. Since I want to allow circular redirects but also track redirect URIs, I just overrode the RedirectLocations.contains() method to always return false. It works like this, but it's unintuitive.

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852288#action_12852288 ] 

Oleg Kalnichevski commented on HTTPCLIENT-928:
----------------------------------------------

There is absolutely no need to duplicate the functionality of DefaultRedirectHandler. What is wrong with just extending the class?

---
DefaultHttpClient httpclient = new DefaultHttpClient();
RedirectHandler redirectHandler = new DefaultRedirectHandler() {

    public URI getLocationURI(
            final HttpResponse response, final HttpContext context) throws ProtocolException {
        URI uri = super.getLocationURI(response, context);
        System.out.println("----------------------------------------");
        System.out.println("Redirected to: " + uri);
        System.out.println("----------------------------------------");
        return uri;
    }
    
};

httpclient.setRedirectHandler(redirectHandler);

HttpHost target = new HttpHost("www.google.com", 80, "http");
HttpGet httpget = new HttpGet("/");

HttpResponse response = httpclient.execute(target, httpget);

System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
System.out.println("----------------------------------------");
HttpEntity entity = response.getEntity();
if (entity != null) {
    entity.consumeContent();
}
---

Oleg

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852908#action_12852908 ] 

Oleg Kalnichevski commented on HTTPCLIENT-928:
----------------------------------------------

This method is not used anywhere in the HttpClient code. It has been introduced primarily for the sake of API completeness, just in case the class is re-used in another context. I think it is more logical to expect that URI instances get removed from the collection if the remove method is called. Most probably 'log' is just a misleading name.

Oleg

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852775#action_12852775 ] 

Oleg Kalnichevski commented on HTTPCLIENT-928:
----------------------------------------------

Ryan,

I changed RedirectLocations to maintain an additional list of all redirect. Please take a look:

http://svn.apache.org/viewvc?rev=930221&view=rev

Oleg

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Ryan Stewart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852803#action_12852803 ] 

Ryan Stewart commented on HTTPCLIENT-928:
-----------------------------------------

Great! That will be a big help. I don't really know how the remove() method is used within HttpClient (is it used at all?), but is it a good idea to remove entries from the log?

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


[jira] Commented: (HTTPCLIENT-928) Can't get list of redirect locations

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HTTPCLIENT-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852291#action_12852291 ] 

Oleg Kalnichevski commented on HTTPCLIENT-928:
----------------------------------------------

What can be done pretty easily is changing the RedirectLocations class to use LinkedHashSet instead of HashSet, which would preserve the natural sequence of redirects. This, however, would not help with cyclic redirects.  Another possibility would be to maintain a list of redirect locations in the RedirectLocations in addition to the set.

Would that solve the problem for you?

Oleg

> Can't get list of redirect locations
> ------------------------------------
>
>                 Key: HTTPCLIENT-928
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-928
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpClient
>    Affects Versions: 4.0.1
>            Reporter: Ryan Stewart
>
> HttpClient does a great job of following redirects, but afterward there doesn't seem to be any way to see the URLs that it followed in the redirect chain. They are stored internally by the DefaultRedirectHandler in the HttpContext in an attribute named "http.protocol.redirect-locations", but the RedirectLocations object that contains them stores them in a Set, so there's no way of knowing in what order the URLs were visited.
> Here's an example of why I need it:
> 1) Use HttpClient to retrieve http://foo.com
> 2) http://foo.com returns a 301 redirect to http://foo.com/bar, so HttpClient follows the redirect and returns the page to me
> 3) http://foo.com/bar refers to a relative resource like "baz.html".
> That relative resource should resolve to "http://foo.com/bar/baz.html". I only know that, though, if I can look at the redirect URL that HttpClient got in step 2. Currently, I don't seem to be able to do that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org