You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Brent Putman <pu...@georgetown.edu> on 2016/03/11 05:10:55 UTC

Parsing of Link header elements containing query parameters

Hi,
I'm working with a REST API which returns a Link entity header to
indicate "rel" links (previous, next, etc) for pagination over more
results than are returned in a single call.  In their docs they
specifically reference this very outdated (and non-standard) spec [1],
but it seems to be quite similar to the more current RFC 5988 [2].

The individual URI values in the Link header value contain query
parameters.  Here is the HC library wire trace of the entire header:

2016-03-10 22:36:31.354 [DEBUG] : org.apache.http.wire: http-outgoing-0
<< "Link:
<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
rel="current",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=2&per_page=10>;
rel="next",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
rel="first",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=559&per_page=10>;
rel="last"[\r][\n]"


When I attempt to extract this header from the HttpResponse and display
the individual element values using code similar to:

Header linkHeader = httpResponse.getFirstHeader("Link");
for (HeaderElement element : linkHeader.getElements()) {
    System.out.println("Saw HeaderElement: " + element.toString());
    System.out.println("HeaderElement name: " + element.getName());
    System.out.println("HeaderElement value: " + element.getValue());
}


I'm seeing output for example:

Saw HeaderElement:
<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
rel=current
HeaderElement name:
<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page
HeaderElement value: 1&per_page=10>


So, it's splitting on the first '=' character to determine the element
name vs value, which looks odd.  And there doesn't seem to be a way in
the API to get the value of the HeaderElement minus the parameters.

Is this:
1) A bug in HttpClient's HeaderElement parsing?
2) A mistake on the part of the server sending these particular URL
values (i.e. perhaps should be encoded in some way)?
3) Neither: Perhaps given knowledge of the specific header syntax and
semantics, the name/value API is not appropriate for it, and I need to
handle these values manually by for example:
     A) Stitching the URI back together manually as the name + "=" + value
     B) Splitting the HeaderElement#toString() on the semi-colon

#3 makes me nervous at the moment since I don't fully understand the
issues at hand.

I'm trying to read through relevant HTTP specs to better understand the
nuance of the header value syntax.  But I know there are people on the
list who are knowledgeable on the specs and may have a quick answer, so
wanted to pose the question in the meantime.

Thanks,
Brent

[1] http://www.w3.org/Protocols/9707-link-header.html
[2] https://tools.ietf.org/html/rfc5988


Re: Parsing of Link header elements containing query parameters

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2016-03-10 at 23:10 -0500, Brent Putman wrote:
> Hi,
> I'm working with a REST API which returns a Link entity header to
> indicate "rel" links (previous, next, etc) for pagination over more
> results than are returned in a single call.  In their docs they
> specifically reference this very outdated (and non-standard) spec [1],
> but it seems to be quite similar to the more current RFC 5988 [2].
> 
> The individual URI values in the Link header value contain query
> parameters.  Here is the HC library wire trace of the entire header:
> 
> 2016-03-10 22:36:31.354 [DEBUG] : org.apache.http.wire: http-outgoing-0
> << "Link:
> <https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
> rel="current",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=2&per_page=10>;
> rel="next",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
> rel="first",<https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=559&per_page=10>;
> rel="last"[\r][\n]"
> 

The wisdom of passing around multiple URIs in HTTP headers seems
questionable to me. 

Besides the URI values are not enclosed in quote marks or properly
escaped, so no wonder the standard header element tokenizer fails to
parse them.

Oleg

> 
> When I attempt to extract this header from the HttpResponse and display
> the individual element values using code similar to:
> 
> Header linkHeader = httpResponse.getFirstHeader("Link");
> for (HeaderElement element : linkHeader.getElements()) {
>     System.out.println("Saw HeaderElement: " + element.toString());
>     System.out.println("HeaderElement name: " + element.getName());
>     System.out.println("HeaderElement value: " + element.getValue());
> }
> 
> 
> I'm seeing output for example:
> 
> Saw HeaderElement:
> <https://georgetown.test.instructure.com/api/v1/accounts/self/users?page=1&per_page=10>;
> rel=current
> HeaderElement name:
> <https://georgetown.test.instructure.com/api/v1/accounts/self/users?page
> HeaderElement value: 1&per_page=10>
> 
> 
> So, it's splitting on the first '=' character to determine the element
> name vs value, which looks odd.  And there doesn't seem to be a way in
> the API to get the value of the HeaderElement minus the parameters.
> 
> Is this:
> 1) A bug in HttpClient's HeaderElement parsing?
> 2) A mistake on the part of the server sending these particular URL
> values (i.e. perhaps should be encoded in some way)?
> 3) Neither: Perhaps given knowledge of the specific header syntax and
> semantics, the name/value API is not appropriate for it, and I need to
> handle these values manually by for example:
>      A) Stitching the URI back together manually as the name + "=" + value
>      B) Splitting the HeaderElement#toString() on the semi-colon
> 
> #3 makes me nervous at the moment since I don't fully understand the
> issues at hand.
> 
> I'm trying to read through relevant HTTP specs to better understand the
> nuance of the header value syntax.  But I know there are people on the
> list who are knowledgeable on the specs and may have a quick answer, so
> wanted to pose the question in the meantime.
> 
> Thanks,
> Brent
> 
> [1] http://www.w3.org/Protocols/9707-link-header.html
> [2] https://tools.ietf.org/html/rfc5988
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org