You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2019/12/18 19:37:40 UTC

[HTTP] Possible revision in remote client-side code.

I'm looking at the remote access APIs and HTTP usage.

Unless there are good reasons why not, I think using the JDK Java11 HTTP 
code, java.net.http is good - less dependencies, lots of info on the 
web. It has both sync and async support, and also HTTP/2. It is 
complicated and there is value in having packaged ways for common use 
cases that have the RDF handling baked in (base URI anyone?!)

Some thoughts:

RDFConnection:
* The name is a bit long!
* New RDFConn (other name?) same operations as RDFConnection but at the 
Graph/Node level.
* RDFConnection is an adapter to Resource/Model.

SPARQL Query:
* Convert HttpQuery to use java.net.http.
* Keep the "default global setup" style, and share this with other 
network-related code.
* Builder pattern for the per object settings.
* This may use HttpOp or directly use the java.net.http code.
   Worth doing it the best way for the long term.

RDF-centric:
* Library of functions and RDF-centric BodyHandler/BodyPublishers,
   Deal with compression on input stream, response to RDF,
* Could be useful for sync and async.

HttpRDF
* RDF operations eg.
     Graph x = HttpRDF.getGraph(url)
     Graph x = HttpRDF.getGraph(httpClient, url)
* GSP naming is in RDFConnectionRemote.

My play area is:
https://github.com/afs/jena-http/blob/master/src/main/java/org/seaborne/http/HttpRDF.java

HttpOp
* This can be smaller and focused on common uses cases; less coverage, 
easier to use (and still support use for tests).
* common cases are sync usage of HTTP. If you are writing a spider with 
async requests, you'll want control of the HttpClient.
* For each operation have httpGet(args) and httpGet(HttpClient, args) 
versions.
* retain the idea of one default "system wide" HttpClient so common uses 
cases "Just work". Share with QueryHTTP. Put this in one place "HttpEnv".
* No "HttpResponseHandler" variants, no "httpPostForm"
       That's about 50% of the execs.

     Andy

Re: [HTTP] Possible revision in remote client-side code.

Posted by Andy Seaborne <an...@apache.org>.

On 18/12/2019 19:37, Andy Seaborne wrote:
> I'm looking at the remote access APIs and HTTP usage.
> 
> Unless there are good reasons why not, I think using the JDK Java11 HTTP 
> code, java.net.http is good - less dependencies, lots of info on the 
> web. It has both sync and async support, and also HTTP/2. It is 
> complicated and there is value in having packaged ways for common use 
> cases that have the RDF handling baked in (base URI anyone?!)

So far ...

https://github.com/afs/jena-http/

has mostly complete basic HTTP operations, query, update, and GSP.

java.net.http works well, and it uses flow to deliver data. There is a 
zero-copy InputStream to access the data from an HTTP body. Haven't 
looked at higher-level Subscribers (readers) that produce java objects 
directly, only uses the InputStream to pass to the existing parsers. 
I'm not convinced that there are any gains and certainly there are costs 
to parse from fragmented data ByteBuffers (tokens split across 
boundaries are quite nasty to handle and would need new tokenizers - the 
InputStream does that work.

What is missing is suitable authentication. Basic auth is supported in 
response to a failed HTTP operation - which is the user-centric case of 
a dialog popping up.  Slight downside is that if the auth is wrong, it 
does it 3 times before giving up (i.e. to allow 3 user attempts). The 
number "3" is a system wide system property.

Challenge-response authentication can be fiddly to handle for requsts 
were the data is not replayable.  The first request fails ... and can't 
be resent unless the data is replayable.

What does work is directly setting the Basic auth HTTP header and 
probably worth adding some custom helper support (c.f. SERVICE keyword). 
No challenge round of HTTP requests.  That's OK if the connection is HTTPS.

     Andy

> 
> Some thoughts:
> 
> RDFConnection:
> * The name is a bit long!
> * New RDFConn (other name?) same operations as RDFConnection but at the 
> Graph/Node level.
> * RDFConnection is an adapter to Resource/Model.
> 
> SPARQL Query:
> * Convert HttpQuery to use java.net.http.
> * Keep the "default global setup" style, and share this with other 
> network-related code.
> * Builder pattern for the per object settings.
> * This may use HttpOp or directly use the java.net.http code.
>    Worth doing it the best way for the long term.
> 
> RDF-centric:
> * Library of functions and RDF-centric BodyHandler/BodyPublishers,
>    Deal with compression on input stream, response to RDF,
> * Could be useful for sync and async.
> 
> HttpRDF
> * RDF operations eg.
>      Graph x = HttpRDF.getGraph(url)
>      Graph x = HttpRDF.getGraph(httpClient, url)
> * GSP naming is in RDFConnectionRemote.
> 
> My play area is:
> https://github.com/afs/jena-http/blob/master/src/main/java/org/seaborne/http/HttpRDF.java 
> 
> 
> HttpOp
> * This can be smaller and focused on common uses cases; less coverage, 
> easier to use (and still support use for tests).
> * common cases are sync usage of HTTP. If you are writing a spider with 
> async requests, you'll want control of the HttpClient.
> * For each operation have httpGet(args) and httpGet(HttpClient, args) 
> versions.
> * retain the idea of one default "system wide" HttpClient so common uses 
> cases "Just work". Share with QueryHTTP. Put this in one place "HttpEnv".
> * No "HttpResponseHandler" variants, no "httpPostForm"
>        That's about 50% of the execs.
> 
>      Andy