You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2021/06/28 17:00:48 UTC

Evolving HttpClient usage

Jena currently uses Apache HttpClient v4 for HTTP.
This supports HTTP 1.1.

Apache HttpClient v5 supports HTTP/2 and there is a migration path from 
v4 to new style v5 but the path is not seamless. It is at least package 
renaming followed by API changes.

https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html
   and
https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html


For most Jena users, there are no application changes needed because 
SPARQL operations are packed up into the Jena APIs. But if an 
application is doing detailed HTTP setup - most importantly,that 
includes authentication - there is going to be a migration impact.

Java11 now has a API java.net.http an all-new way to work with HTTP 
including HTTP/2. (And there are other HTTP clients - I haven't used any 
of those others).


Should we update to java.net.http or Apache HttpClient v5 or other?


Given the JDK has a decent HTTP client, my preference is to switch to 
use java.net.http unless there is a positive reason to use a specific 
external one.

The JDK provided one means dependencies, is always present, and gets 
fixes/improvements (if any) come by updating the JVM used.

----

And also if HTTP support in Jena is being upgraded ... the code could do 
with some work. Some of it is really old and is showing its age.

Areas:
    RDFConnection,
    SPARQL/HTTP QueryExecution and UpdateProcessor,
    Graph Store Protocol
    SERVICE.

== Improvements

+ Builder style for constructing the more complicated
   (e.g. anything HTTP!)
+ Both Model and Graph / Statement and Triple level APIs
   (Model-level being adapters of Graph level engines)

      ResultSet(resources) - RowSet (Nodes)
      RDFConnection - RDFLink
      QueryExecution - QueryExec
      (not an issue with UpdateProcessor)

+ Deprecation of QueryExecution.setTimeout and setIntialBinding
   (use a builder)
+ Switch to rewrite for initial bindings
   This will work for remote usage which currently is unsupported,
+ Explicit GSP engine - include support for quads operations.

. SERVICE rewrite to use the new classes.

- HttpOp : Direct use of java.net.http covers the complex cases so
   this class can be smaller and focused on the common cases.
   (I doubt it's used much directly)

+ Utilities: HttpRDF, AsyncHttpRDF, HttpOp
   AsyncHttpRDF should at least cover async GET so apps can
   gather data from several places in parallel.

== Migration

If we leave the old code for SPARQL execution (QueryEngineHTTP and 
HttpQuery) in-place, with Apache HttpClient4, apply copious deprecations
then, mostly, we have less sudden change. We then remove in a couple of 
releases time.

Deprecate all QueryExecutionFactory.sparqlService, createServiceRequest 
and refer to (new) QueryExecutionHTTPBuilder

Deprecate of QueryExecution.setTimeout and setIntialBindings - they 
should not be where they are.

Update documentation

== Improvements

Code:

   https://github.com/afs/jena-http

which at the moment needs a custom Jena build because of misc cleanup 
and things found while writing jena-http and not PR'ed to Jena.

Using a different HttpClient should not be too difficult as it 
internally encapsulates HttpClient usage. But a switchable HttpClient 
isn't so easy and also not invisible to users because of authentication 
setup is implementation-specific. We can't abstract authentication 
without significant costs in support and maintenance to the project.

     Andy

Re: SPARQL operations [was: Evolving HttpClient usage]

Posted by Andy Seaborne <an...@apache.org>.
Merged!


>> On 23/09/2021 21:15, Andy Seaborne wrote:
>>> It is time to consider putting this in.
>>>
>>>  > Suggestion: so as not to make this a hard-to-reverse change,
>>>  > + Create a new branch in git with this PR
>>>  > + Change the overnight development build to build from this branch.
>>>  >
>>>  > so the main branch is left in the current state.
>>>
>>> that is all very well but it makes it tough for external 
>>> contributions (e.g. hopefully JENA-2169) that go anywhere near the 
>>> changed files. I've been resolving conflicts but it is fiddly and 
>>> error prone.
>>>
>>> Is it Jena5?
>>> There isn't an intention to be that scale and I hope the majority of 
>>> users are not affected; maybe some deprecations appear. Fuseki users 
>>> aren't impacted.
>>>
>>> Downstream systems digging into the internals of Jena will be more 
>>> impacted where very old code has been updated, but whether they would 
>>> test before a release is, as we know, quite unlikely.
>>>
>>>      Andy

Re: SPARQL operations [was: Evolving HttpClient usage]

Posted by Andy Seaborne <an...@apache.org>.
Bruno - thank you for the reviews

     Andy

On 23/09/2021 21:38, Andy Seaborne wrote:
> To add:
> 
> I've checked by compiling RDF Delta against a build of this branch.
> 
> Part of the changes is switching from Apache HttpClient v4 to use 
> java.net.http.
> 
> Apache HttpClient is still a dependency at the moment.
> 
> The biggest change is HttpOp used in test classes. The class has 
> intentionally been renamed as HttpOp1 (think: an extreme @deprecated) 
> and a new, smaller more focused HttpOp exists in a different package.
> 
>      Andy
> 
> On 23/09/2021 21:15, Andy Seaborne wrote:
>> It is time to consider putting this in.
>>
>>  > Suggestion: so as not to make this a hard-to-reverse change,
>>  > + Create a new branch in git with this PR
>>  > + Change the overnight development build to build from this branch.
>>  >
>>  > so the main branch is left in the current state.
>>
>> that is all very well but it makes it tough for external contributions 
>> (e.g. hopefully JENA-2169) that go anywhere near the changed files. 
>> I've been resolving conflicts but it is fiddly and error prone.
>>
>> Is it Jena5?
>> There isn't an intention to be that scale and I hope the majority of 
>> users are not affected; maybe some deprecations appear. Fuseki users 
>> aren't impacted.
>>
>> Downstream systems digging into the internals of Jena will be more 
>> impacted where very old code has been updated, but whether they would 
>> test before a release is, as we know, quite unlikely.
>>
>>      Andy
>>
>> On 29/07/2021 14:36, Andy Seaborne wrote:
>>> This is about ready.
>>>
>>> It's big.
>>>
>>> == tl;dr
>>>
>>> + Put on new branch
>>> + Switch the development Jenkins build the new branch so people can 
>>> test it prior to release (and smooth out any unexpected bumps)
>>> + Applications using HTTP authentication need to change.
>>>
>>> ==
>>>
>>> It will affect application that use of HTTP auth because up until 
>>> now, they have had to configure an AHC HttpClient ("AHC" = "Apache 
>>> HttpComponents") externally to Jena.
>>>
>>> There are builders for all local and HTTP forms of RDFConnection, 
>>> QueryExecution and UpdateExecution. In fact, using builders is 
>>> preferred to the old factories. The factories remain, they use 
>>> builder methods, where they still make sense.
>>>
>>> == Initial bindings => substitution
>>>
>>> "substitution" is rewrite a query/update replacing a variables for 
>>> values. It is a before-execution rewrite.
>>>
>>> QueryExecution still allows initial bindings and timouts to be set 
>>> although that is better done by the companion builder.
>>>
>>> Query and update rewrite by substitution is uniformly provided. While 
>>> initialBinding is still there for local operation (never supported 
>>> for remote operations), substitution is now available for local and 
>>> remote.
>>>
>>> Substitution does not always give the same answers, thought it does 
>>> in most cases, so this is a long term migration.
>>>
>>> == java.net.http.HttpClient
>>>
>>> java.net.http.HttpClient is different to AHC HttpClient.
>>>
>>> java.net.http.HttpClient:
>>> * It is more like a combination of AHC HttpClient and HttpContext.
>>> * There are no pool or cache options - connection reuse is inside the 
>>> JDK code; there is no support for caching (application 
>>> responsibility), which is fine for Jena'a use.
>>> * It supports one form of basic auth usage (a pattern useful with 
>>> microservices).
>>> * has HTTP/2 support.
>>>
>>> AHC HttpClient has not been removed as a dependency. Some old code 
>>> remains (HttpQuery, QueryEngineHTTP) to lessen the immediate changes. 
>>> jena-jdbc-remote is not changed and still uses AHC HttpClient.
>>>
>>> == Authentication
>>>
>>> The PR code adds authentication options
>>>
>>> 1/ The java.net.http.HttpClient with auth model.
>>> 2/ Challenge-based basic authentication.
>>> 3/ Challenge-based digest authentication.
>>> 4/ user@ form of userinfo in URLs (specifically for SERVICE)
>>> 5/ user:password@ form of userinfo in URLs (specifically for SERVICE)
>>>
>>> All these are best done over https but that depends on the server end.
>>> (5) is not a good idea but we have to live with it.
>>>
>>> There is now a thin abstraction to manage username/password and the 
>>> applications no longer have to deal with HttpClient directly.
>>>
>>> == HttpOp, HttpRDF
>>>
>>> HttpOp - a library of packed up usage of HTTP request for ways that 
>>> Jena uses HTTP - is moved to HttpOpAHC.
>>>
>>> There is a new HttpOP, in a new package, together with a companion 
>>> HttpRDF, for java.net.http.HttpClient and the new ways that Jena uses 
>>> HTTP.
>>>
>>> There is a separate GSP client so all the code for GSP is in one place.
>>>
>>> The HttpOp the thing that has impacted RDF Delta, and then mostly in 
>>> tests where test code does direct HTTP actions to validate the server 
>>> behaviour.
>>>
>>> == From here
>>>
>>> It would be a good idea to have this exposed before a Jena release. 
>>> We can at least give people the chance to see what impact, if any, it 
>>> has on them. Local usage is not supposed to be impacted.
>>>
>>> But unusual/unexpected usage patterns may not work with zero change. 
>>> Some of this code is very old.
>>>
>>> Suggestion: so as not to make this a hard-to-reverse change,
>>> + Create a new branch in git with this PR
>>> + Change the overnight development build to build from this branch.
>>>
>>> so the main branch is left in the current state.
>>>
>>>
>>> == Other
>>>
>>> This is also a chance to improve naming of API 
>>> functions/methods/classes through deprecation migration. Suggestions 
>>> welcome.
>>>
>>>
>>> On 09/07/2021 17:37, Andy Seaborne wrote:
>>>> Epic JENA-2125 to track this with tickets for each part.
>>>>
>>>>  >       ResultSet(resources) - RowSet (Nodes)
>>>>  >       RDFConnection - RDFLink
>>>>  >       QueryExecution - QueryExec
>>>>
>>>>      Andy
>>>>
>>>> On 28/06/2021 18:00, Andy Seaborne wrote:
>>>>> Jena currently uses Apache HttpClient v4 for HTTP.
>>>>> This supports HTTP 1.1.
>>>>>
>>>>> Apache HttpClient v5 supports HTTP/2 and there is a migration path 
>>>>> from v4 to new style v5 but the path is not seamless. It is at 
>>>>> least package renaming followed by API changes.
>>>>>
>>>>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html 
>>>>>
>>>>>    and
>>>>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html 
>>>>>
>>>>>
>>>>>
>>>>> For most Jena users, there are no application changes needed 
>>>>> because SPARQL operations are packed up into the Jena APIs. But if 
>>>>> an application is doing detailed HTTP setup - most importantly,that 
>>>>> includes authentication - there is going to be a migration impact.
>>>>>
>>>>> Java11 now has a API java.net.http an all-new way to work with HTTP 
>>>>> including HTTP/2. (And there are other HTTP clients - I haven't 
>>>>> used any of those others).
>>>>>
>>>>>
>>>>> Should we update to java.net.http or Apache HttpClient v5 or other?
>>>>>
>>>>>
>>>>> Given the JDK has a decent HTTP client, my preference is to switch 
>>>>> to use java.net.http unless there is a positive reason to use a 
>>>>> specific external one.
>>>>>
>>>>> The JDK provided one means dependencies, is always present, and 
>>>>> gets fixes/improvements (if any) come by updating the JVM used.
>>>>>
>>>>> ----
>>>>>
>>>>> And also if HTTP support in Jena is being upgraded ... the code 
>>>>> could do with some work. Some of it is really old and is showing 
>>>>> its age.
>>>>>
>>>>> Areas:
>>>>>     RDFConnection,
>>>>>     SPARQL/HTTP QueryExecution and UpdateProcessor,
>>>>>     Graph Store Protocol
>>>>>     SERVICE.
>>>>>
>>>>> == Improvements
>>>>>
>>>>> + Builder style for constructing the more complicated
>>>>>    (e.g. anything HTTP!)
>>>>> + Both Model and Graph / Statement and Triple level APIs
>>>>>    (Model-level being adapters of Graph level engines)
>>>>>
>>>>>       ResultSet(resources) - RowSet (Nodes)
>>>>>       RDFConnection - RDFLink
>>>>>       QueryExecution - QueryExec
>>>>>       (not an issue with UpdateProcessor)
>>>>>
>>>>> + Deprecation of QueryExecution.setTimeout and setIntialBinding
>>>>>    (use a builder)
>>>>> + Switch to rewrite for initial bindings
>>>>>    This will work for remote usage which currently is unsupported,
>>>>> + Explicit GSP engine - include support for quads operations.
>>>>>
>>>>> . SERVICE rewrite to use the new classes.
>>>>>
>>>>> - HttpOp : Direct use of java.net.http covers the complex cases so
>>>>>    this class can be smaller and focused on the common cases.
>>>>>    (I doubt it's used much directly)
>>>>>
>>>>> + Utilities: HttpRDF, AsyncHttpRDF, HttpOp
>>>>>    AsyncHttpRDF should at least cover async GET so apps can
>>>>>    gather data from several places in parallel.
>>>>>
>>>>> == Migration
>>>>>
>>>>> If we leave the old code for SPARQL execution (QueryEngineHTTP and 
>>>>> HttpQuery) in-place, with Apache HttpClient4, apply copious 
>>>>> deprecations
>>>>> then, mostly, we have less sudden change. We then remove in a 
>>>>> couple of releases time.
>>>>>
>>>>> Deprecate all QueryExecutionFactory.sparqlService, 
>>>>> createServiceRequest and refer to (new) QueryExecutionHTTPBuilder
>>>>>
>>>>> Deprecate of QueryExecution.setTimeout and setIntialBindings - they 
>>>>> should not be where they are.
>>>>>
>>>>> Update documentation
>>>>>
>>>>> == Improvements
>>>>>
>>>>> Code:
>>>>>
>>>>>    https://github.com/afs/jena-http
>>>>>
>>>>> which at the moment needs a custom Jena build because of misc 
>>>>> cleanup and things found while writing jena-http and not PR'ed to 
>>>>> Jena.
>>>>>
>>>>> Using a different HttpClient should not be too difficult as it 
>>>>> internally encapsulates HttpClient usage. But a switchable 
>>>>> HttpClient isn't so easy and also not invisible to users because of 
>>>>> authentication setup is implementation-specific. We can't abstract 
>>>>> authentication without significant costs in support and maintenance 
>>>>> to the project.
>>>>>
>>>>>      Andy

Re: SPARQL operations [was: Evolving HttpClient usage]

Posted by Andy Seaborne <an...@apache.org>.
To add:

I've checked by compiling RDF Delta against a build of this branch.

Part of the changes is switching from Apache HttpClient v4 to use 
java.net.http.

Apache HttpClient is still a dependency at the moment.

The biggest change is HttpOp used in test classes. The class has 
intentionally been renamed as HttpOp1 (think: an extreme @deprecated) 
and a new, smaller more focused HttpOp exists in a different package.

     Andy

On 23/09/2021 21:15, Andy Seaborne wrote:
> It is time to consider putting this in.
> 
>  > Suggestion: so as not to make this a hard-to-reverse change,
>  > + Create a new branch in git with this PR
>  > + Change the overnight development build to build from this branch.
>  >
>  > so the main branch is left in the current state.
> 
> that is all very well but it makes it tough for external contributions 
> (e.g. hopefully JENA-2169) that go anywhere near the changed files. I've 
> been resolving conflicts but it is fiddly and error prone.
> 
> Is it Jena5?
> There isn't an intention to be that scale and I hope the majority of 
> users are not affected; maybe some deprecations appear. Fuseki users 
> aren't impacted.
> 
> Downstream systems digging into the internals of Jena will be more 
> impacted where very old code has been updated, but whether they would 
> test before a release is, as we know, quite unlikely.
> 
>      Andy
> 
> On 29/07/2021 14:36, Andy Seaborne wrote:
>> This is about ready.
>>
>> It's big.
>>
>> == tl;dr
>>
>> + Put on new branch
>> + Switch the development Jenkins build the new branch so people can 
>> test it prior to release (and smooth out any unexpected bumps)
>> + Applications using HTTP authentication need to change.
>>
>> ==
>>
>> It will affect application that use of HTTP auth because up until now, 
>> they have had to configure an AHC HttpClient ("AHC" = "Apache 
>> HttpComponents") externally to Jena.
>>
>> There are builders for all local and HTTP forms of RDFConnection, 
>> QueryExecution and UpdateExecution. In fact, using builders is 
>> preferred to the old factories. The factories remain, they use builder 
>> methods, where they still make sense.
>>
>> == Initial bindings => substitution
>>
>> "substitution" is rewrite a query/update replacing a variables for 
>> values. It is a before-execution rewrite.
>>
>> QueryExecution still allows initial bindings and timouts to be set 
>> although that is better done by the companion builder.
>>
>> Query and update rewrite by substitution is uniformly provided. While 
>> initialBinding is still there for local operation (never supported for 
>> remote operations), substitution is now available for local and remote.
>>
>> Substitution does not always give the same answers, thought it does in 
>> most cases, so this is a long term migration.
>>
>> == java.net.http.HttpClient
>>
>> java.net.http.HttpClient is different to AHC HttpClient.
>>
>> java.net.http.HttpClient:
>> * It is more like a combination of AHC HttpClient and HttpContext.
>> * There are no pool or cache options - connection reuse is inside the 
>> JDK code; there is no support for caching (application 
>> responsibility), which is fine for Jena'a use.
>> * It supports one form of basic auth usage (a pattern useful with 
>> microservices).
>> * has HTTP/2 support.
>>
>> AHC HttpClient has not been removed as a dependency. Some old code 
>> remains (HttpQuery, QueryEngineHTTP) to lessen the immediate changes. 
>> jena-jdbc-remote is not changed and still uses AHC HttpClient.
>>
>> == Authentication
>>
>> The PR code adds authentication options
>>
>> 1/ The java.net.http.HttpClient with auth model.
>> 2/ Challenge-based basic authentication.
>> 3/ Challenge-based digest authentication.
>> 4/ user@ form of userinfo in URLs (specifically for SERVICE)
>> 5/ user:password@ form of userinfo in URLs (specifically for SERVICE)
>>
>> All these are best done over https but that depends on the server end.
>> (5) is not a good idea but we have to live with it.
>>
>> There is now a thin abstraction to manage username/password and the 
>> applications no longer have to deal with HttpClient directly.
>>
>> == HttpOp, HttpRDF
>>
>> HttpOp - a library of packed up usage of HTTP request for ways that 
>> Jena uses HTTP - is moved to HttpOpAHC.
>>
>> There is a new HttpOP, in a new package, together with a companion 
>> HttpRDF, for java.net.http.HttpClient and the new ways that Jena uses 
>> HTTP.
>>
>> There is a separate GSP client so all the code for GSP is in one place.
>>
>> The HttpOp the thing that has impacted RDF Delta, and then mostly in 
>> tests where test code does direct HTTP actions to validate the server 
>> behaviour.
>>
>> == From here
>>
>> It would be a good idea to have this exposed before a Jena release. We 
>> can at least give people the chance to see what impact, if any, it has 
>> on them. Local usage is not supposed to be impacted.
>>
>> But unusual/unexpected usage patterns may not work with zero change. 
>> Some of this code is very old.
>>
>> Suggestion: so as not to make this a hard-to-reverse change,
>> + Create a new branch in git with this PR
>> + Change the overnight development build to build from this branch.
>>
>> so the main branch is left in the current state.
>>
>>
>> == Other
>>
>> This is also a chance to improve naming of API 
>> functions/methods/classes through deprecation migration. Suggestions 
>> welcome.
>>
>>
>> On 09/07/2021 17:37, Andy Seaborne wrote:
>>> Epic JENA-2125 to track this with tickets for each part.
>>>
>>>  >       ResultSet(resources) - RowSet (Nodes)
>>>  >       RDFConnection - RDFLink
>>>  >       QueryExecution - QueryExec
>>>
>>>      Andy
>>>
>>> On 28/06/2021 18:00, Andy Seaborne wrote:
>>>> Jena currently uses Apache HttpClient v4 for HTTP.
>>>> This supports HTTP 1.1.
>>>>
>>>> Apache HttpClient v5 supports HTTP/2 and there is a migration path 
>>>> from v4 to new style v5 but the path is not seamless. It is at least 
>>>> package renaming followed by API changes.
>>>>
>>>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html 
>>>>
>>>>    and
>>>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html 
>>>>
>>>>
>>>>
>>>> For most Jena users, there are no application changes needed because 
>>>> SPARQL operations are packed up into the Jena APIs. But if an 
>>>> application is doing detailed HTTP setup - most importantly,that 
>>>> includes authentication - there is going to be a migration impact.
>>>>
>>>> Java11 now has a API java.net.http an all-new way to work with HTTP 
>>>> including HTTP/2. (And there are other HTTP clients - I haven't used 
>>>> any of those others).
>>>>
>>>>
>>>> Should we update to java.net.http or Apache HttpClient v5 or other?
>>>>
>>>>
>>>> Given the JDK has a decent HTTP client, my preference is to switch 
>>>> to use java.net.http unless there is a positive reason to use a 
>>>> specific external one.
>>>>
>>>> The JDK provided one means dependencies, is always present, and gets 
>>>> fixes/improvements (if any) come by updating the JVM used.
>>>>
>>>> ----
>>>>
>>>> And also if HTTP support in Jena is being upgraded ... the code 
>>>> could do with some work. Some of it is really old and is showing its 
>>>> age.
>>>>
>>>> Areas:
>>>>     RDFConnection,
>>>>     SPARQL/HTTP QueryExecution and UpdateProcessor,
>>>>     Graph Store Protocol
>>>>     SERVICE.
>>>>
>>>> == Improvements
>>>>
>>>> + Builder style for constructing the more complicated
>>>>    (e.g. anything HTTP!)
>>>> + Both Model and Graph / Statement and Triple level APIs
>>>>    (Model-level being adapters of Graph level engines)
>>>>
>>>>       ResultSet(resources) - RowSet (Nodes)
>>>>       RDFConnection - RDFLink
>>>>       QueryExecution - QueryExec
>>>>       (not an issue with UpdateProcessor)
>>>>
>>>> + Deprecation of QueryExecution.setTimeout and setIntialBinding
>>>>    (use a builder)
>>>> + Switch to rewrite for initial bindings
>>>>    This will work for remote usage which currently is unsupported,
>>>> + Explicit GSP engine - include support for quads operations.
>>>>
>>>> . SERVICE rewrite to use the new classes.
>>>>
>>>> - HttpOp : Direct use of java.net.http covers the complex cases so
>>>>    this class can be smaller and focused on the common cases.
>>>>    (I doubt it's used much directly)
>>>>
>>>> + Utilities: HttpRDF, AsyncHttpRDF, HttpOp
>>>>    AsyncHttpRDF should at least cover async GET so apps can
>>>>    gather data from several places in parallel.
>>>>
>>>> == Migration
>>>>
>>>> If we leave the old code for SPARQL execution (QueryEngineHTTP and 
>>>> HttpQuery) in-place, with Apache HttpClient4, apply copious 
>>>> deprecations
>>>> then, mostly, we have less sudden change. We then remove in a couple 
>>>> of releases time.
>>>>
>>>> Deprecate all QueryExecutionFactory.sparqlService, 
>>>> createServiceRequest and refer to (new) QueryExecutionHTTPBuilder
>>>>
>>>> Deprecate of QueryExecution.setTimeout and setIntialBindings - they 
>>>> should not be where they are.
>>>>
>>>> Update documentation
>>>>
>>>> == Improvements
>>>>
>>>> Code:
>>>>
>>>>    https://github.com/afs/jena-http
>>>>
>>>> which at the moment needs a custom Jena build because of misc 
>>>> cleanup and things found while writing jena-http and not PR'ed to Jena.
>>>>
>>>> Using a different HttpClient should not be too difficult as it 
>>>> internally encapsulates HttpClient usage. But a switchable 
>>>> HttpClient isn't so easy and also not invisible to users because of 
>>>> authentication setup is implementation-specific. We can't abstract 
>>>> authentication without significant costs in support and maintenance 
>>>> to the project.
>>>>
>>>>      Andy

Re: SPARQL operations [was: Evolving HttpClient usage]

Posted by Andy Seaborne <an...@apache.org>.
It is time to consider putting this in.

 > Suggestion: so as not to make this a hard-to-reverse change,
 > + Create a new branch in git with this PR
 > + Change the overnight development build to build from this branch.
 >
 > so the main branch is left in the current state.

that is all very well but it makes it tough for external contributions 
(e.g. hopefully JENA-2169) that go anywhere near the changed files. 
I've been resolving conflicts but it is fiddly and error prone.

Is it Jena5?
There isn't an intention to be that scale and I hope the majority of 
users are not affected; maybe some deprecations appear. Fuseki users 
aren't impacted.

Downstream systems digging into the internals of Jena will be more 
impacted where very old code has been updated, but whether they would 
test before a release is, as we know, quite unlikely.

     Andy

On 29/07/2021 14:36, Andy Seaborne wrote:
> This is about ready.
> 
> It's big.
> 
> == tl;dr
> 
> + Put on new branch
> + Switch the development Jenkins build the new branch so people can test 
> it prior to release (and smooth out any unexpected bumps)
> + Applications using HTTP authentication need to change.
> 
> ==
> 
> It will affect application that use of HTTP auth because up until now, 
> they have had to configure an AHC HttpClient ("AHC" = "Apache 
> HttpComponents") externally to Jena.
> 
> There are builders for all local and HTTP forms of RDFConnection, 
> QueryExecution and UpdateExecution. In fact, using builders is preferred 
> to the old factories. The factories remain, they use builder methods, 
> where they still make sense.
> 
> == Initial bindings => substitution
> 
> "substitution" is rewrite a query/update replacing a variables for 
> values. It is a before-execution rewrite.
> 
> QueryExecution still allows initial bindings and timouts to be set 
> although that is better done by the companion builder.
> 
> Query and update rewrite by substitution is uniformly provided. While 
> initialBinding is still there for local operation (never supported for 
> remote operations), substitution is now available for local and remote.
> 
> Substitution does not always give the same answers, thought it does in 
> most cases, so this is a long term migration.
> 
> == java.net.http.HttpClient
> 
> java.net.http.HttpClient is different to AHC HttpClient.
> 
> java.net.http.HttpClient:
> * It is more like a combination of AHC HttpClient and HttpContext.
> * There are no pool or cache options - connection reuse is inside the 
> JDK code; there is no support for caching (application responsibility), 
> which is fine for Jena'a use.
> * It supports one form of basic auth usage (a pattern useful with 
> microservices).
> * has HTTP/2 support.
> 
> AHC HttpClient has not been removed as a dependency. Some old code 
> remains (HttpQuery, QueryEngineHTTP) to lessen the immediate changes. 
> jena-jdbc-remote is not changed and still uses AHC HttpClient.
> 
> == Authentication
> 
> The PR code adds authentication options
> 
> 1/ The java.net.http.HttpClient with auth model.
> 2/ Challenge-based basic authentication.
> 3/ Challenge-based digest authentication.
> 4/ user@ form of userinfo in URLs (specifically for SERVICE)
> 5/ user:password@ form of userinfo in URLs (specifically for SERVICE)
> 
> All these are best done over https but that depends on the server end.
> (5) is not a good idea but we have to live with it.
> 
> There is now a thin abstraction to manage username/password and the 
> applications no longer have to deal with HttpClient directly.
> 
> == HttpOp, HttpRDF
> 
> HttpOp - a library of packed up usage of HTTP request for ways that Jena 
> uses HTTP - is moved to HttpOpAHC.
> 
> There is a new HttpOP, in a new package, together with a companion 
> HttpRDF, for java.net.http.HttpClient and the new ways that Jena uses HTTP.
> 
> There is a separate GSP client so all the code for GSP is in one place.
> 
> The HttpOp the thing that has impacted RDF Delta, and then mostly in 
> tests where test code does direct HTTP actions to validate the server 
> behaviour.
> 
> == From here
> 
> It would be a good idea to have this exposed before a Jena release. We 
> can at least give people the chance to see what impact, if any, it has 
> on them. Local usage is not supposed to be impacted.
> 
> But unusual/unexpected usage patterns may not work with zero change. 
> Some of this code is very old.
> 
> Suggestion: so as not to make this a hard-to-reverse change,
> + Create a new branch in git with this PR
> + Change the overnight development build to build from this branch.
> 
> so the main branch is left in the current state.
> 
> 
> == Other
> 
> This is also a chance to improve naming of API functions/methods/classes 
> through deprecation migration. Suggestions welcome.
> 
> 
> On 09/07/2021 17:37, Andy Seaborne wrote:
>> Epic JENA-2125 to track this with tickets for each part.
>>
>>  >       ResultSet(resources) - RowSet (Nodes)
>>  >       RDFConnection - RDFLink
>>  >       QueryExecution - QueryExec
>>
>>      Andy
>>
>> On 28/06/2021 18:00, Andy Seaborne wrote:
>>> Jena currently uses Apache HttpClient v4 for HTTP.
>>> This supports HTTP 1.1.
>>>
>>> Apache HttpClient v5 supports HTTP/2 and there is a migration path 
>>> from v4 to new style v5 but the path is not seamless. It is at least 
>>> package renaming followed by API changes.
>>>
>>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html 
>>>
>>>    and
>>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html 
>>>
>>>
>>>
>>> For most Jena users, there are no application changes needed because 
>>> SPARQL operations are packed up into the Jena APIs. But if an 
>>> application is doing detailed HTTP setup - most importantly,that 
>>> includes authentication - there is going to be a migration impact.
>>>
>>> Java11 now has a API java.net.http an all-new way to work with HTTP 
>>> including HTTP/2. (And there are other HTTP clients - I haven't used 
>>> any of those others).
>>>
>>>
>>> Should we update to java.net.http or Apache HttpClient v5 or other?
>>>
>>>
>>> Given the JDK has a decent HTTP client, my preference is to switch to 
>>> use java.net.http unless there is a positive reason to use a specific 
>>> external one.
>>>
>>> The JDK provided one means dependencies, is always present, and gets 
>>> fixes/improvements (if any) come by updating the JVM used.
>>>
>>> ----
>>>
>>> And also if HTTP support in Jena is being upgraded ... the code could 
>>> do with some work. Some of it is really old and is showing its age.
>>>
>>> Areas:
>>>     RDFConnection,
>>>     SPARQL/HTTP QueryExecution and UpdateProcessor,
>>>     Graph Store Protocol
>>>     SERVICE.
>>>
>>> == Improvements
>>>
>>> + Builder style for constructing the more complicated
>>>    (e.g. anything HTTP!)
>>> + Both Model and Graph / Statement and Triple level APIs
>>>    (Model-level being adapters of Graph level engines)
>>>
>>>       ResultSet(resources) - RowSet (Nodes)
>>>       RDFConnection - RDFLink
>>>       QueryExecution - QueryExec
>>>       (not an issue with UpdateProcessor)
>>>
>>> + Deprecation of QueryExecution.setTimeout and setIntialBinding
>>>    (use a builder)
>>> + Switch to rewrite for initial bindings
>>>    This will work for remote usage which currently is unsupported,
>>> + Explicit GSP engine - include support for quads operations.
>>>
>>> . SERVICE rewrite to use the new classes.
>>>
>>> - HttpOp : Direct use of java.net.http covers the complex cases so
>>>    this class can be smaller and focused on the common cases.
>>>    (I doubt it's used much directly)
>>>
>>> + Utilities: HttpRDF, AsyncHttpRDF, HttpOp
>>>    AsyncHttpRDF should at least cover async GET so apps can
>>>    gather data from several places in parallel.
>>>
>>> == Migration
>>>
>>> If we leave the old code for SPARQL execution (QueryEngineHTTP and 
>>> HttpQuery) in-place, with Apache HttpClient4, apply copious deprecations
>>> then, mostly, we have less sudden change. We then remove in a couple 
>>> of releases time.
>>>
>>> Deprecate all QueryExecutionFactory.sparqlService, 
>>> createServiceRequest and refer to (new) QueryExecutionHTTPBuilder
>>>
>>> Deprecate of QueryExecution.setTimeout and setIntialBindings - they 
>>> should not be where they are.
>>>
>>> Update documentation
>>>
>>> == Improvements
>>>
>>> Code:
>>>
>>>    https://github.com/afs/jena-http
>>>
>>> which at the moment needs a custom Jena build because of misc cleanup 
>>> and things found while writing jena-http and not PR'ed to Jena.
>>>
>>> Using a different HttpClient should not be too difficult as it 
>>> internally encapsulates HttpClient usage. But a switchable HttpClient 
>>> isn't so easy and also not invisible to users because of 
>>> authentication setup is implementation-specific. We can't abstract 
>>> authentication without significant costs in support and maintenance 
>>> to the project.
>>>
>>>      Andy

SPARQL operations [was: Evolving HttpClient usage]

Posted by Andy Seaborne <an...@apache.org>.
This is about ready.

It's big.

== tl;dr

+ Put on new branch
+ Switch the development Jenkins build the new branch so people can test 
it prior to release (and smooth out any unexpected bumps)
+ Applications using HTTP authentication need to change.

==

It will affect application that use of HTTP auth because up until now, 
they have had to configure an AHC HttpClient ("AHC" = "Apache 
HttpComponents") externally to Jena.

There are builders for all local and HTTP forms of RDFConnection, 
QueryExecution and UpdateExecution. In fact, using builders is preferred 
to the old factories. The factories remain, they use builder methods, 
where they still make sense.

== Initial bindings => substitution

"substitution" is rewrite a query/update replacing a variables for 
values. It is a before-execution rewrite.

QueryExecution still allows initial bindings and timouts to be set 
although that is better done by the companion builder.

Query and update rewrite by substitution is uniformly provided. While 
initialBinding is still there for local operation (never supported for 
remote operations), substitution is now available for local and remote.

Substitution does not always give the same answers, thought it does in 
most cases, so this is a long term migration.

== java.net.http.HttpClient

java.net.http.HttpClient is different to AHC HttpClient.

java.net.http.HttpClient:
* It is more like a combination of AHC HttpClient and HttpContext.
* There are no pool or cache options - connection reuse is inside the 
JDK code; there is no support for caching (application responsibility), 
which is fine for Jena'a use.
* It supports one form of basic auth usage (a pattern useful with 
microservices).
* has HTTP/2 support.

AHC HttpClient has not been removed as a dependency. Some old code 
remains (HttpQuery, QueryEngineHTTP) to lessen the immediate changes. 
jena-jdbc-remote is not changed and still uses AHC HttpClient.

== Authentication

The PR code adds authentication options

1/ The java.net.http.HttpClient with auth model.
2/ Challenge-based basic authentication.
3/ Challenge-based digest authentication.
4/ user@ form of userinfo in URLs (specifically for SERVICE)
5/ user:password@ form of userinfo in URLs (specifically for SERVICE)

All these are best done over https but that depends on the server end.
(5) is not a good idea but we have to live with it.

There is now a thin abstraction to manage username/password and the 
applications no longer have to deal with HttpClient directly.

== HttpOp, HttpRDF

HttpOp - a library of packed up usage of HTTP request for ways that Jena 
uses HTTP - is moved to HttpOpAHC.

There is a new HttpOP, in a new package, together with a companion 
HttpRDF, for java.net.http.HttpClient and the new ways that Jena uses HTTP.

There is a separate GSP client so all the code for GSP is in one place.

The HttpOp the thing that has impacted RDF Delta, and then mostly in 
tests where test code does direct HTTP actions to validate the server 
behaviour.

== From here

It would be a good idea to have this exposed before a Jena release. We 
can at least give people the chance to see what impact, if any, it has 
on them. Local usage is not supposed to be impacted.

But unusual/unexpected usage patterns may not work with zero change. 
Some of this code is very old.

Suggestion: so as not to make this a hard-to-reverse change,
+ Create a new branch in git with this PR
+ Change the overnight development build to build from this branch.

so the main branch is left in the current state.


== Other

This is also a chance to improve naming of API functions/methods/classes 
through deprecation migration. Suggestions welcome.


On 09/07/2021 17:37, Andy Seaborne wrote:
> Epic JENA-2125 to track this with tickets for each part.
> 
>  >       ResultSet(resources) - RowSet (Nodes)
>  >       RDFConnection - RDFLink
>  >       QueryExecution - QueryExec
> 
>      Andy
> 
> On 28/06/2021 18:00, Andy Seaborne wrote:
>> Jena currently uses Apache HttpClient v4 for HTTP.
>> This supports HTTP 1.1.
>>
>> Apache HttpClient v5 supports HTTP/2 and there is a migration path 
>> from v4 to new style v5 but the path is not seamless. It is at least 
>> package renaming followed by API changes.
>>
>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html 
>>
>>    and
>> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html 
>>
>>
>>
>> For most Jena users, there are no application changes needed because 
>> SPARQL operations are packed up into the Jena APIs. But if an 
>> application is doing detailed HTTP setup - most importantly,that 
>> includes authentication - there is going to be a migration impact.
>>
>> Java11 now has a API java.net.http an all-new way to work with HTTP 
>> including HTTP/2. (And there are other HTTP clients - I haven't used 
>> any of those others).
>>
>>
>> Should we update to java.net.http or Apache HttpClient v5 or other?
>>
>>
>> Given the JDK has a decent HTTP client, my preference is to switch to 
>> use java.net.http unless there is a positive reason to use a specific 
>> external one.
>>
>> The JDK provided one means dependencies, is always present, and gets 
>> fixes/improvements (if any) come by updating the JVM used.
>>
>> ----
>>
>> And also if HTTP support in Jena is being upgraded ... the code could 
>> do with some work. Some of it is really old and is showing its age.
>>
>> Areas:
>>     RDFConnection,
>>     SPARQL/HTTP QueryExecution and UpdateProcessor,
>>     Graph Store Protocol
>>     SERVICE.
>>
>> == Improvements
>>
>> + Builder style for constructing the more complicated
>>    (e.g. anything HTTP!)
>> + Both Model and Graph / Statement and Triple level APIs
>>    (Model-level being adapters of Graph level engines)
>>
>>       ResultSet(resources) - RowSet (Nodes)
>>       RDFConnection - RDFLink
>>       QueryExecution - QueryExec
>>       (not an issue with UpdateProcessor)
>>
>> + Deprecation of QueryExecution.setTimeout and setIntialBinding
>>    (use a builder)
>> + Switch to rewrite for initial bindings
>>    This will work for remote usage which currently is unsupported,
>> + Explicit GSP engine - include support for quads operations.
>>
>> . SERVICE rewrite to use the new classes.
>>
>> - HttpOp : Direct use of java.net.http covers the complex cases so
>>    this class can be smaller and focused on the common cases.
>>    (I doubt it's used much directly)
>>
>> + Utilities: HttpRDF, AsyncHttpRDF, HttpOp
>>    AsyncHttpRDF should at least cover async GET so apps can
>>    gather data from several places in parallel.
>>
>> == Migration
>>
>> If we leave the old code for SPARQL execution (QueryEngineHTTP and 
>> HttpQuery) in-place, with Apache HttpClient4, apply copious deprecations
>> then, mostly, we have less sudden change. We then remove in a couple 
>> of releases time.
>>
>> Deprecate all QueryExecutionFactory.sparqlService, 
>> createServiceRequest and refer to (new) QueryExecutionHTTPBuilder
>>
>> Deprecate of QueryExecution.setTimeout and setIntialBindings - they 
>> should not be where they are.
>>
>> Update documentation
>>
>> == Improvements
>>
>> Code:
>>
>>    https://github.com/afs/jena-http
>>
>> which at the moment needs a custom Jena build because of misc cleanup 
>> and things found while writing jena-http and not PR'ed to Jena.
>>
>> Using a different HttpClient should not be too difficult as it 
>> internally encapsulates HttpClient usage. But a switchable HttpClient 
>> isn't so easy and also not invisible to users because of 
>> authentication setup is implementation-specific. We can't abstract 
>> authentication without significant costs in support and maintenance to 
>> the project.
>>
>>      Andy

Re: Evolving HttpClient usage

Posted by Andy Seaborne <an...@apache.org>.
Epic JENA-2125 to track this with tickets for each part.

 >       ResultSet(resources) - RowSet (Nodes)
 >       RDFConnection - RDFLink
 >       QueryExecution - QueryExec

     Andy

On 28/06/2021 18:00, Andy Seaborne wrote:
> Jena currently uses Apache HttpClient v4 for HTTP.
> This supports HTTP 1.1.
> 
> Apache HttpClient v5 supports HTTP/2 and there is a migration path from 
> v4 to new style v5 but the path is not seamless. It is at least package 
> renaming followed by API changes.
> 
> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/index.html 
> 
>    and
> https://hc.apache.org/httpcomponents-client-5.1.x/migration-guide/migration-to-classic.html 
> 
> 
> 
> For most Jena users, there are no application changes needed because 
> SPARQL operations are packed up into the Jena APIs. But if an 
> application is doing detailed HTTP setup - most importantly,that 
> includes authentication - there is going to be a migration impact.
> 
> Java11 now has a API java.net.http an all-new way to work with HTTP 
> including HTTP/2. (And there are other HTTP clients - I haven't used any 
> of those others).
> 
> 
> Should we update to java.net.http or Apache HttpClient v5 or other?
> 
> 
> Given the JDK has a decent HTTP client, my preference is to switch to 
> use java.net.http unless there is a positive reason to use a specific 
> external one.
> 
> The JDK provided one means dependencies, is always present, and gets 
> fixes/improvements (if any) come by updating the JVM used.
> 
> ----
> 
> And also if HTTP support in Jena is being upgraded ... the code could do 
> with some work. Some of it is really old and is showing its age.
> 
> Areas:
>     RDFConnection,
>     SPARQL/HTTP QueryExecution and UpdateProcessor,
>     Graph Store Protocol
>     SERVICE.
> 
> == Improvements
> 
> + Builder style for constructing the more complicated
>    (e.g. anything HTTP!)
> + Both Model and Graph / Statement and Triple level APIs
>    (Model-level being adapters of Graph level engines)
> 
>       ResultSet(resources) - RowSet (Nodes)
>       RDFConnection - RDFLink
>       QueryExecution - QueryExec
>       (not an issue with UpdateProcessor)
> 
> + Deprecation of QueryExecution.setTimeout and setIntialBinding
>    (use a builder)
> + Switch to rewrite for initial bindings
>    This will work for remote usage which currently is unsupported,
> + Explicit GSP engine - include support for quads operations.
> 
> . SERVICE rewrite to use the new classes.
> 
> - HttpOp : Direct use of java.net.http covers the complex cases so
>    this class can be smaller and focused on the common cases.
>    (I doubt it's used much directly)
> 
> + Utilities: HttpRDF, AsyncHttpRDF, HttpOp
>    AsyncHttpRDF should at least cover async GET so apps can
>    gather data from several places in parallel.
> 
> == Migration
> 
> If we leave the old code for SPARQL execution (QueryEngineHTTP and 
> HttpQuery) in-place, with Apache HttpClient4, apply copious deprecations
> then, mostly, we have less sudden change. We then remove in a couple of 
> releases time.
> 
> Deprecate all QueryExecutionFactory.sparqlService, createServiceRequest 
> and refer to (new) QueryExecutionHTTPBuilder
> 
> Deprecate of QueryExecution.setTimeout and setIntialBindings - they 
> should not be where they are.
> 
> Update documentation
> 
> == Improvements
> 
> Code:
> 
>    https://github.com/afs/jena-http
> 
> which at the moment needs a custom Jena build because of misc cleanup 
> and things found while writing jena-http and not PR'ed to Jena.
> 
> Using a different HttpClient should not be too difficult as it 
> internally encapsulates HttpClient usage. But a switchable HttpClient 
> isn't so easy and also not invisible to users because of authentication 
> setup is implementation-specific. We can't abstract authentication 
> without significant costs in support and maintenance to the project.
> 
>      Andy