You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@jena.apache.org by GitBox <gi...@apache.org> on 2022/03/08 18:31:46 UTC

[GitHub] [jena] Aklakan opened a new pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Aklakan opened a new pull request #1218:
URL: https://github.com/apache/jena/pull/1218


   This PR provides a streaming implementation for json result rows according to [JENA-2302](https://issues.apache.org/jira/projects/JENA/issues/JENA-2302).
   In this initial commit the `RowSetReaderJSON:process` method has been modified to delegate to the new streaming approach.
   
   All existing tests in ARQ are green but there may yet be some uncaught corner cases.
   In particular, a test case where 'head' comes after the 'results' key needs yet to be provided.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings would be displayed when streaming xml.
   
   Needs more thought how to handle this properly - maybe QueryExecHttp should have a `warnOnRemainingData` flag where this class itself checks for (a configurable) amount of remaining data before emitting the warning and calling close on the underlying HttpClient-backed InputStream which may then either consume the remaining data or cut the connection.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066051336


   When I opened [JENA-2309](https://issues.apache.org/jira/projects/JENA/issues/JENA-2309) I realized that actually I should slightly refactor the method `RowSetJSONStreaming.moveToNext` to build upon an ClosableIterator<RsJsonElt> which would expose the head, binding and boolean elements - akin to AsyncParser.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`.
   * or drop the ability to handle severity of violations and treat violations with a fixed severity. In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   
   Or what do you have a better suggestion?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504


   Note that in the PR, the RowSetJSONStreaming implementation is independent of gson. Well, there are gson-specific static util functions (which might as well be made all private for the time being), but those methods are only used to implement `IteratorRsJSON` 'encoder' (maybe there is a better name for it) - i.e. the mapper of elements from the json reader to domain events.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1064106157


   First comments:
   
   #### Headers
   
   Two files need the license header added. The build runs Apache RAT (Release Audit Tool) to check license headers are present and fails the build if they are not.
   
   jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetJSONStreamingBenchmark.java  
   jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetReaderJSONStreaming.java
   
   after which the build passes. 
   
   The build is `mvn clean install` -- there is a github action if you want to run it that way.
   
   #### Warnings in the codebase
   
   The code has warnings - could you deal with them please:
   ```
   RowSetJSONStreaming.java
   Javadoc: BufferedRowSet cannot be resolved to a type
   
   RowSetJSONStreaming.java
   The import com.google.gson.stream.MalformedJsonException is never used
   
   RowSetJSONStreamingBenchmark.java
   The import java.io.IOException is never used
   
   RowSetJSONStreamingBenchmark.java
   The import java.net.MalformedURLException is never used
   
   RowSetReaderJSON.java
   The import org.apache.jena.sparql.exec.RowSetBuffered is never used
   ```
   #### History
   
   When it is time to merge this, then the commits will need to be squashed down to one commits, or a few commits if a split makes sense, with "JENA-2302: " at the start of commit message. Just one "JENA-2302: Streaming JSON results" is fine. The details can be here and in JIRA.  
   
   The aim is to make the commits meaningful to someone looking at the history is 1-5 years time. Detailed development commits don't stand the test of time.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067449956


   I added a few contrived test cases to test the validation and demonstrate the configuration.
   
   The streaming json implementation causes tests of `org.apache.jena.jdbc.results.AbstractResultSetTests`run from `jena-jdbc-driver-remote` to raise warnings. This is because the test suite expects certain tests to fail, however `QueryExecHTTP` checks connections for consumptions; which obviously no longer occurs due to the streaming nature.
   HttpClient's default behavior of consuming remaining data on close is a real pain: If an spo query fails midway then even with the streaming json implementation the http client might needlessly retrieve all remaining data. I once made a [PR to Apache VFS2](https://issues.apache.org/jira/browse/VFS-805#) (which was accepted and by now I can confirm that it still works) for exactly issue for apache http client; not sure if the solution is applicable to java's native client but maybe it helps.
   
   ```java
   if (retainedConnection.read() != -1)
       Log.warn(this, "HTTP response not fully consumed, if HTTP Client is reusing connections (its default behaviour) then it will consume the remaining response data which may take a long time and cause this application to become unresponsive");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1075316223


   Error handling discussion:
   
   Error handling is trying to communicate the general consequences of some condition. Often, the application code is a long way away from the reader so details of the format are not understood by the application. In Fuseki, they are logged. Error handling and especially "warning", overlaps with logging. Even this "client" side code would do the same for federated queries.
   
   In the parsers, errors generate exceptions so they are not recoverable; there are possibly occasionally exceptions to this, sometimes recovery is possible but the output should now be considered "wrong".
   
   Unfortunately, this isn't a free choice. A warning is something the code notes but will continue to process, an error is serious enough that the results are suspect at best. Turing error into warning changes the contract.
   
   In RowSetReaderStream, a situation like "Encountered bindings as well as boolean result" is an error. The incoming results JSON is illegal. The fault is at the server end. Any fixup is a decision as to which header to believe and is throwing the other choice away.
   
   All the calls to `ErrorHandlers.relay` are serious errors, except, arguably, finding extra elements in the JSON which the reader doesn't understand. (e.g. JSON "comments"!!)
   
   I tried to refactor to centralize error handling so the the processing code call "error(details)"  but it became too big a change.
   
   I think allowing for fine-grained handling would benefit from an explicit error record of an enum "error kind", it's classification "error"/"warning" and the specific message. ValidationSettings enumerates the possible errors as statics.
   
   The enum can be used to process the error which as looking up into a table of what is currently called Severity. The Severity class is how the caller wants to treat the issue arising. We need something to reflect the code raising the issue.
   
   Related: logging system have "markers" to allow related log messages. Jena does not use markers - but the concept seems useful for a more detail error handling extension.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504


   Note that in the PR, the RowSetJSONStreaming implementation is independent of gson. Well, there are gson-specific static util functions (which might as well be made all private for the time being), but those methods are only used to implement `IteratorRsJSON` 'encoder' (maybe there is a better name for it) - i.e. the mapper of elements from the json reader to domain events.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066127948


   > For a cleaner integration I have added a little extension to the riot system
   
   `ErrorHandler` is an interface. An error handler implementation can send warnings/errors to an event system by being an implementation. Do we need a separate concept "relays" and `ErrorHandlerEvent` for this one result set reader?
   
   With `IGNORE` etc,. it feel like this is adding something more like logging. Error handlers are called only exception circumstances where things are wrong in some way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067449956


   I added a few contrived test cases to test the validation and demonstrate the configuration including the IGNORE setting.
   
   I discovered a quirk related to this PR: The streaming json implementation causes tests of `org.apache.jena.jdbc.results.AbstractResultSetTests`run from `jena-jdbc-driver-remote` to emit warnings during `mvn install`. This is because the test suite expects certain tests to fail, however `QueryExecHTTP` checks connections for consumptions; which obviously no longer occurs due to the streaming nature.
   HttpClient's default behavior of consuming remaining data on close is a real pain: If an spo query fails midway then even with the streaming json implementation the http client might needlessly retrieve all remaining data. I once made a [PR to Apache VFS2](https://issues.apache.org/jira/browse/VFS-805#) for exactly this issue with apache's http client (which was accepted and by now I can confirm that it still works); not sure if the solution is applicable to java's native client but maybe it helps.
   
   QueryExecHTTP raises the warning in this snippet:
   ```java
   if (retainedConnection.read() != -1)
       Log.warn(this, "HTTP response not fully consumed, if HTTP Client is reusing connections (its default behaviour) then it will consume the remaining response data which may take a long time and cause this application to become unresponsive");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`. It feels wrong placing error handlers into the context rather than severity constants.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   * Drop the ValidationSettings class which removes the ability to handle severity of violations individually.  In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * Drop ValidationSettings and also drop the error handler and raise hard coded exceptions. So essentially removing the ability to configure validation.
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system.
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class
   * the class `ErrorHandlers` with a `relay` method for how to relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONSteaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.
   
   Is there a place in the jena code base for for benchmarking components? Right now my test benchmark runner is part of the PR though its in the wrong package and needs a revision as to not use an external resource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system.
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class
   * the class `ErrorHandlers` with a `relay` method how te relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONSteaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system which should pretty much complete the PR content wise.
   Also, `RowSetReaderJSONStreaming` now defines the `rsJsonSearchHeadEagerly ` symbol which gives control over whether to actively search for a `head` key on row set construction or only lazily when requesting it via `getResultVars`. Default is lazy.
   
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class (Perhaps rename to `ErrorEvent`?)
   * the class `ErrorHandlers` with a `relay` method for how to relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONStreaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.
   
   Is there a place in the jena code base for for benchmarking components? Right now my test benchmark runner is part of the PR though its in the wrong package and needs a revision as to not use an external resource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1070984517


   https://www.overops.com/blog/the-ultimate-json-library-json-simple-vs-gson-vs-jackson-vs-json/
   
   Not that raw speed is likely to be the limiting factor streaming at scale (writing is usually slower than reading) but in the 2021 update at the link above, GSON is faster for small and large files.
   
   YMMV.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings would be displayed when streaming xml.
   
   Needs more thought how to handle this properly - maybe QueryExecHttp should have a `warnOnRemainingData` flag where this class itself checks for (a configurable) amount of remaining data before emitting the warning and calling close on the underlying HttpClient-backed InputStream which may then either consume the remaining data or cut the connection. Closing a query execution early due to a processing error is not so uncommon.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1073283375


   The implementation certainly streams. I did a test of reading 50 million results with Fuseki and client in one JVM.
   
   The CPU was 130% so true parallelism, and the log messages from Fuseki came out at the expected points. 
   The heap size was limited to 2Gbytes but the results are much larger (>8G) and there were no heap issues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a change in pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on a change in pull request #1218:
URL: https://github.com/apache/jena/pull/1218#discussion_r823753353



##########
File path: pom.xml
##########
@@ -381,6 +381,13 @@
         <version>${ver.dexxcollection}</version>
       </dependency>
       
+      <!-- For RowSetJSONStreaming -->

Review comment:
       No need for this. By including it, it is reasonable to use it anywhere now.

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/JSONResultsKW.java
##########
@@ -24,37 +24,37 @@
 
 public class JSONResultsKW
 {
-    public static String kHead          = "head" ;
-    public static String kVars          = "vars" ;
-    public static String kLink          = "link" ;
-    public static String kResults       = "results" ;
-    public static String kBindings      = "bindings" ;
-    public static String kType          = "type" ;
-    public static String kUri           = "uri"  ;
-
-    public static String kValue         = "value" ;
-    public static String kLiteral       = "literal" ;
-    public static String kUnbound       = "undef" ;
+    public static final String kHead          = "head" ;

Review comment:
       👍🏻 

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/RowSetReaderRegistry.java
##########
@@ -57,7 +57,8 @@ public static void init() {
         initialized = true;
 
         register(RS_XML,        RowSetReaderXML.factory);
-        register(RS_JSON,       RowSetReaderJSON.factory);

Review comment:
       Leave in-place and commented out. In time, after a release or two, we can drop that but it servers as a quick-fix marker for now.

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/RowSetReaderRegistry.java
##########
@@ -57,7 +57,8 @@ public static void init() {
         initialized = true;
 
         register(RS_XML,        RowSetReaderXML.factory);
-        register(RS_JSON,       RowSetReaderJSON.factory);
+        // register(RS_JSON,       RowSetReaderJSON.factory);
+        register(RS_JSON,       RowSetReaderJSONStreaming.factory); // Experimental! JENA-2302

Review comment:
       Remove "experimental" comment.

##########
File path: pom.xml
##########
@@ -381,6 +381,13 @@
         <version>${ver.dexxcollection}</version>
       </dependency>
       
+      <!-- For RowSetJSONStreaming -->
+      <dependency>
+        <groupId>com.google.code.gson</groupId>
+        <artifactId>gson</artifactId>
+        <version>2.9.0</version>

Review comment:
       Please could you follow the pattern of putting the version number in as a `<property>`.

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetReaderJSONStreaming.java
##########
@@ -0,0 +1,50 @@
+package org.apache.jena.riot.rowset.rw;
+
+import java.io.InputStream;
+import java.util.Objects;
+
+import org.apache.jena.riot.resultset.ResultSetLang;
+import org.apache.jena.riot.rowset.RowSetReader;
+import org.apache.jena.riot.rowset.RowSetReaderFactory;
+import org.apache.jena.sparql.exec.QueryExecResult;
+import org.apache.jena.sparql.exec.RowSetBuffered;
+import org.apache.jena.sparql.resultset.ResultSetException;
+import org.apache.jena.sparql.util.Context;
+
+public class RowSetReaderJSONStreaming
+    implements RowSetReader
+{
+    public static final RowSetReaderFactory factory = lang -> {
+        if (!Objects.equals(lang, ResultSetLang.RS_JSON ) )
+            throw new ResultSetException("RowSet for JSON asked for a "+lang);
+        return new RowSetReaderJSONStreaming();
+    };
+
+    @Override
+    public QueryExecResult readAny(InputStream in, Context context) {
+        return process(in, context);
+    }
+

Review comment:
       The codebase has one blank line between methods.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066127948


   > For a cleaner integration I have added a little extension to the riot system
   
   `ErrorHandler` is an interface. The contact is synchronous though an error handler implementation can send warnings/errors to an event system by being an implementation and hence asynchronous. Do we need a separate concept "relays" and `ErrorHandlerEvent` for this one result set reader?
   
   With `IGNORE` etc,. it feel like this is adding something more like logging. Error handlers are called only exception circumstances where things are wrong in some way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`. It feels wrong placing error handlers into the context rather than severity constants.
   * Drop the ValidationSettings class which removes the ability to handle severity of violations individually.  In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system which should pretty much complete the PR content wise.
   Also, `RowSetReaderJSONStreaming` now defines the `rsJsonSearchHeadEagerly ` symbol which gives control over whether to actively search for a `head` key on row set construction or only lazily when requesting it via `getResultVars`. Default is lazy.
   
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class
   * the class `ErrorHandlers` with a `relay` method for how to relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONSteaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.
   
   Is there a place in the jena code base for for benchmarking components? Right now my test benchmark runner is part of the PR though its in the wrong package and needs a revision as to not use an external resource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`. It feels wrong placing error handlers into the context rather than severity constants.
   * or drop the ability to handle severity of violations and treat violations with a fixed severity. In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1070984517






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings should be displayed when streaming xml. Needs more thought how to deal with it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings should be displayed when streaming xml. Needs more thought how to deal with it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504


   Note that in the PR, the `RowSetJSONStreaming` implementation (which does accumulation and validation) is independent of gson. Well, there are gson-specific static util functions (which might as well be made all private for the time being), but those methods are only used to implement `IteratorRsJSON`'s 'encoder' (maybe there is a better name for it; such as `DeserializationStrategy`) - i.e. the mapper of elements from the json reader to domain events.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066719174


   That can be done with an `ErrorHandler`. The `ErrorHandler` is the policy of what to do with notification from the application code. The resultset reader calls warning if it can recover and error if it can't or the results are not trusted, and fatal for catastrophic failure. 
   
   Downgrading or ignoring "error" is not matter of filtering. "Error" is more significant.
   
   Presumably, Virtuoso was fixed long ago - it hasn't been mentioned on stackoverflow or users@. Additional "maybe" features become technical debt. Unlike your big data processing platform, we don't know who uses what and in what way.
   
   If you wish to propose an enhancement to error handlers then great - it should apply to all places error handler are used and have an agreed design. (It seems to me it is introducing error numbers and logging markers, for example.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067449956


   I added a few contrived test cases to test the validation and demonstrate the configuration.
   
   The streaming json implementation causes tests of `org.apache.jena.jdbc.results.AbstractResultSetTests`run from `jena-jdbc-driver-remote` to raise warnings. This is because the test suite expects certain tests to fail, however `QueryExecHTTP` checks connections for consumptions; which obviously no longer occurs due to the streaming nature.
   HttpClient's default behavior of consuming remaining data on close is a real pain: If an spo query fails midway then even with the streaming json implementation the http client might needlessly retrieve all remaining data. I once made a [PR to Apache VFS2](https://issues.apache.org/jira/browse/VFS-805#) for exactly issue for apache http client; not sure if the solution is applicable to java's native client but maybe it helps.
   
   ```java
   if (retainedConnection.read() != -1)
       Log.warn(this, "HTTP response not fully consumed, if HTTP Client is reusing connections (its default behaviour) then it will consume the remaining response data which may take a long time and cause this application to become unresponsive");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066051336


   Now that I opened [JENA-2309](https://issues.apache.org/jira/projects/JENA/issues/JENA-2309) I realized that actually I should slightly refactor the method `RowSetJSONStreaming.moveToNext` to build upon a `ClosableIterator<RsJsonElt>` which would expose the head, binding and boolean elements - akin to AsyncParser.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`. It feels wrong placing error handlers into the context rather than severity constants.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   * Drop the ValidationSettings class which removes the ability to handle severity of violations individually.  In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * Drop ValidationSettings and also drop the error handler and raise hard coded exceptions. So essentially removing the ability to configure validation.
   * Make the ValidationSettings and the error handler stuff private in RowSetJSONStreaming for now until there is some future consensus.
   * Or is the primary concern the IGNORE constant which should be removed because a violation must at least be logged but never be 'ignored'?
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504


   Note that in the PR, the `RowSetJSONStreaming` implementation (which does accumulation and validation) is independent of gson. Well, there are gson-specific static util functions (which might as well be made all private for the time being), but those methods are only used to implement `IteratorRsJSON`'s 'encoder' (maybe there is a better name for it) - i.e. the mapper of elements from the json reader to domain events.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`.
   * or drop the ability to handle severity of violations and treat violations with a fixed severity. In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504


   Note that in the PR, the RowSetJSONStreaming implementation is independent of gson. Well, there are gson-specific static util functions (which might as well be made all private for the time being), but those methods are only used to implement `IteratorRsJSON`'s 'encoder' (maybe there is a better name for it) - i.e. the mapper of elements from the json reader to domain events.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071105172


   GSON is small and has an Apache License 2.0. In terms of adding a new dependency, it is as easy as it gets.
   
   titanium-json-ld uses jakarta.json - that's Eclipse Public License which is [category B](https://www.apache.org/legal/resolved.html#category-b). Odd that it isn't the Eclipse Distribution License which is BSD-ish. jena has already added the necessary text into the LICENSE in the distributions (zips). The shad plugin automatically includes all the licenses used by a combined jars.
   
   Jackson is AL2. It's quite a bit bigger.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`. It feels wrong placing error handlers into the context rather than severity constants.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   * Drop the ValidationSettings class which removes the ability to handle severity of violations individually.  In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * Drop ValidationSettings and also drop the error handler and raise hard coded exceptions. So essentially removing the ability to configure validation. (Or at least make it all private in RowSetJSONStreaming for now until there is some future agreement).
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1075308694


   > What is your suggestion?
   
   Here is a diff for suggestion of what to do about the more sophisticated error handling. 
   
   [Diff-1218-afs-1.txt](https://github.com/apache/jena/files/8325574/Diff-1218-afs-1.txt)
   
   There is a new package `org.apache.jena.riot.rowset.rw.rs_json` with the additional machinery being introduced for JSON result set reading. This leaves `RowSetJSONStreaming` in the `rowset.rw` package.
   
   It puts the additional error handler classes there, moving them out of the riot/system package. All the PR code is there without changing the system-wide API, which otherwise would, to me, be promising the facilities are available on other readers which is not the case. In Jena, with long version tail and the number of users, it is easier to add things than remove them.
   
   The diff moves some inner classes where they are shared across classes into their own top-level files. There are some scope changes to package/private.
   
   We can continue discussing changing the system error handling while still getting the reader capabilities into the codebase.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066051336


   Now that I opened [JENA-2309](https://issues.apache.org/jira/projects/JENA/issues/JENA-2309) I realized that actually I should slightly refactor the method `RowSetJSONStreaming.moveToNext` to build upon an ClosableIterator<RsJsonElt> which would expose the head, binding and boolean elements - akin to AsyncParser.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system which should pretty much complete the PR content wise.
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class
   * the class `ErrorHandlers` with a `relay` method for how to relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONSteaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.
   
   Is there a place in the jena code base for for benchmarking components? Right now my test benchmark runner is part of the PR though its in the wrong package and needs a revision as to not use an external resource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066051336


   Now that I opened [JENA-2309](https://issues.apache.org/jira/projects/JENA/issues/JENA-2309) I realized that it would be possible to slightly refactor the method `RowSetJSONStreaming.moveToNext` to build upon a `ClosableIterator<RsJsonElt>` which would expose the head, binding and boolean elements - akin to AsyncParser's low-lever API.
   
   However since the only element worth repeating is the binding, making that change would introduce extra overhead on that - so its making a nicer architecture that can account for very niche cases at the cost of extra computational overhead (wrapping each binding in an extra JsonRsElt event). Maybe its not worth it.
   
   I started looking into this after all - so please don't (re)review the latest changes yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings would be displayed when streaming xml.
   
   Needs more thought how to handle this properly - maybe QueryExecHttp should have a `warnOnRemainingData` option whose value is the number of bytes this class should read before issuing a warning and calling close on theHttpClient-backed InputStream; which may then either consume the remaining data or cut the connection. Closing a query execution early due to a processing error is not so uncommon.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings would be displayed when streaming xml.
   
   Needs more thought how to handle this properly - maybe QueryExecHttp should have a `warnOnRemainingData` option whose value is the number of bytes this class should read before issuing a warning and calling close on the HttpClient-backed InputStream; which may then either consume the remaining data or cut the connection. Closing a query execution early due to a processing error is not so uncommon.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071105172


   GSON is small and has an Apache License 2.0. In terms of adding a new dependency, it is as easy as it gets.
   
   titanium-json-ld uses jakarta.json - that's Eclipse Public License which is [category B](https://www.apache.org/legal/resolved.html#category-b). Odd that it isn't the Eclipse Distribution License which is BSD-ish. jena has already added the necessary text into the LICENSE in the distributions (zips). The shade plugin automatically includes all the licenses used by a combined jars.
   
   Jackson is AL2. It's quite a bit bigger.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066143980


   The aforementioned update is done. From my side the architecture is now complete.
   
   > With `IGNORE` etc,. it feel like this is adding something more like logging. Error handlers are called only exception circumstances where things are wrong in some way.
   
   The rationale is that this gives the possibility to decide on a per use-case basis on how severe certain violations should be considered. I think it was some legacy version of virtuoso that returned empty json for empty select results. The default severity in RowSetJSONStreaming is ERROR so such "ErrorEvents" will be relayed to an ErrorHandler's error method that will then typically raise an exception. With this update you can set the severity to WARNING which will log a warning or IGNORE which won't report anything.
   
   As another example, what to do if there are multiple 'results' elements in the json? Is it an error? Issue a warning? Ignore because 'things on the Web happened' and for some special use case it is fine that the parser just continues to emit bindings?
   
   You have to actively change properties in the context though to get this behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066143980


   The aforementioned update is done. From my side the architecture is now complete.
   
   > With `IGNORE` etc,. it feel like this is adding something more like logging. Error handlers are called only exception circumstances where things are wrong in some way.
   
   The rationale is that this gives the possibility to decide on a per use-case basis on how severe certain violations should be considered. I think it was some legacy version of virtuoso that returned empty json for empty select results. The default severity in RowSetJSONStreaming is ERROR so such "ErrorEvents" will be relayed to an ErrorHandler's error method that will then raise an exception. With this update you can set the severity to WARNING which will log a warning or IGNORE which won't report anything.
   
   As another example, what to do if there are multiple 'results' elements in the json? Is it an error? Issue a warning? Ignore because 'things on the Web happen' and the parser will just continue to emit bindings?
   
   You have to actively change properties in the context though to get this behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067449956


   I added a few contrived test cases to test the validation and demonstrate the configuration including the IGNORE setting.
   
   I discovered a glitch related to this PR: The streaming json implementation causes tests of `org.apache.jena.jdbc.results.AbstractResultSetTests`run from `jena-jdbc-driver-remote` to emit warnings during `mvn install`. This is because the test suite expects certain tests to fail, however `QueryExecHTTP` checks connections for consumptions; which obviously no longer occurs due to the streaming nature.
   HttpClient's default behavior of consuming remaining data on close is a real pain: If an spo query fails midway then even with the streaming json implementation the http client might needlessly retrieve all remaining data. I once made a [PR to Apache VFS2](https://issues.apache.org/jira/browse/VFS-805#) for exactly this issue with apache's http client (which was accepted and by now I can confirm that it still works); not sure if the solution is applicable to java's native client but maybe it helps.
   
   QueryExecHTTP raises the warning in this snippet:
   ```java
   if (retainedConnection.read() != -1)
       Log.warn(this, "HTTP response not fully consumed, if HTTP Client is reusing connections (its default behaviour) then it will consume the remaining response data which may take a long time and cause this application to become unresponsive");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system which should pretty much complete the PR content wise.
   Also, `RowSetReaderJSONStreaming` now defines the `rsJsonSearchHeadEagerly ` symbol which gives control over whether to actively search for a `head` key on row set construction or only lazily when requesting it via `getResultVars`. Default is lazy.
   
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler. Note, that this name is ambiguous with `Severity` from shacl validation.
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class (Perhaps rename to `ErrorEvent`?)
   * the class `ErrorHandlers` with a `relay` method for how to relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONStreaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.
   
   Is there a place in the jena code base for for benchmarking components? Right now my test benchmark runner is part of the PR though its in the wrong package and needs a revision as to not use an external resource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system.
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class
   * the class `ErrorHandlers` with a `relay` method for how to relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   (Should I have overlooked existing features for this when I searched the code base for this then it should hopefully be easy to update)
   
   The `RowSetReaderJSONSteaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066051336


   Now that I opened [JENA-2309](https://issues.apache.org/jira/projects/JENA/issues/JENA-2309) I realized that actually I should slightly refactor the method `RowSetJSONStreaming.moveToNext` to build upon a `ClosableIterator<RsJsonElt>` which would expose the head, binding and boolean elements - akin to AsyncParser's low-lever API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066143980


   The aforementioned update is done. From my side the architecture is now complete.
   
   > With `IGNORE` etc,. it feel like this is adding something more like logging. Error handlers are called only exception circumstances where things are wrong in some way.
   
   The rationale is that this gives the possibility to decide on a per use-case basis on how severe certain violations should be considered. I think it was some legacy version of virtuoso that returned empty json for empty select results. The default severity in RowSetJSONStreaming is ERROR so such "ErrorEvents" will be relayed to an ErrorHandler's error method that will then typically raise an exception. With this update you can set the severity to WARNING which will log a warning or IGNORE which won't report anything.
   
   As another example, what to do if there are multiple 'results' keys in the json? Is it an error? Issue a warning? Ignore because 'things on the Web happened' and for some special use case it is fine that the optimistic parser just continues to emit bindings?
   
   You have to actively change properties in the context though to get this behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was called) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`. It feels wrong placing error handlers into the context rather than severity constants.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   * Drop the ValidationSettings class which removes the ability to handle severity of violations individually.  In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * Drop ValidationSettings and also drop the error handler and raise hard coded exceptions. So essentially removing the ability to configure validation. (or at least make it all private in RowSetJSONStreaming).
   
   What is your suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067876323


   The HTTP response must be finished. It is a feature of HTTP.  HTTP/1.1 is not a streaming protocol. HTTP/2 is much better but it must be used in the right way - overlaying HTTP/1.1 does not expose the two-flow of data in HTTP/2.
   
   Sometimes, in HTTP/1.1, the client can close the TCP connection and open a new one. It messes up connection polling. TCP connections are expensive. There are client-wide and server-wide problems that arise. We know, it's happened. Jena's own test suite used to have these issues.
   
   Closing the connection does not cause the client to clear-up at the point of close, nor does the server see a close to clear-up a that moment..  On unloaded machines, it can take 2 minutes; under load the Linux kernel delays reclaim close-waiting connections (it's a GC like task). 
   
   The effect is either end can run out of file descriptors. 
   
   In HTTP/2, the situation is both better and worse. Better because there is access to the chunking and flow of chunks but that is only if HTTP/2 specific code is written. Worse because there is only a single TCP connection per host pair which has multiple outstanding requests in-flight and can't be closed. It can lead to head-of-line blocking.
   
   See javadoc for `BodySubscribers#ofInputStream()`
   
   Jena needs reliability before performance. And it is the corner cases, not the main/normal usage.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067449956


   I added a few contrived test cases to test the validation and demonstrate the configuration.
   
   The streaming json implementation causes tests of `org.apache.jena.jdbc.results.AbstractResultSetTests`run from `jena-jdbc-driver-remote` to raise warnings. This is because the test suite expects certain tests to fail, however `QueryExecHTTP` checks connections for consumptions; which obviously no longer occurs due to the streaming nature.
   HttpClient's default behavior of consuming remaining data on close is a real pain: If an spo query fails midway then even with the streaming json implementation the http client might needlessly retrieve all remaining data. I once made a [PR to Apache VFS2](https://issues.apache.org/jira/browse/VFS-805#) for exactly this issue with apache's http client (which was accepted and by now I can confirm that it still works); not sure if the solution is applicable to java's native client but maybe it helps.
   
   ```java
   if (retainedConnection.read() != -1)
       Log.warn(this, "HTTP response not fully consumed, if HTTP Client is reusing connections (its default behaviour) then it will consume the remaining response data which may take a long time and cause this application to become unresponsive");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066051336


   Now that I opened [JENA-2309](https://issues.apache.org/jira/projects/JENA/issues/JENA-2309) I realized that it would be possible to slightly refactor the method `RowSetJSONStreaming.moveToNext` to build upon a `ClosableIterator<RsJsonElt>` which would expose the head, binding and boolean elements - akin to AsyncParser's low-lever API.
   
   However since the only element worth repeating is the binding, making that change would introduce extra overhead on that - so its making a nicer architecture that can account for very niche cases at the cost of extra computational overhead (wrapping each binding in an extra JsonRsElt event). Maybe its not worth it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071105172


   GSON is small and has an Apache License 2.0. In terms of adding a new dependency, it is as easy as it gets.
   
   titanium-json-ld uses jakarta.json - that's Eclipse Public License which is [category B](https://www.apache.org/legal/resolved.html#category-b). Odd that it isn't the Eclipse Distribution License which is BSD-ish. jena has already added the necessary text into the LICENSE in the distributions (zips). The shade plugin automatically includes all the licenses used by a combined jars.
   
   Jackson is AL2. It's quite a bit bigger.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1073283375


   The implementation certainly streams. I did a test of reading 50 million results with Fuseki and client in one JVM.
   
   The CPU was 130% so true parallelism, and the log messages from Fuseki came out at the expected points. 
   The heap size was limited to 2Gbytes but the results are much larger (>8G) and there was heap issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1065913419


   There is now a big update on the validation system of the streaming json system.
   For a cleaner integration I have added a little extension to the riot system. Next to the `ErrorHandler` class there is now
   * the class `Severity` with constants that match the methods of ErrorHandler
   * the class `ErrorHandlerEvent` which captures the parameters of the ErrorHandler methods in one class
   * the class `ErrorHandlers` with a `relay` method how te relay a `ErrorHandlerEvent` to an `ErrorHandler`
   
   The `RowSetReaderJSONSteaming` validation system has now several settings with severity levels that relay violation events on that infrastructure.
   
   As for further testing of the default settings which should capture the existing behavior, I still to plan to write a little script that randomly modifies a template json result set and compares the parsing with the non streaming approach.
   
   Testing all possible combinations of validation settings would be an overkill though.
   
   The comments above should have all been adressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066097319


   Suggestion: prepare something that can go into the code base soon and then, as another PR, improve it.
   
   Otherwise, a complicated thing will not have had time to stabilize for 4.5.0.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1062198513


   I just added the first simple head-after-results test case (and fixed the issues that revealed).
   I will add a more sophisticated example tomorrow, but overall
   the code should now be ready to be tried out by any one feeling bold :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066143980


   The aforementioned update is done. From my side the architecture is now complete.
   
   > With `IGNORE` etc,. it feel like this is adding something more like logging. Error handlers are called only exception circumstances where things are wrong in some way.
   
   The rationale is that this gives the possibility to decide on a per use-case basis on how severe certain violations should be considered. I think it was some legacy version of virtuoso that returned empty json for empty select results. The default severity in RowSetJSONStreaming is ERROR so such "ErrorEvents" will be relayed to an ErrorHandler's error method that will then typically raise an exception. With this update you can set the severity to WARNING which will log a warning or IGNORE which won't report anything.
   
   As another example, what to do if there are multiple 'results' elements in the json? Is it an error? Issue a warning? Ignore because 'things on the Web happen' and the parser will just continue to emit bindings?
   
   You have to actively change properties in the context though to get this behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1066838670


   So what is your suggestion w.r.t the ValidationSettings class?
   
   At present you can configure the severity of violations individually, very much like shacl violations.
   
   ```java
       public static class ValidationSettings implements Serializable {
           /**
            * What to do if the JSON is effectively 'empty', i.e. if neither
            * the head nor the results key were present.
            * Unexpected elements are captured by onUnexpectedJsonElement.
            * e.g. returned older version of virtuoso open source
            * Mitigation is to assume an empty set of bindings.
            */
           protected Severity emptyJsonSeverity = Severity.ERROR;
   
           /** What to do if no head was encountered. We may have already
            * optimistically streamed all the bindings in anticipation of an
            * eventual head. */
           protected Severity missingHeadSeverity = Severity.ERROR;
           ...
   }
   ```
   
   The way it is used is like this: I check for the violation and based on its configured severity I forward some message to one of the ErrorHandler's methods. For example, the missing head violation is only revealed once the whole stream is consumed. However, because processing already took place (unless getResultVars was caled) so you may wish to only warn about that issue and move on. The default severity is ERROR so a message would get relayed to the single configured error handler's error() method which in turn would raise an exception.
   
   ```java
       public static void validateCompleted(RowSetJSONStreaming rs, ErrorHandler errorHandler, ValidationSettings settings) {        // Missing head
           if (rs.getKHeadCount() == 0) {
               ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
                   new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
           }
       }
   ```
   
   
    Without the relay mechanism one would either have to 
   * specify one error handler per violation, such as `ARQ.context().copy().put(rsJsonSeverityMissingHead, someErrorHandler)`.
   * or drop the ability to handle severity of violations and treat violations with a fixed severity. In this case there would only be one error handler with an all-or-nothing policy - accept or reject all violations because they cannot be discriminated.
   * The relay method could also be seen as something RowSetJSONStreaming specific and be moved to that class as a private member.
   
   Or what do you have a better suggestion?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067449956


   I added a few contrived test cases to test the validation and demonstrate the configuration.
   
   The streaming json implementation causes tests of `org.apache.jena.jdbc.results.AbstractResultSetTests`run from `jena-jdbc-driver-remote` to raise warnings. This is because the test suite expects certain tests to fail, however `QueryExecHTTP` checks connections for consumptions; which obviously no longer occurs due to the streaming nature.
   HttpClient's default behavior of consuming remaining data on close is a real pain: If an spo query fails midway then even with the streaming json implementation the http client might needlessly retrieve all remaining data. I once made a [PR to Apache VFS2](https://issues.apache.org/jira/browse/VFS-805#) for exactly this issue with apache's http client (which was accepted and by now I can confirm that it still works); not sure if the solution is applicable to java's native client but maybe it helps.
   
   QueryExecHTTP raises the warning in this snippet:
   ```java
   if (retainedConnection.read() != -1)
       Log.warn(this, "HTTP response not fully consumed, if HTTP Client is reusing connections (its default behaviour) then it will consume the remaining response data which may take a long time and cause this application to become unresponsive");
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067937613


   > I started looking into this after all - so please don't (re)review the latest changes yet.
   
   > What is your suggestion?
   
   If this is ready now, I will review and make a proposal.
   
   I'm sorry if discussions seems very cautious but this is a key feature of the Jena. A release is a significant commitment; a big difference to an SaaS platform. Adding APIs is a significant commitment. It much harder to remove or evolve features than add them. 
   
   We have a lot of users, and that includes many who are data experts, not systems developers.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings would be displayed when streaming xml.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1067979907


   Hi Andy, the current state is ready for review. Without your feedback it doesn't make sense for me to add more tests for features which might be at risk anyway depending on your verdict.
   
   > I'm sorry if discussions seems very cautious but this is a key feature of the Jena.
   
   No worries, I tried to break the architecture down into simple and reasonable components so that all the current concerns and maybe future ones can hopefully be quickly addressed.
   
   As for the mentioned issue with `AbstractResultSetTests`: The default behavior of HttpClient is to consume the remaining input when .close() is called in its input stream. Under this perspective the warning is redundant because in this case it is expected behavior. I don't think that any special logic would belong to `RowSetJSONStreaming`. I would suspect that the same warnings would be displayed when streaming xml.
   
   Needs more thought how to handle this properly - maybe QueryExecHttp should have a `warnOnRemainingData` flag where this class itself checks for (a configurable) amount of data before emitting the warning and calling close on the underlying HttpClient-backed InputStream which may then either consume the remaining data or cut the connection.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] Aklakan edited a comment on pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
Aklakan edited a comment on pull request #1218:
URL: https://github.com/apache/jena/pull/1218#issuecomment-1071056504


   Note that in the PR, the `RowSetJSONStreaming` implementation (which does accumulation and validation) is independent of gson. Well, there are gson-specific static util functions (which might as well be made all private for the time being), but those methods are only used to implement `IteratorRsJSON`'s 'encoder' (there is definitely a better name for it; such as `DeserializationStrategy`) - i.e. the mapper of elements from the json reader to domain events.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs commented on a change in pull request #1218: Jena-2302: Streaming implementation of RowSetReaderJSON

Posted by GitBox <gi...@apache.org>.
afs commented on a change in pull request #1218:
URL: https://github.com/apache/jena/pull/1218#discussion_r830657561



##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/IteratorRsJSON.java
##########
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.riot.rowset.rw;
+
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBindings;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBoolean;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kHead;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kResults;
+
+import java.io.IOException;
+
+import org.apache.jena.atlas.iterator.IteratorSlotted;
+import org.apache.jena.sparql.resultset.ResultSetException;
+
+import com.google.gson.Gson;
+import com.google.gson.stream.JsonReader;
+
+/**
+ * An iterator that reads sparql result set domain elements from a
+ * underlying gson JsonReader.
+ *
+ * This iterator is 'dumb': it neither validates nor keeps statistics.
+ *
+ * For this purpose the iterator delegates to {@link RsJsonEltEncoder} which
+ * is responsible for creating instances of type E from the state of the
+ * json stream.
+ *
+ * @param <E> The type of elements to yield by this iterator
+ */
+public class IteratorRsJSON<E>
+    extends IteratorSlotted<E>
+{
+    /**
+     * Bridge between gson json elements and domain element objects
+     * (which are free to (not) abstract from gson)
+     *
+     * The most basic implementation could turn each event into a gson JsonElement
+     * by parameterizing E with JsonElement and having each method
+     * return gson.fromJson(reader).
+     *
+     * @param <E> The uniform element type to create from the raw events
+     */
+    public interface RsJsonEltEncoder<E> {
+        E newHeadElt   (Gson gson, JsonReader reader) throws IOException;
+        E newBooleanElt(Gson gson, JsonReader reader) throws IOException;
+        E newResultsElt(Gson gson, JsonReader reader) throws IOException;
+        E newBindingElt(Gson gson, JsonReader reader) throws IOException;
+
+        // May just invoke reader.skipValue() and return null
+        E newUnknownElt(Gson gson, JsonReader reader)  throws IOException;
+    }
+
+    /** Parsing state; i.e. where we are in the json document */
+    public enum ParserState {

Review comment:
       Does this need to be public?

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/RowSetJSONStreaming.java
##########
@@ -0,0 +1,760 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.riot.rowset.rw;
+
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBnode;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBoolean;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kDatatype;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kHead;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kLiteral;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kObject;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kObjectAlt;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kPredicate;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kPredicateAlt;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kProperty;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kResults;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kStatement;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kSubject;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kSubjectAlt;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kTriple;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kType;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kTypedLiteral;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kUri;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kValue;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kVars;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kXmlLang;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.Serializable;
+import java.lang.reflect.Type;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map.Entry;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.function.Supplier;
+
+import org.apache.jena.atlas.data.DataBag;
+import org.apache.jena.atlas.iterator.IteratorCloseable;
+import org.apache.jena.atlas.iterator.IteratorSlotted;
+import org.apache.jena.graph.Node;
+import org.apache.jena.graph.NodeFactory;
+import org.apache.jena.riot.lang.LabelToNode;
+import org.apache.jena.riot.rowset.rw.IteratorRsJSON.RsJsonEltEncoder;
+import org.apache.jena.riot.system.ErrorEvent;
+import org.apache.jena.riot.system.ErrorHandler;
+import org.apache.jena.riot.system.ErrorHandlers;
+import org.apache.jena.riot.system.Severity;
+import org.apache.jena.sparql.core.Var;
+import org.apache.jena.sparql.engine.binding.Binding;
+import org.apache.jena.sparql.engine.binding.BindingBuilder;
+import org.apache.jena.sparql.engine.binding.BindingFactory;
+import org.apache.jena.sparql.exec.RowSet;
+import org.apache.jena.sparql.exec.RowSetBuffered;
+import org.apache.jena.sparql.resultset.ResultSetException;
+import org.apache.jena.sparql.util.NodeFactoryExtra;
+import org.apache.jena.vocabulary.RDF;
+
+import com.google.gson.Gson;
+import com.google.gson.JsonElement;
+import com.google.gson.JsonObject;
+import com.google.gson.reflect.TypeToken;
+import com.google.gson.stream.JsonReader;
+
+/**
+ * Streaming RowSet implementation for application/sparql-results+json
+ * The {@link #getResultVars()} will return null as long as the header has not
+ * been consumed from the underlying stream.
+ *
+ * Use {@link RowSetBuffered} to modify the behavior such that {@link #getResultVars()}
+ * immediately consumes the underlying stream until the header is read,
+ * thereby buffering any encountered bindings for replay.
+ */
+public class RowSetJSONStreaming<E>
+    extends IteratorSlotted<Binding>
+    implements RowSet
+{
+    /* Construction -------------------------------------------------------- */
+
+    public static RowSetBuffered<RowSetJSONStreaming<?>> createBuffered(
+            InputStream in, LabelToNode labelMap,
+            Supplier<DataBag<Binding>> bufferFactory, ValidationSettings validationSettings,
+            ErrorHandler errorHandler) {
+        return new RowSetBuffered<>(createUnbuffered(in, labelMap, validationSettings, errorHandler), bufferFactory);
+    }
+
+    public static RowSetJSONStreaming<?> createUnbuffered(InputStream in, LabelToNode labelMap, ValidationSettings validationSettings,
+            ErrorHandler errorHandler) {
+
+        Gson gson = new Gson();
+        JsonReader reader = gson.newJsonReader(new InputStreamReader(in, StandardCharsets.UTF_8));
+
+        // Set up handling of unexpected json elements
+        UnexpectedJsonEltHandler unexpectedJsonHandler = (_gson, _reader) -> {
+            ErrorHandlers.relay(errorHandler, validationSettings.unexpectedJsonElementSeverity, () ->
+                    new ErrorEvent("Encountered unexpected json element at path " + reader.getPath()));
+            reader.skipValue();
+            return null;
+        };
+
+        // Experiments with an decoder/encoder that returned Bindings directly without
+        // wrapping them as RsJsonEltDft did not show a significant performance difference
+        // that would justify supporting alternative encoder/decoder pairs.
+        RsJsonEltEncoder<RsJsonEltDft> eltEncoder = new RsJsonEltEncoderDft(labelMap, null, unexpectedJsonHandler);
+        RsJsonEltDecoder<RsJsonEltDft> eltDecoder = RsJsonEltDecoderDft.INSTANCE;
+        IteratorRsJSON<RsJsonEltDft> eltIt = new IteratorRsJSON<>(gson, reader, eltEncoder);
+
+        // Wrap this up as a validating RowSet
+        RowSetJSONStreaming<?> result = new RowSetJSONStreaming<>(eltIt, eltDecoder, 0l, validationSettings, errorHandler);
+        return result;
+    }
+
+    /* Domain adapter / bridge for IteratorRsJson  ------------------------- */
+
+    public enum RsJsonEltType {
+        UNKNOWN,
+        HEAD,
+        BOOLEAN,
+        RESULTS, // The 'results' event exists as a marker for validation
+        BINDING
+    }
+
+    public static class RsJsonEltDft {
+        protected RsJsonEltType type;
+        protected List<Var> head;
+        protected Boolean askResult;
+        protected Binding binding;
+
+        // Nullable json element instance of the underlying json API
+        protected Object unknownJsonElement;
+
+        public RsJsonEltDft(RsJsonEltType type) {
+            this.type = type;
+        }
+
+        public RsJsonEltDft(List<Var> head) {
+            this.type = RsJsonEltType.HEAD;
+            this.head = head;
+        }
+
+        public RsJsonEltDft(Boolean askResult) {
+            this.type = RsJsonEltType.BOOLEAN;
+            this.askResult = askResult;
+        }
+
+        public RsJsonEltDft(Binding binding) {
+            this.type = RsJsonEltType.BINDING;
+            this.binding = binding;
+        }
+
+        public RsJsonEltDft(Object jsonElement) {
+            this.type = RsJsonEltType.UNKNOWN;
+            this.unknownJsonElement = jsonElement;
+        }
+
+        public RsJsonEltType getType() {
+            return type;
+        }
+
+        public List<Var> getHead() {
+            return head;
+        }
+
+        public Boolean getAskResult() {
+            return askResult;
+        }
+
+        public Binding getBinding() {
+            return binding;
+        }
+
+        public Object getUnknownJsonElement() {
+            return unknownJsonElement;
+        }
+
+        @Override
+        public String toString() {
+            return "RsJsonEltDft [type=" + type + ", head=" + head + ", askResult=" + askResult + ", binding=" + binding
+                    + ", unknownJsonElement=" + unknownJsonElement + "]";
+        }
+    }
+
+    /* Core implementation ------------------------------------------------- */
+
+    protected IteratorCloseable<E> eltIterator;
+    protected RsJsonEltDecoder<? super E> eltDecoder;
+    protected long rowNumber;
+
+    protected ValidationSettings validationSettings;
+    protected ErrorHandler errorHandler;
+
+    protected TentativeValue<List<Var>> resultVars = null;
+    protected TentativeValue<Boolean> askResult = null;
+
+    protected int kHeadCount = 0;
+    protected int kResultsCount = 0;
+    protected int kBooleanCount = 0;
+    protected int unknownJsonCount = 0;
+
+    public RowSetJSONStreaming(
+            IteratorCloseable<E> rsJsonIterator,
+            RsJsonEltDecoder<? super E> eltDecoder,
+            long rowNumber,
+            ValidationSettings validationSettings, ErrorHandler errorHandler) {
+        super();
+
+        this.eltIterator = rsJsonIterator;
+        this.eltDecoder = eltDecoder;
+        this.rowNumber = rowNumber;
+
+        this.validationSettings = validationSettings;
+        this.errorHandler = errorHandler;
+
+        this.resultVars = new TentativeValue<>();
+        this.askResult = new TentativeValue<>();
+    }
+
+    @Override
+    protected Binding moveToNext() {
+        try {
+            Binding result = computeNextActual();
+            return result;
+        } catch (Throwable e) {
+            // Re-wrap any exception
+            throw new ResultSetException(e.getMessage(), e);
+        }
+    }
+
+    protected Binding computeNextActual() throws IOException {
+        boolean updateAccepted;
+        Binding result = null;
+
+        E elt = null;
+        RsJsonEltType type = null;
+
+        outer: while (eltIterator.hasNext()) {
+            elt = eltIterator.next();
+            type = eltDecoder.getType(elt);
+
+            switch (type) {
+            case HEAD:
+                ++kHeadCount;
+                List<Var> rsv = eltDecoder.getAsHead(elt);
+                updateAccepted = resultVars.updateValue(rsv);
+                if (!updateAccepted) {
+                    ErrorHandlers.relay(errorHandler, validationSettings.getInvalidatedHeadSeverity(),
+                            new ErrorEvent(String.format(
+                                ". Prior value for headVars was %s but got superseded with %s", resultVars.getValue(), rsv)));
+                }
+                validate(this, errorHandler, validationSettings);
+                continue;
+
+            case BOOLEAN:
+                ++kBooleanCount;
+                Boolean b = eltDecoder.getAsBoolean(elt);
+                updateAccepted = askResult.updateValue(b);
+                if (!updateAccepted) {
+                    ErrorHandlers.relay(errorHandler, validationSettings.getInvalidatedHeadSeverity(),
+                            new ErrorEvent(String.format(
+                                ". Prior value for boolean result was %s but got supersesed %s", askResult.getValue(), b)));
+                }
+                validate(this, errorHandler, validationSettings);
+                continue;
+
+            case RESULTS:
+                ++kResultsCount;
+                validate(this, errorHandler, validationSettings);
+                continue;
+
+            case BINDING:
+                ++rowNumber;
+                result = eltDecoder.getAsBinding(elt);
+                break outer;
+
+            case UNKNOWN:
+                ++unknownJsonCount;
+                continue;
+            }
+        }
+
+        // Once we return any tentative value becomes final
+        resultVars.makeFinal();
+        askResult.makeFinal();
+
+        if (result == null && !eltIterator.hasNext()) {
+            validateCompleted(this, errorHandler, validationSettings);
+        }
+
+        return result;
+    }
+
+    @Override
+    protected boolean hasMore() {
+        return true;
+    }
+
+    @Override
+    protected void closeIterator() {
+        eltIterator.close();
+    }
+
+
+    /* Statistics ---------------------------------------------------------- */
+
+    @Override
+    public long getRowNumber() {
+        return rowNumber;
+    }
+
+    boolean hasResultVars() {
+        return resultVars.hasValue();
+    }
+
+    @Override
+    public List<Var> getResultVars() {
+        return resultVars.getValue();
+    }
+
+    public boolean hasAskResult() {
+        return askResult.hasValue();
+    }
+
+    public Boolean getAskResult() {
+        return askResult.getValue();
+    }
+
+    public int getKHeadCount() {
+        return kHeadCount;
+    }
+
+    public int getKBooleanCount() {
+        return kBooleanCount;
+    }
+
+    public int getKResultsCount() {
+        return kResultsCount;
+    }
+
+    public int getUnknownJsonCount() {
+        return unknownJsonCount;
+    }
+
+    /* Parsing (gson-based) ------------------------------------------------ */
+
+    /** Parse the vars element from head - may return null */
+    public static List<Var> parseHeadVars(Gson gson, JsonReader reader) throws IOException {
+        List<Var> result = null;
+        Type stringListType = new TypeToken<List<String>>() {}.getType();
+        JsonObject headJson = gson.fromJson(reader, JsonObject.class);
+        JsonElement varsJson = headJson.get(kVars);
+        if (varsJson != null) {
+            List<String> varNames = gson.fromJson(varsJson, stringListType);
+            result = Var.varList(varNames);
+        }
+        return result;
+    }
+
+    public static Binding parseBinding(
+            Gson gson, JsonReader reader, LabelToNode labelMap,
+            Function<JsonObject, Node> onUnknownRdfTermType) throws IOException {
+        JsonObject obj = gson.fromJson(reader, JsonObject.class);
+
+        BindingBuilder bb = BindingFactory.builder();
+
+        for (Entry<String, JsonElement> e : obj.entrySet()) {
+            Var v = Var.alloc(e.getKey());
+            JsonElement nodeElt = e.getValue();
+
+            Node node = parseOneTerm(nodeElt, labelMap, onUnknownRdfTermType);
+            bb.add(v, node);
+        }
+
+        return bb.build();
+    }
+
+    public static Node parseOneTerm(JsonElement jsonElt, LabelToNode labelMap, Function<JsonObject, Node> onUnknownRdfTermType) {
+
+        if (jsonElt == null) {
+            throw new ResultSetException("Expected a json object for an RDF term but got null");
+        }
+
+        JsonObject term = jsonElt.getAsJsonObject();
+
+        Node result;
+        String type = expectNonNull(term, kType).getAsString();
+        JsonElement valueJson = expectNonNull(term, kValue);
+        String valueStr;
+        switch (type) {
+        case kUri:
+            valueStr = valueJson.getAsString();
+            result = NodeFactory.createURI(valueStr);
+            break;
+        case kTypedLiteral: /* Legacy */
+        case kLiteral:
+            valueStr = valueJson.getAsString();
+            JsonElement langJson = term.get(kXmlLang);
+            JsonElement dtJson = term.get(kDatatype);
+
+            String lang = langJson == null ? null : langJson.getAsString();
+            String dtStr = dtJson == null ? null : dtJson.getAsString();
+
+            if ( lang != null ) {
+                // Strictly, xml:lang=... and datatype=rdf:langString is wrong
+                // (the datatype should be absent)
+                // The RDF specs recommend omitting the datatype. They did
+                // however come after the SPARQL 1.1 docs
+                // it's more of a "SHOULD" than a "MUST".
+                // datatype=xsd:string is also unnecessary.
+                if ( dtStr != null && !dtStr.equals(RDF.dtLangString.getURI()) ) {
+                    // Must agree.
+                    throw new ResultSetException("Both language and datatype defined, datatype is not rdf:langString:\n" + term);
+                }
+            }
+
+            result = NodeFactoryExtra.createLiteralNode(valueStr, lang, dtStr);
+            break;
+        case kBnode:
+            valueStr = valueJson.getAsString();
+            result = labelMap.get(null, valueStr);
+            break;
+        case kStatement:
+        case kTriple:
+            JsonObject tripleJson = valueJson.getAsJsonObject();
+
+            JsonElement js = expectOneKey(tripleJson, kSubject, kSubjectAlt);
+            JsonElement jp = expectOneKey(tripleJson, kPredicate, kProperty, kPredicateAlt);
+            JsonElement jo = expectOneKey(tripleJson, kObject, kObjectAlt);
+
+            Node s = parseOneTerm(js, labelMap, onUnknownRdfTermType);
+            Node p = parseOneTerm(jp, labelMap, onUnknownRdfTermType);
+            Node o = parseOneTerm(jo, labelMap, onUnknownRdfTermType);
+
+            result = NodeFactory.createTripleNode(s, p, o);
+            break;
+        default:
+            if (onUnknownRdfTermType != null) {
+                result = onUnknownRdfTermType.apply(term);
+                if (result == null) {
+                    throw new ResultSetException("Custom handler returned null for unknown rdf term type '" + type + "'");
+                }
+            } else {
+                throw new ResultSetException("Object key not recognized as valid for an RDF term: " + term);
+            }
+            break;
+        }
+
+        return result;
+    }
+
+    public static JsonElement expectNonNull(JsonObject json, String key) {
+        JsonElement v = json.get(key);
+        if ( v == null )
+            throw new ResultSetException("Unexpected null value for key: " + key);
+
+        return v;
+    }
+
+    public static JsonElement expectOneKey(JsonObject json, String ...keys) {
+        JsonElement result = null;
+
+        for (String key : keys) {
+            JsonElement tmp = json.get(key);
+
+            if (tmp != null) {
+                if (result != null) {
+                    throw new ResultSetException("More than one key out of " + Arrays.asList(keys));
+                }
+
+                result = tmp;
+            }
+        }
+
+        if (result == null) {
+            throw new ResultSetException("One or more of the required keys " + Arrays.asList(keys) + " was not found");
+        }
+
+        return result;
+    }
+
+    /* Validation ---------------------------------------------------------- */
+
+    /** Runtime validation of the current state of a streaming json row set */
+    public static void validate(RowSetJSONStreaming<?> rs, ErrorHandler errorHandler, ValidationSettings settings) {
+        if (rs.hasAskResult() && rs.getKResultsCount() > 0) {
+            ErrorHandlers.relay(errorHandler, settings.getMixedResultsSeverity(), () ->
+                new ErrorEvent("Encountered bindings as well as boolean result"));
+        }
+
+        if (rs.getKResultsCount() > 1) {
+            ErrorHandlers.relay(errorHandler, settings.getInvalidatedResultsSeverity(), () ->
+                new ErrorEvent("Multiple 'results' keys encountered"));
+        }
+    }
+
+    /** Check a completed streaming json row set for inconsistencies.
+     *  Specifically checks for missing result value and missing head */
+    public static void validateCompleted(RowSetJSONStreaming<?> rs, ErrorHandler errorHandler, ValidationSettings settings) {
+        // Missing result (neither 'results' nor 'boolean' seen)
+        if (rs.getKResultsCount() == 0 && rs.getKBooleanCount() == 0) {
+            ErrorHandlers.relay(errorHandler, settings.getEmptyJsonSeverity(),
+                new ErrorEvent(String.format("Either '%s' or '%s' is mandatory; neither seen", kResults, kBoolean)));
+        }
+
+        // Missing head
+        if (rs.getKHeadCount() == 0) {
+            ErrorHandlers.relay(errorHandler, settings.getMissingHeadSeverity(),
+                new ErrorEvent(String.format("Mandory key '%s' not seen", kHead)));
+        }
+    }
+
+    /** Validation settings class */
+    public static class ValidationSettings implements Serializable {
+        private static final long serialVersionUID = 1L;
+
+        /**
+         * What to do if the JSON is effectively 'empty', i.e. if neither
+         * the head nor the results key were present.
+         * Unexpected elements are captured by onUnexpectedJsonElement.
+         * e.g. returned older version of virtuoso open source
+         * Mitigation is to assume an empty set of bindings.
+         */
+        protected Severity emptyJsonSeverity = Severity.ERROR;
+
+        /** What to do if no head was encountered. We may have already
+         * optimistically streamed all the bindings in anticipation of an
+         * eventual head. */
+        protected Severity missingHeadSeverity = Severity.ERROR;
+
+        /** What to do if there is a repeated 'results' key
+         * At this stage we have already optimisticaly streamed results which
+         * under JSON semantics would have been superseded by this newly
+         * encountered key. */
+        protected Severity invalidatedResultsSeverity = Severity.ERROR;
+
+        /** What to do if there is a repeated 'head' <b>whole value does not match the
+         * prior value</b>. Repeated heads with the same value are valid.
+         * Any possibly prior reported head would have been superseded by this newly
+         * encountered key.
+         * Should parsing continue then only the first encountered value will remain active.
+         */
+        protected Severity invalidatedHeadSeverity = Severity.FATAL;
+
+        /**
+         * What to do if the JSON contains both a boolean result and bindings
+         * Mitigation is to assume bindings and ignore the boolean result
+         */
+        protected Severity mixedResultsSeverity = Severity.FATAL;
+
+        /** What to do if we encounter an unexpected JSON key */
+        protected Severity unexpectedJsonElementSeverity = Severity.IGNORE;
+
+        public Severity getEmptyJsonSeverity() {
+            return emptyJsonSeverity;
+        }
+
+        public void setEmptyJsonSeverity(Severity severity) {
+            this.emptyJsonSeverity = severity;
+        }
+
+        public Severity getInvalidatedHeadSeverity() {
+            return invalidatedHeadSeverity;
+        }
+
+        public void setInvalidatedHeadSeverity(Severity severity) {
+            this.invalidatedHeadSeverity = severity;
+        }
+
+        public Severity getInvalidatedResultsSeverity() {
+            return invalidatedResultsSeverity;
+        }
+
+        public void setInvalidatedResultsSeverity(Severity severity) {
+            this.invalidatedResultsSeverity = severity;
+        }
+
+        public Severity getMissingHeadSeverity() {
+            return missingHeadSeverity;
+        }
+
+        public void setMissingHeadSeverity(Severity severity) {
+            this.missingHeadSeverity = severity;
+        }
+
+        public Severity getMixedResultsSeverity() {
+            return mixedResultsSeverity;
+        }
+
+        public void setMixedResultsSeverity(Severity severity) {
+            this.mixedResultsSeverity = severity;
+        }
+
+        public Severity getUnexpectedJsonElementSeverity() {
+            return unexpectedJsonElementSeverity;
+        }
+
+        public void setUnexpectedJsonElementSeverity(Severity severity) {
+            this.unexpectedJsonElementSeverity = severity;
+        }
+    }
+
+    /* Internal / Utility -------------------------------------------------- */
+
+    /** Interface for optionally emitting the json elements
+     * Typically this only calls jsonReader.skipValue() as to not spend
+     * efforts on parsing */
+    @FunctionalInterface
+    public interface UnexpectedJsonEltHandler {
+        JsonElement apply(Gson gson, JsonReader reader) throws IOException;
+    }
+
+    /** A decoder can extract an instance of E's {@link RsJsonEltType}
+     *  an provides methods to extract the corresponding value.
+     *  It decouples {@link RowSetJSONStreaming} from gson.
+     */
+    public static interface RsJsonEltDecoder<E> {
+        RsJsonEltType getType(E elt);
+        Binding getAsBinding(E elt);
+        List<Var> getAsHead(E elt);
+        Boolean getAsBoolean(E elt);
+    }
+
+    /** Decoder for extraction of rs-json domain objects from {@link RsJsonEltDft} */
+    public static class RsJsonEltDecoderDft
+        implements RsJsonEltDecoder<RsJsonEltDft> {
+
+        public static final RsJsonEltDecoderDft INSTANCE = new RsJsonEltDecoderDft();
+
+        @Override public RsJsonEltType getType(RsJsonEltDft elt) { return elt.getType(); }
+        @Override public Binding getAsBinding(RsJsonEltDft elt)  { return elt.getBinding(); }
+        @Override public List<Var> getAsHead(RsJsonEltDft elt)   { return  elt.getHead(); }
+        @Override public Boolean getAsBoolean(RsJsonEltDft elt)  { return elt.getAskResult(); }
+    }
+
+    /**
+     * Json encoder that wraps each message uniformly as a {@link RsJsonEltDft}.
+     */
+    public static class RsJsonEltEncoderDft
+        implements RsJsonEltEncoder<RsJsonEltDft> {
+
+        protected LabelToNode labelMap;
+        protected Function<JsonObject, Node> unknownRdfTermTypeHandler;
+        protected UnexpectedJsonEltHandler unexpectedJsonHandler;
+
+        public RsJsonEltEncoderDft(LabelToNode labelMap,
+                Function<JsonObject, Node> unknownRdfTermTypeHandler,
+                UnexpectedJsonEltHandler unexpectedJsonHandler) {
+            super();
+            this.labelMap = labelMap;
+            this.unknownRdfTermTypeHandler = unknownRdfTermTypeHandler;
+            this.unexpectedJsonHandler = unexpectedJsonHandler;
+        }
+
+        @Override
+        public RsJsonEltDft newHeadElt(Gson gson, JsonReader reader) throws IOException {
+            List<Var> vars = parseHeadVars(gson, reader);
+            return new RsJsonEltDft(vars);
+        }
+
+        @Override
+        public RsJsonEltDft newBooleanElt(Gson gson, JsonReader reader) throws IOException {
+            Boolean b = reader.nextBoolean();
+            return new RsJsonEltDft(b);
+        }
+
+        @Override
+        public RsJsonEltDft newBindingElt(Gson gson, JsonReader reader) throws IOException {
+            Binding binding = parseBinding(gson, reader, labelMap, unknownRdfTermTypeHandler);
+            return new RsJsonEltDft(binding);
+        }
+
+        @Override
+        public RsJsonEltDft newResultsElt(Gson gson, JsonReader reader) throws IOException {
+            return new RsJsonEltDft(RsJsonEltType.RESULTS);
+        }
+
+        @Override
+        public RsJsonEltDft newUnknownElt(Gson gson, JsonReader reader) throws IOException {
+            JsonElement jsonElement = unexpectedJsonHandler == null
+                    ? null
+                    : unexpectedJsonHandler.apply(gson, reader);
+            return new RsJsonEltDft(jsonElement);
+        }
+    }
+
+    /**
+     * Internal helper class used for valiaditon.

Review comment:
       ```suggestion
        * Internal helper class used for validation.
   ```

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/IteratorRsJSON.java
##########
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.riot.rowset.rw;
+
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBindings;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBoolean;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kHead;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kResults;
+
+import java.io.IOException;
+
+import org.apache.jena.atlas.iterator.IteratorSlotted;
+import org.apache.jena.sparql.resultset.ResultSetException;
+
+import com.google.gson.Gson;
+import com.google.gson.stream.JsonReader;
+
+/**
+ * An iterator that reads sparql result set domain elements from a
+ * underlying gson JsonReader.
+ *
+ * This iterator is 'dumb': it neither validates nor keeps statistics.
+ *
+ * For this purpose the iterator delegates to {@link RsJsonEltEncoder} which
+ * is responsible for creating instances of type E from the state of the
+ * json stream.
+ *
+ * @param <E> The type of elements to yield by this iterator
+ */
+public class IteratorRsJSON<E>
+    extends IteratorSlotted<E>
+{
+    /**
+     * Bridge between gson json elements and domain element objects
+     * (which are free to (not) abstract from gson)
+     *
+     * The most basic implementation could turn each event into a gson JsonElement
+     * by parameterizing E with JsonElement and having each method
+     * return gson.fromJson(reader).
+     *
+     * @param <E> The uniform element type to create from the raw events
+     */
+    public interface RsJsonEltEncoder<E> {

Review comment:
       Maybe pull up and put in own file.

##########
File path: jena-arq/src/main/java/org/apache/jena/riot/rowset/rw/IteratorRsJSON.java
##########
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.jena.riot.rowset.rw;
+
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBindings;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kBoolean;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kHead;
+import static org.apache.jena.riot.rowset.rw.JSONResultsKW.kResults;
+
+import java.io.IOException;
+
+import org.apache.jena.atlas.iterator.IteratorSlotted;
+import org.apache.jena.sparql.resultset.ResultSetException;
+
+import com.google.gson.Gson;
+import com.google.gson.stream.JsonReader;
+
+/**
+ * An iterator that reads sparql result set domain elements from a
+ * underlying gson JsonReader.
+ *
+ * This iterator is 'dumb': it neither validates nor keeps statistics.
+ *
+ * For this purpose the iterator delegates to {@link RsJsonEltEncoder} which
+ * is responsible for creating instances of type E from the state of the
+ * json stream.
+ *
+ * @param <E> The type of elements to yield by this iterator
+ */
+public class IteratorRsJSON<E>
+    extends IteratorSlotted<E>
+{
+    /**
+     * Bridge between gson json elements and domain element objects
+     * (which are free to (not) abstract from gson)
+     *
+     * The most basic implementation could turn each event into a gson JsonElement
+     * by parameterizing E with JsonElement and having each method
+     * return gson.fromJson(reader).
+     *
+     * @param <E> The uniform element type to create from the raw events
+     */
+    public interface RsJsonEltEncoder<E> {
+        E newHeadElt   (Gson gson, JsonReader reader) throws IOException;
+        E newBooleanElt(Gson gson, JsonReader reader) throws IOException;
+        E newResultsElt(Gson gson, JsonReader reader) throws IOException;
+        E newBindingElt(Gson gson, JsonReader reader) throws IOException;
+
+        // May just invoke reader.skipValue() and return null
+        E newUnknownElt(Gson gson, JsonReader reader)  throws IOException;
+    }
+
+    /** Parsing state; i.e. where we are in the json document */
+    public enum ParserState {
+        INIT,
+        ROOT,
+        RESULTS,
+        BINDINGS,
+        DONE
+    }
+
+    protected ParserState parserState;
+    protected RsJsonEltEncoder<E> eltEncoder;
+
+    protected Gson gson;
+    protected JsonReader reader;
+
+    public IteratorRsJSON(Gson gson, JsonReader jsonReader, RsJsonEltEncoder<E> eltEncoder) {
+        this(gson, jsonReader, eltEncoder, ParserState.INIT);
+    }
+
+    /**
+     * Constructor that allows setting the initial parser state such as
+     * when starting to parse in a hadoop input split.
+     */
+    public IteratorRsJSON(Gson gson, JsonReader jsonReader, RsJsonEltEncoder<E> eltEncoder, ParserState parserState) {
+        this.gson = gson;
+        this.reader = jsonReader;
+        this.eltEncoder = eltEncoder;
+        this.parserState = parserState;
+    }
+
+    @Override
+    protected E moveToNext() {
+        E result;
+        try {
+            result = computeNextActual();
+        } catch (Throwable e) {
+            // Re-wrap any exception
+            throw new ResultSetException(e.getMessage(), e);
+        }
+        return result;
+    }
+
+    protected E computeNextActual() throws IOException {
+        E result;
+        outer: while (true) {
+            switch (parserState) {
+            case INIT:
+                reader.beginObject();
+                parserState = ParserState.ROOT;
+                continue outer;
+
+            case ROOT:
+                while (reader.hasNext()) {
+                    String topLevelName = reader.nextName();
+                    switch (topLevelName) {
+                    case kHead:
+                        result = eltEncoder.newHeadElt(gson, reader);
+                        break outer;

Review comment:
       Any reason not to `return result`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org