You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/11/24 10:25:45 UTC

[GitHub] [jena] rvesse commented on issue #1633: optional streaming construct?

rvesse commented on issue #1633:
URL: https://github.com/apache/jena/issues/1633#issuecomment-1326251895

   So having done this for a previous employers CLI tools for their Graph Database that used Jena for the user facing pieces I can say that this is non-trivial to achieve.
   
   That's not to say that is isn't possible merely to highlight that there are a few things to be aware of if someone wanted to attempt this:
   
   1. You likely want to make this an opt-in behaviour **NOT** change the existing default behaviour
       - A streaming construct won't suppress duplicate triples so you could get much larger output than expected
       - If the consumer of the output doesn't cope with duplicate triples properly this can break larger data pipelines
   2. If a user opts into this behaviour you need to validate that their selected output format is compatible with streaming.  
       - Jena has streaming writers for some languages but not all languages (and this includes some that in theory could have a streaming writer but it would be horrendously verbose e.g. RDF/XML)
           - See `WriterStreamRDFPlain` (for NTriples/Turtle), `WriterStreamRDFBlocks` (for Turtle with limited syntactic sugar), `StreamRDF2Thrift` and `StreamRDF2Protobuf`
       - Also worth noting that streaming writers will inherently produce less compressed output, i.e. they can't use all the syntactic sugar of their languages e.g. Turtle predicate object lists, collection shorthands etc, because those require multiple passes over the full data to compute whether those are usable
       - I don't remember if there is a registry for streaming writers (I remember having to hardcode an `if` structure for this at the time but that was ~8 years ago now), there might be one now (@afs does that exist now?) or it may need introducing
       - You'll need to propagate the query namespace prefixes to the streaming writer somehow since you'll be operating with an `Iterator<Triple>` that won't have any prefixes available unlike the `Model` you get from a normal construct evaluation
   3. Then depending on whether you can use a streaming writer or not invoke the relevant `execConstruct()` vs `execConstructTriples()` methods and handle the result accordingly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org