You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Henry Story (Created) (JIRA)" <ji...@apache.org> on 2012/01/29 17:41:10 UTC

[jira] [Created] (JENA-203) support for Non Blocking Parsers

support for Non Blocking Parsers
--------------------------------

                 Key: JENA-203
                 URL: https://issues.apache.org/jira/browse/JENA-203
             Project: Jena
          Issue Type: Improvement
            Reporter: Henry Story


In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
actors (500 bytes per actor) are growing more viisible.

Unless I am mistaken the only way to parse some content is using methods that use an 
InputStream such as this:

    val m = ModelFactory.createDefaultModel()
     m.getReader(lang.jenaLang).read(m, in, base.toString)

That read call blocks. Would it be possible to have an API which allows
one to parse a document in chunks as they arrive from the input?




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Andy Seaborne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219928#comment-13219928 ] 

Andy Seaborne commented on JENA-203:
------------------------------------

Interesting stuff - I need to find a decent block of time to do more than just look.  

To go back to the title of this JIRA ...

What can be done to "support non-blocking parsers" in addition to the current parsers.  It seems to me that the non-block parsers scatter-gather paradigm is a separate subsystem on top of Jena - if there anything the core could provide to help?

What I'd like to see is that Jena does not need to include every feature possible, but can support independent and vibrant open source projects (the developers have already talk a bit about some simple modularity while delivering combined collections in useful forms for common cases, like a single jar with everything in it or a single jar + dependencies to make using the command like tools much easier).

(BTW the n-triples parser link is 404)

                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Andy Seaborne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220989#comment-13220989 ] 

Andy Seaborne commented on JENA-203:
------------------------------------

I can image it has no impact on existing Jena API.  I think model.read() is the wrong way round and it should ReadEngine.read(model).  The FileManager has that design ; I'm imaging "ReadEngine.many(model, set of places, waitForAll?)" would be the way to get from 500 places at once.

The new WebReader I have ready to go also does that where there is only one RDF reader for all syntaxes and model.read routes to that.  The syntax is merely a hint, and only one or several pieces of information used to determine the specific parser.

If your parser does encounter a large, fast reply stream - how fast does it parse?

                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205858#comment-13205858 ] 

Henry Story commented on JENA-203:
----------------------------------

Now there is a non blocking NTriples parser available here

    https://github.com/betehess/pimp-my-rdf/blob/master/n-triples-parser/src/main/scala/Parser.scala
                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197939#comment-13197939 ] 

Henry Story commented on JENA-203:
----------------------------------

With a bit of help from Damian I did get  the RDF/XML parser to be asynchronous using com.fasterxml.aalto asynchronous parser [1]. 

I had to adapt Damian's jena.rdf.arp.StAX2SAX - which I called AsyncJenaParser [2] . This is then used by the URLFetcher class [3]. This class
extends the async_http_client by ning [4], to fetch RDF.

Currently it can only fetch RDF/XML, and with a bit more work, any XML format.

What is missing is the Turtle parsers and JSON parsers

The URLFetcher could be a bit more general and just pass on the data it receives to some actors. That would remove the parser processing from the IO thread, and allow the fetcher to be more general. 

There is perhaps something here that can be integrated by Jena. The AsyncJenaParser perhaps?

Henry

[1] http://www.cowtowncoder.com/blog/archives/2011/03/entry_451.html
[2] https://dvcs.w3.org/hg/read-write-web/file/aa9074df0635/src/main/java/patch/AsyncJenaParser.java
[3] https://dvcs.w3.org/hg/read-write-web/file/d9c1f87eee55/src/main/scala/cache/WebFetcher.scala
[4] all classes can be found in the build file https://dvcs.w3.org/hg/read-write-web/file/aa9074df0635/project/build.scala



                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Claude Warren (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254627#comment-13254627 ] 

Claude Warren commented on JENA-203:
------------------------------------

I was pondering this problem recently and was wondering about creating a new poling iterator class that returns True, False or NULL for hasNext().  The NULL being, no data yet.

The idea is that each endpoint would be a thread fronted by a polling iterator that would plug into a poling iterator worker/pool/what-have-you the worker/pool/what-have-you would poll the endpoints until it got a TRUE or FALSE.   On true it would return true for hasNext() and next() would return the result from the same endpoint.  On false it would remove the endpoint from the pool, after last endpoint is removed it returns FALSE.  On NULL it would move onto the next endpoint in the pool cycling back to the start when it reached the end.

This should allow results from slower endpoints to be intermixed with results from faster endpoints and should increase the speed (decrease the time) to get all results.
                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219944#comment-13219944 ] 

Henry Story commented on JENA-203:
----------------------------------

I am not sure what is the best way to change the Jena API for non blocking parsers, nor if anything needs to be done (yet). Essentially the way these parsers work is that one should be
able to parse chunks of data, get some partial results (a small set of triples) and feed that to a  Jena graph or store. Feeding it to a Jena Graph, or popping statements into a store one at a time is  not a problem. So the XML parser I did above shows that it can be done with the jena rdf/xml parsers, and the turtle parser shows how one can do it with other frameworks that use Jena: after all the Turtle parser tests can add triples to Jena or Sesame graphs.

But I think consciousness of this problem should help guide the direction of your thinking when developing new parsers, or what is needed to work with linked data in  an efficient way.

Out of doing this a few times an API will probably emerge.

Currently I have a simple blocking interface API for the non blocking parser
   https://github.com/betehess/pimp-my-rdf/blob/248c8a13567e589308d1b7999570a14d6b530b20/n3/src/main/scala/TurtleReader.scala

we all know this API. I need to find out how people in the actors community do this, and see what kind of pattern they agree is good. If I find that
I'll post that here. Perhaps that will lead to some ideas of what such a pattern looks like.

(The NTriples file moved. Here is the current snapshot link, which should be a permalink 
   https://github.com/betehess/pimp-my-rdf/blob/248c8a13567e589308d1b7999570a14d6b530b20/n3/src/main/scala/NTriples.scala , but won't necessarily be the most up to date one )

I'll keep you posted on further developments. I should try using these parsers in a real scenario soon, so I'll soon know how well this holds up.

                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Andy Seaborne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195796#comment-13195796 ] 

Andy Seaborne commented on JENA-203:
------------------------------------

Good timing - there are initial discussions readers recently on jena-users@.  Please contribute and we can frame a more general JIRA.

The more usual idiom is "m.read(m,in, base)" but the general mechanism you describe can be used with actor frameworks. m.getReader creates a reader that the app can pass (in a closure0like setup) to an actor.

The RIOT parsers output to a Sink<Triple> which allows different architectures.  RIOT encapsulates parsing as an algorithm so that algorithm can be executed on a separate thread/actor.

What rendezvous style would you suggest?

This does not seem to be "priority major" and until there is a patch available I suggest not marking it as such.

                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219582#comment-13219582 ] 

Henry Story commented on JENA-203:
----------------------------------

And now a non blocking Turtle parer is available here too

https://github.com/betehess/pimp-my-rdf/blob/d64ae11514f4bd8402c0857cb29c203ec821bd67/n3/src/main/scala/Turtle.scala

with more detailed discussion on the W3C mailing list 

http://lists.w3.org/Archives/Public/public-rdf-comments/2012Feb/0043.html
                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254633#comment-13254633 ] 

Henry Story commented on JENA-203:
----------------------------------

I think the correct data structure to look at is the Iteratee one from Functional programming.

Here I wrote an RDFIteratee trait that has two implmenetations: one synchronous ( JenaSyncRDFIteratee ) and the other asynchronous ( JenaRdfXmlAsync )

   https://github.com/bblfish/Play20/blob/webid/framework/src/webid/src/main/scala/webid/rdf/RDFIteratee.scala

This can then be used to write some very elegant code which can evolve as one gets better asynchronous parsers come along

   https://github.com/bblfish/Play20/blob/webid/framework/src/webid/src/main/scala/webid/GraphCache.scala#L133

More documentation on Iteratees 

  https://github.com/playframework/Play20/wiki/Iteratees

Henry

                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220997#comment-13220997 ] 

Henry Story commented on JENA-203:
----------------------------------

I think at present it is a lot slower than the Jena and Sesame readers. There is probably still (I hope) a lot of optimisation that can be done... I learnt a lot doing it, but one does get to see in the end what the advantages of xml are...  :-)
                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224111#comment-13224111 ] 

Henry Story commented on JENA-203:
----------------------------------

A lot slower means it is currently 10x slower. Small changes can make big differences in such parsers, but I won't have the time to tweak it now. If people would like to see how much they can improve they are welcome 
                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Posted by "Henry Story (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197953#comment-13197953 ] 

Henry Story commented on JENA-203:
----------------------------------

Ah I forgot this was discussed on the mailing list here:
  http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201201.mbox/%3C54563B60-702E-4748-B19E-9C3A0EDFBB1D%40bblfish.net%3E


                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty
> are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira