You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (Updated) (JIRA)" <ji...@apache.org> on 2012/01/06 19:18:39 UTC

[jira] [Updated] (JENA-187) TSVInput parses all at once rather than streaming

     [ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Vesse updated JENA-187:
---------------------------

    Attachment: TsvInputStreaming.patch

This patch modifies TSVInput so that it parses only the header row and then generates a ResultSetStream using a new QueryIterator implementation (TSVInputIterator) which will parse result rows on demand.

This provides low memory usage streaming of TSV results

This patch also fixes the unit tests around result set formatting to actually check for isomorphism of results for round-trippable formats (XML, JSON and TSV) and to reflect change in exception type thrown by TSVInputIterator when a bad result row is encountered
                
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
>                 Key: JENA-187
>                 URL: https://issues.apache.org/jira/browse/JENA-187
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.9.0
>         Environment: Any
>            Reporter: Rob Vesse
>            Priority: Critical
>              Labels: parsing, sparql, tsv
>         Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira