You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (Created) (JIRA)" <ji...@apache.org> on 2012/01/06 18:04:39 UTC
[jira] [Created] (JENA-187) TSVInput parses all at once rather than
streaming
TSVInput parses all at once rather than streaming
-------------------------------------------------
Key: JENA-187
URL: https://issues.apache.org/jira/browse/JENA-187
Project: Jena
Issue Type: Bug
Components: ARQ
Affects Versions: ARQ 2.9.0
Environment: Any
Reporter: Rob Vesse
Priority: Critical
TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
I will submit a patch that address this later today once I've written the code to fix this
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JENA-187) TSVInput parses all at once rather
than streaming
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183374#comment-13183374 ]
Hudson commented on JENA-187:
-----------------------------
Integrated in Jena_ARQ #406 (See [https://builds.apache.org/job/Jena_ARQ/406/])
JENA-187 - Stream based parsing for TSVInput
andy :
Files :
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/sparql/resultset/TSVInput.java
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/sparql/resultset/TSVInputIterator.java
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/atlas/io/IO.java
* /incubator/jena/Jena2/ARQ/trunk/src/test/java/com/hp/hpl/jena/sparql/resultset/TestResultSetFormat1.java
* /incubator/jena/Jena2/ARQ/trunk/src/test/java/com/hp/hpl/jena/sparql/resultset/TestResultSetFormat2.java
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
> Key: JENA-187
> URL: https://issues.apache.org/jira/browse/JENA-187
> Project: Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: ARQ 2.9.0
> Environment: Any
> Reporter: Rob Vesse
> Assignee: Andy Seaborne
> Priority: Critical
> Labels: parsing, sparql, tsv
> Fix For: ARQ 2.9.1
>
> Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (JENA-187) TSVInput parses all at once rather
than streaming
Posted by "Andy Seaborne (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne resolved JENA-187.
--------------------------------
Resolution: Fixed
Patch applied - many thanks.
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
> Key: JENA-187
> URL: https://issues.apache.org/jira/browse/JENA-187
> Project: Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: ARQ 2.9.0
> Environment: Any
> Reporter: Rob Vesse
> Assignee: Andy Seaborne
> Priority: Critical
> Labels: parsing, sparql, tsv
> Fix For: ARQ 2.9.1
>
> Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (JENA-187) TSVInput parses all at once rather than
streaming
Posted by "Andy Seaborne (Closed) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne closed JENA-187.
------------------------------
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
> Key: JENA-187
> URL: https://issues.apache.org/jira/browse/JENA-187
> Project: Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: ARQ 2.9.0
> Environment: Any
> Reporter: Rob Vesse
> Assignee: Andy Seaborne
> Priority: Critical
> Labels: parsing, sparql, tsv
> Fix For: ARQ 2.9.1
>
> Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (JENA-187) TSVInput parses all at once rather than
streaming
Posted by "Andy Seaborne (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne updated JENA-187:
-------------------------------
Fix Version/s: ARQ 2.9.1
Assignee: Andy Seaborne
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
> Key: JENA-187
> URL: https://issues.apache.org/jira/browse/JENA-187
> Project: Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: ARQ 2.9.0
> Environment: Any
> Reporter: Rob Vesse
> Assignee: Andy Seaborne
> Priority: Critical
> Labels: parsing, sparql, tsv
> Fix For: ARQ 2.9.1
>
> Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (JENA-187) TSVInput parses all at once rather than
streaming
Posted by "Rob Vesse (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rob Vesse updated JENA-187:
---------------------------
Attachment: TsvInputStreaming.patch
This patch modifies TSVInput so that it parses only the header row and then generates a ResultSetStream using a new QueryIterator implementation (TSVInputIterator) which will parse result rows on demand.
This provides low memory usage streaming of TSV results
This patch also fixes the unit tests around result set formatting to actually check for isomorphism of results for round-trippable formats (XML, JSON and TSV) and to reflect change in exception type thrown by TSVInputIterator when a bad result row is encountered
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
> Key: JENA-187
> URL: https://issues.apache.org/jira/browse/JENA-187
> Project: Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: ARQ 2.9.0
> Environment: Any
> Reporter: Rob Vesse
> Priority: Critical
> Labels: parsing, sparql, tsv
> Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira