You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (Created) (JIRA)" <ji...@apache.org> on 2012/01/06 18:04:39 UTC

[jira] [Created] (JENA-187) TSVInput parses all at once rather than streaming

TSVInput parses all at once rather than streaming
-------------------------------------------------

                 Key: JENA-187
                 URL: https://issues.apache.org/jira/browse/JENA-187
             Project: Jena
          Issue Type: Bug
          Components: ARQ
    Affects Versions: ARQ 2.9.0
         Environment: Any
            Reporter: Rob Vesse
            Priority: Critical


TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results

I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (JENA-187) TSVInput parses all at once rather than streaming

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183374#comment-13183374 ] 

Hudson commented on JENA-187:
-----------------------------

Integrated in Jena_ARQ #406 (See [https://builds.apache.org/job/Jena_ARQ/406/])
    JENA-187 - Stream based parsing for TSVInput

andy : 
Files : 
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/sparql/resultset/TSVInput.java
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/com/hp/hpl/jena/sparql/resultset/TSVInputIterator.java
* /incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/atlas/io/IO.java
* /incubator/jena/Jena2/ARQ/trunk/src/test/java/com/hp/hpl/jena/sparql/resultset/TestResultSetFormat1.java
* /incubator/jena/Jena2/ARQ/trunk/src/test/java/com/hp/hpl/jena/sparql/resultset/TestResultSetFormat2.java

                
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
>                 Key: JENA-187
>                 URL: https://issues.apache.org/jira/browse/JENA-187
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.9.0
>         Environment: Any
>            Reporter: Rob Vesse
>            Assignee: Andy Seaborne
>            Priority: Critical
>              Labels: parsing, sparql, tsv
>             Fix For: ARQ 2.9.1
>
>         Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (JENA-187) TSVInput parses all at once rather than streaming

Posted by "Andy Seaborne (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne resolved JENA-187.
--------------------------------

    Resolution: Fixed

Patch applied - many thanks.
                
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
>                 Key: JENA-187
>                 URL: https://issues.apache.org/jira/browse/JENA-187
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.9.0
>         Environment: Any
>            Reporter: Rob Vesse
>            Assignee: Andy Seaborne
>            Priority: Critical
>              Labels: parsing, sparql, tsv
>             Fix For: ARQ 2.9.1
>
>         Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (JENA-187) TSVInput parses all at once rather than streaming

Posted by "Andy Seaborne (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne closed JENA-187.
------------------------------

    
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
>                 Key: JENA-187
>                 URL: https://issues.apache.org/jira/browse/JENA-187
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.9.0
>         Environment: Any
>            Reporter: Rob Vesse
>            Assignee: Andy Seaborne
>            Priority: Critical
>              Labels: parsing, sparql, tsv
>             Fix For: ARQ 2.9.1
>
>         Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (JENA-187) TSVInput parses all at once rather than streaming

Posted by "Andy Seaborne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne updated JENA-187:
-------------------------------

    Fix Version/s: ARQ 2.9.1
         Assignee: Andy Seaborne
    
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
>                 Key: JENA-187
>                 URL: https://issues.apache.org/jira/browse/JENA-187
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.9.0
>         Environment: Any
>            Reporter: Rob Vesse
>            Assignee: Andy Seaborne
>            Priority: Critical
>              Labels: parsing, sparql, tsv
>             Fix For: ARQ 2.9.1
>
>         Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (JENA-187) TSVInput parses all at once rather than streaming

Posted by "Rob Vesse (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/JENA-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Vesse updated JENA-187:
---------------------------

    Attachment: TsvInputStreaming.patch

This patch modifies TSVInput so that it parses only the header row and then generates a ResultSetStream using a new QueryIterator implementation (TSVInputIterator) which will parse result rows on demand.

This provides low memory usage streaming of TSV results

This patch also fixes the unit tests around result set formatting to actually check for isomorphism of results for round-trippable formats (XML, JSON and TSV) and to reflect change in exception type thrown by TSVInputIterator when a bad result row is encountered
                
> TSVInput parses all at once rather than streaming
> -------------------------------------------------
>
>                 Key: JENA-187
>                 URL: https://issues.apache.org/jira/browse/JENA-187
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: ARQ 2.9.0
>         Environment: Any
>            Reporter: Rob Vesse
>            Priority: Critical
>              Labels: parsing, sparql, tsv
>         Attachments: TsvInputStreaming.patch
>
>
> TSVInput parses TSV result sets all at once into memory and then wraps them in a query iterator which is very naive and results in OutOfMemoryException once you have a large number of results
> I will submit a patch that address this later today once I've written the code to fix this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira