You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2012/11/21 23:20:58 UTC

[jira] [Created] (JENA-352) Vast numbers of bNodes can overwhelm the parser

Andy Seaborne created JENA-352:
----------------------------------

             Summary: Vast numbers of bNodes can overwhelm the parser
                 Key: JENA-352
                 URL: https://issues.apache.org/jira/browse/JENA-352
             Project: Apache Jena
          Issue Type: Bug
          Components: RIOT, TDB
            Reporter: Andy Seaborne
            Priority: Minor


The parsers need to keep a bNode label to bNode map which (unusual data) can grow too large.  As it takes unusual data, rated as "minor".

outline of solution: 

1/ which to a bNode allocation scheme which is a large random number per file, and concat or XOR with the claimed bNode label to generate a unique label without state build up.

2/ (Turtle) don't remember [] bnodes past their usage scope.

3/ Partial - keep a sliding window of bNodes label amppings 

e.g.
http://mail-archives.apache.org/mod_mbox/jena-users/201112.mbox/%3C4EDFE45F.6090202@apache.org%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-352) Vast numbers of bNodes can overwhelm the parser

Posted by "Andy Seaborne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/JENA-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509057#comment-13509057 ] 

Andy Seaborne commented on JENA-352:
------------------------------------

Alternative approach:

RIOT now handles the pseudo-URIs of the form <_:XYZ> using XYZas the internal identifier for the bnode.

This has two uses:

1/ Use with dumps to restore exactly the old data (NB RIOT writes bnodes as _:BXYZ i.e. leading "B" and also an encoded label).

2/ Processing large loads - either so the data can be split or simple to load a very large file with bNodes.

Does not apply to RDF/XML.

As this is only partial solution, I've left the JIRA left open.

The seed+XOR the label (i.e. option 1) is better.

                
> Vast numbers of bNodes can overwhelm the parser
> -----------------------------------------------
>
>                 Key: JENA-352
>                 URL: https://issues.apache.org/jira/browse/JENA-352
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: RIOT, TDB
>            Reporter: Andy Seaborne
>            Priority: Minor
>
> The parsers need to keep a bNode label to bNode map which (unusual data) can grow too large.  As it takes unusual data, rated as "minor".
> outline of solution: 
> 1/ which to a bNode allocation scheme which is a large random number per file, and concat or XOR with the claimed bNode label to generate a unique label without state build up.
> 2/ (Turtle) don't remember [] bnodes past their usage scope.
> 3/ Partial - keep a sliding window of bNodes label amppings 
> e.g.
> http://mail-archives.apache.org/mod_mbox/jena-users/201112.mbox/%3C4EDFE45F.6090202@apache.org%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira