You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@clerezza.apache.org by "Hasan (JIRA)" <ji...@apache.org> on 2011/01/13 21:08:48 UTC

[jira] Created: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
------------------------------------------------------------------------------------------

                 Key: CLEREZZA-395
                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
             Project: Clerezza
          Issue Type: Improvement
            Reporter: Hasan


With every parsing of rdf files free memory is getting less.
The problem seems to lie in the JenaGraphAdaptor class
It has a member:
final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();

which grows each time a serialized graph get parsed.

My experiments with my test data show

At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Hasan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982571#action_12982571 ] 

Hasan commented on CLEREZZA-395:
--------------------------------

Hi All,

I would like to draw your attention to the problems we faced due to
increasing memory usage needed by the bnode bidirectional mapping.
Though the RDF specification does not specify how to implement BNode, in
practice, I think having an implementation where we can assign an identifier
to a BNode is really useful (see the comments of Rupert below).

Shouldn't we extend the Clerezza BNode implementation with this feature?

Kind regards
Hasan

On Sat, Jan 15, 2011 at 7:58 PM, Rupert Westenthaler (JIRA) <jira@apache.org



> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982852#action_12982852 ] 

Reto Bachmann-Gmür commented on CLEREZZA-395:
---------------------------------------------

Today I commit a change to the jena adapter so that existing bnodes or bnodes produced by jena-parsers no longer cause an entry in the bidi-map. Another improvement that should be done is to keep only weak references in the bidimap and discard map entries once the BNode-instances are no longer referenced.

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982150#action_12982150 ] 

Rupert Westenthaler commented on CLEREZZA-395:
----------------------------------------------

I think the only reason why the map is needed is because in Clerezza BNode does not allow to define an ID. 

If BNode would have an ID, than one could set this ID to the same value as Node.getBlankNodeId().toString(). If the BNode implementation would than use this ID for the implementation of hash() and equals, than having multiple instances for one and the same blank node in the jena graph should not be a problem anymore and one would no longer need the bidirectional mapping.

BNode could still use the hash() and equals() implementation of java.lang.Object() if no ID is parsed in the constructor so there should be no influence at existing Code.

WDYT
Rupert Westenthaler

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Hasan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hasan closed CLEREZZA-395.
--------------------------

    Resolution: Fixed
      Assignee: Hasan

the clear method is overriden so that bnodes mapping can also be cleared.
However, I think, memory leak is still there in the map.
Consider the following:
If triples are removed and they contain bnodes, and if after removing those triples, the bnodes in the triples do not exist anywhere in the graph anymore, then there is a leak in the map.
This problem - if what I describe above is correct - is however not part of this issue and should be addressed in a separate issue.


> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Hasan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981591#action_12981591 ] 

Hasan commented on CLEREZZA-395:
--------------------------------

the claim is valid only if the parsed file should replace existing mgraph. 
otherwise, the mapping may grow

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982851#action_12982851 ] 

Reto Bachmann-Gmür commented on CLEREZZA-395:
---------------------------------------------

Hi Rupert,

You're right that a bnode-instance can reference bnodes in different triple-collections, in terms of rdf these are obviously not the same bnode. Nevertheless when you merge the two triplecollections the two bnodes represented by one instance become the same bnode.

The difference from the technical perspective is the life-span of the bnode-reference, as soon as the bnode-instance becomes eligible for garbage collection the storage provider knows that the bnode in question has no longer an intrinsic identity alien to rdf.

As long as we support such a triple-centric api we need to be able to point to a bnode at least while "drawing" a graph, but if this pointer has no age limit the storage layer would have to keep redundant information for ever.

Say the following Statements are created (with an empty graph1 and !rupert1.equals(rupert2)):

graph1.add(new TripleImpl(rupert1;firstName,new PlainLiteral("Rupert")); 
graph1.add(new TripleImpl(rupert2;firstName,new PlainLiteral("Rupert"));

after these two statements graph1 is clearly not lean, yet the implementation cannot remove the redundancy as long as following statements could be added:

graph1.add(new TripleImpl(rupert1;lastName,new PlainLiteral("Westenthaler")); 
graph1.add(new TripleImpl(rupert2;lastName,new PlainLiteral("Murdoch")); 

If you don't add the latter two statements the store is free to remove the redundancy when there's no reference to the bnode in any object of the application. If bnode identity was determined by a bnode-label the storage layer would never know for sure that nobody will attempt to reference the node by that id.

Cheers,
reto

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Reto Bachmann-Gmür (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982586#action_12982586 ] 

Reto Bachmann-Gmür commented on CLEREZZA-395:
---------------------------------------------

Hi,

we can certainly improve the current implementation reducing the mapping to bnodes that do not originate from the jena adapter. But I'm strictly against providing means that would allow to store or transfer a bnode (reference) without it's context, as this would defeat the purpose of bnodes.

Cheers,
reto

----- Original message -----



> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Tsuyoshi Ito (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsuyoshi Ito updated CLEREZZA-395:
----------------------------------

    Comment: was deleted

(was: Is that not problematic to invoke tria2JenaBNodes.clear(); in the clear method?

what happens if applications use references to these bnodes while clear is invoked?
)

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982653#action_12982653 ] 

Rupert Westenthaler commented on CLEREZZA-395:
----------------------------------------------

Hi Reto, all

What do you mean by "store" and "transfer"?
 (1) persistent storage (e.g. Jena TDB) and export (e.g. RDF/XML serilization), or also
 (2) storage and CRUD operations while working with several MGraphs on API Level.
I completely agree with (1) but I am unsure about (2) because I understand the potential danger but also used things like that for a lot of stuff in the past years (e.g. by using by using http://www.openrdf.org/doc/sesame2/api/index.html?org/openrdf/repository/util/RDFInserter.html that preserves BNode IDs).

Let me point out, that operations described in (2) are possible with the current implementation.
Here a small Example of what I refer to (written here in the TextEditor - so no guarantee that is would compile)

MGraph graph1 = new SimpleMGraph();
MGraph graph2 = new SimpleMGraph();
//I think it should even work with an Jena Graph because of the Bidi Map providing mappings for BNodes

//By being able to create a BNode without a Graph there is not something like a Context of an BNode
BNode rupertInfo = new BNode();
BNode retoInfo = new BNode();
UriRef name = new UriRef(FOAF+"name");
UriRef knows = new UriRef(FOAF+"knows");

//add operations do not create new instances of BNode ... so there is still no context
graph1.add(new TripleImpl(rupertInfo;name,new PlainLiteral("Rupert Westenthaler"));
graph1.add(new TripleImpl(rupertInfo;knows, retoInfo));

graph2.add(new TripleImpl(reto;name,new PlainLiteral("Reto Bachmann-Gmur"));
//"rupertInfo" is now in two graphs (2 contexts?)
graph2.add(new TripleImpl(reto;knows;rupertInfo);

//So now lets have some fun with the BNodes
//search for all knows in graph1 -> OK (because within the same context)
Iterator<Triple> rupertsFriends = graph1.filter(rupertInfo,knows,null);
//query for all information of the results (BNodes) in graph2 -> NOT OK?!
while(rupertsFriends.hasNext()){
  Resource friendBNode = rupertsFriends.getObject();
  Iterator<Triple> friendInfos = graph2.filter(friendBNode,null,null)
  //add them to the BNode in graph1 -> NOT OK?!
  while(friendInfos.hasNext()){
    graph1.add(friendInfos.next); //OK this would not work because it changes graph1 within the Iteration over rupertsFriends, but it shows the principle
  }
}

This works because
 - BNode does not override equals and the equals implementation of java.lang.Objects checks for reference
 - one instance of an BNode is shared between the two Graphs
 - the performAdd Method (at least from SimpleTripleCollection) does not create new instances for added BNodes
So if it is the goal to completely avoid sharing of BNodes between Graph instances one would need to change the current implementation.

In conclusion I would like to point out that adding an ID to BNode - as suggested in my first comment - would not change anything out of a technical perspective. However I clearly understand that adding a Constructor like BNode(String bNodeID) to the public API would encourage wrong usage of BNodes by users which might cause a lot of troubles if they are not aware of the consequences.

best
Rupert Westenthaler 

> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CLEREZZA-395) bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files

Posted by "Tsuyoshi Ito (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CLEREZZA-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981676#action_12981676 ] 

Tsuyoshi Ito commented on CLEREZZA-395:
---------------------------------------

Is that not problematic to invoke tria2JenaBNodes.clear(); in the clear method?

what happens if applications use references to these bnodes while clear is invoked?


> bnodes mapping in JenaGraphAdaptor should not keep growing with every parsing of rdf files
> ------------------------------------------------------------------------------------------
>
>                 Key: CLEREZZA-395
>                 URL: https://issues.apache.org/jira/browse/CLEREZZA-395
>             Project: Clerezza
>          Issue Type: Improvement
>            Reporter: Hasan
>            Assignee: Hasan
>
> With every parsing of rdf files free memory is getting less.
> The problem seems to lie in the JenaGraphAdaptor class
> It has a member:
> final BidiMap<BNode, Node> tria2JenaBNodes = new BidiMapImpl<BNode, Node>();
> which grows each time a serialized graph get parsed.
> My experiments with my test data show
> At the end of the 1st parsing: Size of tria2JenaBNodes = 87200
> At the end of the 2nd parsing: Size of tria2JenaBNodes = 130800
> At the end of the 3rd parsing: Size of tria2JenaBNodes = 174400

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.