You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@marmotta.apache.org by Alan Melville <al...@gmail.com> on 2013/12/10 05:14:12 UTC

Non unique IntArray / SPOC cache Key

Hello all,

I have encountered a situation where by the same cache key is generated for
different triple statements. The current key generation is based on hash
codes ( see IntArray.createSPOCKey(s, p, o, c)), which unfortunately is not
guaranteed to be unique across different node types.

In my particular example I have a KiWiLiteral node with a URI string value.
I also have another KiWiUriResource node with the same value for its URI
(messy I know :( ).

<rdf:Description rdf:about="http://vocabulary.curriculum.edu.au/access/10">
	<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
	<skos:prefLabel xml:lang="en">Visual independence</skos:prefLabel>
	<skos:topConceptOf
rdf:resource="*http://vocabulary.curriculum.edu.au/access
<http://vocabulary.curriculum.edu.au/access>*"/>
	<skos:topConceptOf>*http://vocabulary.curriculum.edu.au/access
<http://vocabulary.curriculum.edu.au/access>*</skos:topConceptOf>


KiWiLiteral.hashCode() is solely based on its label/content and
KiWiUriResource.hashCode() is solely based on its URI hence to two
different nodes generate the same hashCode, which in its self is fine, but
the IntArray.createSPOCKey(s, p, o, c) generates the same key and thus the
KiWiValueFactory.tripleRegistry may return the wrong KiWiTriple (see
KiWiValueFactory.createStatement(s, p, o, c, con)).


Proposed solution:

1) Update the hashCode in the KiWiNode types to be more unique?
2) Adjust the IntArray.createSPOCKey implementation?
3) other?

Happy to raise a Jira ticket if the dev community feels this needs
attention.

Kind regards
Al

Re: Non unique IntArray / SPOC cache Key

Posted by Sebastian Schaffert <se...@gmail.com>.
Hi Alan,

I had to revert the patch because I forgot that I cannot change the way the
hashCode is computed, as it is defined in the Sesame API how this is
supposed to be done.  I therefore instead changed the implementation of
IntArray. I'll also write a test for this issue but I'd be grateful if you
could try again with the most recent snapshot.

Greetings,

Sebastian


2013/12/10 Sebastian Schaffert <se...@gmail.com>

> Hi Al,
>
> thanks for pointing this out, this is clearly a bug. For the sake of
> documentation I have created an issue at
>
> https://issues.apache.org/jira/browse/MARMOTTA-401
>
>
> I have also created a patch in the latest development branch that should
> fix the issue. Can you please check with a new snapshot from GIT if it now
> works for you?
>
> Greetings,
>
> Sebastian
>
>
> 2013/12/10 Alan Melville <al...@gmail.com>
>
>> Hello all,
>>
>> I have encountered a situation where by the same cache key is generated
>> for
>> different triple statements. The current key generation is based on hash
>> codes ( see IntArray.createSPOCKey(s, p, o, c)), which unfortunately is
>> not
>> guaranteed to be unique across different node types.
>>
>> In my particular example I have a KiWiLiteral node with a URI string
>> value.
>> I also have another KiWiUriResource node with the same value for its URI
>> (messy I know :( ).
>>
>> <rdf:Description rdf:about="http://vocabulary.curriculum.edu.au/access/10
>> ">
>>         <rdf:type rdf:resource="
>> http://www.w3.org/2004/02/skos/core#Concept"/>
>>         <skos:prefLabel xml:lang="en">Visual independence</skos:prefLabel>
>>         <skos:topConceptOf
>> rdf:resource="*http://vocabulary.curriculum.edu.au/access
>> <http://vocabulary.curriculum.edu.au/access>*"/>
>>         <skos:topConceptOf>*http://vocabulary.curriculum.edu.au/access
>> <http://vocabulary.curriculum.edu.au/access>*</skos:topConceptOf>
>>
>>
>> KiWiLiteral.hashCode() is solely based on its label/content and
>> KiWiUriResource.hashCode() is solely based on its URI hence to two
>> different nodes generate the same hashCode, which in its self is fine, but
>> the IntArray.createSPOCKey(s, p, o, c) generates the same key and thus the
>> KiWiValueFactory.tripleRegistry may return the wrong KiWiTriple (see
>> KiWiValueFactory.createStatement(s, p, o, c, con)).
>>
>>
>> Proposed solution:
>>
>> 1) Update the hashCode in the KiWiNode types to be more unique?
>> 2) Adjust the IntArray.createSPOCKey implementation?
>> 3) other?
>>
>> Happy to raise a Jira ticket if the dev community feels this needs
>> attention.
>>
>> Kind regards
>> Al
>>
>
>

Re: Non unique IntArray / SPOC cache Key

Posted by Sebastian Schaffert <se...@gmail.com>.
Hi Al,

thanks for pointing this out, this is clearly a bug. For the sake of
documentation I have created an issue at

https://issues.apache.org/jira/browse/MARMOTTA-401


I have also created a patch in the latest development branch that should
fix the issue. Can you please check with a new snapshot from GIT if it now
works for you?

Greetings,

Sebastian


2013/12/10 Alan Melville <al...@gmail.com>

> Hello all,
>
> I have encountered a situation where by the same cache key is generated for
> different triple statements. The current key generation is based on hash
> codes ( see IntArray.createSPOCKey(s, p, o, c)), which unfortunately is not
> guaranteed to be unique across different node types.
>
> In my particular example I have a KiWiLiteral node with a URI string value.
> I also have another KiWiUriResource node with the same value for its URI
> (messy I know :( ).
>
> <rdf:Description rdf:about="http://vocabulary.curriculum.edu.au/access/10
> ">
>         <rdf:type rdf:resource="
> http://www.w3.org/2004/02/skos/core#Concept"/>
>         <skos:prefLabel xml:lang="en">Visual independence</skos:prefLabel>
>         <skos:topConceptOf
> rdf:resource="*http://vocabulary.curriculum.edu.au/access
> <http://vocabulary.curriculum.edu.au/access>*"/>
>         <skos:topConceptOf>*http://vocabulary.curriculum.edu.au/access
> <http://vocabulary.curriculum.edu.au/access>*</skos:topConceptOf>
>
>
> KiWiLiteral.hashCode() is solely based on its label/content and
> KiWiUriResource.hashCode() is solely based on its URI hence to two
> different nodes generate the same hashCode, which in its self is fine, but
> the IntArray.createSPOCKey(s, p, o, c) generates the same key and thus the
> KiWiValueFactory.tripleRegistry may return the wrong KiWiTriple (see
> KiWiValueFactory.createStatement(s, p, o, c, con)).
>
>
> Proposed solution:
>
> 1) Update the hashCode in the KiWiNode types to be more unique?
> 2) Adjust the IntArray.createSPOCKey implementation?
> 3) other?
>
> Happy to raise a Jira ticket if the dev community feels this needs
> attention.
>
> Kind regards
> Al
>