You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by souri datta <so...@gmail.com> on 2015/04/24 21:00:00 UTC
Creating consistent ids for blank nodes
Hi,
When I am parsing an N-quad using Jena, is it possible to keep the blank
node id same after parsing?
For e.g., if I have a blank n-quad like
_:bsomerandomid <ns:pred> <http://foo.com> <foo> .
After parsing, when I do
subject.getBlankNodeId() it runs a randomly generated string which is
different from "_:bsomerandomid" .
Is it possible to set some flags so that I get blank the same id as present
in the input quad?
Thanks,
Souri
*Code used:*
*private ParserProfileBase profile = new ParserProfileBase(new
Prologue(null,*
* IRIResolver.createNoResolve()),*
* ErrorHandlerFactory.errorHandlerStrictSilent(),*
* LabelToNode.createUseLabelEncoded());*
* profile.setStrictMode(true);*
*}*
*List<Quad> output = new ArrayList<>();*
*Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
*LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
* while (parser.hasNext()) {*
* output.add(parser.next());*
* }*
* output.get(0).getSubject().getBlankNodeId();*
Re: Creating consistent ids for blank nodes
Posted by David Moss <ad...@gmail.com>.
As far as I am aware blank nodes are called blank for a reason. If you want a named resource, why not just create one?
Sent from my iPhone
> On 25 Apr 2015, at 5:00 am, souri datta <so...@gmail.com> wrote:
>
> Hi,
> When I am parsing an N-quad using Jena, is it possible to keep the blank
> node id same after parsing?
>
> For e.g., if I have a blank n-quad like
>
> _:bsomerandomid <ns:pred> <http://foo.com> <foo> .
>
> After parsing, when I do
>
> subject.getBlankNodeId() it runs a randomly generated string which is
> different from "_:bsomerandomid" .
>
> Is it possible to set some flags so that I get blank the same id as present
> in the input quad?
>
>
> Thanks,
> Souri
>
>
>
> *Code used:*
> *private ParserProfileBase profile = new ParserProfileBase(new
> Prologue(null,*
> * IRIResolver.createNoResolve()),*
> * ErrorHandlerFactory.errorHandlerStrictSilent(),*
> * LabelToNode.createUseLabelEncoded());*
> * profile.setStrictMode(true);*
> *}*
> *List<Quad> output = new ArrayList<>();*
> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
> * while (parser.hasNext()) {*
> * output.add(parser.next());*
> * }*
> * output.get(0).getSubject().getBlankNodeId();*
Re: Creating consistent ids for blank nodes
Posted by David Moss <ad...@gmail.com>.
Is that function guaranteed to return the same blank node every time it is used or is it a temporary coincidence? My reading indicates you can't depend on getting the same blank node ID every time and it's pretty much impossible across systems.
When I've dealt with such data I march through it allocating the attributes of each blank node to a new resource and delete the blank node. That way I am guaranteed my resource will be there in future, and with the same ID.
Perhaps I'm too paranoid, but better safe than sorry.
Sent from my iPhone
> On 26 Apr 2015, at 7:49 am, souri datta <so...@gmail.com> wrote:
>
> David,
> In my use case I don't have control over the n-quads generation. So, I
> cannot create named resources instead of blank nodes.
>
> Andy,
> After looking into the jena source code, was able to find a nice utility
> that solves my problem.
>
> Here is the interesting part of new code :
>
>> Node sNode = output.get(0).getSubject();
>
> if (sNode.isBlank() {
>
> System.out.pritnln(NodeFmtLib.str(sNode));
>
>
>
>
> The NodeFmtLib.str() returns me the same blank node id as specified in the
> n-quad.
>
> --Souri
>
>> On Sat, Apr 25, 2015 at 2:10 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>>> On 24/04/15 20:00, souri datta wrote:
>>>
>>> Hi,
>>> When I am parsing an N-quad using Jena, is it possible to keep the
>>> blank
>>> node id same after parsing?
>>>
>>> For e.g., if I have a blank n-quad like
>>>
>>> _:bsomerandomid <ns:pred> <http://foo.com> <foo> .
>>>
>>> After parsing, when I do
>>>
>>> subject.getBlankNodeId() it runs a randomly generated string which is
>>> different from "_:bsomerandomid" .
>>>
>>> Is it possible to set some flags so that I get blank the same id as
>>> present
>>> in the input quad?
>>
>> Hi,
>>
>> As David say, blank nodes are not for identification so maybe you need
>> some kind of stable label liek a <urn:uuid:....>
>>
>> Warning - if you work with the blanknode labels as given you will not be
>> following RDF. "_:a" in one file is different to "_:a" in another file or
>> even if the same file is read twice.
>>
>> With your code (*) I get
>>
>> bsomerandomid
>>
>> The "_:" is not part of the blank node id.
>>
>> "createUseLabelEncoded" generates legal blank node labels directly related
>> to the string given.
>>
>> The correct form generates a globally unique id that never clashes with
>> any other (well, its related to a 122 bit random number).
>>
>> Please take care with code next time - HTML gets converted and formatted
>> HTML is messed up.
>>
>> Andy
>>
>>
>>
>>>
>>> Thanks,
>>> Souri
>>>
>>>
>>>
>>> *Code used:*
>>> *private ParserProfileBase profile = new ParserProfileBase(new
>>> Prologue(null,*
>>> * IRIResolver.createNoResolve()),*
>>> * ErrorHandlerFactory.errorHandlerStrictSilent(),*
>>> * LabelToNode.createUseLabelEncoded());*
>>> * profile.setStrictMode(true);*
>>> *}*
>>> *List<Quad> output = new ArrayList<>();*
>>> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
>>> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
>>> * while (parser.hasNext()) {*
>>> * output.add(parser.next());*
>>> * }*
>>> * output.get(0).getSubject().getBlankNodeId();*
>>
Re: Creating consistent ids for blank nodes
Posted by souri datta <so...@gmail.com>.
David,
In my use case I don't have control over the n-quads generation. So, I
cannot create named resources instead of blank nodes.
Andy,
After looking into the jena source code, was able to find a nice utility
that solves my problem.
Here is the interesting part of new code :
> Node sNode = output.get(0).getSubject();
if (sNode.isBlank() {
System.out.pritnln(NodeFmtLib.str(sNode));
The NodeFmtLib.str() returns me the same blank node id as specified in the
n-quad.
--Souri
On Sat, Apr 25, 2015 at 2:10 AM, Andy Seaborne <an...@apache.org> wrote:
> On 24/04/15 20:00, souri datta wrote:
>
>> Hi,
>> When I am parsing an N-quad using Jena, is it possible to keep the
>> blank
>> node id same after parsing?
>>
>> For e.g., if I have a blank n-quad like
>>
>> _:bsomerandomid <ns:pred> <http://foo.com> <foo> .
>>
>> After parsing, when I do
>>
>> subject.getBlankNodeId() it runs a randomly generated string which is
>> different from "_:bsomerandomid" .
>>
>> Is it possible to set some flags so that I get blank the same id as
>> present
>> in the input quad?
>>
>
> Hi,
>
> As David say, blank nodes are not for identification so maybe you need
> some kind of stable label liek a <urn:uuid:....>
>
> Warning - if you work with the blanknode labels as given you will not be
> following RDF. "_:a" in one file is different to "_:a" in another file or
> even if the same file is read twice.
>
> With your code (*) I get
>
> bsomerandomid
>
> The "_:" is not part of the blank node id.
>
> "createUseLabelEncoded" generates legal blank node labels directly related
> to the string given.
>
> The correct form generates a globally unique id that never clashes with
> any other (well, its related to a 122 bit random number).
>
> Please take care with code next time - HTML gets converted and formatted
> HTML is messed up.
>
> Andy
>
>
>
>>
>> Thanks,
>> Souri
>>
>>
>>
>> *Code used:*
>> *private ParserProfileBase profile = new ParserProfileBase(new
>> Prologue(null,*
>> * IRIResolver.createNoResolve()),*
>> * ErrorHandlerFactory.errorHandlerStrictSilent(),*
>> * LabelToNode.createUseLabelEncoded());*
>> * profile.setStrictMode(true);*
>> *}*
>> *List<Quad> output = new ArrayList<>();*
>> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
>> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
>> * while (parser.hasNext()) {*
>> * output.add(parser.next());*
>> * }*
>> * output.get(0).getSubject().getBlankNodeId();*
>>
>>
>
Re: Creating consistent ids for blank nodes
Posted by Andy Seaborne <an...@apache.org>.
On 24/04/15 20:00, souri datta wrote:
> Hi,
> When I am parsing an N-quad using Jena, is it possible to keep the blank
> node id same after parsing?
>
> For e.g., if I have a blank n-quad like
>
> _:bsomerandomid <ns:pred> <http://foo.com> <foo> .
>
> After parsing, when I do
>
> subject.getBlankNodeId() it runs a randomly generated string which is
> different from "_:bsomerandomid" .
>
> Is it possible to set some flags so that I get blank the same id as present
> in the input quad?
Hi,
As David say, blank nodes are not for identification so maybe you need
some kind of stable label liek a <urn:uuid:....>
Warning - if you work with the blanknode labels as given you will not be
following RDF. "_:a" in one file is different to "_:a" in another file
or even if the same file is read twice.
With your code (*) I get
bsomerandomid
The "_:" is not part of the blank node id.
"createUseLabelEncoded" generates legal blank node labels directly
related to the string given.
The correct form generates a globally unique id that never clashes with
any other (well, its related to a 122 bit random number).
Please take care with code next time - HTML gets converted and formatted
HTML is messed up.
Andy
>
>
> Thanks,
> Souri
>
>
>
> *Code used:*
> *private ParserProfileBase profile = new ParserProfileBase(new
> Prologue(null,*
> * IRIResolver.createNoResolve()),*
> * ErrorHandlerFactory.errorHandlerStrictSilent(),*
> * LabelToNode.createUseLabelEncoded());*
> * profile.setStrictMode(true);*
> *}*
> *List<Quad> output = new ArrayList<>();*
> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
> * while (parser.hasNext()) {*
> * output.add(parser.next());*
> * }*
> * output.get(0).getSubject().getBlankNodeId();*
>