You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by souri datta <so...@gmail.com> on 2015/04/24 21:00:00 UTC

Creating consistent ids for blank nodes

Hi,
 When  I am parsing an N-quad using Jena, is it possible to keep the blank
node id same after parsing?

For e.g., if I have a blank n-quad like

_:bsomerandomid   <ns:pred>    <http://foo.com>  <foo> .

After parsing, when I do

subject.getBlankNodeId() it runs a randomly generated string which is
different from "_:bsomerandomid" .

Is it possible to set some flags so that I get blank the same id as present
in the input quad?


Thanks,
Souri



*Code used:*
*private ParserProfileBase profile = new ParserProfileBase(new
Prologue(null,*
*            IRIResolver.createNoResolve()),*
*            ErrorHandlerFactory.errorHandlerStrictSilent(),*
*            LabelToNode.createUseLabelEncoded());*
*    profile.setStrictMode(true);*
*}*
*List<Quad> output = new ArrayList<>();*
*Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
*LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
* while (parser.hasNext()) {*
*        output.add(parser.next());*
* }*
* output.get(0).getSubject().getBlankNodeId();*

Re: Creating consistent ids for blank nodes

Posted by David Moss <ad...@gmail.com>.
As far as I am aware blank nodes are called blank for a reason. If you want a named resource, why not just create one? 

Sent from my iPhone

> On 25 Apr 2015, at 5:00 am, souri datta <so...@gmail.com> wrote:
> 
> Hi,
> When  I am parsing an N-quad using Jena, is it possible to keep the blank
> node id same after parsing?
> 
> For e.g., if I have a blank n-quad like
> 
> _:bsomerandomid   <ns:pred>    <http://foo.com>  <foo> .
> 
> After parsing, when I do
> 
> subject.getBlankNodeId() it runs a randomly generated string which is
> different from "_:bsomerandomid" .
> 
> Is it possible to set some flags so that I get blank the same id as present
> in the input quad?
> 
> 
> Thanks,
> Souri
> 
> 
> 
> *Code used:*
> *private ParserProfileBase profile = new ParserProfileBase(new
> Prologue(null,*
> *            IRIResolver.createNoResolve()),*
> *            ErrorHandlerFactory.errorHandlerStrictSilent(),*
> *            LabelToNode.createUseLabelEncoded());*
> *    profile.setStrictMode(true);*
> *}*
> *List<Quad> output = new ArrayList<>();*
> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
> * while (parser.hasNext()) {*
> *        output.add(parser.next());*
> * }*
> * output.get(0).getSubject().getBlankNodeId();*

Re: Creating consistent ids for blank nodes

Posted by David Moss <ad...@gmail.com>.
Is that function guaranteed to return the same blank node every time it is used or is it a temporary coincidence? My reading indicates you can't depend on getting the same blank node ID every time and it's pretty much impossible across systems.

When I've dealt with such data I march through it allocating the attributes of each blank node to a new resource and delete the blank node. That way I am guaranteed my resource will be there in future, and with the same ID.

Perhaps I'm too paranoid, but better safe than sorry.

Sent from my iPhone

> On 26 Apr 2015, at 7:49 am, souri datta <so...@gmail.com> wrote:
> 
> David,
> In my use case I don't have control over the n-quads generation. So, I
> cannot create named resources instead of blank nodes.
> 
> Andy,
> After looking into the jena source code, was able to find a nice utility
> that solves my problem.
> 
> Here is the interesting part of new code :
> 
>> Node sNode = output.get(0).getSubject();
> 
> if (sNode.isBlank() {
> 
>   System.out.pritnln(NodeFmtLib.str(sNode));
> 
> 
> 
> 
> The NodeFmtLib.str() returns me the same blank node id as specified in the
> n-quad.
> 
> --Souri
> 
>> On Sat, Apr 25, 2015 at 2:10 AM, Andy Seaborne <an...@apache.org> wrote:
>> 
>>> On 24/04/15 20:00, souri datta wrote:
>>> 
>>> Hi,
>>>  When  I am parsing an N-quad using Jena, is it possible to keep the
>>> blank
>>> node id same after parsing?
>>> 
>>> For e.g., if I have a blank n-quad like
>>> 
>>> _:bsomerandomid   <ns:pred>    <http://foo.com>  <foo> .
>>> 
>>> After parsing, when I do
>>> 
>>> subject.getBlankNodeId() it runs a randomly generated string which is
>>> different from "_:bsomerandomid" .
>>> 
>>> Is it possible to set some flags so that I get blank the same id as
>>> present
>>> in the input quad?
>> 
>> Hi,
>> 
>> As David say, blank nodes are not for identification so maybe you need
>> some kind of stable label liek a <urn:uuid:....>
>> 
>> Warning - if you work with the blanknode labels as given you will not be
>> following RDF.  "_:a" in one file is different to "_:a" in another file or
>> even if the same file is read twice.
>> 
>> With your code (*) I get
>> 
>> bsomerandomid
>> 
>> The "_:" is not part of the blank node id.
>> 
>> "createUseLabelEncoded" generates legal blank node labels directly related
>> to the string given.
>> 
>> The correct form generates a globally unique id that never clashes with
>> any other (well, its related to a 122 bit random number).
>> 
>> Please take care with code next time - HTML gets converted and formatted
>> HTML is messed up.
>> 
>>        Andy
>> 
>> 
>> 
>>> 
>>> Thanks,
>>> Souri
>>> 
>>> 
>>> 
>>> *Code used:*
>>> *private ParserProfileBase profile = new ParserProfileBase(new
>>> Prologue(null,*
>>> *            IRIResolver.createNoResolve()),*
>>> *            ErrorHandlerFactory.errorHandlerStrictSilent(),*
>>> *            LabelToNode.createUseLabelEncoded());*
>>> *    profile.setStrictMode(true);*
>>> *}*
>>> *List<Quad> output = new ArrayList<>();*
>>> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
>>> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
>>> * while (parser.hasNext()) {*
>>> *        output.add(parser.next());*
>>> * }*
>>> * output.get(0).getSubject().getBlankNodeId();*
>> 

Re: Creating consistent ids for blank nodes

Posted by souri datta <so...@gmail.com>.
David,
 In my use case I don't have control over the n-quads generation. So, I
cannot create named resources instead of blank nodes.

Andy,
 After looking into the jena source code, was able to find a nice utility
that solves my problem.

Here is the interesting part of new code :

> Node sNode = output.get(0).getSubject();

if (sNode.isBlank() {

   System.out.pritnln(NodeFmtLib.str(sNode));




The NodeFmtLib.str() returns me the same blank node id as specified in the
n-quad.

--Souri

On Sat, Apr 25, 2015 at 2:10 AM, Andy Seaborne <an...@apache.org> wrote:

> On 24/04/15 20:00, souri datta wrote:
>
>> Hi,
>>   When  I am parsing an N-quad using Jena, is it possible to keep the
>> blank
>> node id same after parsing?
>>
>> For e.g., if I have a blank n-quad like
>>
>> _:bsomerandomid   <ns:pred>    <http://foo.com>  <foo> .
>>
>> After parsing, when I do
>>
>> subject.getBlankNodeId() it runs a randomly generated string which is
>> different from "_:bsomerandomid" .
>>
>> Is it possible to set some flags so that I get blank the same id as
>> present
>> in the input quad?
>>
>
> Hi,
>
> As David say, blank nodes are not for identification so maybe you need
> some kind of stable label liek a <urn:uuid:....>
>
> Warning - if you work with the blanknode labels as given you will not be
> following RDF.  "_:a" in one file is different to "_:a" in another file or
> even if the same file is read twice.
>
> With your code (*) I get
>
> bsomerandomid
>
> The "_:" is not part of the blank node id.
>
> "createUseLabelEncoded" generates legal blank node labels directly related
> to the string given.
>
> The correct form generates a globally unique id that never clashes with
> any other (well, its related to a 122 bit random number).
>
> Please take care with code next time - HTML gets converted and formatted
> HTML is messed up.
>
>         Andy
>
>
>
>>
>> Thanks,
>> Souri
>>
>>
>>
>> *Code used:*
>> *private ParserProfileBase profile = new ParserProfileBase(new
>> Prologue(null,*
>> *            IRIResolver.createNoResolve()),*
>> *            ErrorHandlerFactory.errorHandlerStrictSilent(),*
>> *            LabelToNode.createUseLabelEncoded());*
>> *    profile.setStrictMode(true);*
>> *}*
>> *List<Quad> output = new ArrayList<>();*
>> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
>> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
>> * while (parser.hasNext()) {*
>> *        output.add(parser.next());*
>> * }*
>> * output.get(0).getSubject().getBlankNodeId();*
>>
>>
>

Re: Creating consistent ids for blank nodes

Posted by Andy Seaborne <an...@apache.org>.
On 24/04/15 20:00, souri datta wrote:
> Hi,
>   When  I am parsing an N-quad using Jena, is it possible to keep the blank
> node id same after parsing?
>
> For e.g., if I have a blank n-quad like
>
> _:bsomerandomid   <ns:pred>    <http://foo.com>  <foo> .
>
> After parsing, when I do
>
> subject.getBlankNodeId() it runs a randomly generated string which is
> different from "_:bsomerandomid" .
>
> Is it possible to set some flags so that I get blank the same id as present
> in the input quad?

Hi,

As David say, blank nodes are not for identification so maybe you need 
some kind of stable label liek a <urn:uuid:....>

Warning - if you work with the blanknode labels as given you will not be 
following RDF.  "_:a" in one file is different to "_:a" in another file 
or even if the same file is read twice.

With your code (*) I get

bsomerandomid

The "_:" is not part of the blank node id.

"createUseLabelEncoded" generates legal blank node labels directly 
related to the string given.

The correct form generates a globally unique id that never clashes with 
any other (well, its related to a 122 bit random number).

Please take care with code next time - HTML gets converted and formatted 
HTML is messed up.

	Andy

>
>
> Thanks,
> Souri
>
>
>
> *Code used:*
> *private ParserProfileBase profile = new ParserProfileBase(new
> Prologue(null,*
> *            IRIResolver.createNoResolve()),*
> *            ErrorHandlerFactory.errorHandlerStrictSilent(),*
> *            LabelToNode.createUseLabelEncoded());*
> *    profile.setStrictMode(true);*
> *}*
> *List<Quad> output = new ArrayList<>();*
> *Tokenizer tokenizer = TokenizerFactory.makeTokenizerString(line);*
> *LangNQuads parser = new LangNQuads(tokenizer, profile, null);*
> * while (parser.hasNext()) {*
> *        output.add(parser.next());*
> * }*
> * output.get(0).getSubject().getBlankNodeId();*
>