You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by David Moss <ad...@gmail.com> on 2013/08/15 01:03:22 UTC

Naming entities

This is a fairly basic question, but how do others go about naming entities in an RDF graph?

The semantic web evangelists are keen on URIs that mean something ie <http://admoss.info/David_Moss>.

This sounds great but in practice it doesn't scale. There are many people named David Moss in the world.

It is possible to have URIs such as <http://admoss.info/David_Moss1> <http://admoss.info/David_Moss2> ... <http://admoss.info/David_Moss249>, but differentiating between them is not a human readable task. It also becomes problematic in tracking the highest number of each entity name so additions can be made to the graph.

I first tried using blank nodes as entity identifiers but they are no good for the purpose as searching is difficult and they are not supposed to be used outside the environment in which they are created. They are supposed to be internal only references for convenience of the machine. They are also the antithesis of human readable.

I currently maintainable next_id entity in my graph and use and update its value to obtain entity names, ending up with <http://admoss.info/person22>, <http://admoss.info/organisation23> and <http://admoss.info/Building24> etc.

This is not exactly human readable, but I can't think of any naming policy that maintains the dream of human readable identifiers yet scales.

How are others addressing this issue?

Re: Naming entities

Posted by Rob Walpole <ro...@gmail.com>.

+1 for UUIDs but with some domain information. i.e.
http://my.base.uri/person/{uuid} or http://my.base.uri/employee/{uuid}

In RDF terms you can have more human readable information in the label and
render this where required, i.e.

<http://my.base.uri/person/{uuid-goes-here}> a foaf:Person ;
    rdfs:label "David Moss" .


On Thu, Aug 15, 2013 at 1:32 PM, Martynas Jusevičius
<ma...@graphity.org>wrote:

> Where uniqueness is more important than readability, I would go with UUIDs.
>
> On Thu, Aug 15, 2013 at 2:03 AM, David Moss <ad...@gmail.com> wrote:
> > This is a fairly basic question, but how do others go about naming
> entities in an RDF graph?
> >
> > The semantic web evangelists are keen on URIs that mean something ie <
> http://admoss.info/David_Moss>.
> >
> > This sounds great but in practice it doesn't scale. There are many
> people named David Moss in the world.
> >
> > It is possible to have URIs such as <http://admoss.info/David_Moss1> <
> http://admoss.info/David_Moss2> ... <http://admoss.info/David_Moss249>,
> but differentiating between them is not a human readable task. It also
> becomes problematic in tracking the highest number of each entity name so
> additions can be made to the graph.
> >
> > I first tried using blank nodes as entity identifiers but they are no
> good for the purpose as searching is difficult and they are not supposed to
> be used outside the environment in which they are created. They are
> supposed to be internal only references for convenience of the machine.
> They are also the antithesis of human readable.
> >
> > I currently maintainable next_id entity in my graph and use and update
> its value to obtain entity names, ending up with <
> http://admoss.info/person22>, <http://admoss.info/organisation23> and <
> http://admoss.info/Building24> etc.
> >
> > This is not exactly human readable, but I can't think of any naming
> policy that maintains the dream of human readable identifiers yet scales.
> >
> > How are others addressing this issue?
> >
> >
>



-- 

Rob Walpole
Email robkwalpole@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole

Re: Naming entities

Posted by Martynas Jusevičius <ma...@graphity.org>.

Where uniqueness is more important than readability, I would go with UUIDs.

On Thu, Aug 15, 2013 at 2:03 AM, David Moss <ad...@gmail.com> wrote:
> This is a fairly basic question, but how do others go about naming entities in an RDF graph?
>
> The semantic web evangelists are keen on URIs that mean something ie <http://admoss.info/David_Moss>.
>
> This sounds great but in practice it doesn't scale. There are many people named David Moss in the world.
>
> It is possible to have URIs such as <http://admoss.info/David_Moss1> <http://admoss.info/David_Moss2> ... <http://admoss.info/David_Moss249>, but differentiating between them is not a human readable task. It also becomes problematic in tracking the highest number of each entity name so additions can be made to the graph.
>
> I first tried using blank nodes as entity identifiers but they are no good for the purpose as searching is difficult and they are not supposed to be used outside the environment in which they are created. They are supposed to be internal only references for convenience of the machine. They are also the antithesis of human readable.
>
> I currently maintainable next_id entity in my graph and use and update its value to obtain entity names, ending up with <http://admoss.info/person22>, <http://admoss.info/organisation23> and <http://admoss.info/Building24> etc.
>
> This is not exactly human readable, but I can't think of any naming policy that maintains the dream of human readable identifiers yet scales.
>
> How are others addressing this issue?
>
>

RE: Naming entities

Posted by Ed Swing <Ed...@sas.com>.

I think it would depend on the scale and the domain you're aiming for. If you're trying to represent ALL people in the world, then few solutions would scale (at least until there's a universal personal ID or URI). But you could construct a (mostly) unique ID, and prefix that with a base URI for your system. For instance, you could use a base URI + email addresses (e.g., http://my.base.uri/John.Doe_AT_gmail.com). You'd have to ensure your data couldn't be accessed by spammers in this case, of course. Facebook or Twiiter home pages (or both) might also work.

While it might not link to a canonical representation, the simple fact is that there is no global canonical representation for individuals, much less a global registry.

In my case, my domains are either for internal use (employees of a company), major public figures (politicians, etc.), or individuals in a particular business (e.g., medical). For the employees, the employee ID can provide a unique representation. Public figures is relatively easy - use the URL for their Wikipedia page. Particular businesses might have some sort of internal ID (this varies, of course).

-----Original Message-----
From: David Moss [mailto:admoss0@gmail.com] 
Sent: Wednesday, August 14, 2013 7:03 PM
To: users@jena.apache.org
Subject: Naming entities

This is a fairly basic question, but how do others go about naming entities in an RDF graph?

The semantic web evangelists are keen on URIs that mean something ie <http://admoss.info/David_Moss>.

This sounds great but in practice it doesn't scale. There are many people named David Moss in the world.

It is possible to have URIs such as <http://admoss.info/David_Moss1> <http://admoss.info/David_Moss2> ... <http://admoss.info/David_Moss249>, but differentiating between them is not a human readable task. It also becomes problematic in tracking the highest number of each entity name so additions can be made to the graph.

I first tried using blank nodes as entity identifiers but they are no good for the purpose as searching is difficult and they are not supposed to be used outside the environment in which they are created. They are supposed to be internal only references for convenience of the machine. They are also the antithesis of human readable.

I currently maintainable next_id entity in my graph and use and update its value to obtain entity names, ending up with <http://admoss.info/person22>, <http://admoss.info/organisation23> and <http://admoss.info/Building24> etc.

This is not exactly human readable, but I can't think of any naming policy that maintains the dream of human readable identifiers yet scales.

How are others addressing this issue?