You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Marcel Ferrante <ma...@gmail.com> on 2012/02/22 20:01:35 UTC

Adding Semantics to WORDPRESS with JENA

Hi everyone,

we are developing a plugin that wants add semantics to wordpress CMS,
and we are thing about integrate jena with wordpress to store the metadata.

For example, we have a wordpress MU of music artistis. The artistic, musics,
albuns, etc are represent with posts. So custom post is similar a "class",
the post simiiliar a instance, and custom fields similiar a atributes.

Here we can ser more about custom posts:
http://vimeo.com/32661608  http://vimeo.com/10187055

And here we can see the wordpress ER:
http://www.inqbation.com/wp-content/uploads/2011/05/WP3.0-ERD.png

So we can aloud user to create semantic relations beetween posts, links,
terms...any resource in wordpress.

We acctually created a oblivous triple table to store the statments like
this:

+----+---wp_posts---+-----------------------+
+-id-+---post_title---+---post_author---+
| 11  |   car             |   12                 |
| 14  |   fusca          |   32                 |
| 23  |   my fusca    |   43                 |


+----+---wp_images----+-----------------+
+-id-+---------title--------+-----url---------+
| 9   |  fusca photo1  | ../fusca1.jpg  |


+----+-wp_nodes-+-------------------------------------------------+-literal-+
+-id-+---wp_id-----+-----value-------------------------------------+---------+
| 1  |    11    | posts                                                | 0 |
| 2  |    14    | posts                       | 0 |
| 3  |    23    | posts                       | 0 |
| 4  |    9      | images               | 0 |
| 5  |            | http://purl.org/dc/terms/creator            | 0 |
| 6  |            | http://www.w3.org/2000/01/rdf-schema#subClassOf   | 0 |
| 7  |            | http://www.w3.org/1999/02/22-rdf-syntax-ns#type      |
0 |
| 8  |            | has_photo         | 0 |
| 9  |   | mileage             | 0 |
| 10 |           | 1981     | 1 |
| 11 |    43    | users     | 0 |

when wp_id not null, then value is a wordpress resource (table or
table-field)
when literal equal 1, then value is literal

+----+--wp_spo---+-----+
+-id-+--s--+--p---+--o--+
| 1   |  2   |  6    | 1    |
| 1   |  3   |  7    | 2    |
| 1   |  3   |  5    | 11  |
| 1   |  3   |  8    | 4    |
| 1   |  3   |  9    | 10  |

The question is:
1.has this model a good performance ?
2.how store this using Jena SDB?

The new documentation is incomplete (dont have an explanation for each
field, datatype, and how works) :
http://incubator.apache.org/jena/documentation/sdb/database_layouts.html

This one is more detailed:
http://jena.sourceforge.net/DB/layout.html
But its the old one.

So, how does jena sdb store namespace, literals, graphs?
Id like any documentation, could be an article, that explain why this model
is better than previous models (like in Efficient RDF Storage and Retrieval
in Jena2 article)

Maybe someone could send a mysql dump with a litle rdf graph stored.
If we dont use, it could help us to improve our model.

This could be used in many applications like:

- a simple semantic annoations tool with a familiar interface
- semantic portals or knowledge management system
- rdf explorer (to wiki.dbpedia.org/Downloads for example)

You could say, but there is already many semantic tools.
Ye, but which one has more than 70 million sites around the world ?
http://en.wordpress.com/stats/

Should I send this issue to dev mailling list ?

Thanks in advance,
Marcel



-- 
Marcel Ferrante Silva
"The Power of Ideas"
skype: marcelferrante
msn/gtalk: marcelf@gmail.com

Re: Adding Semantics to WORDPRESS with JENA

Posted by Andy Seaborne <an...@apache.org>.
On 23/02/12 11:59, Marcel Ferrante wrote:
> Hi Andy,
>
> It`s me again ;)

So it is :-)

>
>
>>   I don't think I understand this -
>>
>> 2 6 1 seems to be   posts subclassof  posts
>>
>> but shouldn't it be the same S and O id?
>>
>
> You are right ! Excluse-me. I miss the g column.

It's not the g col I'm pointing out - you seem to have two "posts" with 
different ids.  That's going to get confusing (i.e. wrong) in RDF.

>
> +----+--wp_spo----------+-----+
> +-id-+--g--+--s--+--p---+--o--+
> | 1   |   1   |   2  |  6    | 1    |
> | 2   |   1   | 13   |  7    | 2    |
> | 3   |   1   | 13   |  5    | 11  |
>
> And we could include hash and lang in wp_nodes table:
>
> +----+-wp_nodes-+-------------------+----------+---------+--hash--+
> +-id-+--wp_id--+----value-----------+-literal-+--lang--+--hash--+
> | 1   |    11      | posts               | 0         |    1     |  23423 +
> | 2   |    14      | posts        | 0          |    1     | 54523 +
>
>
>
>> 3 7 2 seems to be   posts rdf:type  posts
>> 3 9 10 seems to be   posts mileage 19811
>>
>> so I guess I haven't guessed the foreign key relationships correctly.
>>
>> The second triple means:
>
> <rdf:description about=" http://www.sellcars.com.br/wp#my-fusca">
>   <mileage>19811</mileage>
> </rdf:description>
>
> The wp_nodes is a bridge to wordpress resources: wp_id is the id of the
> original table.
> In wp_nodes we store all: wp resources, external resources and literals...
>
> Doubt: is better (for performance) separate in two different tables (like
> wp_resources and wp_literals) ?

Probably makes no difference.  If you do split them, it is more 
complicated as you have to know which id refers to which tables.  id=789 
maybe a literal or a URI and to find it you may end up looking in both.

	Andy

>
> Thanks
> Marcel
>


Re: Adding Semantics to WORDPRESS with JENA

Posted by Marcel Ferrante <ma...@gmail.com>.
Hi Andy,

It`s me again ;)


>  I don't think I understand this -
>
> 2 6 1 seems to be   posts subclassof  posts
>
> but shouldn't it be the same S and O id?
>

You are right ! Excluse-me. I miss the g column.

+----+--wp_spo----------+-----+
+-id-+--g--+--s--+--p---+--o--+
| 1   |   1   |   2  |  6    | 1    |
| 2   |   1   | 13   |  7    | 2    |
| 3   |   1   | 13   |  5    | 11  |

And we could include hash and lang in wp_nodes table:

+----+-wp_nodes-+-------------------+----------+---------+--hash--+
+-id-+--wp_id--+----value-----------+-literal-+--lang--+--hash--+
| 1   |    11      | posts               | 0         |    1     |  23423 +
| 2   |    14      | posts        | 0          |    1     | 54523 +



> 3 7 2 seems to be   posts rdf:type  posts
> 3 9 10 seems to be   posts mileage 19811
>
> so I guess I haven't guessed the foreign key relationships correctly.
>
> The second triple means:

<rdf:description about=" http://www.sellcars.com.br/wp#my-fusca">
 <mileage>19811</mileage>
</rdf:description>

The wp_nodes is a bridge to wordpress resources: wp_id is the id of the
original table.
In wp_nodes we store all: wp resources, external resources and literals...

Doubt: is better (for performance) separate in two different tables (like
wp_resources and wp_literals) ?

Thanks
Marcel

-- 
Marcel Ferrante Silva
"The Power of Ideas"
skype: marcelferrante
msn/gtalk: marcelf@gmail.com

Re: Adding Semantics to WORDPRESS with JENA

Posted by Andy Seaborne <an...@apache.org>.
On 22/02/12 19:01, Marcel Ferrante wrote:

> We acctually created a oblivous triple table to store the statments like
> this:

> +----+---wp_posts---+-----------------------+
> +-id-+---post_title---+---post_author---+
> | 11  |   car             |   12                 |
> | 14  |   fusca          |   32                 |
> | 23  |   my fusca    |   43                 |
>
>
> +----+---wp_images----+-----------------+
> +-id-+---------title--------+-----url---------+
> | 9   |  fusca photo1  | ../fusca1.jpg  |
>
>
> +----+-wp_nodes-+-------------------------------------------------+-literal-+
> +-id-+---wp_id-----+-----value-------------------------------------+---------+
> | 1  |    11    | posts                                                | 0 |
> | 2  |    14    | posts                       | 0 |
> | 3  |    23    | posts                       | 0 |
> | 4  |    9      | images               | 0 |
> | 5  |            | http://purl.org/dc/terms/creator            | 0 |
> | 6  |            | http://www.w3.org/2000/01/rdf-schema#subClassOf   | 0 |
> | 7  |            | http://www.w3.org/1999/02/22-rdf-syntax-ns#type      |
> 0 |
> | 8  |            | has_photo         | 0 |
> | 9  |   | mileage             | 0 |
> | 10 |           | 1981     | 1 |
> | 11 |    43    | users     | 0 |
>
> when wp_id not null, then value is a wordpress resource (table or
> table-field)
> when literal equal 1, then value is literal
>
> +----+--wp_spo---+-----+
> +-id-+--s--+--p---+--o--+
> | 1   |  2   |  6    | 1    |
> | 1   |  3   |  7    | 2    |
> | 1   |  3   |  5    | 11  |
> | 1   |  3   |  8    | 4    |
> | 1   |  3   |  9    | 10  |

I don't think I understand this -

2 6 1 seems to be   posts subclassof  posts

but shouldn't it be the same S and O id?

3 7 2 seems to be   posts rdf:type  posts
3 9 10 seems to be   posts mileage 19811

so I guess I haven't guessed the foreign key relationships correctly.

(it got rather damaged by email as well)

	Andy

Re: Adding Semantics to WORDPRESS with JENA

Posted by Andy Seaborne <an...@apache.org>.
On 22/02/12 19:01, Marcel Ferrante wrote:
> The question is:
> 1.has this model a good performance ?
> 2.how store this using Jena SDB?

You should consider using use SDB through the API.  This would mean yo 
don't have to worry about the internal details.  Maybe even store the 
details in a separate store and use SPARQL to access it (e.g. Fuseki as 
database layer).

> The new documentation is incomplete (dont have an explanation for each
> field, datatype, and how works) :

The G/S/P/O are ids (Id or Hash, depending on choice) in to the node table.

> http://incubator.apache.org/jena/documentation/sdb/database_layouts.html

The exact layout details can be found in the source code for the 
particular database you wish to use.   slight variation for differnt 
engines.

	Andy