You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2011/11/03 16:52:47 UTC

Giraph: anyone?

Hi,
I wonder if anyone here had a look at Apache Giraph (in the incubator)
(i.e. a Google Pregel clone), here: http://incubator.apache.org/giraph/

I am curious to know if and how this could be used to implement an
rule based inference engine. :-)

Pregel seems to me a better model/architecture (than MapReduce) for
things such as the RETE algorithm.

Having said that, a think you can easily do with MapReduce is to
distribute all your vocabularies/ontologies to all your nodes (via
DistributedCache) load them in RAM (they are typically not that
big) and then perform inference in parallel.

An example of this is here: [1,2]. This is using RIOT infer approach,
but I was wondering if I could just use Jena inference engine there
and if that would work on one triple at the time. I don't think
everything works in this case.

However, perhaps there is a way to group the RDF data so that
inference would work as if all the instance data were available.
Any idea/suggestion?

Since I am here and I was looking at RIOT infer command implementation
yesterday, I found this comment here [3]:

  * TDB Infer
  *   RDFS
  *   owl:sameAs (in T-Box, not A-Box)
  *   owl:equivalentClass, owl:equivalentProperty
  *   owl:TransitiveProperty, owl:SymmetricProperty

(by the way, interesting comments in the RIOT infer's command ;-))

I see how RDFS is implemented (very elegant and compact).

I can see how we could do similar things for:

  - owl:equivalentClass
  - owl:equivalentProperty
  - owl:SymmetricProperty

I think I know how to do:

  - owl:sameAs (in T-Box, not A-Box)

But, I do not really see how we could do owl:TransitiveProperty in a
similar streaming fashion. Is that possible?

I was also thinking if those "fragments" of OWL should be added to the
existing Java classes or we should keep RDFS and OWL separate and have
InferenceProcessorOWL.java.

Cheers,
Paolo

  [1] 
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java
  [2] 
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java
  [3] 
http://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/main/java/riotcmd/infer.java

Re: Giraph: anyone?

Posted by Paolo Castagna <ca...@googlemail.com>.
Stephen Allen wrote:
> Hi Paulo,
> 
> You may be interested this paper from ISWC 2009: "Parallel
> Materialization of the Finite RDFS Closure for Hundreds of Millions of
> Triples" [1].
> 
> -Stephen

Thank you Stephen.

Paolo (*)


(*) the Italian version, not the Brasilian one :-)

> 
> [1] http://data.semanticweb.org/pdfs/iswc/2009/paper241.pdf
> 
> 
> On Thu, Nov 3, 2011 at 11:52 AM, Paolo Castagna
> <ca...@googlemail.com> wrote:
>> Hi,
>> I wonder if anyone here had a look at Apache Giraph (in the incubator)
>> (i.e. a Google Pregel clone), here: http://incubator.apache.org/giraph/
>>
>> I am curious to know if and how this could be used to implement an
>> rule based inference engine. :-)
>>
>> Pregel seems to me a better model/architecture (than MapReduce) for
>> things such as the RETE algorithm.
>>
>> Having said that, a think you can easily do with MapReduce is to
>> distribute all your vocabularies/ontologies to all your nodes (via
>> DistributedCache) load them in RAM (they are typically not that
>> big) and then perform inference in parallel.
>>
>> An example of this is here: [1,2]. This is using RIOT infer approach,
>> but I was wondering if I could just use Jena inference engine there
>> and if that would work on one triple at the time. I don't think
>> everything works in this case.
>>
>> However, perhaps there is a way to group the RDF data so that
>> inference would work as if all the instance data were available.
>> Any idea/suggestion?
>>
>> Since I am here and I was looking at RIOT infer command implementation
>> yesterday, I found this comment here [3]:
>>
>>  * TDB Infer
>>  *   RDFS
>>  *   owl:sameAs (in T-Box, not A-Box)
>>  *   owl:equivalentClass, owl:equivalentProperty
>>  *   owl:TransitiveProperty, owl:SymmetricProperty
>>
>> (by the way, interesting comments in the RIOT infer's command ;-))
>>
>> I see how RDFS is implemented (very elegant and compact).
>>
>> I can see how we could do similar things for:
>>
>>  - owl:equivalentClass
>>  - owl:equivalentProperty
>>  - owl:SymmetricProperty
>>
>> I think I know how to do:
>>
>>  - owl:sameAs (in T-Box, not A-Box)
>>
>> But, I do not really see how we could do owl:TransitiveProperty in a
>> similar streaming fashion. Is that possible?
>>
>> I was also thinking if those "fragments" of OWL should be added to the
>> existing Java classes or we should keep RDFS and OWL separate and have
>> InferenceProcessorOWL.java.
>>
>> Cheers,
>> Paolo
>>
>>  [1]
>> https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java
>>  [2]
>> https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java
>>  [3]
>> http://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/main/java/riotcmd/infer.java
>>


Re: Giraph: anyone?

Posted by Stephen Allen <sa...@apache.org>.
Hi Paulo,

You may be interested this paper from ISWC 2009: "Parallel
Materialization of the Finite RDFS Closure for Hundreds of Millions of
Triples" [1].

-Stephen

[1] http://data.semanticweb.org/pdfs/iswc/2009/paper241.pdf


On Thu, Nov 3, 2011 at 11:52 AM, Paolo Castagna
<ca...@googlemail.com> wrote:
> Hi,
> I wonder if anyone here had a look at Apache Giraph (in the incubator)
> (i.e. a Google Pregel clone), here: http://incubator.apache.org/giraph/
>
> I am curious to know if and how this could be used to implement an
> rule based inference engine. :-)
>
> Pregel seems to me a better model/architecture (than MapReduce) for
> things such as the RETE algorithm.
>
> Having said that, a think you can easily do with MapReduce is to
> distribute all your vocabularies/ontologies to all your nodes (via
> DistributedCache) load them in RAM (they are typically not that
> big) and then perform inference in parallel.
>
> An example of this is here: [1,2]. This is using RIOT infer approach,
> but I was wondering if I could just use Jena inference engine there
> and if that would work on one triple at the time. I don't think
> everything works in this case.
>
> However, perhaps there is a way to group the RDF data so that
> inference would work as if all the instance data were available.
> Any idea/suggestion?
>
> Since I am here and I was looking at RIOT infer command implementation
> yesterday, I found this comment here [3]:
>
>  * TDB Infer
>  *   RDFS
>  *   owl:sameAs (in T-Box, not A-Box)
>  *   owl:equivalentClass, owl:equivalentProperty
>  *   owl:TransitiveProperty, owl:SymmetricProperty
>
> (by the way, interesting comments in the RIOT infer's command ;-))
>
> I see how RDFS is implemented (very elegant and compact).
>
> I can see how we could do similar things for:
>
>  - owl:equivalentClass
>  - owl:equivalentProperty
>  - owl:SymmetricProperty
>
> I think I know how to do:
>
>  - owl:sameAs (in T-Box, not A-Box)
>
> But, I do not really see how we could do owl:TransitiveProperty in a
> similar streaming fashion. Is that possible?
>
> I was also thinking if those "fragments" of OWL should be added to the
> existing Java classes or we should keep RDFS and OWL separate and have
> InferenceProcessorOWL.java.
>
> Cheers,
> Paolo
>
>  [1]
> https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java
>  [2]
> https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java
>  [3]
> http://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/main/java/riotcmd/infer.java
>