You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@marmotta.apache.org by Diego Benna <di...@gmail.com> on 2013/08/21 12:58:49 UTC

Hadoop+HBase and Apache Marmotta for a distribuited database RDF

Hello,
My name is Diego Benna and I'm a student of computer science at the
University of Padua. I am doing master thesis and I'm studying Apache
Marmotta and the world of the semantic web.
I found your contact on the site http://www.wikier.org/ and an online guide
where you  present Apache Marmot. I'm also studying Hadoop and tools that
allow the scalability of a system with large amounts of data.
I have a problem, and I ask you because I saw that you are an expert in the
field: I need to create a scalable system, that makes crawling and collect
data rdf.
Hadoop is right for me as scalability and Apache Marmot is right for me as
a database for rdf. But I do not know how to integrate the two. Hadoop uses
HDFS and HBase as a distributed filesystem as a database, Apache Marmot does
not support hbase. What do you advise me to do? And is possible to
integrate Apache Marmotta with HBase to handle large amounts of data using RDF
and SPARQL as the query language?
Thank you in advance for availability.

Diego Benna

Re: Hadoop+HBase and Apache Marmotta for a distribuited database RDF

Posted by Sergio Fernández <wi...@apache.org>.

Hi Diego

On 21/08/13 12:58, Diego Benna wrote:
> My name is Diego Benna and I'm a student of computer science at the
> University of Padua. I am doing master thesis and I'm studying Apache
> Marmotta and the world of the semantic web.
> I found your contact on the site http://www.wikier.org/ and an online guide
> where you  present Apache Marmot. I'm also studying Hadoop and tools that
> allow the scalability of a system with large amounts of data.
> I have a problem, and I ask you because I saw that you are an expert in the
> field: I need to create a scalable system, that makes crawling and collect
> data rdf.
> Hadoop is right for me as scalability and Apache Marmot is right for me as
> a database for rdf. But I do not know how to integrate the two. Hadoop uses
> HDFS and HBase as a distributed filesystem as a database, Apache Marmot does
> not support hbase. What do you advise me to do? And is possible to
> integrate Apache Marmotta with HBase to handle large amounts of data using RDF
> and SPARQL as the query language?

The short answer is no. But there is also a long version you might find 
more interesting:

Currently Marmotta only works on top of a triple store based on a 
relational database, known as KiWi. Further details at:

http://marmotta.incubator.apache.org/kiwi/introduction.html

But one of the mid-terms goals is to be fully Sesame-based, allowing us 
to move to other triples stores that support such RDF API. Then you 
would be able to use stuff (prototypes) you can find out there, such us:

http://github.com/editice/sesame-hbase

Also, few years ago HBase project was thinking about supporting SPARQL, 
but AFAIK no actual progress has been made, see:

http://issues.apache.org/jira/browse/HBASE-2433

This paper is very interesting for you in case you'd like to work on that:

http://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf‎

Hope this helps.

Cheers,

-- 
Sergio Fernández