You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2008/10/21 03:02:21 UTC

A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Hi all,

This RDF proposal is a good long time ago. Now we'd like to settle
down to research again. I attached our proposal, We'd love to hear
your feedback & stories!!

Thanks.
-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Stuart Sierra <ma...@stuartsierra.com>.
On Mon, Oct 20, 2008 at 9:02 PM, Edward J. Yoon <ed...@apache.org> wrote:
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!

Hello, Edward,
I'm very glad to see this idea moving forward.  Two comments:

An essential feature, for me, would be the ability to write custom
MapReduce jobs to process RDF, independent of the RDF query processor.
 That way I could plug in my own inference engine, rules engine, or
graph transformer.

I'd also like to see re-use of existing APIs wherever possible, like
JRDF or RDF2Go.  It may be worth examining other large-scale RDF
databases like Mulgara to see if any code can be reused.
-Stuart

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for all feedbacks and stories.
Also, I got a lot of insightful feedbacks via private mail. WOW!!

OK. I hope to hear next time again!

/Edward

On Tue, Oct 21, 2008 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, Sorry for our mistake, "It will be one of the Apache Incubator
> Projects" should be "It will be proposed to the Apache Incubator
> Project".
>
> Thanks. :)
>
> On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for all feedbacks and stories.
Also, I got a lot of insightful feedbacks via private mail. WOW!!

OK. I hope to hear next time again!

/Edward

On Tue, Oct 21, 2008 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, Sorry for our mistake, "It will be one of the Apache Incubator
> Projects" should be "It will be proposed to the Apache Incubator
> Project".
>
> Thanks. :)
>
> On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for all feedbacks and stories.
Also, I got a lot of insightful feedbacks via private mail. WOW!!

OK. I hope to hear next time again!

/Edward

On Tue, Oct 21, 2008 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, Sorry for our mistake, "It will be one of the Apache Incubator
> Projects" should be "It will be proposed to the Apache Incubator
> Project".
>
> Thanks. :)
>
> On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Sorry for our mistake, "It will be one of the Apache Incubator
Projects" should be "It will be proposed to the Apache Incubator
Project".

Thanks. :)

On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Hyunsik Choi <hy...@korea.ac.kr>.
Hi Colin,

I'm a member of RDF proposal. I have one question as to Metaweb. Do you
intend to make Metaweb open source?

Hyunsik Choi

On Mon, 2008-10-20 at 18:23 -0700, Colin Evans wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat 
> files, and have written a simple query language and planner that 
> executes the queries with chained map-reduce jobs.  This approach works 
> well for warehousing triple data, and doesn't require HBase.  Queries 
> may take a few minutes to execute, but the system scales for very large 
> datasets and result sets because it doesn't try to resolve queries in 
> memory.  We're currently testing with more than 150MM triples and have 
> been happy with the results.
> 
> -Colin
> 
> 
> Edward J. Yoon wrote:
> > Hi all,
> >
> > This RDF proposal is a good long time ago. Now we'd like to settle
> > down to research again. I attached our proposal, We'd love to hear
> > your feedback & stories!!
> >
> > Thanks.
> >   
> 
-- 
-----------------------------------------------------------------
Hyunsik Choi (Ph.D Student)

Laboratory of Prof. Yon Dohn Chung
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea

TEL : +82-2-3290-3580
-----------------------------------------------------------------


Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Any feedback?

We also need a feedback from core committers.

/Edward

On Tue, Oct 21, 2008 at 3:13 PM, Hyunsik Choi <hy...@gmail.com> wrote:
> Although we proposed the system for RDF data, we actually are
> considering more general system for graph data model. Actually, many
> data in real world can be represented graph data model. In particular,
> besides web data some data domains (i.e., biological data, chemical
> data, social networks, and so on) are rather represented as graph data.
>
> What do you think about that?
>
> --
> Hyunsik Choi
> Database & Information Systems Lab, Korea University
>
>
> Edward J. Yoon wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Hyunsik Choi <hy...@gmail.com>.
Although we proposed the system for RDF data, we actually are
considering more general system for graph data model. Actually, many
data in real world can be represented graph data model. In particular,
besides web data some data domains (i.e., biological data, chemical
data, social networks, and so on) are rather represented as graph data.

What do you think about that?

-- 
Hyunsik Choi
Database & Information Systems Lab, Korea University


Edward J. Yoon wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
>   


Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, I remember freebase.com which are mentioned by barney pell
(powerset CTO) at our company (NHN, corp) meeting.

Hmm, The two approaches seem slightly different. However, I hope we
can work together in the near future if it possible.

/Edward

On Tue, Oct 21, 2008 at 1:41 PM, Colin Evans <co...@metaweb.com> wrote:
>
> We've got a lot of open source projects related to Hadoop and to our graph
> data available at http://research.freebase.com, but we aren't planning on
> open sourcing our graph processing work around Hadoop yet.
>
>
> Hyunsik Choi wrote:
>>
>> Hi Colin,
>>
>> I'm a member of RDF proposal. I have one question as to Metaweb. Do
>> you (or your company) have a plan to make Metaweb to be open source?
>>
>> Hyunsik Choi
>>
>> -----------------------------------------------------------------
>> Hyunsik Choi (Ph.D Student)
>>
>> Laboratory of Prof. Yon Dohn Chung
>> Database & Information Systems Group
>> Dept. of Computer Science & Engineering, Korea University
>> 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
>>
>> TEL : +82-2-3290-3580
>> -----------------------------------------------------------------
>>
>> On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <co...@metaweb.com> wrote:
>>
>>>
>>> Hi Edward,
>>> At Metaweb, we're experimenting with storing raw triples in HDFS flat
>>> files,
>>> and have written a simple query language and planner that executes the
>>> queries with chained map-reduce jobs.  This approach works well for
>>> warehousing triple data, and doesn't require HBase.  Queries may take a
>>> few
>>> minutes to execute, but the system scales for very large datasets and
>>> result
>>> sets because it doesn't try to resolve queries in memory.  We're
>>> currently
>>> testing with more than 150MM triples and have been happy with the
>>> results.
>>>
>>> -Colin
>>>
>>>
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> This RDF proposal is a good long time ago. Now we'd like to settle
>>>> down to research again. I attached our proposal, We'd love to hear
>>>> your feedback & stories!!
>>>>
>>>> Thanks.
>>>>
>>>>
>>>
>>>
>
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Colin Evans <co...@metaweb.com>.
We've got a lot of open source projects related to Hadoop and to our 
graph data available at http://research.freebase.com, but we aren't 
planning on open sourcing our graph processing work around Hadoop yet.


Hyunsik Choi wrote:
> Hi Colin,
>
> I'm a member of RDF proposal. I have one question as to Metaweb. Do
> you (or your company) have a plan to make Metaweb to be open source?
>
> Hyunsik Choi
>
> -----------------------------------------------------------------
> Hyunsik Choi (Ph.D Student)
>
> Laboratory of Prof. Yon Dohn Chung
> Database & Information Systems Group
> Dept. of Computer Science & Engineering, Korea University
> 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
>
> TEL : +82-2-3290-3580
> -----------------------------------------------------------------
>
> On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <co...@metaweb.com> wrote:
>   
>> Hi Edward,
>> At Metaweb, we're experimenting with storing raw triples in HDFS flat files,
>> and have written a simple query language and planner that executes the
>> queries with chained map-reduce jobs.  This approach works well for
>> warehousing triple data, and doesn't require HBase.  Queries may take a few
>> minutes to execute, but the system scales for very large datasets and result
>> sets because it doesn't try to resolve queries in memory.  We're currently
>> testing with more than 150MM triples and have been happy with the results.
>>
>> -Colin
>>
>>
>> Edward J. Yoon wrote:
>>     
>>> Hi all,
>>>
>>> This RDF proposal is a good long time ago. Now we'd like to settle
>>> down to research again. I attached our proposal, We'd love to hear
>>> your feedback & stories!!
>>>
>>> Thanks.
>>>
>>>       
>>     


Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Hyunsik Choi <hy...@korea.ac.kr>.
Hi Colin,

I'm a member of RDF proposal. I have one question as to Metaweb. Do
you (or your company) have a plan to make Metaweb to be open source?

Hyunsik Choi

-----------------------------------------------------------------
Hyunsik Choi (Ph.D Student)

Laboratory of Prof. Yon Dohn Chung
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea

TEL : +82-2-3290-3580
-----------------------------------------------------------------

On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <co...@metaweb.com> wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat files,
> and have written a simple query language and planner that executes the
> queries with chained map-reduce jobs.  This approach works well for
> warehousing triple data, and doesn't require HBase.  Queries may take a few
> minutes to execute, but the system scales for very large datasets and result
> sets because it doesn't try to resolve queries in memory.  We're currently
> testing with more than 150MM triples and have been happy with the results.
>
> -Colin
>
>
> Edward J. Yoon wrote:
>>
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>
>

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Ted Dunning <te...@gmail.com>.
At Veoh the recommendation data amounts to many billions of (roughly) these
triples and this approach works very well indeed, even on tiny development
clusters.

On Mon, Oct 20, 2008 at 6:23 PM, Colin Evans <co...@metaweb.com> wrote:

> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat
> files, and have written a simple query language and planner that executes
> the queries with chained map-reduce jobs.  This approach works well for
> warehousing triple data, and doesn't require HBase.  Queries may take a few
> minutes to execute, but the system scales for very large datasets and result
> sets because it doesn't try to resolve queries in memory.  We're currently
> testing with more than 150MM triples and have been happy with the results.
>
> -Colin
>
>
>
> Edward J. Yoon wrote:
>
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>>
>
>


-- 
ted

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by Colin Evans <co...@metaweb.com>.
Hi Edward,
At Metaweb, we're experimenting with storing raw triples in HDFS flat 
files, and have written a simple query language and planner that 
executes the queries with chained map-reduce jobs.  This approach works 
well for warehousing triple data, and doesn't require HBase.  Queries 
may take a few minutes to execute, but the system scales for very large 
datasets and result sets because it doesn't try to resolve queries in 
memory.  We're currently testing with more than 150MM triples and have 
been happy with the results.

-Colin


Edward J. Yoon wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
>   


Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Sorry for our mistake, "It will be one of the Apache Incubator
Projects" should be "It will be proposed to the Apache Incubator
Project".

Thanks. :)

On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce

Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Sorry for our mistake, "It will be one of the Apache Incubator
Projects" should be "It will be proposed to the Apache Incubator
Project".

Thanks. :)

On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org