You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2008/10/21 03:02:21 UTC
A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Hi all,
This RDF proposal is a good long time ago. Now we'd like to settle
down to research again. I attached our proposal, We'd love to hear
your feedback & stories!!
Thanks.
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Stuart Sierra <ma...@stuartsierra.com>.
On Mon, Oct 20, 2008 at 9:02 PM, Edward J. Yoon <ed...@apache.org> wrote:
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
Hello, Edward,
I'm very glad to see this idea moving forward. Two comments:
An essential feature, for me, would be the ability to write custom
MapReduce jobs to process RDF, independent of the RDF query processor.
That way I could plug in my own inference engine, rules engine, or
graph transformer.
I'd also like to see re-use of existing APIs wherever possible, like
JRDF or RDF2Go. It may be worth examining other large-scale RDF
databases like Mulgara to see if any code can be reused.
-Stuart
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for all feedbacks and stories.
Also, I got a lot of insightful feedbacks via private mail. WOW!!
OK. I hope to hear next time again!
/Edward
On Tue, Oct 21, 2008 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, Sorry for our mistake, "It will be one of the Apache Incubator
> Projects" should be "It will be proposed to the Apache Incubator
> Project".
>
> Thanks. :)
>
> On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for all feedbacks and stories.
Also, I got a lot of insightful feedbacks via private mail. WOW!!
OK. I hope to hear next time again!
/Edward
On Tue, Oct 21, 2008 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, Sorry for our mistake, "It will be one of the Apache Incubator
> Projects" should be "It will be proposed to the Apache Incubator
> Project".
>
> Thanks. :)
>
> On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks for all feedbacks and stories.
Also, I got a lot of insightful feedbacks via private mail. WOW!!
OK. I hope to hear next time again!
/Edward
On Tue, Oct 21, 2008 at 10:36 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Oh, Sorry for our mistake, "It will be one of the Apache Incubator
> Projects" should be "It will be proposed to the Apache Incubator
> Project".
>
> Thanks. :)
>
> On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>> --
>> Best regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Sorry for our mistake, "It will be one of the Apache Incubator
Projects" should be "It will be proposed to the Apache Incubator
Project".
Thanks. :)
On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Hyunsik Choi <hy...@korea.ac.kr>.
Hi Colin,
I'm a member of RDF proposal. I have one question as to Metaweb. Do you
intend to make Metaweb open source?
Hyunsik Choi
On Mon, 2008-10-20 at 18:23 -0700, Colin Evans wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat
> files, and have written a simple query language and planner that
> executes the queries with chained map-reduce jobs. This approach works
> well for warehousing triple data, and doesn't require HBase. Queries
> may take a few minutes to execute, but the system scales for very large
> datasets and result sets because it doesn't try to resolve queries in
> memory. We're currently testing with more than 150MM triples and have
> been happy with the results.
>
> -Colin
>
>
> Edward J. Yoon wrote:
> > Hi all,
> >
> > This RDF proposal is a good long time ago. Now we'd like to settle
> > down to research again. I attached our proposal, We'd love to hear
> > your feedback & stories!!
> >
> > Thanks.
> >
>
--
-----------------------------------------------------------------
Hyunsik Choi (Ph.D Student)
Laboratory of Prof. Yon Dohn Chung
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
TEL : +82-2-3290-3580
-----------------------------------------------------------------
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Any feedback?
We also need a feedback from core committers.
/Edward
On Tue, Oct 21, 2008 at 3:13 PM, Hyunsik Choi <hy...@gmail.com> wrote:
> Although we proposed the system for RDF data, we actually are
> considering more general system for graph data model. Actually, many
> data in real world can be represented graph data model. In particular,
> besides web data some data domains (i.e., biological data, chemical
> data, social networks, and so on) are rather represented as graph data.
>
> What do you think about that?
>
> --
> Hyunsik Choi
> Database & Information Systems Lab, Korea University
>
>
> Edward J. Yoon wrote:
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Hyunsik Choi <hy...@gmail.com>.
Although we proposed the system for RDF data, we actually are
considering more general system for graph data model. Actually, many
data in real world can be represented graph data model. In particular,
besides web data some data domains (i.e., biological data, chemical
data, social networks, and so on) are rather represented as graph data.
What do you think about that?
--
Hyunsik Choi
Database & Information Systems Lab, Korea University
Edward J. Yoon wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
>
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, I remember freebase.com which are mentioned by barney pell
(powerset CTO) at our company (NHN, corp) meeting.
Hmm, The two approaches seem slightly different. However, I hope we
can work together in the near future if it possible.
/Edward
On Tue, Oct 21, 2008 at 1:41 PM, Colin Evans <co...@metaweb.com> wrote:
>
> We've got a lot of open source projects related to Hadoop and to our graph
> data available at http://research.freebase.com, but we aren't planning on
> open sourcing our graph processing work around Hadoop yet.
>
>
> Hyunsik Choi wrote:
>>
>> Hi Colin,
>>
>> I'm a member of RDF proposal. I have one question as to Metaweb. Do
>> you (or your company) have a plan to make Metaweb to be open source?
>>
>> Hyunsik Choi
>>
>> -----------------------------------------------------------------
>> Hyunsik Choi (Ph.D Student)
>>
>> Laboratory of Prof. Yon Dohn Chung
>> Database & Information Systems Group
>> Dept. of Computer Science & Engineering, Korea University
>> 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
>>
>> TEL : +82-2-3290-3580
>> -----------------------------------------------------------------
>>
>> On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <co...@metaweb.com> wrote:
>>
>>>
>>> Hi Edward,
>>> At Metaweb, we're experimenting with storing raw triples in HDFS flat
>>> files,
>>> and have written a simple query language and planner that executes the
>>> queries with chained map-reduce jobs. This approach works well for
>>> warehousing triple data, and doesn't require HBase. Queries may take a
>>> few
>>> minutes to execute, but the system scales for very large datasets and
>>> result
>>> sets because it doesn't try to resolve queries in memory. We're
>>> currently
>>> testing with more than 150MM triples and have been happy with the
>>> results.
>>>
>>> -Colin
>>>
>>>
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> This RDF proposal is a good long time ago. Now we'd like to settle
>>>> down to research again. I attached our proposal, We'd love to hear
>>>> your feedback & stories!!
>>>>
>>>> Thanks.
>>>>
>>>>
>>>
>>>
>
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Colin Evans <co...@metaweb.com>.
We've got a lot of open source projects related to Hadoop and to our
graph data available at http://research.freebase.com, but we aren't
planning on open sourcing our graph processing work around Hadoop yet.
Hyunsik Choi wrote:
> Hi Colin,
>
> I'm a member of RDF proposal. I have one question as to Metaweb. Do
> you (or your company) have a plan to make Metaweb to be open source?
>
> Hyunsik Choi
>
> -----------------------------------------------------------------
> Hyunsik Choi (Ph.D Student)
>
> Laboratory of Prof. Yon Dohn Chung
> Database & Information Systems Group
> Dept. of Computer Science & Engineering, Korea University
> 1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
>
> TEL : +82-2-3290-3580
> -----------------------------------------------------------------
>
> On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <co...@metaweb.com> wrote:
>
>> Hi Edward,
>> At Metaweb, we're experimenting with storing raw triples in HDFS flat files,
>> and have written a simple query language and planner that executes the
>> queries with chained map-reduce jobs. This approach works well for
>> warehousing triple data, and doesn't require HBase. Queries may take a few
>> minutes to execute, but the system scales for very large datasets and result
>> sets because it doesn't try to resolve queries in memory. We're currently
>> testing with more than 150MM triples and have been happy with the results.
>>
>> -Colin
>>
>>
>> Edward J. Yoon wrote:
>>
>>> Hi all,
>>>
>>> This RDF proposal is a good long time ago. Now we'd like to settle
>>> down to research again. I attached our proposal, We'd love to hear
>>> your feedback & stories!!
>>>
>>> Thanks.
>>>
>>>
>>
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Hyunsik Choi <hy...@korea.ac.kr>.
Hi Colin,
I'm a member of RDF proposal. I have one question as to Metaweb. Do
you (or your company) have a plan to make Metaweb to be open source?
Hyunsik Choi
-----------------------------------------------------------------
Hyunsik Choi (Ph.D Student)
Laboratory of Prof. Yon Dohn Chung
Database & Information Systems Group
Dept. of Computer Science & Engineering, Korea University
1, 5-ga, Anam-dong, Seongbuk-gu, Seoul, 136-713, Republic of Korea
TEL : +82-2-3290-3580
-----------------------------------------------------------------
On Tue, Oct 21, 2008 at 10:23 AM, Colin Evans <co...@metaweb.com> wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat files,
> and have written a simple query language and planner that executes the
> queries with chained map-reduce jobs. This approach works well for
> warehousing triple data, and doesn't require HBase. Queries may take a few
> minutes to execute, but the system scales for very large datasets and result
> sets because it doesn't try to resolve queries in memory. We're currently
> testing with more than 150MM triples and have been happy with the results.
>
> -Colin
>
>
> Edward J. Yoon wrote:
>>
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>
>
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Ted Dunning <te...@gmail.com>.
At Veoh the recommendation data amounts to many billions of (roughly) these
triples and this approach works very well indeed, even on tiny development
clusters.
On Mon, Oct 20, 2008 at 6:23 PM, Colin Evans <co...@metaweb.com> wrote:
> Hi Edward,
> At Metaweb, we're experimenting with storing raw triples in HDFS flat
> files, and have written a simple query language and planner that executes
> the queries with chained map-reduce jobs. This approach works well for
> warehousing triple data, and doesn't require HBase. Queries may take a few
> minutes to execute, but the system scales for very large datasets and result
> sets because it doesn't try to resolve queries in memory. We're currently
> testing with more than 150MM triples and have been happy with the results.
>
> -Colin
>
>
>
> Edward J. Yoon wrote:
>
>> Hi all,
>>
>> This RDF proposal is a good long time ago. Now we'd like to settle
>> down to research again. I attached our proposal, We'd love to hear
>> your feedback & stories!!
>>
>> Thanks.
>>
>>
>
>
--
ted
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by Colin Evans <co...@metaweb.com>.
Hi Edward,
At Metaweb, we're experimenting with storing raw triples in HDFS flat
files, and have written a simple query language and planner that
executes the queries with chained map-reduce jobs. This approach works
well for warehousing triple data, and doesn't require HBase. Queries
may take a few minutes to execute, but the system scales for very large
datasets and result sets because it doesn't try to resolve queries in
memory. We're currently testing with more than 150MM triples and have
been happy with the results.
-Colin
Edward J. Yoon wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
>
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Sorry for our mistake, "It will be one of the Apache Incubator
Projects" should be "It will be proposed to the Apache Incubator
Project".
Thanks. :)
On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
Re: A Scale-Out RDF Store for Distributed Processing on Map/Reduce
Posted by "Edward J. Yoon" <ed...@apache.org>.
Oh, Sorry for our mistake, "It will be one of the Apache Incubator
Projects" should be "It will be proposed to the Apache Incubator
Project".
Thanks. :)
On Tue, Oct 21, 2008 at 10:02 AM, Edward J. Yoon <ed...@apache.org> wrote:
> Hi all,
>
> This RDF proposal is a good long time ago. Now we'd like to settle
> down to research again. I attached our proposal, We'd love to hear
> your feedback & stories!!
>
> Thanks.
> --
> Best regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>
--
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org