You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2009/03/31 10:58:54 UTC

Re: Schema to store graph

Hama store the sparse graph using Hbase as an sparse adjacency matrix.
One of reason is to perform matrix decomposition for large sparse
graphs. Anyway, I guess If you store the graph like that, you'll only
need update the row 'v/w' to add v to w's/w to v's list of neighbors.

Just FYI, You also may want to see --
http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html

If you have any advice for us, Pls let us know.

On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com> wrote:
> What would be a good schema in HBase to store information pertaining to a
> many to many graph? I was thinking of having the node id as the row key, the
> type of relation as the column family, the relation name for the column
> identifier and the actual cell containing the key of the node that is being
> connected with.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>

-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by Dave Latham <la...@davelink.net>.

Thanks, Jonathan, that's very helpful!

Dave

On Wed, Apr 1, 2009 at 5:25 PM, Jonathan Gray <jl...@streamy.com> wrote:

> I will do my best to bring some clarity.
>
> First of all, HBase 0.20 will remove most, if not all, of the "limitations"
> on a single rows columns in a family.
>
> As far as 0.19 is concerned, there are no "limits".  We have several rows
> with 10s of thousands of columns in a family and this does not break
> anything.  The primary issue is that there are serious _performance_ issues
> when the family gets big.  There's nothing that will all of a sudden stop
> working, some things will just get very slow.
>
> So the reason you see varying opinions on the issue is that there is really
> no limit, things just progressively get slower and slower.  When they get
> slow and by how much is related to the size of your columns, if there are
> multiple versions of them, and how you are querying them.  I'm not 100%
> clear on which cases have the worst performance, and I'm not going to dig
> in
> the code now as this has radically changed in 0.20, but I think things are
> very bad if you specify explicit column lists, have high numbers of deletes
> and/or versions, etc.  I think this also has a negative impact on row
> seeking/scanning.
>
> I suggest you run some tests and benchmarks.  Figure out what your max
> is/will be, and run some performance tests.  Only you know if the
> performance hits from high numbers of columns is too much or not.  In my
> case, it was fine.  The query does not have significant slow-down compared
> with those with fewer (of course it's slower because it's reading more).
>
> And as long as things are not painfully slow, then you should be good
> moving
> forward with 0.19 and then watch everything get 10+X faster when you
> upgrade
> to 0.20 :)
>
> Hope that helps.
>
> JG
>
> > -----Original Message-----
> > From: Dave Latham [mailto:latham@davelink.net]
> > Sent: Wednesday, April 01, 2009 11:58 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Schema to store graph
> >
> > Can someone clarify the issues with the number of columns per column
> > family
> > that HBase 0.19 can handle?  I'm a bit confused, because I feel like
> > there's
> > some conflicting information.
> >
> > In this post (Dec. 20), St.Ack says low hundreds of columns per family
> > are
> > recommended, and refers to a bug (I'm guessing HBASE-867):
> > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> > user/200812.mbox/%3C494D7D6F.2050903@duboce.net%3E
> >
> > Then in this post (Dec. 21), Jonathan says they have hundreds of
> > thousands
> > of columns per family in production:
> > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> > user/200812.mbox/%3C60569.71.177.254.11.1229821338.squirrel@webmail.str
> > eamy.com%3E
> >
> > And follows (Mar. 9) with 50,000 columns:
> > http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> > user/200903.mbox/%3C040701c9a0bd$fed90b70$fc8b2250$@com%3E
> >
> > And now in this thread people are referring to a rough limit of 5000.
> >
> > There are probably some differences based on resources available and
> > what
> > not, but I wouldn't think it would make this level of difference.  I've
> > begun implementing a schema where I expect some rows to have
> > potentially
> > 10,000s of columns (in the same family) and want to make sure that this
> > is
> > possible with HBase 0.19.  I don't at all mean to pin anyone down, I'm
> > just
> > hoping someone can shed a bit more light.
> >
> > Dave
> >
> >
> > On Wed, Apr 1, 2009 at 12:41 AM, stack <st...@duboce.net> wrote:
> >
> > > Edward is referring to https://issues.apache.org/jira/browse/HBASE-
> > 867.
> > > We
> > > need to fix it for 0.20.0 hbase release.
> > > St.Ack
> > >
> > > On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <edwardyoon@apache.org
> > > >wrote:
> > >
> > > > One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of
> > one
> > > > column so I couldn't test/benchmark for large scale.
> > > >
> > > > On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana
> > <am...@gmail.com>
> > > > wrote:
> > > > > Response below
> > > > >
> > > > >
> > > > > Amandeep Khurana
> > > > > Computer Science Graduate Student
> > > > > University of California, Santa Cruz
> > > > >
> > > > >
> > > > > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon
> > <edwardyoon@apache.org
> > > > >wrote:
> > > > >
> > > > >> Hama store the sparse graph using Hbase as an sparse adjacency
> > matrix.
> > > > >> One of reason is to perform matrix decomposition for large
> > sparse
> > > > >> graphs. Anyway, I guess If you store the graph like that, you'll
> > only
> > > > >> need update the row 'v/w' to add v to w's/w to v's list of
> > neighbors.
> > > > >
> > > > >
> > > > > I didnt quite understand the last line here.
> > > > >
> > > > > I did think of a sparse matrix as well but not sure which is a
> > better
> > > > > approach. Thats why I posted here...
> > > > >
> > > > > Share about your experiences with Hama...
> > > > >
> > > > >>
> > > > >>
> > > > >> Just FYI, You also may want to see --
> > > > >> http://blog.udanax.org/2009/02/breadth-first-search-
> > mapreduce.html
> > > > >>
> > > > >> If you have any advice for us, Pls let us know.
> > > > >>
> > > > >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana
> > <am...@gmail.com>
> > > > >> wrote:
> > > > >> > What would be a good schema in HBase to store information
> > pertaining
> > > > to a
> > > > >> > many to many graph? I was thinking of having the node id as
> > the row
> > > > key,
> > > > >> the
> > > > >> > type of relation as the column family, the relation name for
> > the
> > > > column
> > > > >> > identifier and the actual cell containing the key of the node
> > that
> > > is
> > > > >> being
> > > > >> > connected with.
> > > > >> >
> > > > >> >
> > > > >> > Amandeep Khurana
> > > > >> > Computer Science Graduate Student
> > > > >> > University of California, Santa Cruz
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best Regards, Edward J. Yoon
> > > > >> edwardyoon@apache.org
> > > > >> http://blog.udanax.org
> > > > >>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Edward J. Yoon
> > > > edwardyoon@apache.org
> > > > http://blog.udanax.org
> > > >
> > >
>
>

RE: Schema to store graph

Posted by Jonathan Gray <jl...@streamy.com>.

I will do my best to bring some clarity.

First of all, HBase 0.20 will remove most, if not all, of the "limitations"
on a single rows columns in a family.

As far as 0.19 is concerned, there are no "limits".  We have several rows
with 10s of thousands of columns in a family and this does not break
anything.  The primary issue is that there are serious _performance_ issues
when the family gets big.  There's nothing that will all of a sudden stop
working, some things will just get very slow.

So the reason you see varying opinions on the issue is that there is really
no limit, things just progressively get slower and slower.  When they get
slow and by how much is related to the size of your columns, if there are
multiple versions of them, and how you are querying them.  I'm not 100%
clear on which cases have the worst performance, and I'm not going to dig in
the code now as this has radically changed in 0.20, but I think things are
very bad if you specify explicit column lists, have high numbers of deletes
and/or versions, etc.  I think this also has a negative impact on row
seeking/scanning.

I suggest you run some tests and benchmarks.  Figure out what your max
is/will be, and run some performance tests.  Only you know if the
performance hits from high numbers of columns is too much or not.  In my
case, it was fine.  The query does not have significant slow-down compared
with those with fewer (of course it's slower because it's reading more).

And as long as things are not painfully slow, then you should be good moving
forward with 0.19 and then watch everything get 10+X faster when you upgrade
to 0.20 :)

Hope that helps.

JG

> -----Original Message-----
> From: Dave Latham [mailto:latham@davelink.net]
> Sent: Wednesday, April 01, 2009 11:58 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Schema to store graph
> 
> Can someone clarify the issues with the number of columns per column
> family
> that HBase 0.19 can handle?  I'm a bit confused, because I feel like
> there's
> some conflicting information.
> 
> In this post (Dec. 20), St.Ack says low hundreds of columns per family
> are
> recommended, and refers to a bug (I'm guessing HBASE-867):
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> user/200812.mbox/%3C494D7D6F.2050903@duboce.net%3E
> 
> Then in this post (Dec. 21), Jonathan says they have hundreds of
> thousands
> of columns per family in production:
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> user/200812.mbox/%3C60569.71.177.254.11.1229821338.squirrel@webmail.str
> eamy.com%3E
> 
> And follows (Mar. 9) with 50,000 columns:
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-
> user/200903.mbox/%3C040701c9a0bd$fed90b70$fc8b2250$@com%3E
> 
> And now in this thread people are referring to a rough limit of 5000.
> 
> There are probably some differences based on resources available and
> what
> not, but I wouldn't think it would make this level of difference.  I've
> begun implementing a schema where I expect some rows to have
> potentially
> 10,000s of columns (in the same family) and want to make sure that this
> is
> possible with HBase 0.19.  I don't at all mean to pin anyone down, I'm
> just
> hoping someone can shed a bit more light.
> 
> Dave
> 
> 
> On Wed, Apr 1, 2009 at 12:41 AM, stack <st...@duboce.net> wrote:
> 
> > Edward is referring to https://issues.apache.org/jira/browse/HBASE-
> 867.
> > We
> > need to fix it for 0.20.0 hbase release.
> > St.Ack
> >
> > On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <edwardyoon@apache.org
> > >wrote:
> >
> > > One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of
> one
> > > column so I couldn't test/benchmark for large scale.
> > >
> > > On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana
> <am...@gmail.com>
> > > wrote:
> > > > Response below
> > > >
> > > >
> > > > Amandeep Khurana
> > > > Computer Science Graduate Student
> > > > University of California, Santa Cruz
> > > >
> > > >
> > > > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon
> <edwardyoon@apache.org
> > > >wrote:
> > > >
> > > >> Hama store the sparse graph using Hbase as an sparse adjacency
> matrix.
> > > >> One of reason is to perform matrix decomposition for large
> sparse
> > > >> graphs. Anyway, I guess If you store the graph like that, you'll
> only
> > > >> need update the row 'v/w' to add v to w's/w to v's list of
> neighbors.
> > > >
> > > >
> > > > I didnt quite understand the last line here.
> > > >
> > > > I did think of a sparse matrix as well but not sure which is a
> better
> > > > approach. Thats why I posted here...
> > > >
> > > > Share about your experiences with Hama...
> > > >
> > > >>
> > > >>
> > > >> Just FYI, You also may want to see --
> > > >> http://blog.udanax.org/2009/02/breadth-first-search-
> mapreduce.html
> > > >>
> > > >> If you have any advice for us, Pls let us know.
> > > >>
> > > >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana
> <am...@gmail.com>
> > > >> wrote:
> > > >> > What would be a good schema in HBase to store information
> pertaining
> > > to a
> > > >> > many to many graph? I was thinking of having the node id as
> the row
> > > key,
> > > >> the
> > > >> > type of relation as the column family, the relation name for
> the
> > > column
> > > >> > identifier and the actual cell containing the key of the node
> that
> > is
> > > >> being
> > > >> > connected with.
> > > >> >
> > > >> >
> > > >> > Amandeep Khurana
> > > >> > Computer Science Graduate Student
> > > >> > University of California, Santa Cruz
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best Regards, Edward J. Yoon
> > > >> edwardyoon@apache.org
> > > >> http://blog.udanax.org
> > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Edward J. Yoon
> > > edwardyoon@apache.org
> > > http://blog.udanax.org
> > >
> >

Re: Schema to store graph

Posted by Dave Latham <la...@davelink.net>.

Can someone clarify the issues with the number of columns per column family
that HBase 0.19 can handle?  I'm a bit confused, because I feel like there's
some conflicting information.

In this post (Dec. 20), St.Ack says low hundreds of columns per family are
recommended, and refers to a bug (I'm guessing HBASE-867):
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200812.mbox/%3C494D7D6F.2050903@duboce.net%3E

Then in this post (Dec. 21), Jonathan says they have hundreds of thousands
of columns per family in production:
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200812.mbox/%3C60569.71.177.254.11.1229821338.squirrel@webmail.streamy.com%3E

And follows (Mar. 9) with 50,000 columns:
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200903.mbox/%3C040701c9a0bd$fed90b70$fc8b2250$@com%3E

And now in this thread people are referring to a rough limit of 5000.

There are probably some differences based on resources available and what
not, but I wouldn't think it would make this level of difference.  I've
begun implementing a schema where I expect some rows to have potentially
10,000s of columns (in the same family) and want to make sure that this is
possible with HBase 0.19.  I don't at all mean to pin anyone down, I'm just
hoping someone can shed a bit more light.

Dave

On Wed, Apr 1, 2009 at 12:41 AM, stack <st...@duboce.net> wrote:

> Edward is referring to https://issues.apache.org/jira/browse/HBASE-867.
> We
> need to fix it for 0.20.0 hbase release.
> St.Ack
>
> On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> > column so I couldn't test/benchmark for large scale.
> >
> > On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> > > Response below
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> > >
> > > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> > >wrote:
> > >
> > >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> > >> One of reason is to perform matrix decomposition for large sparse
> > >> graphs. Anyway, I guess If you store the graph like that, you'll only
> > >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> > >
> > >
> > > I didnt quite understand the last line here.
> > >
> > > I did think of a sparse matrix as well but not sure which is a better
> > > approach. Thats why I posted here...
> > >
> > > Share about your experiences with Hama...
> > >
> > >>
> > >>
> > >> Just FYI, You also may want to see --
> > >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> > >>
> > >> If you have any advice for us, Pls let us know.
> > >>
> > >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> > >> wrote:
> > >> > What would be a good schema in HBase to store information pertaining
> > to a
> > >> > many to many graph? I was thinking of having the node id as the row
> > key,
> > >> the
> > >> > type of relation as the column family, the relation name for the
> > column
> > >> > identifier and the actual cell containing the key of the node that
> is
> > >> being
> > >> > connected with.
> > >> >
> > >> >
> > >> > Amandeep Khurana
> > >> > Computer Science Graduate Student
> > >> > University of California, Santa Cruz
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> edwardyoon@apache.org
> > >> http://blog.udanax.org
> > >>
> > >
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > edwardyoon@apache.org
> > http://blog.udanax.org
> >
>

Re: Schema to store graph

Posted by stack <st...@duboce.net>.

Edward is referring to https://issues.apache.org/jira/browse/HBASE-867.   We
need to fix it for 0.20.0 hbase release.
St.Ack

On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <ed...@apache.org>wrote:

> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> column so I couldn't test/benchmark for large scale.
>
> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > Response below
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> >> One of reason is to perform matrix decomposition for large sparse
> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> >
> > I didnt quite understand the last line here.
> >
> > I did think of a sparse matrix as well but not sure which is a better
> > approach. Thats why I posted here...
> >
> > Share about your experiences with Hama...
> >
> >>
> >>
> >> Just FYI, You also may want to see --
> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >>
> >> If you have any advice for us, Pls let us know.
> >>
> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > What would be a good schema in HBase to store information pertaining
> to a
> >> > many to many graph? I was thinking of having the node id as the row
> key,
> >> the
> >> > type of relation as the column family, the relation name for the
> column
> >> > identifier and the actual cell containing the key of the node that is
> >> being
> >> > connected with.
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by stack <st...@duboce.net>.

Edward is referring to https://issues.apache.org/jira/browse/HBASE-867.   We
need to fix it for 0.20.0 hbase release.
St.Ack

On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <ed...@apache.org>wrote:

> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> column so I couldn't test/benchmark for large scale.
>
> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > Response below
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> >> One of reason is to perform matrix decomposition for large sparse
> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> >
> > I didnt quite understand the last line here.
> >
> > I did think of a sparse matrix as well but not sure which is a better
> > approach. Thats why I posted here...
> >
> > Share about your experiences with Hama...
> >
> >>
> >>
> >> Just FYI, You also may want to see --
> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >>
> >> If you have any advice for us, Pls let us know.
> >>
> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > What would be a good schema in HBase to store information pertaining
> to a
> >> > many to many graph? I was thinking of having the node id as the row
> key,
> >> the
> >> > type of relation as the column family, the relation name for the
> column
> >> > identifier and the actual cell containing the key of the node that is
> >> being
> >> > connected with.
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

Plus, The row URLs and anchor family of webTable that mentioned in
BigTable paper is same with above structure. It's the web-link graph
which is represented as an adjacency matrix.

On Wed, Apr 1, 2009 at 5:33 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Let's assume the graph looks like presented below:
>
> 1 - 2 - 3
>  | /
> 4
>
> We can now represent as:
>
>  | 1  2  3  4
> --+----------
> 1 | 0  1  0  1
> 2 | 1  0  1  1
> 3 | 0  1  0  0
> 4 | 1  0  1  0
>
> We don't need to store the zeros, Hbase is ideal in storing sparse
> matrices. So, It can be simply implemented using Hbase APIs as
> describe below.
>
> public class Graph {
>  ...
>
>  public void addEdge(String v, String w) {
>    BatchUpdate update = new BatchUpdate(v);
>    update.put(w, 1);
>    table.commit(update);
>
>    update = new BatchUpdate(w);
>    update.put(v, 1);
>    table.commit(update);
>  }
>
>  ...
>  public static void main(String[] args) {
>    Graph graph = new Graph();
>    graph.addEdge("1", "2");
>
>    graph.addEdge("1", "4");
>
>    graph.addEdge("2", "3");
>
>    graph.addEdge("2", "4");
>
>    graph.addEdge("4", "1");
>
>    graph.addEdge("4", "3");
>  }
> }
>
> On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
>> Right.
>>
>> Edward, I didnt understand what you were trying to say with this:
>>
>> Anyway, I guess If you store the graph like that, you'll only need update
>> the row 'v/w' to add v to w's/w to v's list of neighbors.
>>
>> Can you explain it please?
>>
>> Thanks
>> Amandeep
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
>>> column so I couldn't test/benchmark for large scale.
>>>
>>> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
>>> wrote:
>>> > Response below
>>> >
>>> >
>>> > Amandeep Khurana
>>> > Computer Science Graduate Student
>>> > University of California, Santa Cruz
>>> >
>>> >
>>> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
>>> >wrote:
>>> >
>>> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>>> >> One of reason is to perform matrix decomposition for large sparse
>>> >> graphs. Anyway, I guess If you store the graph like that, you'll only
>>> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>>> >
>>> >
>>> > I didnt quite understand the last line here.
>>> >
>>> > I did think of a sparse matrix as well but not sure which is a better
>>> > approach. Thats why I posted here...
>>> >
>>> > Share about your experiences with Hama...
>>> >
>>> >>
>>> >>
>>> >> Just FYI, You also may want to see --
>>> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>>> >>
>>> >> If you have any advice for us, Pls let us know.
>>> >>
>>> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>>> >> wrote:
>>> >> > What would be a good schema in HBase to store information pertaining
>>> to a
>>> >> > many to many graph? I was thinking of having the node id as the row
>>> key,
>>> >> the
>>> >> > type of relation as the column family, the relation name for the
>>> column
>>> >> > identifier and the actual cell containing the key of the node that is
>>> >> being
>>> >> > connected with.
>>> >> >
>>> >> >
>>> >> > Amandeep Khurana
>>> >> > Computer Science Graduate Student
>>> >> > University of California, Santa Cruz
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> edwardyoon@apache.org
>>> >> http://blog.udanax.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Alright. Got it.

Thanks.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Apr 1, 2009 at 1:33 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Let's assume the graph looks like presented below:
>
> 1 - 2 - 3
>  | /
> 4
>
> We can now represent as:
>
>  | 1  2  3  4
> --+----------
> 1 | 0  1  0  1
> 2 | 1  0  1  1
> 3 | 0  1  0  0
> 4 | 1  0  1  0
>
> We don't need to store the zeros, Hbase is ideal in storing sparse
> matrices. So, It can be simply implemented using Hbase APIs as
> describe below.
>
> public class Graph {
>  ...
>
>  public void addEdge(String v, String w) {
>    BatchUpdate update = new BatchUpdate(v);
>    update.put(w, 1);
>    table.commit(update);
>
>    update = new BatchUpdate(w);
>    update.put(v, 1);
>    table.commit(update);
>  }
>
>  ...
>  public static void main(String[] args) {
>    Graph graph = new Graph();
>    graph.addEdge("1", "2");
>
>    graph.addEdge("1", "4");
>
>    graph.addEdge("2", "3");
>
>    graph.addEdge("2", "4");
>
>    graph.addEdge("4", "1");
>
>    graph.addEdge("4", "3");
>   }
> }
>
> On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
> > Right.
> >
> > Edward, I didnt understand what you were trying to say with this:
> >
> > Anyway, I guess If you store the graph like that, you'll only need update
> > the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> > Can you explain it please?
> >
> > Thanks
> > Amandeep
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> >> column so I couldn't test/benchmark for large scale.
> >>
> >> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > Response below
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >> >
> >> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >wrote:
> >> >
> >> >> Hama store the sparse graph using Hbase as an sparse adjacency
> matrix.
> >> >> One of reason is to perform matrix decomposition for large sparse
> >> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >> >
> >> >
> >> > I didnt quite understand the last line here.
> >> >
> >> > I did think of a sparse matrix as well but not sure which is a better
> >> > approach. Thats why I posted here...
> >> >
> >> > Share about your experiences with Hama...
> >> >
> >> >>
> >> >>
> >> >> Just FYI, You also may want to see --
> >> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >> >>
> >> >> If you have any advice for us, Pls let us know.
> >> >>
> >> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> >> wrote:
> >> >> > What would be a good schema in HBase to store information
> pertaining
> >> to a
> >> >> > many to many graph? I was thinking of having the node id as the row
> >> key,
> >> >> the
> >> >> > type of relation as the column family, the relation name for the
> >> column
> >> >> > identifier and the actual cell containing the key of the node that
> is
> >> >> being
> >> >> > connected with.
> >> >> >
> >> >> >
> >> >> > Amandeep Khurana
> >> >> > Computer Science Graduate Student
> >> >> > University of California, Santa Cruz
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> edwardyoon@apache.org
> >> >> http://blog.udanax.org
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

Plus, The row URLs and anchor family of webTable that mentioned in
BigTable paper is same with above structure. It's the web-link graph
which is represented as an adjacency matrix.

On Wed, Apr 1, 2009 at 5:33 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Let's assume the graph looks like presented below:
>
> 1 - 2 - 3
>  | /
> 4
>
> We can now represent as:
>
>  | 1  2  3  4
> --+----------
> 1 | 0  1  0  1
> 2 | 1  0  1  1
> 3 | 0  1  0  0
> 4 | 1  0  1  0
>
> We don't need to store the zeros, Hbase is ideal in storing sparse
> matrices. So, It can be simply implemented using Hbase APIs as
> describe below.
>
> public class Graph {
>  ...
>
>  public void addEdge(String v, String w) {
>    BatchUpdate update = new BatchUpdate(v);
>    update.put(w, 1);
>    table.commit(update);
>
>    update = new BatchUpdate(w);
>    update.put(v, 1);
>    table.commit(update);
>  }
>
>  ...
>  public static void main(String[] args) {
>    Graph graph = new Graph();
>    graph.addEdge("1", "2");
>
>    graph.addEdge("1", "4");
>
>    graph.addEdge("2", "3");
>
>    graph.addEdge("2", "4");
>
>    graph.addEdge("4", "1");
>
>    graph.addEdge("4", "3");
>  }
> }
>
> On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
>> Right.
>>
>> Edward, I didnt understand what you were trying to say with this:
>>
>> Anyway, I guess If you store the graph like that, you'll only need update
>> the row 'v/w' to add v to w's/w to v's list of neighbors.
>>
>> Can you explain it please?
>>
>> Thanks
>> Amandeep
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
>>> column so I couldn't test/benchmark for large scale.
>>>
>>> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
>>> wrote:
>>> > Response below
>>> >
>>> >
>>> > Amandeep Khurana
>>> > Computer Science Graduate Student
>>> > University of California, Santa Cruz
>>> >
>>> >
>>> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
>>> >wrote:
>>> >
>>> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>>> >> One of reason is to perform matrix decomposition for large sparse
>>> >> graphs. Anyway, I guess If you store the graph like that, you'll only
>>> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>>> >
>>> >
>>> > I didnt quite understand the last line here.
>>> >
>>> > I did think of a sparse matrix as well but not sure which is a better
>>> > approach. Thats why I posted here...
>>> >
>>> > Share about your experiences with Hama...
>>> >
>>> >>
>>> >>
>>> >> Just FYI, You also may want to see --
>>> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>>> >>
>>> >> If you have any advice for us, Pls let us know.
>>> >>
>>> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>>> >> wrote:
>>> >> > What would be a good schema in HBase to store information pertaining
>>> to a
>>> >> > many to many graph? I was thinking of having the node id as the row
>>> key,
>>> >> the
>>> >> > type of relation as the column family, the relation name for the
>>> column
>>> >> > identifier and the actual cell containing the key of the node that is
>>> >> being
>>> >> > connected with.
>>> >> >
>>> >> >
>>> >> > Amandeep Khurana
>>> >> > Computer Science Graduate Student
>>> >> > University of California, Santa Cruz
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> edwardyoon@apache.org
>>> >> http://blog.udanax.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Alright. Got it.

Thanks.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Apr 1, 2009 at 1:33 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Let's assume the graph looks like presented below:
>
> 1 - 2 - 3
>  | /
> 4
>
> We can now represent as:
>
>  | 1  2  3  4
> --+----------
> 1 | 0  1  0  1
> 2 | 1  0  1  1
> 3 | 0  1  0  0
> 4 | 1  0  1  0
>
> We don't need to store the zeros, Hbase is ideal in storing sparse
> matrices. So, It can be simply implemented using Hbase APIs as
> describe below.
>
> public class Graph {
>  ...
>
>  public void addEdge(String v, String w) {
>    BatchUpdate update = new BatchUpdate(v);
>    update.put(w, 1);
>    table.commit(update);
>
>    update = new BatchUpdate(w);
>    update.put(v, 1);
>    table.commit(update);
>  }
>
>  ...
>  public static void main(String[] args) {
>    Graph graph = new Graph();
>    graph.addEdge("1", "2");
>
>    graph.addEdge("1", "4");
>
>    graph.addEdge("2", "3");
>
>    graph.addEdge("2", "4");
>
>    graph.addEdge("4", "1");
>
>    graph.addEdge("4", "3");
>   }
> }
>
> On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
> > Right.
> >
> > Edward, I didnt understand what you were trying to say with this:
> >
> > Anyway, I guess If you store the graph like that, you'll only need update
> > the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> > Can you explain it please?
> >
> > Thanks
> > Amandeep
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> >> column so I couldn't test/benchmark for large scale.
> >>
> >> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > Response below
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >> >
> >> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >wrote:
> >> >
> >> >> Hama store the sparse graph using Hbase as an sparse adjacency
> matrix.
> >> >> One of reason is to perform matrix decomposition for large sparse
> >> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >> >
> >> >
> >> > I didnt quite understand the last line here.
> >> >
> >> > I did think of a sparse matrix as well but not sure which is a better
> >> > approach. Thats why I posted here...
> >> >
> >> > Share about your experiences with Hama...
> >> >
> >> >>
> >> >>
> >> >> Just FYI, You also may want to see --
> >> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >> >>
> >> >> If you have any advice for us, Pls let us know.
> >> >>
> >> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> >> wrote:
> >> >> > What would be a good schema in HBase to store information
> pertaining
> >> to a
> >> >> > many to many graph? I was thinking of having the node id as the row
> >> key,
> >> >> the
> >> >> > type of relation as the column family, the relation name for the
> >> column
> >> >> > identifier and the actual cell containing the key of the node that
> is
> >> >> being
> >> >> > connected with.
> >> >> >
> >> >> >
> >> >> > Amandeep Khurana
> >> >> > Computer Science Graduate Student
> >> >> > University of California, Santa Cruz
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> edwardyoon@apache.org
> >> >> http://blog.udanax.org
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Alright. Got it.

Thanks.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Apr 1, 2009 at 1:33 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Let's assume the graph looks like presented below:
>
> 1 - 2 - 3
>  | /
> 4
>
> We can now represent as:
>
>  | 1  2  3  4
> --+----------
> 1 | 0  1  0  1
> 2 | 1  0  1  1
> 3 | 0  1  0  0
> 4 | 1  0  1  0
>
> We don't need to store the zeros, Hbase is ideal in storing sparse
> matrices. So, It can be simply implemented using Hbase APIs as
> describe below.
>
> public class Graph {
>  ...
>
>  public void addEdge(String v, String w) {
>    BatchUpdate update = new BatchUpdate(v);
>    update.put(w, 1);
>    table.commit(update);
>
>    update = new BatchUpdate(w);
>    update.put(v, 1);
>    table.commit(update);
>  }
>
>  ...
>  public static void main(String[] args) {
>    Graph graph = new Graph();
>    graph.addEdge("1", "2");
>
>    graph.addEdge("1", "4");
>
>    graph.addEdge("2", "3");
>
>    graph.addEdge("2", "4");
>
>    graph.addEdge("4", "1");
>
>    graph.addEdge("4", "3");
>   }
> }
>
> On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
> > Right.
> >
> > Edward, I didnt understand what you were trying to say with this:
> >
> > Anyway, I guess If you store the graph like that, you'll only need update
> > the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> > Can you explain it please?
> >
> > Thanks
> > Amandeep
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> >> column so I couldn't test/benchmark for large scale.
> >>
> >> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > Response below
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >> >
> >> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >wrote:
> >> >
> >> >> Hama store the sparse graph using Hbase as an sparse adjacency
> matrix.
> >> >> One of reason is to perform matrix decomposition for large sparse
> >> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >> >
> >> >
> >> > I didnt quite understand the last line here.
> >> >
> >> > I did think of a sparse matrix as well but not sure which is a better
> >> > approach. Thats why I posted here...
> >> >
> >> > Share about your experiences with Hama...
> >> >
> >> >>
> >> >>
> >> >> Just FYI, You also may want to see --
> >> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >> >>
> >> >> If you have any advice for us, Pls let us know.
> >> >>
> >> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> >> wrote:
> >> >> > What would be a good schema in HBase to store information
> pertaining
> >> to a
> >> >> > many to many graph? I was thinking of having the node id as the row
> >> key,
> >> >> the
> >> >> > type of relation as the column family, the relation name for the
> >> column
> >> >> > identifier and the actual cell containing the key of the node that
> is
> >> >> being
> >> >> > connected with.
> >> >> >
> >> >> >
> >> >> > Amandeep Khurana
> >> >> > Computer Science Graduate Student
> >> >> > University of California, Santa Cruz
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Best Regards, Edward J. Yoon
> >> >> edwardyoon@apache.org
> >> >> http://blog.udanax.org
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

Plus, The row URLs and anchor family of webTable that mentioned in
BigTable paper is same with above structure. It's the web-link graph
which is represented as an adjacency matrix.

On Wed, Apr 1, 2009 at 5:33 PM, Edward J. Yoon <ed...@apache.org> wrote:
> Let's assume the graph looks like presented below:
>
> 1 - 2 - 3
>  | /
> 4
>
> We can now represent as:
>
>  | 1  2  3  4
> --+----------
> 1 | 0  1  0  1
> 2 | 1  0  1  1
> 3 | 0  1  0  0
> 4 | 1  0  1  0
>
> We don't need to store the zeros, Hbase is ideal in storing sparse
> matrices. So, It can be simply implemented using Hbase APIs as
> describe below.
>
> public class Graph {
>  ...
>
>  public void addEdge(String v, String w) {
>    BatchUpdate update = new BatchUpdate(v);
>    update.put(w, 1);
>    table.commit(update);
>
>    update = new BatchUpdate(w);
>    update.put(v, 1);
>    table.commit(update);
>  }
>
>  ...
>  public static void main(String[] args) {
>    Graph graph = new Graph();
>    graph.addEdge("1", "2");
>
>    graph.addEdge("1", "4");
>
>    graph.addEdge("2", "3");
>
>    graph.addEdge("2", "4");
>
>    graph.addEdge("4", "1");
>
>    graph.addEdge("4", "3");
>  }
> }
>
> On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
>> Right.
>>
>> Edward, I didnt understand what you were trying to say with this:
>>
>> Anyway, I guess If you store the graph like that, you'll only need update
>> the row 'v/w' to add v to w's/w to v's list of neighbors.
>>
>> Can you explain it please?
>>
>> Thanks
>> Amandeep
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>
>>> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
>>> column so I couldn't test/benchmark for large scale.
>>>
>>> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
>>> wrote:
>>> > Response below
>>> >
>>> >
>>> > Amandeep Khurana
>>> > Computer Science Graduate Student
>>> > University of California, Santa Cruz
>>> >
>>> >
>>> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
>>> >wrote:
>>> >
>>> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>>> >> One of reason is to perform matrix decomposition for large sparse
>>> >> graphs. Anyway, I guess If you store the graph like that, you'll only
>>> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>>> >
>>> >
>>> > I didnt quite understand the last line here.
>>> >
>>> > I did think of a sparse matrix as well but not sure which is a better
>>> > approach. Thats why I posted here...
>>> >
>>> > Share about your experiences with Hama...
>>> >
>>> >>
>>> >>
>>> >> Just FYI, You also may want to see --
>>> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>>> >>
>>> >> If you have any advice for us, Pls let us know.
>>> >>
>>> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>>> >> wrote:
>>> >> > What would be a good schema in HBase to store information pertaining
>>> to a
>>> >> > many to many graph? I was thinking of having the node id as the row
>>> key,
>>> >> the
>>> >> > type of relation as the column family, the relation name for the
>>> column
>>> >> > identifier and the actual cell containing the key of the node that is
>>> >> being
>>> >> > connected with.
>>> >> >
>>> >> >
>>> >> > Amandeep Khurana
>>> >> > Computer Science Graduate Student
>>> >> > University of California, Santa Cruz
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards, Edward J. Yoon
>>> >> edwardyoon@apache.org
>>> >> http://blog.udanax.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>> edwardyoon@apache.org
>>> http://blog.udanax.org
>>>
>>
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

Let's assume the graph looks like presented below:

1 - 2 - 3
 | /
4

We can now represent as:

  | 1  2  3  4
--+----------
1 | 0  1  0  1
2 | 1  0  1  1
3 | 0  1  0  0
4 | 1  0  1  0

We don't need to store the zeros, Hbase is ideal in storing sparse
matrices. So, It can be simply implemented using Hbase APIs as
describe below.

public class Graph {
  ...

  public void addEdge(String v, String w) {
    BatchUpdate update = new BatchUpdate(v);
    update.put(w, 1);
    table.commit(update);

    update = new BatchUpdate(w);
    update.put(v, 1);
    table.commit(update);
  }

  ...
  public static void main(String[] args) {
    Graph graph = new Graph();
    graph.addEdge("1", "2");

    graph.addEdge("1", "4");

    graph.addEdge("2", "3");

    graph.addEdge("2", "4");

    graph.addEdge("4", "1");

    graph.addEdge("4", "3");
  }
}

On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
> Right.
>
> Edward, I didnt understand what you were trying to say with this:
>
> Anyway, I guess If you store the graph like that, you'll only need update
> the row 'v/w' to add v to w's/w to v's list of neighbors.
>
> Can you explain it please?
>
> Thanks
> Amandeep
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
>> column so I couldn't test/benchmark for large scale.
>>
>> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> > Response below
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>> >
>> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >
>> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>> >> One of reason is to perform matrix decomposition for large sparse
>> >> graphs. Anyway, I guess If you store the graph like that, you'll only
>> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>> >
>> >
>> > I didnt quite understand the last line here.
>> >
>> > I did think of a sparse matrix as well but not sure which is a better
>> > approach. Thats why I posted here...
>> >
>> > Share about your experiences with Hama...
>> >
>> >>
>> >>
>> >> Just FYI, You also may want to see --
>> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>> >>
>> >> If you have any advice for us, Pls let us know.
>> >>
>> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>> >> wrote:
>> >> > What would be a good schema in HBase to store information pertaining
>> to a
>> >> > many to many graph? I was thinking of having the node id as the row
>> key,
>> >> the
>> >> > type of relation as the column family, the relation name for the
>> column
>> >> > identifier and the actual cell containing the key of the node that is
>> >> being
>> >> > connected with.
>> >> >
>> >> >
>> >> > Amandeep Khurana
>> >> > Computer Science Graduate Student
>> >> > University of California, Santa Cruz
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> edwardyoon@apache.org
>> >> http://blog.udanax.org
>> >>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

Let's assume the graph looks like presented below:

1 - 2 - 3
 | /
4

We can now represent as:

  | 1  2  3  4
--+----------
1 | 0  1  0  1
2 | 1  0  1  1
3 | 0  1  0  0
4 | 1  0  1  0

We don't need to store the zeros, Hbase is ideal in storing sparse
matrices. So, It can be simply implemented using Hbase APIs as
describe below.

public class Graph {
  ...

  public void addEdge(String v, String w) {
    BatchUpdate update = new BatchUpdate(v);
    update.put(w, 1);
    table.commit(update);

    update = new BatchUpdate(w);
    update.put(v, 1);
    table.commit(update);
  }

  ...
  public static void main(String[] args) {
    Graph graph = new Graph();
    graph.addEdge("1", "2");

    graph.addEdge("1", "4");

    graph.addEdge("2", "3");

    graph.addEdge("2", "4");

    graph.addEdge("4", "1");

    graph.addEdge("4", "3");
  }
}

On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
> Right.
>
> Edward, I didnt understand what you were trying to say with this:
>
> Anyway, I guess If you store the graph like that, you'll only need update
> the row 'v/w' to add v to w's/w to v's list of neighbors.
>
> Can you explain it please?
>
> Thanks
> Amandeep
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
>> column so I couldn't test/benchmark for large scale.
>>
>> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> > Response below
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>> >
>> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >
>> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>> >> One of reason is to perform matrix decomposition for large sparse
>> >> graphs. Anyway, I guess If you store the graph like that, you'll only
>> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>> >
>> >
>> > I didnt quite understand the last line here.
>> >
>> > I did think of a sparse matrix as well but not sure which is a better
>> > approach. Thats why I posted here...
>> >
>> > Share about your experiences with Hama...
>> >
>> >>
>> >>
>> >> Just FYI, You also may want to see --
>> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>> >>
>> >> If you have any advice for us, Pls let us know.
>> >>
>> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>> >> wrote:
>> >> > What would be a good schema in HBase to store information pertaining
>> to a
>> >> > many to many graph? I was thinking of having the node id as the row
>> key,
>> >> the
>> >> > type of relation as the column family, the relation name for the
>> column
>> >> > identifier and the actual cell containing the key of the node that is
>> >> being
>> >> > connected with.
>> >> >
>> >> >
>> >> > Amandeep Khurana
>> >> > Computer Science Graduate Student
>> >> > University of California, Santa Cruz
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> edwardyoon@apache.org
>> >> http://blog.udanax.org
>> >>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

Let's assume the graph looks like presented below:

1 - 2 - 3
 | /
4

We can now represent as:

  | 1  2  3  4
--+----------
1 | 0  1  0  1
2 | 1  0  1  1
3 | 0  1  0  0
4 | 1  0  1  0

We don't need to store the zeros, Hbase is ideal in storing sparse
matrices. So, It can be simply implemented using Hbase APIs as
describe below.

public class Graph {
  ...

  public void addEdge(String v, String w) {
    BatchUpdate update = new BatchUpdate(v);
    update.put(w, 1);
    table.commit(update);

    update = new BatchUpdate(w);
    update.put(v, 1);
    table.commit(update);
  }

  ...
  public static void main(String[] args) {
    Graph graph = new Graph();
    graph.addEdge("1", "2");

    graph.addEdge("1", "4");

    graph.addEdge("2", "3");

    graph.addEdge("2", "4");

    graph.addEdge("4", "1");

    graph.addEdge("4", "3");
  }
}

On Wed, Apr 1, 2009 at 4:47 PM, Amandeep Khurana <am...@gmail.com> wrote:
> Right.
>
> Edward, I didnt understand what you were trying to say with this:
>
> Anyway, I guess If you store the graph like that, you'll only need update
> the row 'v/w' to add v to w's/w to v's list of neighbors.
>
> Can you explain it please?
>
> Thanks
> Amandeep
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
>> column so I couldn't test/benchmark for large scale.
>>
>> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> > Response below
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>> >
>> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>> >
>> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>> >> One of reason is to perform matrix decomposition for large sparse
>> >> graphs. Anyway, I guess If you store the graph like that, you'll only
>> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>> >
>> >
>> > I didnt quite understand the last line here.
>> >
>> > I did think of a sparse matrix as well but not sure which is a better
>> > approach. Thats why I posted here...
>> >
>> > Share about your experiences with Hama...
>> >
>> >>
>> >>
>> >> Just FYI, You also may want to see --
>> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>> >>
>> >> If you have any advice for us, Pls let us know.
>> >>
>> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>> >> wrote:
>> >> > What would be a good schema in HBase to store information pertaining
>> to a
>> >> > many to many graph? I was thinking of having the node id as the row
>> key,
>> >> the
>> >> > type of relation as the column family, the relation name for the
>> column
>> >> > identifier and the actual cell containing the key of the node that is
>> >> being
>> >> > connected with.
>> >> >
>> >> >
>> >> > Amandeep Khurana
>> >> > Computer Science Graduate Student
>> >> > University of California, Santa Cruz
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> edwardyoon@apache.org
>> >> http://blog.udanax.org
>> >>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Right.

Edward, I didnt understand what you were trying to say with this:

Anyway, I guess If you store the graph like that, you'll only need update
the row 'v/w' to add v to w's/w to v's list of neighbors.

Can you explain it please?

Thanks
Amandeep

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:

> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> column so I couldn't test/benchmark for large scale.
>
> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > Response below
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> >> One of reason is to perform matrix decomposition for large sparse
> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> >
> > I didnt quite understand the last line here.
> >
> > I did think of a sparse matrix as well but not sure which is a better
> > approach. Thats why I posted here...
> >
> > Share about your experiences with Hama...
> >
> >>
> >>
> >> Just FYI, You also may want to see --
> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >>
> >> If you have any advice for us, Pls let us know.
> >>
> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > What would be a good schema in HBase to store information pertaining
> to a
> >> > many to many graph? I was thinking of having the node id as the row
> key,
> >> the
> >> > type of relation as the column family, the relation name for the
> column
> >> > identifier and the actual cell containing the key of the node that is
> >> being
> >> > connected with.
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by stack <st...@duboce.net>.

Edward is referring to https://issues.apache.org/jira/browse/HBASE-867.   We
need to fix it for 0.20.0 hbase release.
St.Ack

On Wed, Apr 1, 2009 at 5:02 AM, Edward J. Yoon <ed...@apache.org>wrote:

> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> column so I couldn't test/benchmark for large scale.
>
> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > Response below
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> >> One of reason is to perform matrix decomposition for large sparse
> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> >
> > I didnt quite understand the last line here.
> >
> > I did think of a sparse matrix as well but not sure which is a better
> > approach. Thats why I posted here...
> >
> > Share about your experiences with Hama...
> >
> >>
> >>
> >> Just FYI, You also may want to see --
> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >>
> >> If you have any advice for us, Pls let us know.
> >>
> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > What would be a good schema in HBase to store information pertaining
> to a
> >> > many to many graph? I was thinking of having the node id as the row
> key,
> >> the
> >> > type of relation as the column family, the relation name for the
> column
> >> > identifier and the actual cell containing the key of the node that is
> >> being
> >> > connected with.
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Right.

Edward, I didnt understand what you were trying to say with this:

Anyway, I guess If you store the graph like that, you'll only need update
the row 'v/w' to add v to w's/w to v's list of neighbors.

Can you explain it please?

Thanks
Amandeep

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:

> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> column so I couldn't test/benchmark for large scale.
>
> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > Response below
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> >> One of reason is to perform matrix decomposition for large sparse
> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> >
> > I didnt quite understand the last line here.
> >
> > I did think of a sparse matrix as well but not sure which is a better
> > approach. Thats why I posted here...
> >
> > Share about your experiences with Hama...
> >
> >>
> >>
> >> Just FYI, You also may want to see --
> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >>
> >> If you have any advice for us, Pls let us know.
> >>
> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > What would be a good schema in HBase to store information pertaining
> to a
> >> > many to many graph? I was thinking of having the node id as the row
> key,
> >> the
> >> > type of relation as the column family, the relation name for the
> column
> >> > identifier and the actual cell containing the key of the node that is
> >> being
> >> > connected with.
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Right.

Edward, I didnt understand what you were trying to say with this:

Anyway, I guess If you store the graph like that, you'll only need update
the row 'v/w' to add v to w's/w to v's list of neighbors.

Can you explain it please?

Thanks
Amandeep

Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Mar 31, 2009 at 8:02 PM, Edward J. Yoon <ed...@apache.org>wrote:

> One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
> column so I couldn't test/benchmark for large scale.
>
> On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > Response below
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> >> One of reason is to perform matrix decomposition for large sparse
> >> graphs. Anyway, I guess If you store the graph like that, you'll only
> >> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
> >
> >
> > I didnt quite understand the last line here.
> >
> > I did think of a sparse matrix as well but not sure which is a better
> > approach. Thats why I posted here...
> >
> > Share about your experiences with Hama...
> >
> >>
> >>
> >> Just FYI, You also may want to see --
> >> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
> >>
> >> If you have any advice for us, Pls let us know.
> >>
> >> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> >> wrote:
> >> > What would be a good schema in HBase to store information pertaining
> to a
> >> > many to many graph? I was thinking of having the node id as the row
> key,
> >> the
> >> > type of relation as the column family, the relation name for the
> column
> >> > identifier and the actual cell containing the key of the node that is
> >> being
> >> > connected with.
> >> >
> >> >
> >> > Amandeep Khurana
> >> > Computer Science Graduate Student
> >> > University of California, Santa Cruz
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
column so I couldn't test/benchmark for large scale.

On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com> wrote:
> Response below
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>> One of reason is to perform matrix decomposition for large sparse
>> graphs. Anyway, I guess If you store the graph like that, you'll only
>> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>
>
> I didnt quite understand the last line here.
>
> I did think of a sparse matrix as well but not sure which is a better
> approach. Thats why I posted here...
>
> Share about your experiences with Hama...
>
>>
>>
>> Just FYI, You also may want to see --
>> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>>
>> If you have any advice for us, Pls let us know.
>>
>> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> > What would be a good schema in HBase to store information pertaining to a
>> > many to many graph? I was thinking of having the node id as the row key,
>> the
>> > type of relation as the column family, the relation name for the column
>> > identifier and the actual cell containing the key of the node that is
>> being
>> > connected with.
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
column so I couldn't test/benchmark for large scale.

On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com> wrote:
> Response below
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>> One of reason is to perform matrix decomposition for large sparse
>> graphs. Anyway, I guess If you store the graph like that, you'll only
>> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>
>
> I didnt quite understand the last line here.
>
> I did think of a sparse matrix as well but not sure which is a better
> approach. Thats why I posted here...
>
> Share about your experiences with Hama...
>
>>
>>
>> Just FYI, You also may want to see --
>> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>>
>> If you have any advice for us, Pls let us know.
>>
>> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> > What would be a good schema in HBase to store information pertaining to a
>> > many to many graph? I was thinking of having the node id as the row key,
>> the
>> > type of relation as the column family, the relation name for the column
>> > identifier and the actual cell containing the key of the node that is
>> being
>> > connected with.
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by "Edward J. Yoon" <ed...@apache.org>.

One thing is Hbase 0.19 doesn't work with over 5,000 qualifier of one
column so I couldn't test/benchmark for large scale.

On Tue, Mar 31, 2009 at 6:04 PM, Amandeep Khurana <am...@gmail.com> wrote:
> Response below
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
>> One of reason is to perform matrix decomposition for large sparse
>> graphs. Anyway, I guess If you store the graph like that, you'll only
>> need update the row 'v/w' to add v to w's/w to v's list of neighbors.
>
>
> I didnt quite understand the last line here.
>
> I did think of a sparse matrix as well but not sure which is a better
> approach. Thats why I posted here...
>
> Share about your experiences with Hama...
>
>>
>>
>> Just FYI, You also may want to see --
>> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>>
>> If you have any advice for us, Pls let us know.
>>
>> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
>> wrote:
>> > What would be a good schema in HBase to store information pertaining to a
>> > many to many graph? I was thinking of having the node id as the row key,
>> the
>> > type of relation as the column family, the relation name for the column
>> > identifier and the actual cell containing the key of the node that is
>> being
>> > connected with.
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> edwardyoon@apache.org
>> http://blog.udanax.org
>>
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Response below


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> One of reason is to perform matrix decomposition for large sparse
> graphs. Anyway, I guess If you store the graph like that, you'll only
> need update the row 'v/w' to add v to w's/w to v's list of neighbors.


I didnt quite understand the last line here.

I did think of a sparse matrix as well but not sure which is a better
approach. Thats why I posted here...

Share about your experiences with Hama...

>
>
> Just FYI, You also may want to see --
> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>
> If you have any advice for us, Pls let us know.
>
> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > What would be a good schema in HBase to store information pertaining to a
> > many to many graph? I was thinking of having the node id as the row key,
> the
> > type of relation as the column family, the relation name for the column
> > identifier and the actual cell containing the key of the node that is
> being
> > connected with.
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Response below


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> One of reason is to perform matrix decomposition for large sparse
> graphs. Anyway, I guess If you store the graph like that, you'll only
> need update the row 'v/w' to add v to w's/w to v's list of neighbors.


I didnt quite understand the last line here.

I did think of a sparse matrix as well but not sure which is a better
approach. Thats why I posted here...

Share about your experiences with Hama...

>
>
> Just FYI, You also may want to see --
> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>
> If you have any advice for us, Pls let us know.
>
> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > What would be a good schema in HBase to store information pertaining to a
> > many to many graph? I was thinking of having the node id as the row key,
> the
> > type of relation as the column family, the relation name for the column
> > identifier and the actual cell containing the key of the node that is
> being
> > connected with.
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>

Re: Schema to store graph

Posted by Amandeep Khurana <am...@gmail.com>.

Response below


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Mar 31, 2009 at 1:58 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Hama store the sparse graph using Hbase as an sparse adjacency matrix.
> One of reason is to perform matrix decomposition for large sparse
> graphs. Anyway, I guess If you store the graph like that, you'll only
> need update the row 'v/w' to add v to w's/w to v's list of neighbors.


I didnt quite understand the last line here.

I did think of a sparse matrix as well but not sure which is a better
approach. Thats why I posted here...

Share about your experiences with Hama...

>
>
> Just FYI, You also may want to see --
> http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html
>
> If you have any advice for us, Pls let us know.
>
> On Tue, Mar 31, 2009 at 5:09 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> > What would be a good schema in HBase to store information pertaining to a
> > many to many graph? I was thinking of having the node id as the row key,
> the
> > type of relation as the column family, the relation name for the column
> > identifier and the actual cell containing the key of the node that is
> being
> > connected with.
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> edwardyoon@apache.org
> http://blog.udanax.org
>