You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Robert Yokota <ra...@gmail.com> on 2017/07/27 22:11:57 UTC

Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

In case anyone is interested, I wrote a blog on how to analyze graphs
stored in HBase with Apache Flink Gelly:

https://yokota.blog/2017/07/27/graph-analytics-on-hbase-
with-hgraphdb-and-apache-flink-gelly/

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Posted by Jörn Franke <jo...@gmail.com>.

Have you checked janusgraph source code , it used also hbase as a storage backend:
http://janusgraph.org/
It combines it with elasticsearch for indexing. Maybe you can inspire from the architecture there.

Generally, hbase it depends a lot on how the data is written to regions, the order of data and the right key (-> this has then impact on how it is read, also in flink to use locality). There is of course more detail on that and depends on the use case. Generally the hbase documentation is rather good.

> On 4. Apr 2018, at 23:38, santoshg <sa...@uber.com> wrote:
> 
> Restarting this thread since it is relevant to us. We are thinking of using
> HBase/Cassandra to store graph data and then load the data from here into
> Flink/Gelly. One of the issues we are concerned about is the read
> performance. So far we tried our tests with data residing on HDFS and that
> worked fine. 
> 
> Is there any guidance on reading from HBase for batch jobs ? Wondering if
> any experience with this approach. Do's/Don'ts etc..
> 
> Thanks
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Posted by santoshg <sa...@uber.com>.

Restarting this thread since it is relevant to us. We are thinking of using
HBase/Cassandra to store graph data and then load the data from here into
Flink/Gelly. One of the issues we are concerned about is the read
performance. So far we tried our tests with data residing on HDFS and that
worked fine. 

Is there any guidance on reading from HBase for batch jobs ? Wondering if
any experience with this approach. Do's/Don'ts etc..

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Posted by Vasiliki Kalavri <va...@gmail.com>.

Thank you for sharing!

On 28 July 2017 at 05:01, Robert Yokota <ra...@gmail.com> wrote:

> Also Google Cloud Bigtable has such a page at https://cloud.google.com/
> bigtable/docs/integrations
>
> On Thu, Jul 27, 2017 at 6:57 PM, Robert Yokota <ra...@gmail.com> wrote:
>
>>
>> One thing I really appreciate about HBase is its flexibility.  It doesn't
>> enforce a schema, but also doesn't prevent you from building a schema layer
>> on top.  It is very customizable, allowing you to push arbitrary code to
>> the server in the form of filters and coprocessors.
>>
>> Not having such higher-layer features built into HBase allows it to
>> remain flexibile, but it does have a down-side.  One complaint is that for
>> a new user coming to HBase, who perhaps does want to work with things like
>> query languages, schemas, secondary indices, transactions, and so forth, it
>> can be daunting to research and understand what other projects in the HBase
>> ecosystem can help him/her, how others have used such projects, and under
>> what use cases each project might be successful or not.
>>
>> Perhaps a good start would be something like an "HBase ecosystem" page at
>> the website that would list projects like Phoenix, Tephra, and others in
>> the HBase ecosystem.  The Apache TinkerPop site has a listing of projects
>> in its ecosystem at http://tinkerpop.apache.org.   I think new users
>> coming to HBase aren't even aware of the larger ecosystem, and sometimes
>> end up selecting alternative data stores as a result.
>>
>> P.S.  I'm using HBase 1.1.2
>>
>> On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Interesting blog.
>>>
>>> From your experience, is there anything on hbase side which you see room
>>> for improvement ?
>>>
>>> Which hbase release are you using ?
>>>
>>> Cheers
>>>
>>> On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <ra...@gmail.com>
>>> wrote:
>>>
>>>> In case anyone is interested, I wrote a blog on how to analyze graphs
>>>> stored in HBase with Apache Flink Gelly:
>>>>
>>>> https://yokota.blog/2017/07/27/graph-analytics-on-hbase-with
>>>> -hgraphdb-and-apache-flink-gelly/
>>>>
>>>
>>>
>>
>

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Posted by Robert Yokota <ra...@gmail.com>.

Also Google Cloud Bigtable has such a page at
https://cloud.google.com/bigtable/docs/integrations

On Thu, Jul 27, 2017 at 6:57 PM, Robert Yokota <ra...@gmail.com> wrote:

>
> One thing I really appreciate about HBase is its flexibility.  It doesn't
> enforce a schema, but also doesn't prevent you from building a schema layer
> on top.  It is very customizable, allowing you to push arbitrary code to
> the server in the form of filters and coprocessors.
>
> Not having such higher-layer features built into HBase allows it to remain
> flexibile, but it does have a down-side.  One complaint is that for a new
> user coming to HBase, who perhaps does want to work with things like query
> languages, schemas, secondary indices, transactions, and so forth, it can
> be daunting to research and understand what other projects in the HBase
> ecosystem can help him/her, how others have used such projects, and under
> what use cases each project might be successful or not.
>
> Perhaps a good start would be something like an "HBase ecosystem" page at
> the website that would list projects like Phoenix, Tephra, and others in
> the HBase ecosystem.  The Apache TinkerPop site has a listing of projects
> in its ecosystem at http://tinkerpop.apache.org.   I think new users
> coming to HBase aren't even aware of the larger ecosystem, and sometimes
> end up selecting alternative data stores as a result.
>
> P.S.  I'm using HBase 1.1.2
>
> On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Interesting blog.
>>
>> From your experience, is there anything on hbase side which you see room
>> for improvement ?
>>
>> Which hbase release are you using ?
>>
>> Cheers
>>
>> On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <ra...@gmail.com>
>> wrote:
>>
>>> In case anyone is interested, I wrote a blog on how to analyze graphs
>>> stored in HBase with Apache Flink Gelly:
>>>
>>> https://yokota.blog/2017/07/27/graph-analytics-on-hbase-with
>>> -hgraphdb-and-apache-flink-gelly/
>>>
>>
>>
>

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Posted by Robert Yokota <ra...@gmail.com>.

One thing I really appreciate about HBase is its flexibility.  It doesn't
enforce a schema, but also doesn't prevent you from building a schema layer
on top.  It is very customizable, allowing you to push arbitrary code to
the server in the form of filters and coprocessors.

Not having such higher-layer features built into HBase allows it to remain
flexibile, but it does have a down-side.  One complaint is that for a new
user coming to HBase, who perhaps does want to work with things like query
languages, schemas, secondary indices, transactions, and so forth, it can
be daunting to research and understand what other projects in the HBase
ecosystem can help him/her, how others have used such projects, and under
what use cases each project might be successful or not.

Perhaps a good start would be something like an "HBase ecosystem" page at
the website that would list projects like Phoenix, Tephra, and others in
the HBase ecosystem.  The Apache TinkerPop site has a listing of projects
in its ecosystem at http://tinkerpop.apache.org.   I think new users coming
to HBase aren't even aware of the larger ecosystem, and sometimes end up
selecting alternative data stores as a result.

P.S.  I'm using HBase 1.1.2

On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <yu...@gmail.com> wrote:

> Interesting blog.
>
> From your experience, is there anything on hbase side which you see room
> for improvement ?
>
> Which hbase release are you using ?
>
> Cheers
>
> On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <ra...@gmail.com> wrote:
>
>> In case anyone is interested, I wrote a blog on how to analyze graphs
>> stored in HBase with Apache Flink Gelly:
>>
>> https://yokota.blog/2017/07/27/graph-analytics-on-hbase-with
>> -hgraphdb-and-apache-flink-gelly/
>>
>
>

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Posted by Ted Yu <yu...@gmail.com>.

Interesting blog.

From your experience, is there anything on hbase side which you see room
for improvement ?

Which hbase release are you using ?

Cheers

On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <ra...@gmail.com> wrote:

> In case anyone is interested, I wrote a blog on how to analyze graphs
> stored in HBase with Apache Flink Gelly:
>
> https://yokota.blog/2017/07/27/graph-analytics-on-hbase-with
> -hgraphdb-and-apache-flink-gelly/
>