You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@s2graph.apache.org by anuj kumar <an...@gmail.com> on 2019/01/17 11:47:45 UTC

Deployment Architecture for s2Graph

Hi,
 I am planning to use S2 Graph for our graph use cases and wanted to know
what is the preferred Deployment Architecture for the S2Graph ecosystem as
a whole?

Thanks,
-- 
*Anuj Kumar*

Any volunteers to give answers? (Was: Deployment Architecture for s2Graph)

Posted by Woonsan Ko <wo...@apache.org>.
Hi S2Graph devs,

Could anyone please give answer(s) to the user below?
The answers don't have to be perfect, nor long, and the question could
be too broad as well. Just giving some responses, possibly with links
to any existing slides or documentation, or some comments and remarks
from your experiences, will help grow the community. Keeping fair
questions in the lists stale could give a wrong impressions to others.
Keeping dialogues help.

Thanks in advance,

Woonsan

---------- Forwarded message ---------
From: anuj kumar <an...@gmail.com>
Date: Thu, Jan 17, 2019 at 8:48 PM
Subject: Deployment Architecture for s2Graph
To: <us...@s2graph.incubator.apache.org>


Hi,
 I am planning to use S2 Graph for our graph use cases and wanted to
know what is the preferred Deployment Architecture for the S2Graph
ecosystem as a whole?

Thanks,
-- 
Anuj Kumar

Re: Deployment Architecture for s2Graph

Posted by DO YUNG YOON <sh...@gmail.com>.
Hi, anuj.

As far as I know, the current stable release version(v0.2.0) has basic
support for ES, but it has many bugs.
It would be better to deploy master branch for ES support for now.

In my opinion, S2Graph community needs to release the next version to catch
up bugs fix and new features, since the previous release was very old.
I am going to open discussion at dev mailing list about this release.

Bye the way, It would be great if you can elaborate on your use cases and
discuss what's best.

On Wed, Jan 23, 2019 at 8:00 PM anuj kumar <an...@gmail.com> wrote:

> Hi Thank you for the detailed email. This is very helpful. Is the support
> for ES already available. This seems to be an important thing for us.
>
> Thanks,
> Anuj Kumar
>
> On Mon, Jan 21, 2019 at 9:42 PM DO YUNG YOON <sh...@gmail.com> wrote:
>
>> Hi anuj. Welcome to S2Graph.
>>
>> Here is how we deploy S2Graph at Kakao, which it has been on production
>> for about 3 years.
>>
>> 1. Storage backend: HBase.
>>
>> We use HBase as primary storage. we choose it because we have lots of
>> batch processed data need to be bulk-loaded without affecting production
>> OLTP service. Also, HBase provides the linear scalability which keeps
>> operating simply.
>>
>> reference: https://www.slideshare.net/HBaseCon/use-cases-session-5
>>
>> Even though it is possible to use different backend storage in theory, I
>> think the preferred storage backend is still HBase.
>>
>> 2. Meta storage: Mysql.
>>
>> S2Graph store the schema of each Label, Service, etc.
>>
>> In short, S2Graph require this schema to serialize/deserialize user
>> provided vertex/edge to internal representation, bytes(KeyValue, Cell in
>> HBase world).
>>
>> We set up a separate meta storage server(MySql) to build admin tools that
>> keep track of all schemas. Any RDBMS that supports JDBC interface, such as
>> H2, Postgresql, ... etc can be used.
>>
>> Even though it is possible to use local embedded H2(which is our
>> default), it would be better to set up a separate server in production,
>> since multiple S2Graph rest server needs to connect to this meta storage.
>>
>> reference:
>> https://schd.ws/hosted_files/apachebigdata2016/03/S2Graph-%20Apache%20Big%20Data.pdf
>> (12~13 page)
>>
>> 3. The bridge between OLAP and OLTP: Kafka
>>
>> S2Graph can publish all incoming vertex/edge into Kafka as wal log. This
>> wal can be used for the streaming use case. At kakao, this wal log is
>> used for many use cases such as business intelligence and machine learning,
>> most importantly without knowing about S2Graph.
>>
>> 4. OLAP: Spark Structured Streaming, Spark.
>>
>> As I mentioned on 3, we use Spark to process wal log stream. I think any
>> streaming engine with Kafka connector is a good choice here. we choose
>> Spark just because we are familiar with this, but nothing special. OLAP
>> process is completely irrelevant to S2Graph.
>> Whatever system is used to process, the processed data can be easily
>> uploaded into HBase in both stream/bulk manner.
>>
>> Here is an example of how we use the wal(the slide presented at Apache
>> Con NA 2018 at Montreal).
>>
>>
>> https://docs.google.com/presentation/d/1Om_Wd0V97dKN7YP-43I35sfNdJB3JCrkruGuHVHSNl4/edit?usp=sharing
>>
>> 5. full text search: ElasticSearch
>>
>> We are experimenting ES to support full-text search on vertex property
>> value. For example, a query like "find friends of persons who are male and
>> age is 30" need to find out vertices that `gender:male and age:30`, then
>> start traversing friends relation from that vertices. In this case, we use
>> ES to find vertices who meet search.
>>
>>
>> I guess 1,2 are required for OLTP, 3,4,5 are optional but be very useful
>> for us.
>>
>> Thanks for asking the question and more than happy to hear what is use
>> case and discuss what S2Graph can do for it.
>>
>>
>>
>> On Thu, Jan 17, 2019 at 8:48 PM anuj kumar <an...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>  I am planning to use S2 Graph for our graph use cases and wanted to
>>> know what is the preferred Deployment Architecture for the S2Graph
>>> ecosystem as a whole?
>>>
>>> Thanks,
>>> --
>>> *Anuj Kumar*
>>>
>>
>
> --
> *Anuj Kumar*
>

Re: Deployment Architecture for s2Graph

Posted by anuj kumar <an...@gmail.com>.
Hi Thank you for the detailed email. This is very helpful. Is the support
for ES already available. This seems to be an important thing for us.

Thanks,
Anuj Kumar

On Mon, Jan 21, 2019 at 9:42 PM DO YUNG YOON <sh...@gmail.com> wrote:

> Hi anuj. Welcome to S2Graph.
>
> Here is how we deploy S2Graph at Kakao, which it has been on production
> for about 3 years.
>
> 1. Storage backend: HBase.
>
> We use HBase as primary storage. we choose it because we have lots of
> batch processed data need to be bulk-loaded without affecting production
> OLTP service. Also, HBase provides the linear scalability which keeps
> operating simply.
>
> reference: https://www.slideshare.net/HBaseCon/use-cases-session-5
>
> Even though it is possible to use different backend storage in theory, I
> think the preferred storage backend is still HBase.
>
> 2. Meta storage: Mysql.
>
> S2Graph store the schema of each Label, Service, etc.
>
> In short, S2Graph require this schema to serialize/deserialize user
> provided vertex/edge to internal representation, bytes(KeyValue, Cell in
> HBase world).
>
> We set up a separate meta storage server(MySql) to build admin tools that
> keep track of all schemas. Any RDBMS that supports JDBC interface, such as
> H2, Postgresql, ... etc can be used.
>
> Even though it is possible to use local embedded H2(which is our default),
> it would be better to set up a separate server in production, since
> multiple S2Graph rest server needs to connect to this meta storage.
>
> reference:
> https://schd.ws/hosted_files/apachebigdata2016/03/S2Graph-%20Apache%20Big%20Data.pdf
> (12~13 page)
>
> 3. The bridge between OLAP and OLTP: Kafka
>
> S2Graph can publish all incoming vertex/edge into Kafka as wal log. This
> wal can be used for the streaming use case. At kakao, this wal log is
> used for many use cases such as business intelligence and machine learning,
> most importantly without knowing about S2Graph.
>
> 4. OLAP: Spark Structured Streaming, Spark.
>
> As I mentioned on 3, we use Spark to process wal log stream. I think any
> streaming engine with Kafka connector is a good choice here. we choose
> Spark just because we are familiar with this, but nothing special. OLAP
> process is completely irrelevant to S2Graph.
> Whatever system is used to process, the processed data can be easily
> uploaded into HBase in both stream/bulk manner.
>
> Here is an example of how we use the wal(the slide presented at Apache
> Con NA 2018 at Montreal).
>
>
> https://docs.google.com/presentation/d/1Om_Wd0V97dKN7YP-43I35sfNdJB3JCrkruGuHVHSNl4/edit?usp=sharing
>
> 5. full text search: ElasticSearch
>
> We are experimenting ES to support full-text search on vertex property
> value. For example, a query like "find friends of persons who are male and
> age is 30" need to find out vertices that `gender:male and age:30`, then
> start traversing friends relation from that vertices. In this case, we use
> ES to find vertices who meet search.
>
>
> I guess 1,2 are required for OLTP, 3,4,5 are optional but be very useful
> for us.
>
> Thanks for asking the question and more than happy to hear what is use
> case and discuss what S2Graph can do for it.
>
>
>
> On Thu, Jan 17, 2019 at 8:48 PM anuj kumar <an...@gmail.com>
> wrote:
>
>> Hi,
>>  I am planning to use S2 Graph for our graph use cases and wanted to know
>> what is the preferred Deployment Architecture for the S2Graph ecosystem as
>> a whole?
>>
>> Thanks,
>> --
>> *Anuj Kumar*
>>
>

-- 
*Anuj Kumar*

Re: Deployment Architecture for s2Graph

Posted by DO YUNG YOON <sh...@gmail.com>.
Hi anuj. Welcome to S2Graph.

Here is how we deploy S2Graph at Kakao, which it has been on production for
about 3 years.

1. Storage backend: HBase.

We use HBase as primary storage. we choose it because we have lots of batch
processed data need to be bulk-loaded without affecting production OLTP
service. Also, HBase provides the linear scalability which keeps operating
simply.

reference: https://www.slideshare.net/HBaseCon/use-cases-session-5

Even though it is possible to use different backend storage in theory, I
think the preferred storage backend is still HBase.

2. Meta storage: Mysql.

S2Graph store the schema of each Label, Service, etc.

In short, S2Graph require this schema to serialize/deserialize user
provided vertex/edge to internal representation, bytes(KeyValue, Cell in
HBase world).

We set up a separate meta storage server(MySql) to build admin tools that
keep track of all schemas. Any RDBMS that supports JDBC interface, such as
H2, Postgresql, ... etc can be used.

Even though it is possible to use local embedded H2(which is our default),
it would be better to set up a separate server in production, since
multiple S2Graph rest server needs to connect to this meta storage.

reference:
https://schd.ws/hosted_files/apachebigdata2016/03/S2Graph-%20Apache%20Big%20Data.pdf
(12~13 page)

3. The bridge between OLAP and OLTP: Kafka

S2Graph can publish all incoming vertex/edge into Kafka as wal log.
This wal can
be used for the streaming use case. At kakao, this wal log is used for many
use cases such as business intelligence and machine learning, most
importantly without knowing about S2Graph.

4. OLAP: Spark Structured Streaming, Spark.

As I mentioned on 3, we use Spark to process wal log stream. I think any
streaming engine with Kafka connector is a good choice here. we choose
Spark just because we are familiar with this, but nothing special. OLAP
process is completely irrelevant to S2Graph.
Whatever system is used to process, the processed data can be easily
uploaded into HBase in both stream/bulk manner.

Here is an example of how we use the wal(the slide presented at Apache Con
NA 2018 at Montreal).

https://docs.google.com/presentation/d/1Om_Wd0V97dKN7YP-43I35sfNdJB3JCrkruGuHVHSNl4/edit?usp=sharing

5. full text search: ElasticSearch

We are experimenting ES to support full-text search on vertex property
value. For example, a query like "find friends of persons who are male and
age is 30" need to find out vertices that `gender:male and age:30`, then
start traversing friends relation from that vertices. In this case, we use
ES to find vertices who meet search.


I guess 1,2 are required for OLTP, 3,4,5 are optional but be very useful
for us.

Thanks for asking the question and more than happy to hear what is use case
and discuss what S2Graph can do for it.



On Thu, Jan 17, 2019 at 8:48 PM anuj kumar <an...@gmail.com> wrote:

> Hi,
>  I am planning to use S2 Graph for our graph use cases and wanted to know
> what is the preferred Deployment Architecture for the S2Graph ecosystem as
> a whole?
>
> Thanks,
> --
> *Anuj Kumar*
>