You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "DOYUNG YOON (JIRA)" <ji...@apache.org> on 2018/06/25 08:12:00 UTC

[jira] [Updated] (S2GRAPH-226) Provide example spark jobs to explain how to utilize WAL log.

     [ https://issues.apache.org/jira/browse/S2GRAPH-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DOYUNG YOON updated S2GRAPH-226:
--------------------------------
    Description: 
Even though s2graph publish all incoming vertex/edge into Kafka, there is no example showing how to use this WAL log.

I suggest adding a simple example showing how to process WAL and let me explain what use cases this example can benefit.

At kakao, s2graph have been used as the fact storage, which store all user's activities such as click content, buy a product, search query.


{noformat}
[{
	"timestamp": 1,
	"elem": "e",
	"from": "steamshon",
	"to": "s2graph",
	"label": "search_query",
	"props": {}
}, {
	"timestamp": 10,
	"elem": "e",
	"from": "steamshon",
	"to": "github.com/apache/incubator-s2graph",
	"label": "content_click",
	"props": {}
}, {
	"timestamp": 12,
	"elem": "v",
	"id": "steamshon",
	"serviceName": "s2graph",
	"columnName": "user",
	"props": {
		"gender": "M"
	}
}]
{noformat}

Each activity, label in s2graph words, consisting of their own graph, but when they are all connected together, then it gives much more information. 

Above edges can be aggregated as Vertex.

It is up to users how to connect each graph, but in our case, we used `user` to merge multiple graphs. for example, we made each activity such as click content, buy a product, search query all use the same `userId` for the same `user`. 

Below is simple example data.

{noformat}
{
	"timestamp": 10,
	"elem": "v",
	"id": "steamshon",
	"serviceName": "s2graph",
	"columnName": "user",
	"props": {
		"gender": "M",
		"edges": [{
			"timestamp": 1,
			"to": "s2graph",
			"label": "search_query",
			"props": {}
		}, {
			"timestamp": 10,
			"to": "github.com/apache/incubator-s2graph",
			"label": "content_click",
			"props": {}
		}]
	}
}
{noformat}

This connected graph can be used not only for OLTP but also OLAP.

I believe s2graph WAL log is good way to integrate OLTP and OLAP, and adding this example can help for user to understand how to leverage it.

  was:

Even though s2graph publish all incoming vertex/edge into Kafka, there is no example showing how to use this WAL log.

I suggest adding a simple example showing how to process WAL and let me explain what use cases this example can benefit.

At kakao, s2graph have been used as the fact storage, which store all user's activities such as click content, buy a product, search query.


{noformat}
[{
	"timestamp": 1,
	"elem": "e",
	"from": "steamshon",
	"to": "s2graph",
	"label": "search_query",
	"props": {}
}, {
	"timestamp": 10,
	"elem": "e",
	"from": "steamshon",
	"to": "github.com/apache/incubator-s2graph",
	"label": "content_click",
	"props": {}
}, {
	"timestamp": 12,
	"elem": "v",
	"id": "steamshon",
	"serviceName": "s2graph",
	"columnName": "user",
	"props": {
		"gender": "M"
	}
}]
{noformat}

Each activity, label in s2graph words, consisting of their own graph, but when they are all connected together, then it gives much more information. 

Above edges can be aggregated as Vertex.

It is up to users how to connect each graph, but in our case, we used `user` to merge multiple graphs. for example, we made each activity such as click content, buy a product, search query all use the same `userId` for the same `user`. 

Below is simple example data.

{noformat}
{
	"timestamp": 10,
	"elem": "v",
	"id": "steamshon",
	"serviceName": "s2graph",
	"columnName": "user",
	"props": {
		"gender": "M",
		"agg": [{
			"timestamp": 1,
			"to": "s2graph",
			"label": "search_query",
			"props": {}
		}, {
			"timestamp": 10,
			"to": "github.com/apache/incubator-s2graph",
			"label": "content_click",
			"props": {}
		}]
	}
}
{noformat}

This connected graph can be used not only for OLTP but also OLAP.

I believe s2graph WAL log is good way to integrate OLTP and OLAP, and adding this example can help for user to understand how to leverage it.


> Provide example spark jobs to explain how to utilize WAL log.
> -------------------------------------------------------------
>
>                 Key: S2GRAPH-226
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-226
>             Project: S2Graph
>          Issue Type: New Feature
>          Components: s2core, s2jobs
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Major
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Even though s2graph publish all incoming vertex/edge into Kafka, there is no example showing how to use this WAL log.
> I suggest adding a simple example showing how to process WAL and let me explain what use cases this example can benefit.
> At kakao, s2graph have been used as the fact storage, which store all user's activities such as click content, buy a product, search query.
> {noformat}
> [{
> 	"timestamp": 1,
> 	"elem": "e",
> 	"from": "steamshon",
> 	"to": "s2graph",
> 	"label": "search_query",
> 	"props": {}
> }, {
> 	"timestamp": 10,
> 	"elem": "e",
> 	"from": "steamshon",
> 	"to": "github.com/apache/incubator-s2graph",
> 	"label": "content_click",
> 	"props": {}
> }, {
> 	"timestamp": 12,
> 	"elem": "v",
> 	"id": "steamshon",
> 	"serviceName": "s2graph",
> 	"columnName": "user",
> 	"props": {
> 		"gender": "M"
> 	}
> }]
> {noformat}
> Each activity, label in s2graph words, consisting of their own graph, but when they are all connected together, then it gives much more information. 
> Above edges can be aggregated as Vertex.
> It is up to users how to connect each graph, but in our case, we used `user` to merge multiple graphs. for example, we made each activity such as click content, buy a product, search query all use the same `userId` for the same `user`. 
> Below is simple example data.
> {noformat}
> {
> 	"timestamp": 10,
> 	"elem": "v",
> 	"id": "steamshon",
> 	"serviceName": "s2graph",
> 	"columnName": "user",
> 	"props": {
> 		"gender": "M",
> 		"edges": [{
> 			"timestamp": 1,
> 			"to": "s2graph",
> 			"label": "search_query",
> 			"props": {}
> 		}, {
> 			"timestamp": 10,
> 			"to": "github.com/apache/incubator-s2graph",
> 			"label": "content_click",
> 			"props": {}
> 		}]
> 	}
> }
> {noformat}
> This connected graph can be used not only for OLTP but also OLAP.
> I believe s2graph WAL log is good way to integrate OLTP and OLAP, and adding this example can help for user to understand how to leverage it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)