You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by "DOYUNG YOON (JIRA)" <ji...@apache.org> on 2018/05/08 06:15:00 UTC
[jira] [Updated] (S2GRAPH-213) Abstract Query/Mutation from Storage.

     [ https://issues.apache.org/jira/browse/S2GRAPH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DOYUNG YOON updated S2GRAPH-213:
--------------------------------
    Description: 
Currently, Storage has many components to implement, and most of them are very specific to HBase storage backend(ex: WriteWriteConflictResolver).

Even though it is possible to add new storage backend using current abstraction, Storage, there are too many works which is not common to all storage backend

I am suggesting following simple interface which is all common for any storage backend to be integrated with.
{noformat}
trait Fetcher {

  def init(config: Config)(implicit ec: ExecutionContext): Future[Fetcher] 
  
  def fetches(queryRequests: Seq[QueryRequest],
              prevStepEdges: Map[VertexId, Seq[EdgeWithScore]])(implicit ec: ExecutionContext): Future[Seq[StepResult]]

  def close(): Unit
}
{noformat}
{noformat}
trait Mutator {
  def mutateVertex(zkQuorum: String, 
                   vertex: S2VertexLike, 
                   withWait: Boolean)(implicit ec: ExecutionContext): Future[MutateResponse]

def mutateStrongEdges(zkQuorum: String, 
                      edges: Seq[S2EdgeLike], 
                      withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[Boolean]]

def mutateWeakEdges(zkQuorum: String, 
                    edges: Seq[S2EdgeLike], 
                    withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[(Int, Boolean)]]

def incrementCounts(zkQuorum: String, 
                    edges: Seq[S2EdgeLike], 
                    withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[MutateResponse]]

def updateDegree(zkQuorum: String, 
                 edge: S2EdgeLike, 
                 degreeVal: Long = 0)(implicit ec: ExecutionContext): Future[MutateResponse]

def deleteAllFetchedEdgesAsyncOld(stepInnerResult: StepResult,
                                  requestTs: Long,
                                  retryNum: Int)(implicit ec: ExecutionContext): Future[Boolean]
}
{noformat}
By abstracting query/mutation as above interface, it is possible to implement JDBCFetcher and JDBCMutator that read/write vertex and edge into any JDBC enabled storage only implementing above interfaces.

One thing to discuss is how we are going to maintain the information about what ServiceColumn/Label use which storage implementation.

The naive solution would be store configuration for storage backend into ServiceColumn/Label's options field, which accepts JSON, and make S2Graph instance to maintain the mapping of what ServiceColumn/Label use which storage implementation.

I think above abstraction make it possible to use different implementation per each ServiceColumn/Label, and more importantly, the user can provide their own implementation.

For example, storing `User` vertex into Postgresql and `Friends` edges into HBase can be possible. Also, users who do not want to use S2Graph for vertex, do not need to store vertex at all, but by implementing `Fetcher` interface they can still traverse vertex as they are stored in S2Graph.

Come up with this suggestion while working on S2GRAPH-206, since model serving requires different implementation for `Fetcher` per model.

  was:

Currently, Storage has many components to implement, and most of them are very specific to HBase storage backend(ex: WriteWriteConflictResolver).

Even though it is possible to add new storage backend using current abstraction, Storage, there are too many works which is not common to all storage backend

I am suggesting following simple interface which is all common for any storage backend to be integrated with.

{noformat}
trait Fetcher {

def init(config: Config)(implicit ec: ExecutionContext): Future[Fetcher] =
Future.successful(this)

def fetches(queryRequests: Seq[QueryRequest],
prevStepEdges: Map[VertexId, Seq[EdgeWithScore]])(implicit ec: ExecutionContext): Future[Seq[StepResult]]

def close(): Unit = {}
}
{noformat}

{noformat}
trait Mutator {
def mutateVertex(zkQuorum: String, vertex: S2VertexLike, withWait: Boolean)(implicit ec: ExecutionContext): Future[MutateResponse]

def mutateStrongEdges(zkQuorum: String, _edges: Seq[S2EdgeLike], withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[Boolean]]

def mutateWeakEdges(zkQuorum: String, _edges: Seq[S2EdgeLike], withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[(Int, Boolean)]]

def incrementCounts(zkQuorum: String, edges: Seq[S2EdgeLike], withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[MutateResponse]]

def updateDegree(zkQuorum: String, edge: S2EdgeLike, degreeVal: Long = 0)(implicit ec: ExecutionContext): Future[MutateResponse]

def deleteAllFetchedEdgesAsyncOld(stepInnerResult: StepResult,
requestTs: Long,
retryNum: Int)(implicit ec: ExecutionContext): Future[Boolean]
}
{noformat}

By abstracting query/mutation as above interface, it is possible to implement JDBCFetcher and JDBCMutator that read/write vertex and edge into any JDBC enabled storage only implementing above interfaces.

One thing to discuss is how we are going to maintain the information about what ServiceColumn/Label use which storage implementation.

The naive solution would be store configuration for storage backend into ServiceColumn/Label's options field, which accepts JSON, and make S2Graph instance to maintain the mapping of what ServiceColumn/Label use which storage implementation.

I think above abstraction make it possible to use different implementation per each ServiceColumn/Label, and more importantly, the user can provide their own implementation.

For example, storing `User` vertex into Postgresql and `Friends` edges into HBase can be possible. Also, users who do not want to use S2Graph for vertex, do not need to store vertex at all, but by implementing `Fetcher` interface they can still traverse vertex as they are stored in S2Graph. 

Come up with this suggestion while working on S2GRAPH-206, since model serving requires different implementation for `Fetcher` per model.



> Abstract Query/Mutation from Storage.
> -------------------------------------
>
>                 Key: S2GRAPH-213
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-213
>             Project: S2Graph
>          Issue Type: Improvement
>          Components: s2core
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>            Priority: Major
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently, Storage has many components to implement, and most of them are very specific to HBase storage backend(ex: WriteWriteConflictResolver).
> Even though it is possible to add new storage backend using current abstraction, Storage, there are too many works which is not common to all storage backend
> I am suggesting following simple interface which is all common for any storage backend to be integrated with.
> {noformat}
> trait Fetcher {
>   def init(config: Config)(implicit ec: ExecutionContext): Future[Fetcher] 
>   
>   def fetches(queryRequests: Seq[QueryRequest],
>               prevStepEdges: Map[VertexId, Seq[EdgeWithScore]])(implicit ec: ExecutionContext): Future[Seq[StepResult]]
>   def close(): Unit
> }
> {noformat}
> {noformat}
> trait Mutator {
>   def mutateVertex(zkQuorum: String, 
>                    vertex: S2VertexLike, 
>                    withWait: Boolean)(implicit ec: ExecutionContext): Future[MutateResponse]
> def mutateStrongEdges(zkQuorum: String, 
>                       edges: Seq[S2EdgeLike], 
>                       withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[Boolean]]
> def mutateWeakEdges(zkQuorum: String, 
>                     edges: Seq[S2EdgeLike], 
>                     withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[(Int, Boolean)]]
> def incrementCounts(zkQuorum: String, 
>                     edges: Seq[S2EdgeLike], 
>                     withWait: Boolean)(implicit ec: ExecutionContext): Future[Seq[MutateResponse]]
> def updateDegree(zkQuorum: String, 
>                  edge: S2EdgeLike, 
>                  degreeVal: Long = 0)(implicit ec: ExecutionContext): Future[MutateResponse]
> def deleteAllFetchedEdgesAsyncOld(stepInnerResult: StepResult,
>                                   requestTs: Long,
>                                   retryNum: Int)(implicit ec: ExecutionContext): Future[Boolean]
> }
> {noformat}
> By abstracting query/mutation as above interface, it is possible to implement JDBCFetcher and JDBCMutator that read/write vertex and edge into any JDBC enabled storage only implementing above interfaces.
> One thing to discuss is how we are going to maintain the information about what ServiceColumn/Label use which storage implementation.
> The naive solution would be store configuration for storage backend into ServiceColumn/Label's options field, which accepts JSON, and make S2Graph instance to maintain the mapping of what ServiceColumn/Label use which storage implementation.
> I think above abstraction make it possible to use different implementation per each ServiceColumn/Label, and more importantly, the user can provide their own implementation.
> For example, storing `User` vertex into Postgresql and `Friends` edges into HBase can be possible. Also, users who do not want to use S2Graph for vertex, do not need to store vertex at all, but by implementing `Fetcher` interface they can still traverse vertex as they are stored in S2Graph.
> Come up with this suggestion while working on S2GRAPH-206, since model serving requires different implementation for `Fetcher` per model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)