You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@s2graph.apache.org by DO YUNG YOON <sh...@gmail.com> on 2016/11/28 10:15:42 UTC

[DISCUSS] manage our subprojects

Hi folks.

I think we should discuss what we provide as subproject until next release.

Since initial code imports to apache, we have not worked on other
subprojects except s2core, s2rest_play.

Here is what you can find in each subproject(from our README).

   1. s2core: The core library, containing the data abstractions for graph
   entities, storage adapters and utilities.
   2. s2rest_play: The REST server built with Play framework
   <https://www.playframework.com/>, providing the write and query API.
   3. s2rest_netty: The REST server built directly using Netty,
   implementing only the query API.
   4. loader: A collection of Spark jobs for bulk loading streaming data
   into S2Graph.
   5. spark: Spark utilities for loader and s2counter_loader.
   6. s2counter_core: The core library providing data structures and logics
   for s2counter_loader.
   7. s2counter_loader: Spark streaming jobs that consume Kafka WAL logs
   and calculate various top-*K* results on-the-fly.


I want to suggest to merge loader, spark, s2counter_loader into one project
called s2loader, make it responsible for streaming/batch utils to work with
S2Graph.

The reason behind of this is improving codebase(we have lots of duplicate
codes currently and it seems quite abandoned).

Also documentations are missed so we should provide firm documentation to
help others to understand them.

Finally there is no specs and test cases. I think adding test cases is
important because we can start refactor our code to easily testable one.

I have opened discussion thread at
http://markmail.org/message/3j2hbfquwwybyz4e but not enough attention has
been showed, so please give any feedback on this so we can start to work on
our subprojects.

Thanks.

Re: [DISCUSS] manage our subprojects

Posted by daewon <da...@apache.org>.

I agree to integrate spark-related projects such as streaming and batch.

Because the above projects have similar functionality, there is a lot of
code duplication.

Project integration is expected to simplify project elimination and project
redundancy.

First, after integrating streaming and batch related projects, I would like
to discuss about removing the 's2rest_netty' project for http layer
integration.

S2GRAPH-85 (https://issues.apache.org/jira/browse/S2GRAPH-85) had a bit of
discussion about the http layer.

On Mon, Nov 28, 2016 at 7:16 PM DO YUNG YOON <sh...@gmail.com> wrote:

> Hi folks.
>
> I think we should discuss what we provide as subproject until next release.
>
> Since initial code imports to apache, we have not worked on other
> subprojects except s2core, s2rest_play.
>
> Here is what you can find in each subproject(from our README).
>
>    1. s2core: The core library, containing the data abstractions for graph
>    entities, storage adapters and utilities.
>    2. s2rest_play: The REST server built with Play framework
>    <https://www.playframework.com/>, providing the write and query API.
>    3. s2rest_netty: The REST server built directly using Netty,
>    implementing only the query API.
>    4. loader: A collection of Spark jobs for bulk loading streaming data
>    into S2Graph.
>    5. spark: Spark utilities for loader and s2counter_loader.
>    6. s2counter_core: The core library providing data structures and logics
>    for s2counter_loader.
>    7. s2counter_loader: Spark streaming jobs that consume Kafka WAL logs
>    and calculate various top-*K* results on-the-fly.
>
>
> I want to suggest to merge loader, spark, s2counter_loader into one project
> called s2loader, make it responsible for streaming/batch utils to work with
> S2Graph.
>
> The reason behind of this is improving codebase(we have lots of duplicate
> codes currently and it seems quite abandoned).
>
> Also documentations are missed so we should provide firm documentation to
> help others to understand them.
>
> Finally there is no specs and test cases. I think adding test cases is
> important because we can start refactor our code to easily testable one.
>
> I have opened discussion thread at
> http://markmail.org/message/3j2hbfquwwybyz4e but not enough attention has
> been showed, so please give any feedback on this so we can start to work on
> our subprojects.
>
> Thanks.
>