You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/04/24 16:23:48 UTC

A interesting comment about "computational loci" from TinkerPop JIRA -- Matt Frantz.

Hi,

The following selectList ticket is long, but last night Matt Frantz has a side comment that I think is worth discussing on dev@.

https://issues.apache.org/jira/browse/TINKERPOP3-639?focusedCommentId=14510009&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14510009

I put it in this Gist as in my browser, the above link doesn't really focus on the comment (its "lower" than where the browser takes me).

https://gist.github.com/okram/1080927667d0a515afd3

What you propose here is a sort of denormalization of the graph. That is, replicate data so you reduce the number of "joins" (message passes) you have to do. While you will "significantly reduce the OLAP message bandwidth" what you have done in essence is greatly increase the amount of data that each machine node in the cluster must handle. Assume that your radius is 2, then that means each compute vertex has a representation of itself and all its adjacent vertices (the "sphere of influence" of the compute vertex). Given that the number of edges is typically 10-100x the number of vertices in the graph (for natural graphs), that means that amount of data you are duplicating is (10-100x)^2 for a radius of 2. For a radius of 3, its (10-100x)^3. So forth and so on...

The goal you are trying to achieve is to "significantly reduce the OLAP message bandwidth." I concur this is a worthwhile goal. However, if the message pass occurs within the same machine because the two communicating vertices are physically co-located, then your messages 1) don't get serialized, 2) don't get written over the wire, and 3) don't get deserialized. Thus, the goal you are trying to achieve seems best solved by ensuring you have a good partitioning algorithm for your graph. That is, for the particular traversal you are doing, you want to make sure that the current compute vertex is sending messages to adjacents that are physically co-located with the current vertex -- measured as some local vs. remote "hit rate" (i.e. where the "cache" is the local machine).

Next, you note a particular restriction on your algorithm: "the local subgraph may not contain any other loci." While that maintains the radius of 1 data size, I wonder a few things:

1. How do you determine whether the local subgraphs overlap in a distributed fashion that is cheap?
2. Only certain vertices will have the richness of deep subgraphs, while others will have to message pass immediately. Thus, you are back to same "bandwidth" problem.

Thanks for the thoughts and the conversation,
Marko.

http://markorodriguez.com

Re: A interesting comment about "computational loci" from TinkerPop JIRA -- Matt Frantz.

Posted by Matt Frantz <ma...@gmail.com>.

Did some homework on PartitionStrategy in light of these suggestions.  (We
are planning to use Titan, but are unsure whether OLAP is appropriate for
all of our bulk processing needs.)  If I understand Matthias's suggestion,
we would define a partition for each of the root "A" vertices and the trees
attached to them.  At some of our customer sites, the number of roots would
be tens if not hundreds of thousands.  Is that appropriate for Titan?  Are
there disadvantages to a highly partitioned graph in the OLTP context?

There are also at least two algorithms that have the same pattern
(tree-shaped subgraphs around each locus of computation).  I assume we
cannot have two different partitioning schemes without redundantly loading
the data.  Is that the case?


On Fri, Apr 24, 2015 at 10:39 AM, Matthias Broecheler <me...@matthiasb.com>
wrote:

> This is an interesting comment as it goes back to the fundamental tradeoff
> between denormalization, storage overhead and partitioning. Denormalization
> colocates the data but at the cost of data redundancy. Given their
> combinatorial nature, such redundancies quickly explode in graphs. A
> normalized graph has the lowest storage footprint but comes at the cost of
> having to do lots of "hops" - either in memory or across machine boundaries
> - which are expensive. Finally, graph partitioning attempts to keep
> frequently accessed data together while keeping it normalized but comes at
> the expense of having to compute "near optimal" partitions which is a very
> hard problem.
>
> So, the answer lies probably in the middle somewhere and depends on the use
> case. For the example that Matt gave, I would argue that partitioning is
> the best approach since you already know what to partition by (the root
> nodes) and it sounds like this would result in pretty clean partitions. How
> that is done exactly would be up to the vendor implementation (I assume).
> Titan, for instance, allows you to install a partitioning strategy to do
> such things.
> But then, the question is how that partitioning strategy could be extended
> to the OLAP world. That's a question we don't have an answer to yet. Even
> if the data is perfectly partitioning in Titan, how can we tell Spark - for
> instance - how to keep the data in this shape so that we can reuse the TP3
> OLAP Spark engine?
>
> On Fri, Apr 24, 2015 at 10:04 AM Marko Rodriguez <ok...@gmail.com>
> wrote:
>
> > Hi,
> >
> > > It sounds like what I want is for my OLAP partitioning algorithm
> (which I
> > > need to research) to be aware of this structure so as to ensure that
> > these
> > > subgraphs (trees) attached to each A vertex are always colocated with
> > their
> > > A vertex.  How is partitioning specified for a given VertexProgram?
> >
> > This is currently handled by the underlying vendor. For HadoopGraph, you
> > have key/value partitions for your HDFS data file. For
> SparkGraphComputer,
> > you have a JavaPairRDD partitioner. For GiraphGraphComputer, you have a
> > vertex partitioner. For distributed databases like Titan, there is a
> vertex
> > partitioner as well.
> >
> > HTH,
> > Marko.
> >
> > http://markorodriguez.com
>

Re: A interesting comment about "computational loci" from TinkerPop JIRA -- Matt Frantz.

Posted by Matthias Broecheler <me...@matthiasb.com>.

This is an interesting comment as it goes back to the fundamental tradeoff
between denormalization, storage overhead and partitioning. Denormalization
colocates the data but at the cost of data redundancy. Given their
combinatorial nature, such redundancies quickly explode in graphs. A
normalized graph has the lowest storage footprint but comes at the cost of
having to do lots of "hops" - either in memory or across machine boundaries
- which are expensive. Finally, graph partitioning attempts to keep
frequently accessed data together while keeping it normalized but comes at
the expense of having to compute "near optimal" partitions which is a very
hard problem.

So, the answer lies probably in the middle somewhere and depends on the use
case. For the example that Matt gave, I would argue that partitioning is
the best approach since you already know what to partition by (the root
nodes) and it sounds like this would result in pretty clean partitions. How
that is done exactly would be up to the vendor implementation (I assume).
Titan, for instance, allows you to install a partitioning strategy to do
such things.
But then, the question is how that partitioning strategy could be extended
to the OLAP world. That's a question we don't have an answer to yet. Even
if the data is perfectly partitioning in Titan, how can we tell Spark - for
instance - how to keep the data in this shape so that we can reuse the TP3
OLAP Spark engine?

On Fri, Apr 24, 2015 at 10:04 AM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> > It sounds like what I want is for my OLAP partitioning algorithm (which I
> > need to research) to be aware of this structure so as to ensure that
> these
> > subgraphs (trees) attached to each A vertex are always colocated with
> their
> > A vertex.  How is partitioning specified for a given VertexProgram?
>
> This is currently handled by the underlying vendor. For HadoopGraph, you
> have key/value partitions for your HDFS data file. For SparkGraphComputer,
> you have a JavaPairRDD partitioner. For GiraphGraphComputer, you have a
> vertex partitioner. For distributed databases like Titan, there is a vertex
> partitioner as well.
>
> HTH,
> Marko.
>
> http://markorodriguez.com

Re: A interesting comment about "computational loci" from TinkerPop JIRA -- Matt Frantz.

Posted by Marko Rodriguez <ok...@gmail.com>.

Hi,

> It sounds like what I want is for my OLAP partitioning algorithm (which I
> need to research) to be aware of this structure so as to ensure that these
> subgraphs (trees) attached to each A vertex are always colocated with their
> A vertex.  How is partitioning specified for a given VertexProgram?

This is currently handled by the underlying vendor. For HadoopGraph, you have key/value partitions for your HDFS data file. For SparkGraphComputer, you have a JavaPairRDD partitioner. For GiraphGraphComputer, you have a vertex partitioner. For distributed databases like Titan, there is a vertex partitioner as well.

HTH,
Marko.

http://markorodriguez.com

Re: A interesting comment about "computational loci" from TinkerPop JIRA -- Matt Frantz.

Posted by Matt Frantz <ma...@gmail.com>.

First a few relevant features of my particular use case:

* My schema contains different types (T.label) of vertices such that at a
particular vertex of type "A", there is a subgraph that is a tree whose
root is attached to "A", and which consists of other types of vertices (B,
C, D...).
* Certain algorithms in our domain can partition naturally on the A
vertices.  That is, my OLAP query might start {{g.V().has(T.label, "A")}},
and the A vertices are my computational loci.
* The computation across the subgraph for a given A vertex is
computationally intense, and will probably dominate performance, although I
have not benchmarked it, so I could be wrong and be suffering from
premature optimization toxicity.

With such a schema, the "radius" of the VertexProgram is actually defined
as a traversal that follows specific edges from the loci vertices.  The
"radius" concept is more general than the integer that I used in my
example.  It is more like a subgraph predicate, e.g. __.out("B").out("C")...

I had considered denormalizing my graph so that the trees were stored in
properties of A vertices.  That would mean that I couldn't use Gremlin to
compute on them, but it would colocate the data.  Or, if I want to continue
to use Gremlin, I could "unpack" them into an in-memory graph.

It sounds like what I want is for my OLAP partitioning algorithm (which I
need to research) to be aware of this structure so as to ensure that these
subgraphs (trees) attached to each A vertex are always colocated with their
A vertex.  How is partitioning specified for a given VertexProgram?

On Fri, Apr 24, 2015 at 7:23 AM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi,
>
> The following selectList ticket is long, but last night Matt Frantz has a
> side comment that I think is worth discussing on dev@.
>
>
> https://issues.apache.org/jira/browse/TINKERPOP3-639?focusedCommentId=14510009&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14510009
>
> I put it in this Gist as in my browser, the above link doesn't really
> focus on the comment (its "lower" than where the browser takes me).
>
>         https://gist.github.com/okram/1080927667d0a515afd3
>
> What you propose here is a sort of denormalization of the graph. That is,
> replicate data so you reduce the number of "joins" (message passes) you
> have to do. While you will "significantly reduce the OLAP message
> bandwidth" what you have done in essence is greatly increase the amount of
> data that each machine node in the cluster must handle. Assume that your
> radius is 2, then that means each compute vertex has a representation of
> itself and all its adjacent vertices (the "sphere of influence" of the
> compute vertex). Given that the number of edges is typically 10-100x the
> number of vertices in the graph (for natural graphs), that means that
> amount of data you are duplicating is (10-100x)^2 for a radius of 2. For a
> radius of 3, its (10-100x)^3. So forth and so on...
>
> The goal you are trying to achieve is to "significantly reduce the OLAP
> message bandwidth." I concur this is a worthwhile goal. However, if the
> message pass occurs within the same machine because the two communicating
> vertices are physically co-located, then your messages 1) don't get
> serialized, 2) don't get written over the wire, and 3) don't get
> deserialized. Thus, the goal you are trying to achieve seems best solved by
> ensuring you have a good partitioning algorithm for your graph. That is,
> for the particular traversal you are doing, you want to make sure that the
> current compute vertex is sending messages to adjacents that are physically
> co-located with the current vertex -- measured as some local vs. remote
> "hit rate" (i.e. where the "cache" is the local machine).
>
> Next, you note a particular restriction on your algorithm: "the local
> subgraph may not contain any other loci." While that maintains the radius
> of 1 data size, I wonder a few things:
>
>         1. How do you determine whether the local subgraphs overlap in a
> distributed fashion that is cheap?
>         2. Only certain vertices will have the richness of deep subgraphs,
> while others will have to message pass immediately. Thus, you are back to
> same "bandwidth" problem.
>
> Thanks for the thoughts and the conversation,
> Marko.
>
> http://markorodriguez.com
>
>