You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Marko Rodriguez <ok...@gmail.com> on 2015/08/31 20:06:22 UTC

Diffusion Algorithms with Gremlin

Hello,

In TinkerPop 3.0.0, at the last minute, I got rid of the sack() merge operator as I was lost in how we would do merge traverser sacks. Why did I get gun shy? Watch:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.withSack(1.0).V(1).outE().inV().sack()
==>1.0
==>1.0
==>1.0

So v[1] has 3 outgoing edges and so it generates three traversers and the sack of the parent traverser is simply copied to the children. What if you want to preserve a total sack value in the traversal. That is, there can never be more than 1.0 "energy" in the graph. Well, if your edges have weights (lets say) and they always are fraction of 1.0, well that would work. *** NOTE  that the TinkerGraph modern graph is not 1.0 normalized. ***

gremlin> g.withSack(1.0d).V(1).outE().sack(mult).by('weight').inV().sack()
==>0.4
==>0.5
==>1.0

Hmm.. but if they are not 1.0 normalized well, that sucks. And its hard to keep such data consistent with large graphs constantly adding and removing edges…And what if you don't have weights!!?

For the last week I was going down this rabbit hole of "split operators" for the withSack() source method. It was all nasty and convoluted. However, last night and into this morning, I think we can solve this simply with a "barrier consumer." Watch.

gremlin> g.withSack(1.0).V(1).local(outE().barrier(normalizeSack)).inV().sack()
==>0.3333333333333333
==>0.3333333333333333
==>0.3333333333333333

Cool. What about if you want to have it normalized by edge weights?

gremlin> g.withSack(1.0).V(1).local(outE().sack(mult).by('weight').barrier(normalizeSack)).inV().sack()
==>0.2105263157894737
==>0.2631578947368421
==>0.5263157894736842

Notice how local(barrier()) gathers all the traversers generated by outE() for the current object and then allows you to do some mutation on them. NormalizeSack simply recompute sacks based on the aggregate.

With this, the merge operator can easily just be: g.withSack(1.0,sum). And then, we can support furcating and converging "energy" in the graph.

This will lead into some very very trippy (theoretical) work I've been doing with constructive and destructive wave interference models with Gremlin. We will be able to support "optical algorithms" -- refraction, diffusion, interference, etc.

Any thoughts/concerns/recommendations?

Marko.

http://markorodriguez.com

Re: Diffusion Algorithms with Gremlin

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello Matt (others),

Perhaps you have some thoughts on this issue:
	https://issues.apache.org/jira/browse/TINKERPOP3-825

Finally, here is the new sack-merge operator in action. As you can see, we can now do energy diffusions. If you make your sack value a function of sin/cos, then you can simulate wave dynamics and all the neat things that come with that (e.g. quantum probabilities due to superposition of traverser state).

// starting a marko, the traverser has a sack of 1.0.
gremlin> g.withSack(1.0d,sum).V(1).sack()
==>1.0

// taking the outgoing knows-edges from marko, the sacks of the split traversers are normalized.
gremlin> g.withSack(1.0d,sum).V(1).local(out('knows').barrier(normSack)).sack()
==>0.5
==>0.5

// because the merge operator is sum, sacks collapse via sum
gremlin> g.withSack(1.0d,sum).V(1).local(out('knows').barrier(normSack)).in('knows').barrier().sack()
==>1.0
==>1.0

Why two 1.0s?!?! Well, that is what the ticket above is all about. The "right" solution is:

gremlin> g.withSack(1.0d,sum).V(1).local(out('knows').barrier(normSack)).in('knows').barrier().sideEffect{it.setBulk(1)}.sack()
==>1.0

Please provide any feedback you may have,
Marko.

http://markorodriguez.com

On Sep 1, 2015, at 9:57 AM, Marko Rodriguez <ok...@gmail.com> wrote:

> Hello,
> 
>> You say "the merge operator can easily just be: g.withSack(1.0, sum)", but
>> that is the syntax for the split operator, so do you need a third parameter
>> there?
> 
> The are multiple overloads. Merge is a BiFunction and split is a UnaryOperator -- see new_merge/ branch.
> 
>> Also, is normalizeSack a user-defined traversal?  In this case, it would be
>> doing math on the floating point sack values?
> 
> No, its a Consumer<TraverserSet<?>> and used by CollectingBarrierStep.
> 
>> One of the interesting (and relevant for my application) possibilities is
>> to direct the priority of the traverser execution based on what you might
>> call "propagation energy".  That is, with each split or merge (or based on
>> other "amplification" criteria), the traversers lose momentum, and the
>> limited compute resources are applied to the traversers with the highest
>> momentum.  I suppose this is a vendor-specific feature, but would there be
>> a story for directing the priority of execution of the set of traversers?
> 
> You can always where(sack().gt(0.01)).
> 
> HTH,
> Marko.
> 
> http://markorodriguez.com
> 
> 
>> 
>> On Mon, Aug 31, 2015 at 11:06 AM, Marko Rodriguez <ok...@gmail.com>
>> wrote:
>> 
>>> Hello,
>>> 
>>> In TinkerPop 3.0.0, at the last minute, I got rid of the sack() merge
>>> operator as I was lost in how we would do merge traverser sacks. Why did I
>>> get gun shy? Watch:
>>> 
>>> gremlin> g = TinkerFactory.createModern().traversal()
>>> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
>>> gremlin> g.withSack(1.0).V(1).outE().inV().sack()
>>> ==>1.0
>>> ==>1.0
>>> ==>1.0
>>> 
>>> So v[1] has 3 outgoing edges and so it generates three traversers and the
>>> sack of the parent traverser is simply copied to the children. What if you
>>> want to preserve a total sack value in the traversal. That is, there can
>>> never be more than 1.0 "energy" in the graph. Well, if your edges have
>>> weights (lets say) and they always are fraction of 1.0, well that would
>>> work. *** NOTE  that the TinkerGraph modern graph is not 1.0 normalized. ***
>>> 
>>> gremlin> g.withSack(1.0d).V(1).outE().sack(mult).by('weight').inV().sack()
>>> ==>0.4
>>> ==>0.5
>>> ==>1.0
>>> 
>>> Hmm.. but if they are not 1.0 normalized well, that sucks. And its hard to
>>> keep such data consistent with large graphs constantly adding and removing
>>> edges…And what if you don't have weights!!?
>>> 
>>> For the last week I was going down this rabbit hole of "split operators"
>>> for the withSack() source method. It was all nasty and convoluted. However,
>>> last night and into this morning, I think we can solve this simply with a
>>> "barrier consumer." Watch.
>>> 
>>> gremlin>
>>> g.withSack(1.0).V(1).local(outE().barrier(normalizeSack)).inV().sack()
>>> ==>0.3333333333333333
>>> ==>0.3333333333333333
>>> ==>0.3333333333333333
>>> 
>>> Cool. What about if you want to have it normalized by edge weights?
>>> 
>>> gremlin>
>>> g.withSack(1.0).V(1).local(outE().sack(mult).by('weight').barrier(normalizeSack)).inV().sack()
>>> ==>0.2105263157894737
>>> ==>0.2631578947368421
>>> ==>0.5263157894736842
>>> 
>>> Notice how local(barrier()) gathers all the traversers generated by outE()
>>> for the current object and then allows you to do some mutation on them.
>>> NormalizeSack simply recompute sacks based on the aggregate.
>>> 
>>> With this, the merge operator can easily just be: g.withSack(1.0,sum). And
>>> then, we can support furcating and converging "energy" in the graph.
>>> 
>>> This will lead into some very very trippy (theoretical) work I've been
>>> doing with constructive and destructive wave interference models with
>>> Gremlin. We will be able to support "optical algorithms" -- refraction,
>>> diffusion, interference, etc.
>>> 
>>> Any thoughts/concerns/recommendations?
>>> 
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> 
>

Re: Diffusion Algorithms with Gremlin

Posted by Marko Rodriguez <ok...@gmail.com>.

Hello,

> You say "the merge operator can easily just be: g.withSack(1.0, sum)", but
> that is the syntax for the split operator, so do you need a third parameter
> there?

The are multiple overloads. Merge is a BiFunction and split is a UnaryOperator -- see new_merge/ branch.

> Also, is normalizeSack a user-defined traversal?  In this case, it would be
> doing math on the floating point sack values?

No, its a Consumer<TraverserSet<?>> and used by CollectingBarrierStep.

> One of the interesting (and relevant for my application) possibilities is
> to direct the priority of the traverser execution based on what you might
> call "propagation energy".  That is, with each split or merge (or based on
> other "amplification" criteria), the traversers lose momentum, and the
> limited compute resources are applied to the traversers with the highest
> momentum.  I suppose this is a vendor-specific feature, but would there be
> a story for directing the priority of execution of the set of traversers?

You can always where(sack().gt(0.01)).

HTH,
Marko.

http://markorodriguez.com


> 
> On Mon, Aug 31, 2015 at 11:06 AM, Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hello,
>> 
>> In TinkerPop 3.0.0, at the last minute, I got rid of the sack() merge
>> operator as I was lost in how we would do merge traverser sacks. Why did I
>> get gun shy? Watch:
>> 
>> gremlin> g = TinkerFactory.createModern().traversal()
>> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
>> gremlin> g.withSack(1.0).V(1).outE().inV().sack()
>> ==>1.0
>> ==>1.0
>> ==>1.0
>> 
>> So v[1] has 3 outgoing edges and so it generates three traversers and the
>> sack of the parent traverser is simply copied to the children. What if you
>> want to preserve a total sack value in the traversal. That is, there can
>> never be more than 1.0 "energy" in the graph. Well, if your edges have
>> weights (lets say) and they always are fraction of 1.0, well that would
>> work. *** NOTE  that the TinkerGraph modern graph is not 1.0 normalized. ***
>> 
>> gremlin> g.withSack(1.0d).V(1).outE().sack(mult).by('weight').inV().sack()
>> ==>0.4
>> ==>0.5
>> ==>1.0
>> 
>> Hmm.. but if they are not 1.0 normalized well, that sucks. And its hard to
>> keep such data consistent with large graphs constantly adding and removing
>> edges…And what if you don't have weights!!?
>> 
>> For the last week I was going down this rabbit hole of "split operators"
>> for the withSack() source method. It was all nasty and convoluted. However,
>> last night and into this morning, I think we can solve this simply with a
>> "barrier consumer." Watch.
>> 
>> gremlin>
>> g.withSack(1.0).V(1).local(outE().barrier(normalizeSack)).inV().sack()
>> ==>0.3333333333333333
>> ==>0.3333333333333333
>> ==>0.3333333333333333
>> 
>> Cool. What about if you want to have it normalized by edge weights?
>> 
>> gremlin>
>> g.withSack(1.0).V(1).local(outE().sack(mult).by('weight').barrier(normalizeSack)).inV().sack()
>> ==>0.2105263157894737
>> ==>0.2631578947368421
>> ==>0.5263157894736842
>> 
>> Notice how local(barrier()) gathers all the traversers generated by outE()
>> for the current object and then allows you to do some mutation on them.
>> NormalizeSack simply recompute sacks based on the aggregate.
>> 
>> With this, the merge operator can easily just be: g.withSack(1.0,sum). And
>> then, we can support furcating and converging "energy" in the graph.
>> 
>> This will lead into some very very trippy (theoretical) work I've been
>> doing with constructive and destructive wave interference models with
>> Gremlin. We will be able to support "optical algorithms" -- refraction,
>> diffusion, interference, etc.
>> 
>> Any thoughts/concerns/recommendations?
>> 
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>>

Re: Diffusion Algorithms with Gremlin

Posted by Matt Frantz <ma...@gmail.com>.

Can you clarify a couple of details of this proposal?

You say "the merge operator can easily just be: g.withSack(1.0, sum)", but
that is the syntax for the split operator, so do you need a third parameter
there?

Also, is normalizeSack a user-defined traversal?  In this case, it would be
doing math on the floating point sack values?

One of the interesting (and relevant for my application) possibilities is
to direct the priority of the traverser execution based on what you might
call "propagation energy".  That is, with each split or merge (or based on
other "amplification" criteria), the traversers lose momentum, and the
limited compute resources are applied to the traversers with the highest
momentum.  I suppose this is a vendor-specific feature, but would there be
a story for directing the priority of execution of the set of traversers?

On Mon, Aug 31, 2015 at 11:06 AM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> In TinkerPop 3.0.0, at the last minute, I got rid of the sack() merge
> operator as I was lost in how we would do merge traverser sacks. Why did I
> get gun shy? Watch:
>
> gremlin> g = TinkerFactory.createModern().traversal()
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
> gremlin> g.withSack(1.0).V(1).outE().inV().sack()
> ==>1.0
> ==>1.0
> ==>1.0
>
> So v[1] has 3 outgoing edges and so it generates three traversers and the
> sack of the parent traverser is simply copied to the children. What if you
> want to preserve a total sack value in the traversal. That is, there can
> never be more than 1.0 "energy" in the graph. Well, if your edges have
> weights (lets say) and they always are fraction of 1.0, well that would
> work. *** NOTE  that the TinkerGraph modern graph is not 1.0 normalized. ***
>
> gremlin> g.withSack(1.0d).V(1).outE().sack(mult).by('weight').inV().sack()
> ==>0.4
> ==>0.5
> ==>1.0
>
> Hmm.. but if they are not 1.0 normalized well, that sucks. And its hard to
> keep such data consistent with large graphs constantly adding and removing
> edges…And what if you don't have weights!!?
>
> For the last week I was going down this rabbit hole of "split operators"
> for the withSack() source method. It was all nasty and convoluted. However,
> last night and into this morning, I think we can solve this simply with a
> "barrier consumer." Watch.
>
> gremlin>
> g.withSack(1.0).V(1).local(outE().barrier(normalizeSack)).inV().sack()
> ==>0.3333333333333333
> ==>0.3333333333333333
> ==>0.3333333333333333
>
> Cool. What about if you want to have it normalized by edge weights?
>
> gremlin>
> g.withSack(1.0).V(1).local(outE().sack(mult).by('weight').barrier(normalizeSack)).inV().sack()
> ==>0.2105263157894737
> ==>0.2631578947368421
> ==>0.5263157894736842
>
> Notice how local(barrier()) gathers all the traversers generated by outE()
> for the current object and then allows you to do some mutation on them.
> NormalizeSack simply recompute sacks based on the aggregate.
>
> With this, the merge operator can easily just be: g.withSack(1.0,sum). And
> then, we can support furcating and converging "energy" in the graph.
>
> This will lead into some very very trippy (theoretical) work I've been
> doing with constructive and destructive wave interference models with
> Gremlin. We will be able to support "optical algorithms" -- refraction,
> diffusion, interference, etc.
>
> Any thoughts/concerns/recommendations?
>
> Marko.
>
> http://markorodriguez.com
>
>