You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dennis Gove (JIRA)" <ji...@apache.org> on 2016/04/08 21:39:25 UTC
[jira] [Commented] (SOLR-8925) Add gatherNodes Streaming Expression
to support breadth first traversals
[ https://issues.apache.org/jira/browse/SOLR-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232782#comment-15232782 ]
Dennis Gove commented on SOLR-8925:
-----------------------------------
I like this. Just a couple of questions.
1. What does this do with duplicate nodes? ie, overlapping friend networks. Will it prune those out, show the node twice, mark a node has having multiple sources?
2. When using the scatter parameter will the nodes be marked as which group they fall into? What if a node falls into multiple groups (kinda related to #1 above)?
3. Will a node include information about its source, ie - why it's included in a graph?
4. If gatherNodes is doing a 'join' between friends and articles I'd expect the tuple to be a join of the tuple found in articles and the tuple found in friends. But if "The inner gatherNodes() expression then emits the friend Tuples" I believe this is more of an intersect. Ie, give me tuples in friends which also appear in articles, using the author->user equalitor.
> Add gatherNodes Streaming Expression to support breadth first traversals
> ------------------------------------------------------------------------
>
> Key: SOLR-8925
> URL: https://issues.apache.org/jira/browse/SOLR-8925
> Project: Solr
> Issue Type: New Feature
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Fix For: 6.1
>
>
> The gatherNodes Streaming Expression is a flexible general purpose breadth first graph traversal. It uses the same parallel join under the covers as (SOLR-8888) but is much more generalized and can be used for a wide range of use cases.
> Sample syntax:
> {code}
> gatherNodes(
> friends,
> gatherNodes(
> friends,
> search(articles, q=“body:(query 1)”, fl=“author”),
> walk ="author->user”,
> gather="friend"),
> walk=“friend-> user”,
> gather="friend",
> scatter=“roots, branches, leaves”
> )
> {code}
> The expression above is evaluated as follows:
> 1) The inner search() expression is evaluated on the *articles* collection, emitting a Stream of Tuples with the author field populated.
> 2) The inner gatherNodes() expression reads the Tuples form the search() stream and traverses to the *friends* collection by performing a distributed join between articles.author and friends.user field. It gathers the value from the *friend* field during the join.
> 3) The inner gatherNodes() expression then emits the *friend* Tuples. By default the gatherNodes function emits only the leaves which in this case are the *friend* tuples.
> 4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses again in the "friends" collection, this time performing the join between *friend* Tuples emitted in step 3. This collects the friend of friends.
> 5) The outer gatherNodes() expression emits the entire graph that was collected. This is controlled by the "scatter" parameter. In the example the *root* nodes are the authors, the *branches* are the author's friends and the *leaves* are the friend of friends.
> This traversal is fully distributed and cross collection.
> Like all streaming expressions the gather nodes expression can be combined with other streaming expressions. For example the following expression uses a hashJoin to intersect the network of friends rooted to authors found with different queries:
> {code}
> hashInnerJoin(
> gatherNodes(friends,
> gatherNodes(friends
> search(articles, q=“body:(queryA)”, fl=“author”),
> walk ="author->user”,
> gather="friend"),
> walk=“friend -> user”,
> gather="friend",
> scatter=“branches, leaves”),
> gatherNodes(friends,
> gatherNodes(friends
> search(articles, q=“body:(queryB)”, fl=“author”),
> walk ="author->user”,
> gather="friend"),
> walk=“friend -> user”,
> gather="friend",
> scatter=“branches, leaves”),
> on=“friend”
> )
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org