You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dennis Gove (JIRA)" <ji...@apache.org> on 2015/12/29 13:10:49 UTC

[jira] [Comment Edited] (SOLR-8176) Model distributed graph traversals with Streaming Expressions

    [ https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073843#comment-15073843 ] 

Dennis Gove edited comment on SOLR-8176 at 12/29/15 12:10 PM:
--------------------------------------------------------------

I've been thinking about this a little bit and one thing I keep coming back to is that there are different kinds of graph traversals and I think our model should take that into account. There are lots of types but I think the two major categories are node traversing graphs and edge traversing graphs. 

h3. Node Traversing Graphs
These are graphs where you have some set of root nodes and you want to find connected nodes with some set of criteria. For example, given a collection of geographic locations (city, county, state, country) with fields "id", "type", "parentId", "name" find all cities in NY. As a hiccup the data is not completely normalized and some cities have their county listed as their parent while some have their state listed as their parent. Ie, you do not know how many nodes are between any given city and any given state.
{code}
graph(
  geography,
  root(q="type=state AND name:ny", fl="id"),
  leaf(q="type=city", fl="id,parentId,name"),
  edge("id=parentId")
)
{code}
In this example you're starting with a set of nodes in the geography collection, all which have some relationship to each other. You select your starting (root) nodes as all states named "ny" (there could be more than one). You then define what constitutes an ending (leaf) node as all cities. And finally, you say that all edges where nodeA.id == nodeB.parentId should be followed.

This traversal can be implemented as a relatively simple iterative search following the form
{code}
frontier := search for all root nodes
leaves := empty list

while frontier is not empty
  frontierIds := list of ids of all nodes in frontier list
  leaves :append: search for all nodes whose parentId is in frontierIds and matches the leaf filter
  frontier := search for all nodes whose parentId is in frontierIds and does not match the leaf filter

{code}
In each iteration the leaves list can grow and the frontier list is replaced with the next set of nodes to consider. In the end you have a list of all leaf nodes which in some way connect to the original root nodes following the defined edge. Note that for simplicity I've left a couple of things out, including checking for already traversed nodes to avoid loops. Also, the leaf nodes are not added to the frontier but they can be. This would be useful in a situation where leaves are connected to leaves.

h3. Edge Traversal Graphs
These are graphs where you have some set of edges but the nodes themselves are relatively unimportant for traversal. For example, finding the shortest path between two nodes, or finding the minimum spanning tree for some set of nodes, or finding loops.


was (Author: dpgove):
I've been thinking about this a little bit and one thing I keep coming back to is that there are different kinds of graph traversals and I think our model should take that into account. There are lots of types but I think the two major categories are node traversing graphs and edge traversing graphs. 

h3. Node Traversing Graphs
These are graphs where you have some set of root nodes and you want to find connected nodes with some set of criteria. For example, given a collection of geographic locations (city, county, state, country) with fields "id", "type", "parentId", "name" find all cities in NY. As a hiccup the data is not completely normalized and some cities have their county listed as their parent while some have their state listed as their parent. Ie, you do not know how many nodes are between any given city and any given state.
{code}
graph(
  geography,
  root(q="type=state AND name:ny", fl="id"),
  leaf(q="type=city", fl="id,parentId,name"),
  edge("id=parentId")
)
{code}
In this example you're starting with a set of nodes in the geography collection, all which have some relationship to each other. You select your starting (root) nodes as all states named "ny" (there could be more than one). You then define what constitutes an ending (leaf) node as all cities. And finally, you say that all edges where nodeA.id == nodeB.parentId should be followed.

This traversal can be implemented as a relatively simple iterative search following the form
{code}
frontier := search for all root nodes
leaves := empty list

while frontier is not empty
  frontierIds := list of ids of all nodes in frontier list
  leaves :append: search for all nodes whose parentId is in frontierIds and matches the leaf filter
  frontier := search for all nodes whose parentId is in frontierIds and does not match the leaf filter

{code}
In each iteration the leaves list can grow and the frontier list is replaced with the next set of nodes to consider. In the end you have a list of all leaf nodes which in some way connect to the original root nodes following the defined edge. Note that for simplicity I've left a couple of things out, including checking for already traversed nodes to avoid loops. Also, the leaf nodes are not added to the frontier but they can be. This would be useful in a situation where leaves are connected to leaves.

> Model distributed graph traversals with Streaming Expressions
> -------------------------------------------------------------
>
>                 Key: SOLR-8176
>                 URL: https://issues.apache.org/jira/browse/SOLR-8176
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, SolrCloud, SolrJ
>    Affects Versions: Trunk
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>              Labels: Graph
>             Fix For: Trunk
>
>
> I think it would be useful to model a few *distributed graph traversal* use cases with Solr's *Streaming Expression* language. This ticket will explore different approaches with a goal of implementing two or three common graph traversal use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org