You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pratik Patel <pr...@semandex.net> on 2018/06/20 17:54:40 UTC

Applying streaming expression as a filter in graph traversal expression (gatherNodes)

We can limit the scope of graph traversal by applying some filter along the
way as follows.

gatherNodes(emails,
            walk="johndoe@apache.org->from",
            fq="body:(solr rocks)",
            gather="to")


Is it possible to replace "body:(solr rocks)" by some streaming expression
like "search" function for example? Like as follows..

gatherNodes(emails,
            walk="johndoe@apache.org->from",
            fq="search(...)", // use streaming expression as filter
            gather="to")



In my case, it would improve performance significantly if one can do that.
Other approach I can think of is to save results of "search" streaming
expression in some variable in pipeline and then use it at multiple places
including "fq" clause of "gatherNodes". Is it possible to do something like
this?

Re: Applying streaming expression as a filter in graph traversal expression (gatherNodes)

Posted by Pratik Patel <pr...@semandex.net>.
Hi Joel,

Thanks for the reply!

I have indexed graph data in solr where an "event" can have one or more
"participants". Thus, it's a graph of "participants" connected to each
other via "events". Because participants are multiple, I am indexing the
graph as follows.


event------event_participant_child------participant

Now my end goal is this, I have a list of "events" and for that list I want
to plot a graph of "participants" by connecting them via events (which have
to be from the original list). I get this list of "events" from a search()
function which I use as my seed expression for gatherNodes().

I am doing a two hop graph traversal as follows.

having(
having(
gatherNodes(
collection1,
having(
   gatherNodes(
                                                        collection1,
search(.....),                             // gets list of events with each
node having "eventId"
walk=eventId->eventId,           // walk to event_participant_child
document which has both "eventId" and "participantId"
gather="participantId",
trackTraversal="true", scatter="leaves",
count(*)
),
gt(count(*),0)
),
walk=node->participantId,
gather="eventId",
                                        fq=(),
                          // limit traversal to original list of events by
using search() here??
trackTraversal="true", scatter="branches",
count(*)
),
eq(level,0)
),
gt(count(*),1)
)

I am able to get the graph I want from ancestors fields of nodes which are
at level 0. Essentially, these are the events from my original list. Using
"having()" function, I am able to limit the response so that it only
includes original events. But it would be a great improvement if I can also
limit the traversal so that only events from original list are visited at
second hop. That is why, I want to apply original search() function as a
filter in outer gatherNodes() function. I know it's a long shot but
considering the potential improvement in performance, I was curious. Please
let me know if you feel there is a better approach.


Thanks
- Pratik






On Thu, Jun 21, 2018 at 7:05 PM, Joel Bernstein <jo...@gmail.com> wrote:

> Currently the gatherNodes expression can only be filtered by a traditional
> filter query. I'm curious about the type of expression you are thinking of
> filtering by?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jun 20, 2018 at 1:54 PM, Pratik Patel <pr...@semandex.net> wrote:
>
> > We can limit the scope of graph traversal by applying some filter along
> the
> > way as follows.
> >
> > gatherNodes(emails,
> >             walk="johndoe@apache.org->from",
> >             fq="body:(solr rocks)",
> >             gather="to")
> >
> >
> > Is it possible to replace "body:(solr rocks)" by some streaming
> expression
> > like "search" function for example? Like as follows..
> >
> > gatherNodes(emails,
> >             walk="johndoe@apache.org->from",
> >             fq="search(...)", // use streaming expression as filter
> >             gather="to")
> >
> >
> >
> > In my case, it would improve performance significantly if one can do
> that.
> > Other approach I can think of is to save results of "search" streaming
> > expression in some variable in pipeline and then use it at multiple
> places
> > including "fq" clause of "gatherNodes". Is it possible to do something
> like
> > this?
> >
>

Re: Applying streaming expression as a filter in graph traversal expression (gatherNodes)

Posted by Joel Bernstein <jo...@gmail.com>.
Currently the gatherNodes expression can only be filtered by a traditional
filter query. I'm curious about the type of expression you are thinking of
filtering by?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 20, 2018 at 1:54 PM, Pratik Patel <pr...@semandex.net> wrote:

> We can limit the scope of graph traversal by applying some filter along the
> way as follows.
>
> gatherNodes(emails,
>             walk="johndoe@apache.org->from",
>             fq="body:(solr rocks)",
>             gather="to")
>
>
> Is it possible to replace "body:(solr rocks)" by some streaming expression
> like "search" function for example? Like as follows..
>
> gatherNodes(emails,
>             walk="johndoe@apache.org->from",
>             fq="search(...)", // use streaming expression as filter
>             gather="to")
>
>
>
> In my case, it would improve performance significantly if one can do that.
> Other approach I can think of is to save results of "search" streaming
> expression in some variable in pipeline and then use it at multiple places
> including "fq" clause of "gatherNodes". Is it possible to do something like
> this?
>