You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2016/10/25 21:57:38 UTC

Graph Traversal Question

Hi,

I'm playing around with the new Graph Traversal/GatherNodes capabilities in
Solr 6.  I've been indexing Yago facts (
http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
which give me triples of something like subject-relationship-object (United
States -> hasCapital -> Washington DC)

My documents look like:
subject: string
relationship: string
object: string

I can do a simple gatherNodes like
http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
walk="United_States->subject", gather="object") and get back the objects
that relate to the subject.  However, I don't see any way to capture what
the relationship is in the response.  IOW, the request above would just
return a node of "Washington DC", but it doesn't tell me the relationship
(i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
anyway to expand the "gather" or otherwise mark up the nodes returned with
additional field attributes or maybe get additional graph info back?

Thanks,
Grant

Re: Graph Traversal Question

Posted by Grant Ingersoll <gs...@apache.org>.
On Wed, Oct 26, 2016 at 10:46 AM Joel Bernstein <jo...@gmail.com> wrote:

> Grant, can you describe your use case? Currently we can filter on the
> relationship using a filter query. So I was wondering what use case would
> involve retrieving the relationship. Are you looking to discover what
> relationships are available? One of the assumptions I made was that users
> would know what relationships they wanted to traverse.
>
>
Some of this is admittedly a thought experiment of what's possible, but I
think when dealing w/ graph operations it's pretty natural to use edge
attributes as part of your calculation.  The most obvious use case of that
is a weighted graph where the edge attribute is a numerical weight (e.g. in
Yonik's example: sort/rank by rating).  For me, I'm exploring how to use KB
data (Yago, which is basically RDF triples) as part of relevance and to
answer questions.  These are commonly done in a triple store (RDF engine),
but w/ this graph stuff in Solr, I think it could be possible to do in Solr
(and quite simply at that), which significantly simplifies the overall
system.


>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Oct 26, 2016 at 9:39 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
>
> > The other way to think about is: I want to put labels on the edges.  In
> my
> > case, the label is the relationship, in your case, the label is the
> rating
> > or author.
> >
> > On Wed, Oct 26, 2016 at 7:26 AM Yonik Seeley <ys...@gmail.com> wrote:
> >
> > > On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll <gs...@apache.org>
> > > wrote:
> > > > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley <ys...@gmail.com>
> > wrote:
> > > >
> > > > In your example below it would be akin to injecting the rating onto
> > those
> > > > responses as well, not just in the 'fq'.
> > >
> > > Gotcha... Yeah, I remember wondering how to do that myself.
> > >
> > > -Yonik
> > >
> >
>

Re: Graph Traversal Question

Posted by Joel Bernstein <jo...@gmail.com>.
Grant, can you describe your use case? Currently we can filter on the
relationship using a filter query. So I was wondering what use case would
involve retrieving the relationship. Are you looking to discover what
relationships are available? One of the assumptions I made was that users
would know what relationships they wanted to traverse.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Oct 26, 2016 at 9:39 AM, Grant Ingersoll <gs...@apache.org>
wrote:

> The other way to think about is: I want to put labels on the edges.  In my
> case, the label is the relationship, in your case, the label is the rating
> or author.
>
> On Wed, Oct 26, 2016 at 7:26 AM Yonik Seeley <ys...@gmail.com> wrote:
>
> > On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> > > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley <ys...@gmail.com>
> wrote:
> > >
> > > In your example below it would be akin to injecting the rating onto
> those
> > > responses as well, not just in the 'fq'.
> >
> > Gotcha... Yeah, I remember wondering how to do that myself.
> >
> > -Yonik
> >
>

Re: Graph Traversal Question

Posted by Grant Ingersoll <gs...@apache.org>.
The other way to think about is: I want to put labels on the edges.  In my
case, the label is the relationship, in your case, the label is the rating
or author.

On Wed, Oct 26, 2016 at 7:26 AM Yonik Seeley <ys...@gmail.com> wrote:

> On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
> > On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley <ys...@gmail.com> wrote:
> >
> > In your example below it would be akin to injecting the rating onto those
> > responses as well, not just in the 'fq'.
>
> Gotcha... Yeah, I remember wondering how to do that myself.
>
> -Yonik
>

Re: Graph Traversal Question

Posted by Yonik Seeley <ys...@gmail.com>.
On Wed, Oct 26, 2016 at 7:13 AM, Grant Ingersoll <gs...@apache.org> wrote:
> On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley <ys...@gmail.com> wrote:
>
> In your example below it would be akin to injecting the rating onto those
> responses as well, not just in the 'fq'.

Gotcha... Yeah, I remember wondering how to do that myself.

-Yonik

Re: Graph Traversal Question

Posted by Grant Ingersoll <gs...@apache.org>.
On Tue, Oct 25, 2016 at 6:26 PM Yonik Seeley <ys...@gmail.com> wrote:

> You can get the nodes that to came from by adding trackTraversal=true
>

Yeah, I've tried that.  It's not quite what I want.  That just gets me the
"subject".

What I'm trying to do is more akin to what a triple store does.

I _can_ do things like filter on the relationship, which is a good start,
but I want the relationship and the object together so that I can do
downstream work on it.

In your example below it would be akin to injecting the rating onto those
responses as well, not just in the 'fq'.


>
> A cut'n'paste example from my Lucene/Solr Revolution slides:
>
> curl $URL -d 'expr=gatherNodes(reviews,
>    search(reviews, q="user_s:Yonik AND rating_i:5",
>           fl="book_s,user_s,rating_i",sort="user_s asc"),
>    walk="book_s->book_s",
>    gather="user_s",
>    fq="rating_i:[4 TO *] -user_s:Yonik",
>    trackTraversal=true )'
>
> {"result-set":{"docs":[
>
> {"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],"level":1},
>
> {"node":"Maria","collection":"reviews","field":"user_s","ancestors":["book2"],"level":1},
> {"EOF":true,"RESPONSE_TIME":22}]}}
>
> -Yonik
>
>
> On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
> > Hi,
> >
> > I'm playing around with the new Graph Traversal/GatherNodes capabilities
> in
> > Solr 6.  I've been indexing Yago facts (
> >
> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/
> )
> > which give me triples of something like subject-relationship-object
> (United
> > States -> hasCapital -> Washington DC)
> >
> > My documents look like:
> > subject: string
> > relationship: string
> > object: string
> >
> > I can do a simple gatherNodes like
> > http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> > walk="United_States->subject", gather="object") and get back the objects
> > that relate to the subject.  However, I don't see any way to capture what
> > the relationship is in the response.  IOW, the request above would just
> > return a node of "Washington DC", but it doesn't tell me the relationship
> > (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> > anyway to expand the "gather" or otherwise mark up the nodes returned
> with
> > additional field attributes or maybe get additional graph info back?
> >
> > Thanks,
> > Grant
>

Re: Graph Traversal Question

Posted by Grant Ingersoll <gs...@apache.org>.
On Tue, Oct 25, 2016 at 6:46 PM Joel Bernstein <jo...@gmail.com> wrote:

> Because the edges are unique on the subject->object there isn't currently a
> way to capture the relationship. Aggregations can be rolled up on numeric
> fields and as Yonik mentioned you can track the ancestor.
>
> It would be fairly easy to track the relationship by adding a relationship
> array that would correspond with the ancestors array for example:
>
> {"result-set":{"docs":[
>
> {"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],
> "relationships":["author"],   "level":1},
> {"node":"Maria","collection":"reviews","field":"user_s","
> ancestors":["book2"], "relationships":["author"], "level":1},
> {"EOF":true,"RESPONSE_TIME":22}]}}
>

Right, that is what I am after!


>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Oct 25, 2016 at 6:26 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
> > You can get the nodes that to came from by adding trackTraversal=true
> >
> > A cut'n'paste example from my Lucene/Solr Revolution slides:
> >
> > curl $URL -d 'expr=gatherNodes(reviews,
> >    search(reviews, q="user_s:Yonik AND rating_i:5",
> >           fl="book_s,user_s,rating_i",sort="user_s asc"),
> >    walk="book_s->book_s",
> >    gather="user_s",
> >    fq="rating_i:[4 TO *] -user_s:Yonik",
> >    trackTraversal=true )'
> >
> > {"result-set":{"docs":[
> > {"node":"Haruka","collection":"reviews","field":"user_s","
> > ancestors":["book1"],"level":1},
> > {"node":"Maria","collection":"reviews","field":"user_s","
> > ancestors":["book2"],"level":1},
> > {"EOF":true,"RESPONSE_TIME":22}]}}
> >
> > -Yonik
> >
> >
> > On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> > > Hi,
> > >
> > > I'm playing around with the new Graph Traversal/GatherNodes
> capabilities
> > in
> > > Solr 6.  I've been indexing Yago facts (
> > > http://www.mpi-inf.mpg.de/departments/databases-and-
> > information-systems/research/yago-naga/yago/downloads/)
> > > which give me triples of something like subject-relationship-object
> > (United
> > > States -> hasCapital -> Washington DC)
> > >
> > > My documents look like:
> > > subject: string
> > > relationship: string
> > > object: string
> > >
> > > I can do a simple gatherNodes like
> > > http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> > > walk="United_States->subject", gather="object") and get back the
> objects
> > > that relate to the subject.  However, I don't see any way to capture
> what
> > > the relationship is in the response.  IOW, the request above would just
> > > return a node of "Washington DC", but it doesn't tell me the
> relationship
> > > (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> > > anyway to expand the "gather" or otherwise mark up the nodes returned
> > with
> > > additional field attributes or maybe get additional graph info back?
> > >
> > > Thanks,
> > > Grant
> >
>

Re: Graph Traversal Question

Posted by Joel Bernstein <jo...@gmail.com>.
Because the edges are unique on the subject->object there isn't currently a
way to capture the relationship. Aggregations can be rolled up on numeric
fields and as Yonik mentioned you can track the ancestor.

It would be fairly easy to track the relationship by adding a relationship
array that would correspond with the ancestors array for example:

{"result-set":{"docs":[
{"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],
"relationships":["author"],   "level":1},
{"node":"Maria","collection":"reviews","field":"user_s","
ancestors":["book2"], "relationships":["author"], "level":1},
{"EOF":true,"RESPONSE_TIME":22}]}}

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Oct 25, 2016 at 6:26 PM, Yonik Seeley <ys...@gmail.com> wrote:

> You can get the nodes that to came from by adding trackTraversal=true
>
> A cut'n'paste example from my Lucene/Solr Revolution slides:
>
> curl $URL -d 'expr=gatherNodes(reviews,
>    search(reviews, q="user_s:Yonik AND rating_i:5",
>           fl="book_s,user_s,rating_i",sort="user_s asc"),
>    walk="book_s->book_s",
>    gather="user_s",
>    fq="rating_i:[4 TO *] -user_s:Yonik",
>    trackTraversal=true )'
>
> {"result-set":{"docs":[
> {"node":"Haruka","collection":"reviews","field":"user_s","
> ancestors":["book1"],"level":1},
> {"node":"Maria","collection":"reviews","field":"user_s","
> ancestors":["book2"],"level":1},
> {"EOF":true,"RESPONSE_TIME":22}]}}
>
> -Yonik
>
>
> On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
> > Hi,
> >
> > I'm playing around with the new Graph Traversal/GatherNodes capabilities
> in
> > Solr 6.  I've been indexing Yago facts (
> > http://www.mpi-inf.mpg.de/departments/databases-and-
> information-systems/research/yago-naga/yago/downloads/)
> > which give me triples of something like subject-relationship-object
> (United
> > States -> hasCapital -> Washington DC)
> >
> > My documents look like:
> > subject: string
> > relationship: string
> > object: string
> >
> > I can do a simple gatherNodes like
> > http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> > walk="United_States->subject", gather="object") and get back the objects
> > that relate to the subject.  However, I don't see any way to capture what
> > the relationship is in the response.  IOW, the request above would just
> > return a node of "Washington DC", but it doesn't tell me the relationship
> > (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> > anyway to expand the "gather" or otherwise mark up the nodes returned
> with
> > additional field attributes or maybe get additional graph info back?
> >
> > Thanks,
> > Grant
>

Re: Graph Traversal Question

Posted by Yonik Seeley <ys...@gmail.com>.
You can get the nodes that to came from by adding trackTraversal=true

A cut'n'paste example from my Lucene/Solr Revolution slides:

curl $URL -d 'expr=gatherNodes(reviews,
   search(reviews, q="user_s:Yonik AND rating_i:5",
          fl="book_s,user_s,rating_i",sort="user_s asc"),
   walk="book_s->book_s",
   gather="user_s",
   fq="rating_i:[4 TO *] -user_s:Yonik",
   trackTraversal=true )'

{"result-set":{"docs":[
{"node":"Haruka","collection":"reviews","field":"user_s","ancestors":["book1"],"level":1},
{"node":"Maria","collection":"reviews","field":"user_s","ancestors":["book2"],"level":1},
{"EOF":true,"RESPONSE_TIME":22}]}}

-Yonik


On Tue, Oct 25, 2016 at 5:57 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Hi,
>
> I'm playing around with the new Graph Traversal/GatherNodes capabilities in
> Solr 6.  I've been indexing Yago facts (
> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/)
> which give me triples of something like subject-relationship-object (United
> States -> hasCapital -> Washington DC)
>
> My documents look like:
> subject: string
> relationship: string
> object: string
>
> I can do a simple gatherNodes like
> http://localhost:8983/solr/default/graph?expr=gatherNodes(default,
> walk="United_States->subject", gather="object") and get back the objects
> that relate to the subject.  However, I don't see any way to capture what
> the relationship is in the response.  IOW, the request above would just
> return a node of "Washington DC", but it doesn't tell me the relationship
> (i.e. I'd like to get Wash DC and hasCapital back somehow).  Is there
> anyway to expand the "gather" or otherwise mark up the nodes returned with
> additional field attributes or maybe get additional graph info back?
>
> Thanks,
> Grant