You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov> on 2016/04/19 22:48:28 UTC

Streaming with facets

So, can someone clarify how faceting works with streaming expressions?

I can see how document search can return documents as it finds them, using any particular ordering desired - just a parse tree of query operators with priority queues (or something more complicated) within each query operator, so you really get the best match as you go for as long as you continue.

For facet values, without knowing Solr's internals, my intuition is that Solr could stream unique facet values, but not counts of matching documents.

Even when I put on my user hat - I don't see how the Streaming API can return both facet values and documents, it looks like it is either documents or facet values as results.

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH


RE: Streaming with facets

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Thanks, Yonik, that makes great sense.

My understanding of "many parts of Solr can already stream" is that not all sets of SearchHandler parameters are equal.  One set of SearchHandler parameters can be best for classic <1 second web search, one set of SearchHandler parameters may be best for returning just analytic computed facets over 10k rows, or even more.

I'm understanding StreamHandler and its relation to JDBC completely now, and staying away from it for now because it doesn't fit my application.

-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com] 
Sent: Tuesday, April 19, 2016 5:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Streaming with facets

Part of the difficulty is that "stream" and "streaming" are rather overloaded terms.  Many parts of Solr can already stream, with varying degrees of how much state is aggregated / internally collected before "streaming" starts.

Faceting can be truly streamed *if* the sort order is by the bucket value ascending, since that is the order contained in the lucene index.  All of the rest of the bucket information can be computed on the fly as it is being sent out.  This is what the JSON Facet API does when method="stream".

We could extend the current facet streaming for other sorts... this would involve calculating & sorting the sort criteria first, and then streaming after that point (i.e. other metrics would be calculated on-the-fly as facet buckets are being streamed).

-Yonik


On Tue, Apr 19, 2016 at 4:48 PM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov> wrote:
> So, can someone clarify how faceting works with streaming expressions?
>
> I can see how document search can return documents as it finds them, using any particular ordering desired - just a parse tree of query operators with priority queues (or something more complicated) within each query operator, so you really get the best match as you go for as long as you continue.
>
> For facet values, without knowing Solr's internals, my intuition is that Solr could stream unique facet values, but not counts of matching documents.
>
> Even when I put on my user hat - I don't see how the Streaming API can return both facet values and documents, it looks like it is either documents or facet values as results.
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>

Re: Streaming with facets

Posted by Yonik Seeley <ys...@gmail.com>.
Part of the difficulty is that "stream" and "streaming" are rather
overloaded terms.  Many parts of Solr can already stream, with varying
degrees of how much state is aggregated / internally collected before
"streaming" starts.

Faceting can be truly streamed *if* the sort order is by the bucket
value ascending, since that is the order contained in the lucene
index.  All of the rest of the bucket information can be computed on
the fly as it is being sent out.  This is what the JSON Facet API does
when method="stream".

We could extend the current facet streaming for other sorts... this
would involve calculating & sorting the sort criteria first, and then
streaming after that point (i.e. other metrics would be calculated
on-the-fly as facet buckets are being streamed).

-Yonik


On Tue, Apr 19, 2016 at 4:48 PM, Davis, Daniel (NIH/NLM) [C]
<da...@nih.gov> wrote:
> So, can someone clarify how faceting works with streaming expressions?
>
> I can see how document search can return documents as it finds them, using any particular ordering desired - just a parse tree of query operators with priority queues (or something more complicated) within each query operator, so you really get the best match as you go for as long as you continue.
>
> For facet values, without knowing Solr's internals, my intuition is that Solr could stream unique facet values, but not counts of matching documents.
>
> Even when I put on my user hat - I don't see how the Streaming API can return both facet values and documents, it looks like it is either documents or facet values as results.
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>