You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Avlesh Singh <av...@gmail.com> on 2009/09/01 04:10:47 UTC

Re: filtering facets

>
> when I will have more motivation, I will submit a patch to solr :-)
>
You want to add more here?- https://issues.apache.org/jira/browse/SOLR-1387

Cheers
Avlesh

On Tue, Sep 1, 2009 at 2:51 AM, Olivier H. Beauchesne <ol...@olihb.com>wrote:

> yeah, but then I would have to retrieve *a lot* of facets. I think for now
> i'll retrieve all the subdomains with facet.prefix and then merge those
> queries. Not ideal, but when I will have more motivation, I will submit a
> patch to solr :-)
>
> Michael a écrit :
>
>  You could post-process the response and remove urls that don't match your
>> domain pattern.
>>
>> On Mon, Aug 31, 2009 at 9:45 AM, Olivier H. Beauchesne <olivier@olihb.com
>> >wrote:
>>
>>
>>
>>> Hi Mike,
>>>
>>> No, my problem is that the field article_outlinks is multivalued thus it
>>> contains several urls not related to my search. I would like to facet
>>> only
>>> urls matching my query.
>>>
>>> For exemple(only on one document, but my search targets over 1M docs):
>>>
>>> Doc1:
>>> article_url:
>>> url1.com/1
>>> url2.com/2
>>> url1.com/1
>>> url1.com/3
>>>
>>> And my query is: article_url:url1.com* and I facet by article_url and I
>>> want it to give me:
>>> url1.com/1 (2)
>>> url1.com/3 (1)
>>>
>>> But right now, because url2.com/2 is contained in a multivalued field
>>> with
>>> the matching urls, I get this:
>>> url1.com/1 (2)
>>> url1.com/3 (1)
>>> url2.com/2 (1)
>>>
>>> I can use facet.prefix to filter, but it's not very flexible if my url
>>> contains a subdomain as facet.prefix doesn't support wildcards.
>>>
>>> Thank you,
>>>
>>> Olivier
>>>
>>> Mike Topper a écrit :
>>>
>>>  Hi Olivier,
>>>
>>>
>>>> are the facet counts on the urls you dont want 0?
>>>>
>>>> if so you can use facet.mincount to only return results greater than 0.
>>>>
>>>> -Mike
>>>>
>>>> Olivier H. Beauchesne wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Long time lurker, first time poster.
>>>>>
>>>>> I have a multi-valued field, let's call it article_outlinks containing
>>>>> all outgoing urls from a document. I want to get all matching urls
>>>>> sorted by counts.
>>>>>
>>>>> For exemple, I want to get all outgoing wikipedia url in my documents
>>>>> sorted by counts.
>>>>>
>>>>> So I execute a query like this:
>>>>> q=article_outlinks:http*wikipedia.org*  and I facet on
>>>>> article_outlinks
>>>>>
>>>>> But I get facets containing the other urls in the documents. I can get
>>>>> something close by using facet.prefix=http://en.wikipedia.org but I
>>>>> want to include other subdomains on wikipedia (ex: fr.wikipedia.org).
>>>>>
>>>>> Is there a way to do a search and getting facets only matching my
>>>>> query?
>>>>>
>>>>> I know facet.prefix isn't a query, but is there a way to get that
>>>>> behavior?
>>>>>
>>>>> Is it easy to extend solr to do something like that?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Olivier
>>>>>
>>>>> Sorry for my english.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>