You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrew Clegg <an...@gmail.com> on 2009/10/28 19:02:15 UTC

Faceting within one document

Hi,

If I give a query that matches a single document, and facet on a particular
field, I get a list of all the terms in that field which appear in that
document.

(I also get some with a count of zero, I don't really understand where they
come from... ?)

Is it possible with faceting, or a similar mechanism, to get a count of how
many times each term appears within that document?

This would be really useful for building a list of top keywords within a
long document, for summarization purposes. I can do it on the client side
but it'd be nice to know if there's a quicker way.

Thanks!

Andrew.

-- 
View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting within one document

Posted by Andrew Clegg <an...@gmail.com>.

Are you sure? I've *never* explicitly deleted a document, I only ever
rebuild the entire index with the data import handler's "full import with
cleaning" operation.


Lance Norskog-2 wrote:
> 
> 0-value facets are left behind by docs which you have deleted. If you
> optimize, there should be no 0-value facets.
> 
> On Wed, Oct 28, 2009 at 11:36 AM, Andrew Clegg <an...@gmail.com>
> wrote:
>>
>>
>> Isn't the TermVectorComponent more for one document at a time, and the
>> TermsComponent for the whole index?
>>
>> Actually -- having done some digging... What I'm really after is the most
>> informative terms in a given document, which should take into account
>> global
>> document frequency as well as term frequency in the document at hand. I
>> think I can use the MoreLikeThisHandler to do this, with a bit of
>> experimentation...
>>
>> Thanks for the facet mincount tip BTW.
>>
>> Andrew.
>>
>>
>> Avlesh Singh wrote:
>>>
>>> For facets -
>>> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
>>> For terms - http://wiki.apache.org/solr/TermsComponent
>>>
>>> Helps?
>>>
>>> Cheers
>>> Avlesh
>>>
>>> On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg
>>> <an...@gmail.com>wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> If I give a query that matches a single document, and facet on a
>>>> particular
>>>> field, I get a list of all the terms in that field which appear in that
>>>> document.
>>>>
>>>> (I also get some with a count of zero, I don't really understand where
>>>> they
>>>> come from... ?)
>>>>
>>>> Is it possible with faceting, or a similar mechanism, to get a count of
>>>> how
>>>> many times each term appears within that document?
>>>>
>>>> This would be really useful for building a list of top keywords within
>>>> a
>>>> long document, for summarization purposes. I can do it on the client
>>>> side
>>>> but it'd be nice to know if there's a quicker way.
>>>>
>>>> Thanks!
>>>>
>>>> Andrew.
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099847.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 
> 

-- 
View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26119536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting within one document

Posted by Lance Norskog <go...@gmail.com>.
0-value facets are left behind by docs which you have deleted. If you
optimize, there should be no 0-value facets.

On Wed, Oct 28, 2009 at 11:36 AM, Andrew Clegg <an...@gmail.com> wrote:
>
>
> Isn't the TermVectorComponent more for one document at a time, and the
> TermsComponent for the whole index?
>
> Actually -- having done some digging... What I'm really after is the most
> informative terms in a given document, which should take into account global
> document frequency as well as term frequency in the document at hand. I
> think I can use the MoreLikeThisHandler to do this, with a bit of
> experimentation...
>
> Thanks for the facet mincount tip BTW.
>
> Andrew.
>
>
> Avlesh Singh wrote:
>>
>> For facets -
>> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
>> For terms - http://wiki.apache.org/solr/TermsComponent
>>
>> Helps?
>>
>> Cheers
>> Avlesh
>>
>> On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg
>> <an...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> If I give a query that matches a single document, and facet on a
>>> particular
>>> field, I get a list of all the terms in that field which appear in that
>>> document.
>>>
>>> (I also get some with a count of zero, I don't really understand where
>>> they
>>> come from... ?)
>>>
>>> Is it possible with faceting, or a similar mechanism, to get a count of
>>> how
>>> many times each term appears within that document?
>>>
>>> This would be really useful for building a list of top keywords within a
>>> long document, for summarization purposes. I can do it on the client side
>>> but it'd be nice to know if there's a quicker way.
>>>
>>> Thanks!
>>>
>>> Andrew.
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099847.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Faceting within one document

Posted by Andrew Clegg <an...@gmail.com>.

Isn't the TermVectorComponent more for one document at a time, and the
TermsComponent for the whole index?

Actually -- having done some digging... What I'm really after is the most
informative terms in a given document, which should take into account global
document frequency as well as term frequency in the document at hand. I
think I can use the MoreLikeThisHandler to do this, with a bit of
experimentation...

Thanks for the facet mincount tip BTW.

Andrew.


Avlesh Singh wrote:
> 
> For facets -
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
> For terms - http://wiki.apache.org/solr/TermsComponent
> 
> Helps?
> 
> Cheers
> Avlesh
> 
> On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg
> <an...@gmail.com>wrote:
> 
>>
>> Hi,
>>
>> If I give a query that matches a single document, and facet on a
>> particular
>> field, I get a list of all the terms in that field which appear in that
>> document.
>>
>> (I also get some with a count of zero, I don't really understand where
>> they
>> come from... ?)
>>
>> Is it possible with faceting, or a similar mechanism, to get a count of
>> how
>> many times each term appears within that document?
>>
>> This would be really useful for building a list of top keywords within a
>> long document, for summarization purposes. I can do it on the client side
>> but it'd be nice to know if there's a quicker way.
>>
>> Thanks!
>>
>> Andrew.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting within one document

Posted by Avlesh Singh <av...@gmail.com>.
For facets -
http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount
For terms - http://wiki.apache.org/solr/TermsComponent

Helps?

Cheers
Avlesh

On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg <an...@gmail.com>wrote:

>
> Hi,
>
> If I give a query that matches a single document, and facet on a particular
> field, I get a list of all the terms in that field which appear in that
> document.
>
> (I also get some with a count of zero, I don't really understand where they
> come from... ?)
>
> Is it possible with faceting, or a similar mechanism, to get a count of how
> many times each term appears within that document?
>
> This would be really useful for building a list of top keywords within a
> long document, for summarization purposes. I can do it on the client side
> but it'd be nice to know if there's a quicker way.
>
> Thanks!
>
> Andrew.
>
> --
> View this message in context:
> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Faceting within one document

Posted by Lance Norskog <go...@gmail.com>.
Sorry, forgot that part.

On Thu, Oct 29, 2009 at 1:37 PM, Andrew Clegg <an...@gmail.com> wrote:
>
> Actually Avlesh pointed me at that, earlier in the thread. But thanks :-)
>
>
> Yonik Seeley-2 wrote:
>>
>> On Wed, Oct 28, 2009 at 2:02 PM, Andrew Clegg <an...@gmail.com>
>> wrote:
>>> If I give a query that matches a single document, and facet on a
>>> particular
>>> field, I get a list of all the terms in that field which appear in that
>>> document.
>>>
>>> (I also get some with a count of zero, I don't really understand where
>>> they
>>> come from... ?)
>>
>> By default, solr has a facet.mincount of zero, so it includes terms
>> that don't match your set of documents.
>> Try facet.mincount=1
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>> Is it possible with faceting, or a similar mechanism, to get a count of
>>> how
>>> many times each term appears within that document?
>>>
>>> This would be really useful for building a list of top keywords within a
>>> long document, for summarization purposes. I can do it on the client side
>>> but it'd be nice to know if there's a quicker way.
>>>
>>> Thanks!
>>>
>>> Andrew.
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26120291.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Faceting within one document

Posted by Andrew Clegg <an...@gmail.com>.
Actually Avlesh pointed me at that, earlier in the thread. But thanks :-)


Yonik Seeley-2 wrote:
> 
> On Wed, Oct 28, 2009 at 2:02 PM, Andrew Clegg <an...@gmail.com>
> wrote:
>> If I give a query that matches a single document, and facet on a
>> particular
>> field, I get a list of all the terms in that field which appear in that
>> document.
>>
>> (I also get some with a count of zero, I don't really understand where
>> they
>> come from... ?)
> 
> By default, solr has a facet.mincount of zero, so it includes terms
> that don't match your set of documents.
> Try facet.mincount=1
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
>> Is it possible with faceting, or a similar mechanism, to get a count of
>> how
>> many times each term appears within that document?
>>
>> This would be really useful for building a list of top keywords within a
>> long document, for summarization purposes. I can do it on the client side
>> but it'd be nice to know if there's a quicker way.
>>
>> Thanks!
>>
>> Andrew.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26120291.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting within one document

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Oct 28, 2009 at 2:02 PM, Andrew Clegg <an...@gmail.com> wrote:
> If I give a query that matches a single document, and facet on a particular
> field, I get a list of all the terms in that field which appear in that
> document.
>
> (I also get some with a count of zero, I don't really understand where they
> come from... ?)

By default, solr has a facet.mincount of zero, so it includes terms
that don't match your set of documents.
Try facet.mincount=1

-Yonik
http://www.lucidimagination.com


> Is it possible with faceting, or a similar mechanism, to get a count of how
> many times each term appears within that document?
>
> This would be really useful for building a list of top keywords within a
> long document, for summarization purposes. I can do it on the client side
> but it'd be nice to know if there's a quicker way.
>
> Thanks!
>
> Andrew.
>
> --
> View this message in context: http://www.nabble.com/Faceting-within-one-document-tp26099278p26099278.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>