You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Emmanuel Castro Santana <em...@gmail.com> on 2009/02/26 18:50:59 UTC

Is there a built in keyword report (Tag Cloud) feature on Solr ?

I am developing a Solr based search application and need to get a kind of a
keyword report for tag cloud generation. If there is anyone here who has
ever had that necessity and has somehow found the way through, I would
really appreciate some help. 
Thanks in advance
-- 
View this message in context: http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-feature-on-Solr---tp22229677p22229677.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Emmanuel Castro Santana <em...@gmail.com>.
Thanks for all the information, it is being really useful. 
I didn't know that there were different names for tag clouds, that is also
good to know !
I don't feel really comfortable about having search-cloud like information
on the index. We would like to have those concerns separated and for this
purpose I think the best way would be developing a Request handler or a
component to be used inside any Request Handler to store all the query
information for search-cloud generation. I have also taken a look at this
TermVectorComponent, don't know if it would help me in this issue, anyway it
may be useful sometime. 
Thanks



Aleksander M. Stensby wrote:
> 
> Sorry, that mail got stuck in my outbox. Anyways. On a side-note, i think  
> it is called a search-cloud when refering to top-searches, and a tag-cloud  
> when refering to top-occuring terms in the corpus, as Chris said.
> 
> Since you are only after creating a search-cloud, i think my answer is a  
> pretty straightforward and simple (and fast) approach to doing so.
> And as Chris mentions, if you want to create a tag cloud with those words  
> that are a.) occuring frequently in the corpus, or b.) more advanced,  
> those terms that are actually "important" to your corpus (score-based /  
> tf-idf etc.) you can simply use the TermsComponent. As the trunk version  
> of Solr introduces the TermVectorComponent, you can also retrieve  
> information for specific search-results etc.
> 
> Another thing you could do with your search-cloud is to for instance add a  
> date-dimension to the solr-index (where you store all the queries), and  
> then out of the box you get the possibility of creating  
> evolving-search-clouds! I.e., you can visualize how "what is being  
> searched for" changes over time! -> now thats a neat feature :) And best  
> of all - Solr gives you this for free with facets once you have those  
> queries indexed :)
> 
> Hope that helps!
> 
> Best regards,
>   Aleksander
> 
> 
> On Fri, 27 Feb 2009 08:12:19 +0100, Aleksander M. Stensby  
> <al...@integrasco.no> wrote:
> 
>> To do that, your best option is to do it "outside" of solr. I.e., when  
>> someone enters a query in your webapplication, you store the search in  
>> for instance a db (or even in a separate solr-index).
>> If you go with a solr-index for "queries", you can simply do facets on  
>> the queries and for instance a facet.limit=50 (which will give you the  
>> top 50 most frequently entered queries).
>>
>> - Aleksander
>>
>> On Thu, 26 Feb 2009 19:35:49 +0100, Emmanuel Castro Santana  
>> <em...@gmail.com> wrote:
>>
>>>
>>> Thanks the help
>>>
>>> "... do a *:* search and then make tag clouds from all of the facets  
>>> ..."
>>>
>>> I may have not made myself clear. When I say keyword report, I mean a  
>>> kind
>>> of a most popular tag cloud, showing in bigger sizes the most searched
>>> terms. Therefore I need information about how many times specific terms  
>>> have
>>> been searched and I can't see how I could accomplish that with this
>>> solution....
>>>
>>>
>>>
>>> Walter Underwood wrote:
>>>>
>>>> Oops, missed that you wanted it by facet. Never mind. --wunder
>>>>
>>>> On 2/26/09 9:57 AM, "Walter Underwood" <wu...@netflix.com> wrote:
>>>>
>>>>> That info is already available via Luke, right? --wunder
>>>>>
>>>>> On 2/26/09 9:55 AM, "Robert Douglass" <ro...@robshouse.net> wrote:
>>>>>
>>>>>> A solution that I'd considering implementing for Drupal's ApacheSolr
>>>>>> module is to do a *:* search and then make tag clouds from all of the
>>>>>> facets. Pretty easy to sort all the facet terms into bins based on  
>>>>>> the
>>>>>> number of documents they match, and then to translate bins to font
>>>>>> sizes. Tag clouds make a nice alternate representation of facet  
>>>>>> blocks.
>>>>>>
>>>>>> Robert Douglass
>>>>>>
>>>>>> The RobsHouse.net Newsletter:
>>>>>> http://robshouse.net/newsletter/robshousenet-newsletter
>>>>>> Follow me on Twitter: http://twitter.com/robertDouglass
>>>>>>
>>>>>> On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:
>>>>>>
>>>>>>>
>>>>>>> I am developing a Solr based search application and need to get a
>>>>>>> kind of a
>>>>>>> keyword report for tag cloud generation. If there is anyone here who
>>>>>>> has
>>>>>>> ever had that necessity and has somehow found the way through, I  
>>>>>>> would
>>>>>>> really appreciate some help.
>>>>>>> Thanks in advance
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>>
>>>> http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-fea>>>
>>>> t
>>>>>>> ure-on-Solr---tp22229677p22229677.html
>>>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
> 
> 
> 
> -- 
> Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
> 
> Please consider the environment before printing all or any of this e-mail
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-feature-on-Solr---tp22229677p22251335.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by "Aleksander M. Stensby" <al...@integrasco.no>.
Sorry, that mail got stuck in my outbox. Anyways. On a side-note, i think  
it is called a search-cloud when refering to top-searches, and a tag-cloud  
when refering to top-occuring terms in the corpus, as Chris said.

Since you are only after creating a search-cloud, i think my answer is a  
pretty straightforward and simple (and fast) approach to doing so.
And as Chris mentions, if you want to create a tag cloud with those words  
that are a.) occuring frequently in the corpus, or b.) more advanced,  
those terms that are actually "important" to your corpus (score-based /  
tf-idf etc.) you can simply use the TermsComponent. As the trunk version  
of Solr introduces the TermVectorComponent, you can also retrieve  
information for specific search-results etc.

Another thing you could do with your search-cloud is to for instance add a  
date-dimension to the solr-index (where you store all the queries), and  
then out of the box you get the possibility of creating  
evolving-search-clouds! I.e., you can visualize how "what is being  
searched for" changes over time! -> now thats a neat feature :) And best  
of all - Solr gives you this for free with facets once you have those  
queries indexed :)

Hope that helps!

Best regards,
  Aleksander


On Fri, 27 Feb 2009 08:12:19 +0100, Aleksander M. Stensby  
<al...@integrasco.no> wrote:

> To do that, your best option is to do it "outside" of solr. I.e., when  
> someone enters a query in your webapplication, you store the search in  
> for instance a db (or even in a separate solr-index).
> If you go with a solr-index for "queries", you can simply do facets on  
> the queries and for instance a facet.limit=50 (which will give you the  
> top 50 most frequently entered queries).
>
> - Aleksander
>
> On Thu, 26 Feb 2009 19:35:49 +0100, Emmanuel Castro Santana  
> <em...@gmail.com> wrote:
>
>>
>> Thanks the help
>>
>> "... do a *:* search and then make tag clouds from all of the facets  
>> ..."
>>
>> I may have not made myself clear. When I say keyword report, I mean a  
>> kind
>> of a most popular tag cloud, showing in bigger sizes the most searched
>> terms. Therefore I need information about how many times specific terms  
>> have
>> been searched and I can't see how I could accomplish that with this
>> solution....
>>
>>
>>
>> Walter Underwood wrote:
>>>
>>> Oops, missed that you wanted it by facet. Never mind. --wunder
>>>
>>> On 2/26/09 9:57 AM, "Walter Underwood" <wu...@netflix.com> wrote:
>>>
>>>> That info is already available via Luke, right? --wunder
>>>>
>>>> On 2/26/09 9:55 AM, "Robert Douglass" <ro...@robshouse.net> wrote:
>>>>
>>>>> A solution that I'd considering implementing for Drupal's ApacheSolr
>>>>> module is to do a *:* search and then make tag clouds from all of the
>>>>> facets. Pretty easy to sort all the facet terms into bins based on  
>>>>> the
>>>>> number of documents they match, and then to translate bins to font
>>>>> sizes. Tag clouds make a nice alternate representation of facet  
>>>>> blocks.
>>>>>
>>>>> Robert Douglass
>>>>>
>>>>> The RobsHouse.net Newsletter:
>>>>> http://robshouse.net/newsletter/robshousenet-newsletter
>>>>> Follow me on Twitter: http://twitter.com/robertDouglass
>>>>>
>>>>> On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:
>>>>>
>>>>>>
>>>>>> I am developing a Solr based search application and need to get a
>>>>>> kind of a
>>>>>> keyword report for tag cloud generation. If there is anyone here who
>>>>>> has
>>>>>> ever had that necessity and has somehow found the way through, I  
>>>>>> would
>>>>>> really appreciate some help.
>>>>>> Thanks in advance
>>>>>> --
>>>>>> View this message in context:
>>>>>>
>>> http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-fea>>>
>>> t
>>>>>> ure-on-Solr---tp22229677p22229677.html
>>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>



-- 
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Please consider the environment before printing all or any of this e-mail

Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by "Aleksander M. Stensby" <al...@integrasco.no>.
To do that, your best option is to do it "outside" of solr. I.e., when  
someone enters a query in your webapplication, you store the search in for  
instance a db (or even in a separate solr-index).
If you go with a solr-index for "queries", you can simply do facets on the  
queries and for instance a facet.limit=50 (which will give you the top 50  
most frequently entered queries).

- Aleksander

On Thu, 26 Feb 2009 19:35:49 +0100, Emmanuel Castro Santana  
<em...@gmail.com> wrote:

>
> Thanks the help
>
> "... do a *:* search and then make tag clouds from all of the facets ..."
>
> I may have not made myself clear. When I say keyword report, I mean a  
> kind
> of a most popular tag cloud, showing in bigger sizes the most searched
> terms. Therefore I need information about how many times specific terms  
> have
> been searched and I can't see how I could accomplish that with this
> solution....
>
>
>
> Walter Underwood wrote:
>>
>> Oops, missed that you wanted it by facet. Never mind. --wunder
>>
>> On 2/26/09 9:57 AM, "Walter Underwood" <wu...@netflix.com> wrote:
>>
>>> That info is already available via Luke, right? --wunder
>>>
>>> On 2/26/09 9:55 AM, "Robert Douglass" <ro...@robshouse.net> wrote:
>>>
>>>> A solution that I'd considering implementing for Drupal's ApacheSolr
>>>> module is to do a *:* search and then make tag clouds from all of the
>>>> facets. Pretty easy to sort all the facet terms into bins based on the
>>>> number of documents they match, and then to translate bins to font
>>>> sizes. Tag clouds make a nice alternate representation of facet  
>>>> blocks.
>>>>
>>>> Robert Douglass
>>>>
>>>> The RobsHouse.net Newsletter:
>>>> http://robshouse.net/newsletter/robshousenet-newsletter
>>>> Follow me on Twitter: http://twitter.com/robertDouglass
>>>>
>>>> On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:
>>>>
>>>>>
>>>>> I am developing a Solr based search application and need to get a
>>>>> kind of a
>>>>> keyword report for tag cloud generation. If there is anyone here who
>>>>> has
>>>>> ever had that necessity and has somehow found the way through, I  
>>>>> would
>>>>> really appreciate some help.
>>>>> Thanks in advance
>>>>> --
>>>>> View this message in context:
>>>>>
>> http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-fea>>>
>> t
>>>>> ure-on-Solr---tp22229677p22229677.html
>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>
>>>>
>>>
>>
>>
>>
>



-- 
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

Please consider the environment before printing all or any of this e-mail

Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Emmanuel Castro Santana <em...@gmail.com>.
Sorry for that. Most searched terms tag cloud is kind of common around here.

"Solr doesn't keep any record of the searches performed, so to build a tag 
cloud based on query popularity you would need to mine your logs."

Do you know if there is already a tool or a Solr plugin for that ?

Thanks


hossman wrote:
> 
> 
> : I may have not made myself clear. When I say keyword report, I mean a
> kind
> : of a most popular tag cloud, showing in bigger sizes the most searched
> : terms. Therefore I need information about how many times specific terms
> have
> : been searched and I can't see how I could accomplish that with this
> : solution.... 
> 
> you have to be more explicit about what you ask for.  I've never heard 
> anyone refer to a tag cloud as being based on how often a term is searched 
> for -- everyone i know uses the frequency of words in the corpus, 
> sometimes with a decay function to promote words mentioned in more recent 
> docs.
> 
> Solr doesn't keep any record of the searches performed, so to build a tag 
> cloud based on query popularity you would need to mine your logs.
> 
> if you want a tag cloud based on the frequency of words in your corpus, 
> the faceting approach mentioned would work -- but a simpler way to get 
> term counts for the whole index (*:*) would be the TermsComponent.  you 
> only really need the facet based solution if you want a cloud based on a 
> subset of documents, (ie: a cloud for all documents matching 
> category:computer)
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-feature-on-Solr---tp22229677p22236934.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Walter Underwood <wu...@netflix.com>.
If you want a tag cloud based on query freqency, start with your
HTTP log analysis tools. Most of those generate a list of top
queries and top words in queries.

wunder

On 2/26/09 2:54 PM, "Chris Hostetter" <ho...@fucit.org> wrote:

> 
> : I may have not made myself clear. When I say keyword report, I mean a kind
> : of a most popular tag cloud, showing in bigger sizes the most searched
> : terms. Therefore I need information about how many times specific terms have
> : been searched and I can't see how I could accomplish that with this
> : solution.... 
> 
> you have to be more explicit about what you ask for.  I've never heard
> anyone refer to a tag cloud as being based on how often a term is searched
> for -- everyone i know uses the frequency of words in the corpus,
> sometimes with a decay function to promote words mentioned in more recent
> docs.
> 
> Solr doesn't keep any record of the searches performed, so to build a tag
> cloud based on query popularity you would need to mine your logs.
> 
> if you want a tag cloud based on the frequency of words in your corpus,
> the faceting approach mentioned would work -- but a simpler way to get
> term counts for the whole index (*:*) would be the TermsComponent.  you
> only really need the facet based solution if you want a cloud based on a
> subset of documents, (ie: a cloud for all documents matching
> category:computer)
> 
> 
> 
> -Hoss
> 


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Chris Hostetter <ho...@fucit.org>.
: I may have not made myself clear. When I say keyword report, I mean a kind
: of a most popular tag cloud, showing in bigger sizes the most searched
: terms. Therefore I need information about how many times specific terms have
: been searched and I can't see how I could accomplish that with this
: solution.... 

you have to be more explicit about what you ask for.  I've never heard 
anyone refer to a tag cloud as being based on how often a term is searched 
for -- everyone i know uses the frequency of words in the corpus, 
sometimes with a decay function to promote words mentioned in more recent 
docs.

Solr doesn't keep any record of the searches performed, so to build a tag 
cloud based on query popularity you would need to mine your logs.

if you want a tag cloud based on the frequency of words in your corpus, 
the faceting approach mentioned would work -- but a simpler way to get 
term counts for the whole index (*:*) would be the TermsComponent.  you 
only really need the facet based solution if you want a cloud based on a 
subset of documents, (ie: a cloud for all documents matching 
category:computer)



-Hoss


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Emmanuel Castro Santana <em...@gmail.com>.
Thanks the help

"... do a *:* search and then make tag clouds from all of the facets ..."

I may have not made myself clear. When I say keyword report, I mean a kind
of a most popular tag cloud, showing in bigger sizes the most searched
terms. Therefore I need information about how many times specific terms have
been searched and I can't see how I could accomplish that with this
solution.... 



Walter Underwood wrote:
> 
> Oops, missed that you wanted it by facet. Never mind. --wunder
> 
> On 2/26/09 9:57 AM, "Walter Underwood" <wu...@netflix.com> wrote:
> 
>> That info is already available via Luke, right? --wunder
>> 
>> On 2/26/09 9:55 AM, "Robert Douglass" <ro...@robshouse.net> wrote:
>> 
>>> A solution that I'd considering implementing for Drupal's ApacheSolr
>>> module is to do a *:* search and then make tag clouds from all of the
>>> facets. Pretty easy to sort all the facet terms into bins based on the
>>> number of documents they match, and then to translate bins to font
>>> sizes. Tag clouds make a nice alternate representation of facet blocks.
>>> 
>>> Robert Douglass
>>> 
>>> The RobsHouse.net Newsletter:
>>> http://robshouse.net/newsletter/robshousenet-newsletter
>>> Follow me on Twitter: http://twitter.com/robertDouglass
>>> 
>>> On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:
>>> 
>>>> 
>>>> I am developing a Solr based search application and need to get a
>>>> kind of a
>>>> keyword report for tag cloud generation. If there is anyone here who
>>>> has
>>>> ever had that necessity and has somehow found the way through, I would
>>>> really appreciate some help.
>>>> Thanks in advance
>>>> -- 
>>>> View this message in context:
>>>> 
> http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-fea>>>
> t
>>>> ure-on-Solr---tp22229677p22229677.html
>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>> 
>>> 
>> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-feature-on-Solr---tp22229677p22230655.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Walter Underwood <wu...@netflix.com>.
Oops, missed that you wanted it by facet. Never mind. --wunder

On 2/26/09 9:57 AM, "Walter Underwood" <wu...@netflix.com> wrote:

> That info is already available via Luke, right? --wunder
> 
> On 2/26/09 9:55 AM, "Robert Douglass" <ro...@robshouse.net> wrote:
> 
>> A solution that I'd considering implementing for Drupal's ApacheSolr
>> module is to do a *:* search and then make tag clouds from all of the
>> facets. Pretty easy to sort all the facet terms into bins based on the
>> number of documents they match, and then to translate bins to font
>> sizes. Tag clouds make a nice alternate representation of facet blocks.
>> 
>> Robert Douglass
>> 
>> The RobsHouse.net Newsletter:
>> http://robshouse.net/newsletter/robshousenet-newsletter
>> Follow me on Twitter: http://twitter.com/robertDouglass
>> 
>> On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:
>> 
>>> 
>>> I am developing a Solr based search application and need to get a
>>> kind of a
>>> keyword report for tag cloud generation. If there is anyone here who
>>> has
>>> ever had that necessity and has somehow found the way through, I would
>>> really appreciate some help.
>>> Thanks in advance
>>> -- 
>>> View this message in context:
>>> 
http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-fea>>>
t
>>> ure-on-Solr---tp22229677p22229677.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>> 
>> 
> 


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Walter Underwood <wu...@netflix.com>.
That info is already available via Luke, right? --wunder

On 2/26/09 9:55 AM, "Robert Douglass" <ro...@robshouse.net> wrote:

> A solution that I'd considering implementing for Drupal's ApacheSolr
> module is to do a *:* search and then make tag clouds from all of the
> facets. Pretty easy to sort all the facet terms into bins based on the
> number of documents they match, and then to translate bins to font
> sizes. Tag clouds make a nice alternate representation of facet blocks.
> 
> Robert Douglass
> 
> The RobsHouse.net Newsletter:
> http://robshouse.net/newsletter/robshousenet-newsletter
> Follow me on Twitter: http://twitter.com/robertDouglass
> 
> On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:
> 
>> 
>> I am developing a Solr based search application and need to get a
>> kind of a
>> keyword report for tag cloud generation. If there is anyone here who
>> has
>> ever had that necessity and has somehow found the way through, I would
>> really appreciate some help.
>> Thanks in advance
>> -- 
>> View this message in context:
>> http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-feat
>> ure-on-Solr---tp22229677p22229677.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>> 
> 


Re: Is there a built in keyword report (Tag Cloud) feature on Solr ?

Posted by Robert Douglass <ro...@robshouse.net>.
A solution that I'd considering implementing for Drupal's ApacheSolr  
module is to do a *:* search and then make tag clouds from all of the  
facets. Pretty easy to sort all the facet terms into bins based on the  
number of documents they match, and then to translate bins to font  
sizes. Tag clouds make a nice alternate representation of facet blocks.

Robert Douglass

The RobsHouse.net Newsletter: http://robshouse.net/newsletter/robshousenet-newsletter
Follow me on Twitter: http://twitter.com/robertDouglass

On Feb 26, 2009, at 6:50 PM, Emmanuel Castro Santana wrote:

>
> I am developing a Solr based search application and need to get a  
> kind of a
> keyword report for tag cloud generation. If there is anyone here who  
> has
> ever had that necessity and has somehow found the way through, I would
> really appreciate some help.
> Thanks in advance
> -- 
> View this message in context: http://www.nabble.com/Is-there-a-built-in-keyword-report-%28Tag-Cloud%29-feature-on-Solr---tp22229677p22229677.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>