You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Claudia Grieco <gr...@crmpa.unisa.it> on 2010/11/25 12:48:18 UTC

Retrieve found keywords from document

Hi guys,

I have this problem:

I'm using Lucene to create a search engine on people profiles.

I have a set of hobbies (let's say {"reading" , "singing"} for example)  and
I want to find people who have at least one of these hobbies AND which of
these hobbies they have.

Currently I search for each one of these hobbies (ex, one search for
reading, one search for singing) but since the list of hobbies is very long
(200+) I'd like to do the following:

 

1)Do ONE search that finds all the documents who have at least an hobby in
the text ( this is easily accomplished using BooleanQuery) 

2)For each document, retrieve the keywords found.

 

Do you have any ideas on how to do n# 2?

Thank you 

Claudia


R: Retrieve found keywords from document

Posted by Claudia Grieco <gr...@crmpa.unisa.it>.
Thanks a lot.
I used the lucene analyzer to parse the profile and everything works :)

-----Messaggio originale-----
Da: Ian Lea [mailto:ian.lea@gmail.com] 
Inviato: giovedì 25 novembre 2010 14.52
A: java-user@lucene.apache.org
Oggetto: Re: Retrieve found keywords from document

You could parse the output from the lucene analyzer that you are using
to get hold of a list of terms and pick the ones that are hobbies.  Or
do it outside lucene using whatever string parsing technique you like.

Or take a look at the recent thread on this list on a similar topic:
"High frequency term for the searched query".


--
Ian.


On Thu, Nov 25, 2010 at 1:27 PM, Claudia Grieco <gr...@crmpa.unisa.it>
wrote:
> What I call "profile" is free text (extracted from a pdf) and not the
result
> of the user listing hobbies in a form
> So to store hobbies in a field called "hobbies" I have to extract hobbies
> from text first...is it possible to do it using Lucene?
>
> -----Messaggio originale-----
> Da: Ian Lea [mailto:ian.lea@gmail.com]
> Inviato: giovedì 25 novembre 2010 13.01
> A: java-user@lucene.apache.org
> Oggetto: Re: Retrieve found keywords from document
>
> Can't you just store the hobbies as standard stored fields
> (Field.Store.YES), or as a single field, call doc.get("hobbies") and
> do what you want with them?
>
> This sounds rather like faceting - if so you might want to consider
> using Solr.  http://wiki.apache.org/solr/SolrFacetingOverview
>
>
> --
> Ian.
>
> On Thu, Nov 25, 2010 at 11:48 AM, Claudia Grieco <gr...@crmpa.unisa.it>
> wrote:
>> Hi guys,
>>
>> I have this problem:
>>
>> I'm using Lucene to create a search engine on people profiles.
>>
>> I have a set of hobbies (let's say {"reading" , "singing"} for example)
>  and
>> I want to find people who have at least one of these hobbies AND which of
>> these hobbies they have.
>>
>> Currently I search for each one of these hobbies (ex, one search for
>> reading, one search for singing) but since the list of hobbies is very
> long
>> (200+) I'd like to do the following:
>>
>>
>>
>> 1)Do ONE search that finds all the documents who have at least an hobby
in
>> the text ( this is easily accomplished using BooleanQuery)
>>
>> 2)For each document, retrieve the keywords found.
>>
>>
>>
>> Do you have any ideas on how to do n# 2?
>>
>> Thank you
>>
>> Claudia
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Retrieve found keywords from document

Posted by Ian Lea <ia...@gmail.com>.
You could parse the output from the lucene analyzer that you are using
to get hold of a list of terms and pick the ones that are hobbies.  Or
do it outside lucene using whatever string parsing technique you like.

Or take a look at the recent thread on this list on a similar topic:
"High frequency term for the searched query".


--
Ian.


On Thu, Nov 25, 2010 at 1:27 PM, Claudia Grieco <gr...@crmpa.unisa.it> wrote:
> What I call "profile" is free text (extracted from a pdf) and not the result
> of the user listing hobbies in a form
> So to store hobbies in a field called "hobbies" I have to extract hobbies
> from text first...is it possible to do it using Lucene?
>
> -----Messaggio originale-----
> Da: Ian Lea [mailto:ian.lea@gmail.com]
> Inviato: giovedì 25 novembre 2010 13.01
> A: java-user@lucene.apache.org
> Oggetto: Re: Retrieve found keywords from document
>
> Can't you just store the hobbies as standard stored fields
> (Field.Store.YES), or as a single field, call doc.get("hobbies") and
> do what you want with them?
>
> This sounds rather like faceting - if so you might want to consider
> using Solr.  http://wiki.apache.org/solr/SolrFacetingOverview
>
>
> --
> Ian.
>
> On Thu, Nov 25, 2010 at 11:48 AM, Claudia Grieco <gr...@crmpa.unisa.it>
> wrote:
>> Hi guys,
>>
>> I have this problem:
>>
>> I'm using Lucene to create a search engine on people profiles.
>>
>> I have a set of hobbies (let's say {"reading" , "singing"} for example)
>  and
>> I want to find people who have at least one of these hobbies AND which of
>> these hobbies they have.
>>
>> Currently I search for each one of these hobbies (ex, one search for
>> reading, one search for singing) but since the list of hobbies is very
> long
>> (200+) I'd like to do the following:
>>
>>
>>
>> 1)Do ONE search that finds all the documents who have at least an hobby in
>> the text ( this is easily accomplished using BooleanQuery)
>>
>> 2)For each document, retrieve the keywords found.
>>
>>
>>
>> Do you have any ideas on how to do n# 2?
>>
>> Thank you
>>
>> Claudia
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


R: Retrieve found keywords from document

Posted by Claudia Grieco <gr...@crmpa.unisa.it>.
What I call "profile" is free text (extracted from a pdf) and not the result
of the user listing hobbies in a form
So to store hobbies in a field called "hobbies" I have to extract hobbies
from text first...is it possible to do it using Lucene?

-----Messaggio originale-----
Da: Ian Lea [mailto:ian.lea@gmail.com] 
Inviato: giovedì 25 novembre 2010 13.01
A: java-user@lucene.apache.org
Oggetto: Re: Retrieve found keywords from document

Can't you just store the hobbies as standard stored fields
(Field.Store.YES), or as a single field, call doc.get("hobbies") and
do what you want with them?

This sounds rather like faceting - if so you might want to consider
using Solr.  http://wiki.apache.org/solr/SolrFacetingOverview


--
Ian.

On Thu, Nov 25, 2010 at 11:48 AM, Claudia Grieco <gr...@crmpa.unisa.it>
wrote:
> Hi guys,
>
> I have this problem:
>
> I'm using Lucene to create a search engine on people profiles.
>
> I have a set of hobbies (let's say {"reading" , "singing"} for example)
 and
> I want to find people who have at least one of these hobbies AND which of
> these hobbies they have.
>
> Currently I search for each one of these hobbies (ex, one search for
> reading, one search for singing) but since the list of hobbies is very
long
> (200+) I'd like to do the following:
>
>
>
> 1)Do ONE search that finds all the documents who have at least an hobby in
> the text ( this is easily accomplished using BooleanQuery)
>
> 2)For each document, retrieve the keywords found.
>
>
>
> Do you have any ideas on how to do n# 2?
>
> Thank you
>
> Claudia
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Retrieve found keywords from document

Posted by Ian Lea <ia...@gmail.com>.
Can't you just store the hobbies as standard stored fields
(Field.Store.YES), or as a single field, call doc.get("hobbies") and
do what you want with them?

This sounds rather like faceting - if so you might want to consider
using Solr.  http://wiki.apache.org/solr/SolrFacetingOverview


--
Ian.

On Thu, Nov 25, 2010 at 11:48 AM, Claudia Grieco <gr...@crmpa.unisa.it> wrote:
> Hi guys,
>
> I have this problem:
>
> I'm using Lucene to create a search engine on people profiles.
>
> I have a set of hobbies (let's say {"reading" , "singing"} for example)  and
> I want to find people who have at least one of these hobbies AND which of
> these hobbies they have.
>
> Currently I search for each one of these hobbies (ex, one search for
> reading, one search for singing) but since the list of hobbies is very long
> (200+) I'd like to do the following:
>
>
>
> 1)Do ONE search that finds all the documents who have at least an hobby in
> the text ( this is easily accomplished using BooleanQuery)
>
> 2)For each document, retrieve the keywords found.
>
>
>
> Do you have any ideas on how to do n# 2?
>
> Thank you
>
> Claudia
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org