You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Kamal Najib <ka...@mytum.de> on 2009/05/18 15:20:40 UTC

how to get the word before and the word after the matched Term?

Hi all,
I want to  get the word before and the word after  the matched Term.For Example if i have the Text " The drug was freshly prepared at 4-hour intervals . Eleven courses were administered to seven patients at this dose level and no patient experienced nausea or vomiting" and the matched Term for example "patient" i want to get the word level and the word experienced("and" and "no" are stop words, therefore i d'ont want to get them.).I have looked at the Class Termposition but in this Class i can only get the position of the matched Term, how can i get the word before and after it, any suggestion?. 
Thank you in advance.
Kamal
--

RE: how to get the word before and the word after the matched Term?

Posted by Aditya <ad...@gmail.com>.

Continuing to what Matt said, answer to your question: there is no direct
library to give this.
Also try sandbox based "highlight" related code base.

Best Regards,
Aditya


-----Original Message-----
From: Matthew Hall [mailto:mhall@informatics.jax.org] 
Sent: Monday, May 18, 2009 6:58 PM
To: java-user@lucene.apache.org
Subject: Re: how to get the word before and the word after the matched Term?

Well, when you get the Document object, you have access to the fields in 
that document, including the text that was searched against.

You could simply retrieve this string, and then use simple java String 
manipulation to get what you want.

Matt

Kamal Najib wrote:
> Hi all,
> I want to  get the word before and the word after  the matched Term.For
Example if i have the Text " The drug was freshly prepared at 4-hour
intervals . Eleven courses were administered to seven patients at this dose
level and no patient experienced nausea or vomiting" and the matched Term
for example "patient" i want to get the word level and the word
experienced("and" and "no" are stop words, therefore i d'ont want to get
them.).I have looked at the Class Termposition but in this Class i can only
get the position of the matched Term, how can i get the word before and
after it, any suggestion?. 
> Thank you in advance.
> Kamal
>   
> ------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: how to get the word before and the word after the matched Term?

Posted by Matthew Hall <mh...@informatics.jax.org>.

Well, when you get the Document object, you have access to the fields in 
that document, including the text that was searched against.

You could simply retrieve this string, and then use simple java String 
manipulation to get what you want.

Matt

Kamal Najib wrote:
> Hi all,
> I want to  get the word before and the word after  the matched Term.For Example if i have the Text " The drug was freshly prepared at 4-hour intervals . Eleven courses were administered to seven patients at this dose level and no patient experienced nausea or vomiting" and the matched Term for example "patient" i want to get the word level and the word experienced("and" and "no" are stop words, therefore i d'ont want to get them.).I have looked at the Class Termposition but in this Class i can only get the position of the matched Term, how can i get the word before and after it, any suggestion?. 
> Thank you in advance.
> Kamal
>   
> ------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: how to get the word before and the word after the matched Term?

Posted by KK <di...@gmail.com>.

Thank you very much @ Grant.
 I used the whitespaceanalyzer and other highlighter methods provided for
all unicoded docs and its working fine. Thank you all.
 The book LIA2ndEdn helped me a lot specifically the examples in the
highlighting section.

Thanks,
KK.

On Tue, May 26, 2009 at 4:43 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On May 25, 2009, at 4:35 AM, KK wrote:
>
>  One more information I would like to add,
>> # I'm building index mostly for non-english texts/documents. and searching
>> is done using unicode utf-8 texts[its obivious, right?]
>>
>>
>
> Yes, searching should be fine.
>
>
>
>  Thanks
>> KK
>>
>> On Mon, May 25, 2009 at 10:58 AM, KK <di...@gmail.com> wrote:
>>
>>  Hi All.
>>> I want to do the same thing with say a window of 10/15.
>>> Can some one give me more details about how to do this i.e getting
>>> neighbors[both sides] of size "window", if some examples are there please
>>> point me to them/post in the mail.
>>> Also I would like to know about the term query. Is it the case that the
>>> term query has to be only single term , I mean can'nt we do the same
>>> thing
>>> where the search query is not just a term but say a phrase[multiple
>>> terms].
>>> Now I want to extract neighbors for this matched phrase. I think this is
>>> the
>>> generic scenario.
>>> So as per the mail I have to make use of SpanQuery, TermVector and
>>> TermVectorMapper for these purposes, right?
>>> NB:I also want to add hit highlighting after fixing the neighbor problem.
>>>
>>> Thanks,
>>> KK.
>>>
>>>
>>> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll <gsingers@apache.org
>>> >wrote:
>>>
>>>  See
>>>>
>>>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours
>>>>  (although
>>>> I think you can do better than the code in the third reply by using a
>>>> TermVectorMapper such that you can process the TermVector as it comes
>>>> from
>>>> disk.)
>>>>
>>>> Essentially, you need to use a combination of SpanQuery, TermVector and
>>>> TermVectorMapper.
>>>>
>>>> HTH,
>>>> Grant
>>>>
>>>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>>>
>>>> Hi all,
>>>>
>>>>> I want to  get the word before and the word after  the matched Term.For
>>>>> Example if i have the Text " The drug was freshly prepared at 4-hour
>>>>> intervals . Eleven courses were administered to seven patients at this
>>>>> dose
>>>>> level and no patient experienced nausea or vomiting" and the matched
>>>>> Term
>>>>> for example "patient" i want to get the word level and the word
>>>>> experienced("and" and "no" are stop words, therefore i d'ont want to
>>>>> get
>>>>> them.).I have looked at the Class Termposition but in this Class i can
>>>>> only
>>>>> get the position of the matched Term, how can i get the word before and
>>>>> after it, any suggestion?.
>>>>> Thank you in advance.
>>>>> Kamal
>>>>> --
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: how to get the word before and the word after the matched Term?

Posted by Grant Ingersoll <gs...@apache.org>.

On May 25, 2009, at 4:35 AM, KK wrote:

> One more information I would like to add,
> # I'm building index mostly for non-english texts/documents. and  
> searching
> is done using unicode utf-8 texts[its obivious, right?]
>


Yes, searching should be fine.


> Thanks
> KK
>
> On Mon, May 25, 2009 at 10:58 AM, KK <di...@gmail.com>  
> wrote:
>
>> Hi All.
>> I want to do the same thing with say a window of 10/15.
>> Can some one give me more details about how to do this i.e getting
>> neighbors[both sides] of size "window", if some examples are there  
>> please
>> point me to them/post in the mail.
>> Also I would like to know about the term query. Is it the case that  
>> the
>> term query has to be only single term , I mean can'nt we do the  
>> same thing
>> where the search query is not just a term but say a phrase[multiple  
>> terms].
>> Now I want to extract neighbors for this matched phrase. I think  
>> this is the
>> generic scenario.
>> So as per the mail I have to make use of SpanQuery, TermVector and
>> TermVectorMapper for these purposes, right?
>> NB:I also want to add hit highlighting after fixing the neighbor  
>> problem.
>>
>> Thanks,
>> KK.
>>
>>
>> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll  
>> <gs...@apache.org>wrote:
>>
>>> See
>>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours 
>>>  (although
>>> I think you can do better than the code in the third reply by  
>>> using a
>>> TermVectorMapper such that you can process the TermVector as it  
>>> comes from
>>> disk.)
>>>
>>> Essentially, you need to use a combination of SpanQuery,  
>>> TermVector and
>>> TermVectorMapper.
>>>
>>> HTH,
>>> Grant
>>>
>>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>>
>>> Hi all,
>>>> I want to  get the word before and the word after  the matched  
>>>> Term.For
>>>> Example if i have the Text " The drug was freshly prepared at 4- 
>>>> hour
>>>> intervals . Eleven courses were administered to seven patients at  
>>>> this dose
>>>> level and no patient experienced nausea or vomiting" and the  
>>>> matched Term
>>>> for example "patient" i want to get the word level and the word
>>>> experienced("and" and "no" are stop words, therefore i d'ont want  
>>>> to get
>>>> them.).I have looked at the Class Termposition but in this Class  
>>>> i can only
>>>> get the position of the matched Term, how can i get the word  
>>>> before and
>>>> after it, any suggestion?.
>>>> Thank you in advance.
>>>> Kamal
>>>> --
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: how to get the word before and the word after the matched Term?

Posted by KK <di...@gmail.com>.

One more information I would like to add,
# I'm building index mostly for non-english texts/documents. and searching
is done using unicode utf-8 texts[its obivious, right?]

Thanks
KK

On Mon, May 25, 2009 at 10:58 AM, KK <di...@gmail.com> wrote:

> Hi All.
> I want to do the same thing with say a window of 10/15.
> Can some one give me more details about how to do this i.e getting
> neighbors[both sides] of size "window", if some examples are there please
> point me to them/post in the mail.
> Also I would like to know about the term query. Is it the case that the
> term query has to be only single term , I mean can'nt we do the same thing
> where the search query is not just a term but say a phrase[multiple terms].
> Now I want to extract neighbors for this matched phrase. I think this is the
> generic scenario.
> So as per the mail I have to make use of SpanQuery, TermVector and
> TermVectorMapper for these purposes, right?
> NB:I also want to add hit highlighting after fixing the neighbor problem.
>
> Thanks,
> KK.
>
>
> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> See
>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours (although
>> I think you can do better than the code in the third reply by using a
>> TermVectorMapper such that you can process the TermVector as it comes from
>> disk.)
>>
>> Essentially, you need to use a combination of SpanQuery, TermVector and
>> TermVectorMapper.
>>
>> HTH,
>> Grant
>>
>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>
>>  Hi all,
>>> I want to  get the word before and the word after  the matched Term.For
>>> Example if i have the Text " The drug was freshly prepared at 4-hour
>>> intervals . Eleven courses were administered to seven patients at this dose
>>> level and no patient experienced nausea or vomiting" and the matched Term
>>> for example "patient" i want to get the word level and the word
>>> experienced("and" and "no" are stop words, therefore i d'ont want to get
>>> them.).I have looked at the Class Termposition but in this Class i can only
>>> get the position of the matched Term, how can i get the word before and
>>> after it, any suggestion?.
>>> Thank you in advance.
>>> Kamal
>>> --
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: how to get the word before and the word after the matched Term?

Posted by Grant Ingersoll <gs...@apache.org>.

On May 25, 2009, at 1:28 AM, KK wrote:

> Hi All.
> I want to do the same thing with say a window of 10/15.
> Can some one give me more details about how to do this i.e getting
> neighbors[both sides] of size "window", if some examples are there  
> please
> point me to them/post in the mail.
> Also I would like to know about the term query. Is it the case that  
> the term
> query has to be only single term , I mean can'nt we do the same  
> thing where
> the search query is not just a term but say a phrase[multiple  
> terms]. Now I
> want to extract neighbors for this matched phrase. I think this is the
> generic scenario.

Yes, see the Span*Query objects (SpanNear, SpanFirst, etc.)

>
> So as per the mail I have to make use of SpanQuery, TermVector and
> TermVectorMapper for these purposes, right?

That's how I've traditionally done it.

>
> NB:I also want to add hit highlighting after fixing the neighbor  
> problem.
>
> Thanks,
> KK.
>
> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>> See
>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours 
>>  (although
>> I think you can do better than the code in the third reply by using a
>> TermVectorMapper such that you can process the TermVector as it  
>> comes from
>> disk.)
>>
>> Essentially, you need to use a combination of SpanQuery, TermVector  
>> and
>> TermVectorMapper.
>>
>> HTH,
>> Grant
>>
>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>
>> Hi all,
>>> I want to  get the word before and the word after  the matched  
>>> Term.For
>>> Example if i have the Text " The drug was freshly prepared at 4-hour
>>> intervals . Eleven courses were administered to seven patients at  
>>> this dose
>>> level and no patient experienced nausea or vomiting" and the  
>>> matched Term
>>> for example "patient" i want to get the word level and the word
>>> experienced("and" and "no" are stop words, therefore i d'ont want  
>>> to get
>>> them.).I have looked at the Class Termposition but in this Class i  
>>> can only
>>> get the position of the matched Term, how can i get the word  
>>> before and
>>> after it, any suggestion?.
>>> Thank you in advance.
>>> Kamal
>>> --
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: how to get the word before and the word after the matched Term?

Posted by KK <di...@gmail.com>.

Hi All.
I want to do the same thing with say a window of 10/15.
Can some one give me more details about how to do this i.e getting
neighbors[both sides] of size "window", if some examples are there please
point me to them/post in the mail.
Also I would like to know about the term query. Is it the case that the term
query has to be only single term , I mean can'nt we do the same thing where
the search query is not just a term but say a phrase[multiple terms]. Now I
want to extract neighbors for this matched phrase. I think this is the
generic scenario.
So as per the mail I have to make use of SpanQuery, TermVector and
TermVectorMapper for these purposes, right?
NB:I also want to add hit highlighting after fixing the neighbor problem.

Thanks,
KK.

On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll <gs...@apache.org>wrote:

> See
> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours (although
> I think you can do better than the code in the third reply by using a
> TermVectorMapper such that you can process the TermVector as it comes from
> disk.)
>
> Essentially, you need to use a combination of SpanQuery, TermVector and
> TermVectorMapper.
>
> HTH,
> Grant
>
> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>
>  Hi all,
>> I want to  get the word before and the word after  the matched Term.For
>> Example if i have the Text " The drug was freshly prepared at 4-hour
>> intervals . Eleven courses were administered to seven patients at this dose
>> level and no patient experienced nausea or vomiting" and the matched Term
>> for example "patient" i want to get the word level and the word
>> experienced("and" and "no" are stop words, therefore i d'ont want to get
>> them.).I have looked at the Class Termposition but in this Class i can only
>> get the position of the matched Term, how can i get the word before and
>> after it, any suggestion?.
>> Thank you in advance.
>> Kamal
>> --
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: how to get the word before and the word after the matched Term?

Posted by Grant Ingersoll <gs...@apache.org>.

See http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours 
  (although I think you can do better than the code in the third reply  
by using a TermVectorMapper such that you can process the TermVector  
as it comes from disk.)

Essentially, you need to use a combination of SpanQuery, TermVector  
and TermVectorMapper.

HTH,
Grant

On May 18, 2009, at 9:20 AM, Kamal Najib wrote:

> Hi all,
> I want to  get the word before and the word after  the matched  
> Term.For Example if i have the Text " The drug was freshly prepared  
> at 4-hour intervals . Eleven courses were administered to seven  
> patients at this dose level and no patient experienced nausea or  
> vomiting" and the matched Term for example "patient" i want to get  
> the word level and the word experienced("and" and "no" are stop  
> words, therefore i d'ont want to get them.).I have looked at the  
> Class Termposition but in this Class i can only get the position of  
> the matched Term, how can i get the word before and after it, any  
> suggestion?.
> Thank you in advance.
> Kamal
> -- 
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org