You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by KK <di...@gmail.com> on 2009/05/25 07:28:30 UTC

Re: how to get the word before and the word after the matched Term?

Hi All.
I want to do the same thing with say a window of 10/15.
Can some one give me more details about how to do this i.e getting
neighbors[both sides] of size "window", if some examples are there please
point me to them/post in the mail.
Also I would like to know about the term query. Is it the case that the term
query has to be only single term , I mean can'nt we do the same thing where
the search query is not just a term but say a phrase[multiple terms]. Now I
want to extract neighbors for this matched phrase. I think this is the
generic scenario.
So as per the mail I have to make use of SpanQuery, TermVector and
TermVectorMapper for these purposes, right?
NB:I also want to add hit highlighting after fixing the neighbor problem.

Thanks,
KK.

On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll <gs...@apache.org>wrote:

> See
> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours (although
> I think you can do better than the code in the third reply by using a
> TermVectorMapper such that you can process the TermVector as it comes from
> disk.)
>
> Essentially, you need to use a combination of SpanQuery, TermVector and
> TermVectorMapper.
>
> HTH,
> Grant
>
> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>
>  Hi all,
>> I want to  get the word before and the word after  the matched Term.For
>> Example if i have the Text " The drug was freshly prepared at 4-hour
>> intervals . Eleven courses were administered to seven patients at this dose
>> level and no patient experienced nausea or vomiting" and the matched Term
>> for example "patient" i want to get the word level and the word
>> experienced("and" and "no" are stop words, therefore i d'ont want to get
>> them.).I have looked at the Class Termposition but in this Class i can only
>> get the position of the matched Term, how can i get the word before and
>> after it, any suggestion?.
>> Thank you in advance.
>> Kamal
>> --
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: how to get the word before and the word after the matched Term?

Posted by KK <di...@gmail.com>.
Thank you very much @ Grant.
 I used the whitespaceanalyzer and other highlighter methods provided for
all unicoded docs and its working fine. Thank you all.
 The book LIA2ndEdn helped me a lot specifically the examples in the
highlighting section.

Thanks,
KK.

On Tue, May 26, 2009 at 4:43 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On May 25, 2009, at 4:35 AM, KK wrote:
>
>  One more information I would like to add,
>> # I'm building index mostly for non-english texts/documents. and searching
>> is done using unicode utf-8 texts[its obivious, right?]
>>
>>
>
> Yes, searching should be fine.
>
>
>
>  Thanks
>> KK
>>
>> On Mon, May 25, 2009 at 10:58 AM, KK <di...@gmail.com> wrote:
>>
>>  Hi All.
>>> I want to do the same thing with say a window of 10/15.
>>> Can some one give me more details about how to do this i.e getting
>>> neighbors[both sides] of size "window", if some examples are there please
>>> point me to them/post in the mail.
>>> Also I would like to know about the term query. Is it the case that the
>>> term query has to be only single term , I mean can'nt we do the same
>>> thing
>>> where the search query is not just a term but say a phrase[multiple
>>> terms].
>>> Now I want to extract neighbors for this matched phrase. I think this is
>>> the
>>> generic scenario.
>>> So as per the mail I have to make use of SpanQuery, TermVector and
>>> TermVectorMapper for these purposes, right?
>>> NB:I also want to add hit highlighting after fixing the neighbor problem.
>>>
>>> Thanks,
>>> KK.
>>>
>>>
>>> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll <gsingers@apache.org
>>> >wrote:
>>>
>>>  See
>>>>
>>>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours
>>>>  (although
>>>> I think you can do better than the code in the third reply by using a
>>>> TermVectorMapper such that you can process the TermVector as it comes
>>>> from
>>>> disk.)
>>>>
>>>> Essentially, you need to use a combination of SpanQuery, TermVector and
>>>> TermVectorMapper.
>>>>
>>>> HTH,
>>>> Grant
>>>>
>>>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>>>
>>>> Hi all,
>>>>
>>>>> I want to  get the word before and the word after  the matched Term.For
>>>>> Example if i have the Text " The drug was freshly prepared at 4-hour
>>>>> intervals . Eleven courses were administered to seven patients at this
>>>>> dose
>>>>> level and no patient experienced nausea or vomiting" and the matched
>>>>> Term
>>>>> for example "patient" i want to get the word level and the word
>>>>> experienced("and" and "no" are stop words, therefore i d'ont want to
>>>>> get
>>>>> them.).I have looked at the Class Termposition but in this Class i can
>>>>> only
>>>>> get the position of the matched Term, how can i get the word before and
>>>>> after it, any suggestion?.
>>>>> Thank you in advance.
>>>>> Kamal
>>>>> --
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: how to get the word before and the word after the matched Term?

Posted by Grant Ingersoll <gs...@apache.org>.
On May 25, 2009, at 4:35 AM, KK wrote:

> One more information I would like to add,
> # I'm building index mostly for non-english texts/documents. and  
> searching
> is done using unicode utf-8 texts[its obivious, right?]
>


Yes, searching should be fine.


> Thanks
> KK
>
> On Mon, May 25, 2009 at 10:58 AM, KK <di...@gmail.com>  
> wrote:
>
>> Hi All.
>> I want to do the same thing with say a window of 10/15.
>> Can some one give me more details about how to do this i.e getting
>> neighbors[both sides] of size "window", if some examples are there  
>> please
>> point me to them/post in the mail.
>> Also I would like to know about the term query. Is it the case that  
>> the
>> term query has to be only single term , I mean can'nt we do the  
>> same thing
>> where the search query is not just a term but say a phrase[multiple  
>> terms].
>> Now I want to extract neighbors for this matched phrase. I think  
>> this is the
>> generic scenario.
>> So as per the mail I have to make use of SpanQuery, TermVector and
>> TermVectorMapper for these purposes, right?
>> NB:I also want to add hit highlighting after fixing the neighbor  
>> problem.
>>
>> Thanks,
>> KK.
>>
>>
>> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll  
>> <gs...@apache.org>wrote:
>>
>>> See
>>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours 
>>>  (although
>>> I think you can do better than the code in the third reply by  
>>> using a
>>> TermVectorMapper such that you can process the TermVector as it  
>>> comes from
>>> disk.)
>>>
>>> Essentially, you need to use a combination of SpanQuery,  
>>> TermVector and
>>> TermVectorMapper.
>>>
>>> HTH,
>>> Grant
>>>
>>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>>
>>> Hi all,
>>>> I want to  get the word before and the word after  the matched  
>>>> Term.For
>>>> Example if i have the Text " The drug was freshly prepared at 4- 
>>>> hour
>>>> intervals . Eleven courses were administered to seven patients at  
>>>> this dose
>>>> level and no patient experienced nausea or vomiting" and the  
>>>> matched Term
>>>> for example "patient" i want to get the word level and the word
>>>> experienced("and" and "no" are stop words, therefore i d'ont want  
>>>> to get
>>>> them.).I have looked at the Class Termposition but in this Class  
>>>> i can only
>>>> get the position of the matched Term, how can i get the word  
>>>> before and
>>>> after it, any suggestion?.
>>>> Thank you in advance.
>>>> Kamal
>>>> --
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to get the word before and the word after the matched Term?

Posted by KK <di...@gmail.com>.
One more information I would like to add,
# I'm building index mostly for non-english texts/documents. and searching
is done using unicode utf-8 texts[its obivious, right?]

Thanks
KK

On Mon, May 25, 2009 at 10:58 AM, KK <di...@gmail.com> wrote:

> Hi All.
> I want to do the same thing with say a window of 10/15.
> Can some one give me more details about how to do this i.e getting
> neighbors[both sides] of size "window", if some examples are there please
> point me to them/post in the mail.
> Also I would like to know about the term query. Is it the case that the
> term query has to be only single term , I mean can'nt we do the same thing
> where the search query is not just a term but say a phrase[multiple terms].
> Now I want to extract neighbors for this matched phrase. I think this is the
> generic scenario.
> So as per the mail I have to make use of SpanQuery, TermVector and
> TermVectorMapper for these purposes, right?
> NB:I also want to add hit highlighting after fixing the neighbor problem.
>
> Thanks,
> KK.
>
>
> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> See
>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours (although
>> I think you can do better than the code in the third reply by using a
>> TermVectorMapper such that you can process the TermVector as it comes from
>> disk.)
>>
>> Essentially, you need to use a combination of SpanQuery, TermVector and
>> TermVectorMapper.
>>
>> HTH,
>> Grant
>>
>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>
>>  Hi all,
>>> I want to  get the word before and the word after  the matched Term.For
>>> Example if i have the Text " The drug was freshly prepared at 4-hour
>>> intervals . Eleven courses were administered to seven patients at this dose
>>> level and no patient experienced nausea or vomiting" and the matched Term
>>> for example "patient" i want to get the word level and the word
>>> experienced("and" and "no" are stop words, therefore i d'ont want to get
>>> them.).I have looked at the Class Termposition but in this Class i can only
>>> get the position of the matched Term, how can i get the word before and
>>> after it, any suggestion?.
>>> Thank you in advance.
>>> Kamal
>>> --
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: how to get the word before and the word after the matched Term?

Posted by Grant Ingersoll <gs...@apache.org>.
On May 25, 2009, at 1:28 AM, KK wrote:

> Hi All.
> I want to do the same thing with say a window of 10/15.
> Can some one give me more details about how to do this i.e getting
> neighbors[both sides] of size "window", if some examples are there  
> please
> point me to them/post in the mail.
> Also I would like to know about the term query. Is it the case that  
> the term
> query has to be only single term , I mean can'nt we do the same  
> thing where
> the search query is not just a term but say a phrase[multiple  
> terms]. Now I
> want to extract neighbors for this matched phrase. I think this is the
> generic scenario.

Yes, see the Span*Query objects (SpanNear, SpanFirst, etc.)

>
> So as per the mail I have to make use of SpanQuery, TermVector and
> TermVectorMapper for these purposes, right?

That's how I've traditionally done it.

>
> NB:I also want to add hit highlighting after fixing the neighbor  
> problem.
>
> Thanks,
> KK.
>
> On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll  
> <gs...@apache.org>wrote:
>
>> See
>> http://www.lucidimagination.com/search/document/7fe40486bc935ce4/get_term_neighbours 
>>  (although
>> I think you can do better than the code in the third reply by using a
>> TermVectorMapper such that you can process the TermVector as it  
>> comes from
>> disk.)
>>
>> Essentially, you need to use a combination of SpanQuery, TermVector  
>> and
>> TermVectorMapper.
>>
>> HTH,
>> Grant
>>
>> On May 18, 2009, at 9:20 AM, Kamal Najib wrote:
>>
>> Hi all,
>>> I want to  get the word before and the word after  the matched  
>>> Term.For
>>> Example if i have the Text " The drug was freshly prepared at 4-hour
>>> intervals . Eleven courses were administered to seven patients at  
>>> this dose
>>> level and no patient experienced nausea or vomiting" and the  
>>> matched Term
>>> for example "patient" i want to get the word level and the word
>>> experienced("and" and "no" are stop words, therefore i d'ont want  
>>> to get
>>> them.).I have looked at the Class Termposition but in this Class i  
>>> can only
>>> get the position of the matched Term, how can i get the word  
>>> before and
>>> after it, any suggestion?.
>>> Thank you in advance.
>>> Kamal
>>> --
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org