You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2005/08/08 06:06:26 UTC

Reply Split Search Word

Hi

Luceners

Apologies.....

As I have already replied,Using Analysis I have tried on all Analyzers
(including Standard Analyzer)
But not able to achive the required COMPLETS WORD Split.

My I/p String would be a lengthy one as below

String sKey = "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis Gospodnetic"  +
"   " +
              "\"" + "Erik Hatcher" + "\""  + "  " +	"Authors of " + "\"" +
"Lucene In Action" + "\"";

The required split of complete words should return

   1) "Dough Cutting"
   2) Otis Gospodnetic
   3) "Erik Hatcher"
   4) Authors of
   5) "Lucene In Action"

Plz Note :- Words with "\"" are complete split words....

I am shure some Analyzer code inside Lucene is handling this task.


som how can one achive this task..

with regards
Karthik

-----Original Message-----
From: Mordo, Aviran (EXP N-NANNATEK) [mailto:aviran.mordo@lmco.com]
Sent: Friday, August 05, 2005 7:58 PM
To: java-user@lucene.apache.org
Subject: RE: Split Search Word


The StandardAnalyzer should work just fine with it, It will break the
search string to 5 search terms.

HTH

Aviran
http://www.aviransplace.com

  _____

From: Karthik N S [mailto:karthik@controlnet.co.in]
Sent: Friday, August 05, 2005 1:57 AM
To: LUCENE
Subject: Split Search Word



Hi Luceners

Apologies.....

I  have along Search String as given below...



SearchWord =  "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
Gospodnetic"  +  "   " + "\"" + "Erik Hatcher" + "\""  + "  " +
                           "Authors of " + "\"" + "Lucene In Action"
+"\"";

And prior to searching the Index ,I need the Words to be Split.

SearchWord   =

   1)   "\"" + "Dough Cutting" + "\""
   2)   "Otis Gospodnetic"
   3)  "\"" + "Erik Hatcher" + "\""
   4)  "Authors of "
   5) "\"" +"Lucene In Action" +"\""

I am shure some Analyzer within Lucene is performin the task.
So some body please Tell me Howto

[ I already used Analysis/Paralysis code to check ,but no help ]




WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Reply Split Search Word

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 8, 2005, at 7:44 AM, Karthik N S wrote:
>    I  would like to reformat the Question  slightly ,
>    Words without double Quotes may also be present in the String.
>    Also I have to apply the STOP - Analyzer  to filter out common  
> English
> words appearing within.
>
> Do u mind giving me a bit of src hint for the same...
> [ I am googled out of ideas ]

Well, the Analyzers chapter in Lucene in Action will give you all the  
basics you need to write custom pieces of an analyzer :)

A custom Tokenizer is where you'll want to start with, perhaps,  
though there are other possibilities such as tokenizing quotes  
separately and then having a filter buffer when they are encountered  
and combine multiple tokens into one when within quotes.  The only  
hints I have would be examples from within Lucene's codebase itself.   
You'll need to maintain some state (whether within quotes or not),  
but otherwise it should be relatively straightforward.

To provide many more hints would require implementing it myself, I'm  
afraid.

     Erik


>
>
> with regards
> karthik
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Monday, August 08, 2005 4:23 PM
> To: java-user@lucene.apache.org
> Subject: Re: Reply Split Search Word
>
>
> To have an analyzer split that string into 1-5 as you have listed
> will require you write a custom Analyzer to tokenize with double
> quotes in mind like that.
>
>      Erik
>
> On Aug 8, 2005, at 12:06 AM, Karthik N S wrote:
>
>
>> Hi
>>
>> Luceners
>>
>> Apologies.....
>>
>> As I have already replied,Using Analysis I have tried on all  
>> Analyzers
>> (including Standard Analyzer)
>> But not able to achive the required COMPLETS WORD Split.
>>
>> My I/p String would be a lengthy one as below
>>
>> String sKey = "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
>> Gospodnetic"  +
>> "   " +
>>               "\"" + "Erik Hatcher" + "\""  + "  " +    "Authors of
>> " + "\"" +
>> "Lucene In Action" + "\"";
>>
>> The required split of complete words should return
>>
>>    1) "Dough Cutting"
>>    2) Otis Gospodnetic
>>    3) "Erik Hatcher"
>>    4) Authors of
>>    5) "Lucene In Action"
>>
>> Plz Note :- Words with "\"" are complete split words....
>>
>> I am shure some Analyzer code inside Lucene is handling this task.
>>
>>
>> som how can one achive this task..
>>
>> with regards
>> Karthik
>>
>> -----Original Message-----
>> From: Mordo, Aviran (EXP N-NANNATEK) [mailto:aviran.mordo@lmco.com]
>> Sent: Friday, August 05, 2005 7:58 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Split Search Word
>>
>>
>> The StandardAnalyzer should work just fine with it, It will break the
>> search string to 5 search terms.
>>
>> HTH
>>
>> Aviran
>> http://www.aviransplace.com
>>
>>   _____
>>
>> From: Karthik N S [mailto:karthik@controlnet.co.in]
>> Sent: Friday, August 05, 2005 1:57 AM
>> To: LUCENE
>> Subject: Split Search Word
>>
>>
>>
>> Hi Luceners
>>
>> Apologies.....
>>
>> I  have along Search String as given below...
>>
>>
>>
>> SearchWord =  "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
>> Gospodnetic"  +  "   " + "\"" + "Erik Hatcher" + "\""  + "  " +
>>                            "Authors of " + "\"" + "Lucene In Action"
>> +"\"";
>>
>> And prior to searching the Index ,I need the Words to be Split.
>>
>> SearchWord   =
>>
>>    1)   "\"" + "Dough Cutting" + "\""
>>    2)   "Otis Gospodnetic"
>>    3)  "\"" + "Erik Hatcher" + "\""
>>    4)  "Authors of "
>>    5) "\"" +"Lucene In Action" +"\""
>>
>> I am shure some Analyzer within Lucene is performin the task.
>> So some body please Tell me Howto
>>
>> [ I already used Analysis/Paralysis code to check ,but no help ]
>>
>>
>>
>>
>> WITH WARM REGARDS
>> HAVE A NICE DAY
>> [ N.S.KARTHIK]
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Reply Split Search Word

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi Erik

   I  would like to reformat the Question  slightly ,
   Words without double Quotes may also be present in the String.
   Also I have to apply the STOP - Analyzer  to filter out common English
words appearing within.

Do u mind giving me a bit of src hint for the same...
[ I am googled out of ideas ]


with regards
karthik


-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Monday, August 08, 2005 4:23 PM
To: java-user@lucene.apache.org
Subject: Re: Reply Split Search Word


To have an analyzer split that string into 1-5 as you have listed
will require you write a custom Analyzer to tokenize with double
quotes in mind like that.

     Erik

On Aug 8, 2005, at 12:06 AM, Karthik N S wrote:

> Hi
>
> Luceners
>
> Apologies.....
>
> As I have already replied,Using Analysis I have tried on all Analyzers
> (including Standard Analyzer)
> But not able to achive the required COMPLETS WORD Split.
>
> My I/p String would be a lengthy one as below
>
> String sKey = "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
> Gospodnetic"  +
> "   " +
>               "\"" + "Erik Hatcher" + "\""  + "  " +    "Authors of
> " + "\"" +
> "Lucene In Action" + "\"";
>
> The required split of complete words should return
>
>    1) "Dough Cutting"
>    2) Otis Gospodnetic
>    3) "Erik Hatcher"
>    4) Authors of
>    5) "Lucene In Action"
>
> Plz Note :- Words with "\"" are complete split words....
>
> I am shure some Analyzer code inside Lucene is handling this task.
>
>
> som how can one achive this task..
>
> with regards
> Karthik
>
> -----Original Message-----
> From: Mordo, Aviran (EXP N-NANNATEK) [mailto:aviran.mordo@lmco.com]
> Sent: Friday, August 05, 2005 7:58 PM
> To: java-user@lucene.apache.org
> Subject: RE: Split Search Word
>
>
> The StandardAnalyzer should work just fine with it, It will break the
> search string to 5 search terms.
>
> HTH
>
> Aviran
> http://www.aviransplace.com
>
>   _____
>
> From: Karthik N S [mailto:karthik@controlnet.co.in]
> Sent: Friday, August 05, 2005 1:57 AM
> To: LUCENE
> Subject: Split Search Word
>
>
>
> Hi Luceners
>
> Apologies.....
>
> I  have along Search String as given below...
>
>
>
> SearchWord =  "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
> Gospodnetic"  +  "   " + "\"" + "Erik Hatcher" + "\""  + "  " +
>                            "Authors of " + "\"" + "Lucene In Action"
> +"\"";
>
> And prior to searching the Index ,I need the Words to be Split.
>
> SearchWord   =
>
>    1)   "\"" + "Dough Cutting" + "\""
>    2)   "Otis Gospodnetic"
>    3)  "\"" + "Erik Hatcher" + "\""
>    4)  "Authors of "
>    5) "\"" +"Lucene In Action" +"\""
>
> I am shure some Analyzer within Lucene is performin the task.
> So some body please Tell me Howto
>
> [ I already used Analysis/Paralysis code to check ,but no help ]
>
>
>
>
> WITH WARM REGARDS
> HAVE A NICE DAY
> [ N.S.KARTHIK]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Reply Split Search Word

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
To have an analyzer split that string into 1-5 as you have listed  
will require you write a custom Analyzer to tokenize with double  
quotes in mind like that.

     Erik

On Aug 8, 2005, at 12:06 AM, Karthik N S wrote:

> Hi
>
> Luceners
>
> Apologies.....
>
> As I have already replied,Using Analysis I have tried on all Analyzers
> (including Standard Analyzer)
> But not able to achive the required COMPLETS WORD Split.
>
> My I/p String would be a lengthy one as below
>
> String sKey = "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis  
> Gospodnetic"  +
> "   " +
>               "\"" + "Erik Hatcher" + "\""  + "  " +    "Authors of  
> " + "\"" +
> "Lucene In Action" + "\"";
>
> The required split of complete words should return
>
>    1) "Dough Cutting"
>    2) Otis Gospodnetic
>    3) "Erik Hatcher"
>    4) Authors of
>    5) "Lucene In Action"
>
> Plz Note :- Words with "\"" are complete split words....
>
> I am shure some Analyzer code inside Lucene is handling this task.
>
>
> som how can one achive this task..
>
> with regards
> Karthik
>
> -----Original Message-----
> From: Mordo, Aviran (EXP N-NANNATEK) [mailto:aviran.mordo@lmco.com]
> Sent: Friday, August 05, 2005 7:58 PM
> To: java-user@lucene.apache.org
> Subject: RE: Split Search Word
>
>
> The StandardAnalyzer should work just fine with it, It will break the
> search string to 5 search terms.
>
> HTH
>
> Aviran
> http://www.aviransplace.com
>
>   _____
>
> From: Karthik N S [mailto:karthik@controlnet.co.in]
> Sent: Friday, August 05, 2005 1:57 AM
> To: LUCENE
> Subject: Split Search Word
>
>
>
> Hi Luceners
>
> Apologies.....
>
> I  have along Search String as given below...
>
>
>
> SearchWord =  "\"" + "Dough Cutting" + "\"" +  "  " +  "Otis
> Gospodnetic"  +  "   " + "\"" + "Erik Hatcher" + "\""  + "  " +
>                            "Authors of " + "\"" + "Lucene In Action"
> +"\"";
>
> And prior to searching the Index ,I need the Words to be Split.
>
> SearchWord   =
>
>    1)   "\"" + "Dough Cutting" + "\""
>    2)   "Otis Gospodnetic"
>    3)  "\"" + "Erik Hatcher" + "\""
>    4)  "Authors of "
>    5) "\"" +"Lucene In Action" +"\""
>
> I am shure some Analyzer within Lucene is performin the task.
> So some body please Tell me Howto
>
> [ I already used Analysis/Paralysis code to check ,but no help ]
>
>
>
>
> WITH WARM REGARDS
> HAVE A NICE DAY
> [ N.S.KARTHIK]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org