You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yuta Kawadai <yu...@gmail.com> on 2010/03/11 00:52:24 UTC

surrogate pairs

Hi

Can Lucene use surrogate pairs (and its term positions or length) ?

Thanks,
Yuta

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: surrogate pairs

Posted by Yuta Kawadai <yu...@gmail.com>.
I'm sorry for lack of talk.

I try to treat the text which contains "surrogate pairs" in Lucene.
So, I want to confirm whether Lucene(core part, Analyzer, TokenFilter
and so on) can treat terms which contains "surrogate pairs" or not.

Thanks,
Yuta

2010/3/11 Erick Erickson <er...@gmail.com>:
> Please describe the problem you're trying to solve,
> what *you* mean by "surrogate pairs" and how you'd
> like Lucene to use them. The lack of these details
> forces any responder to guess, almost certainly
> wrongly.
>
> Best
> Erick
>
> On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yu...@gmail.com> wrote:
>
>> Hi
>>
>> Can Lucene use surrogate pairs (and its term positions or length) ?
>>
>> Thanks,
>> Yuta
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: surrogate pairs

Posted by Erick Erickson <er...@gmail.com>.
Please describe the problem you're trying to solve,
what *you* mean by "surrogate pairs" and how you'd
like Lucene to use them. The lack of these details
forces any responder to guess, almost certainly
wrongly.

Best
Erick

On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yu...@gmail.com> wrote:

> Hi
>
> Can Lucene use surrogate pairs (and its term positions or length) ?
>
> Thanks,
> Yuta
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: surrogate pairs

Posted by Simon Willnauer <si...@googlemail.com>.
On Thu, Mar 11, 2010 at 2:28 AM, Robert Muir <rc...@gmail.com> wrote:
> On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yu...@gmail.com> wrote:
>> Hi
>>
>> Can Lucene use surrogate pairs (and its term positions or length) ?
>>
>> Thanks,
>> Yuta
>
> Yes, just make sure you use an Analyzer that supports them...
> unfortunately most of the ones included with released versions of
> Lucene (e.g. CJKAnalyzer) will not do the right thing, hopefully in
> the next release they will.
They will do the right thing in the next release :)

simon
>
> --
> Robert Muir
> rcmuir@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: surrogate pairs

Posted by David Leangen <ap...@leangen.net>.
Hi, Yuta-san,


>> Now I use own Analyzer which based on "MeCab" (It's open source
>> Japanese morphological analyzer).
>> I try to modify it to support surrogate pairs.
>> 
>> And I'm expecting the next release!

Cool!

I look forward to that. Is there a link somewhere to your project? I am very interested.


Thank you!
=David



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: surrogate pairs

Posted by Simon Willnauer <si...@googlemail.com>.
Hi Yuta,
Are you looking for a specific analyzer like CJKANalyzer or do you
look for tokenstreams like lowercaseTokenFilter etc.
A fair bit of the token filters are already converted to support
handle surrogate pairs correctly. If you need help to figure out how
to use stuff from trunk I'm happy to help.

simon

On Fri, Mar 12, 2010 at 5:27 AM, Yuta Kawadai <yu...@gmail.com> wrote:
> Thank you.
>
> Now I use own Analyzer which based on "MeCab" (It's open source
> Japanese morphological analyzer).
> I try to modify it to support surrogate pairs.
>
> And I'm expecting the next release!
>
> Yuta
>
> 2010/3/11 Robert Muir <rc...@gmail.com>:
>> On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yu...@gmail.com> wrote:
>>> Hi
>>>
>>> Can Lucene use surrogate pairs (and its term positions or length) ?
>>>
>>> Thanks,
>>> Yuta
>>
>> Yes, just make sure you use an Analyzer that supports them...
>> unfortunately most of the ones included with released versions of
>> Lucene (e.g. CJKAnalyzer) will not do the right thing, hopefully in
>> the next release they will.
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: surrogate pairs

Posted by Yuta Kawadai <yu...@gmail.com>.
Thank you.

Now I use own Analyzer which based on "MeCab" (It's open source
Japanese morphological analyzer).
I try to modify it to support surrogate pairs.

And I'm expecting the next release!

Yuta

2010/3/11 Robert Muir <rc...@gmail.com>:
> On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yu...@gmail.com> wrote:
>> Hi
>>
>> Can Lucene use surrogate pairs (and its term positions or length) ?
>>
>> Thanks,
>> Yuta
>
> Yes, just make sure you use an Analyzer that supports them...
> unfortunately most of the ones included with released versions of
> Lucene (e.g. CJKAnalyzer) will not do the right thing, hopefully in
> the next release they will.
>
> --
> Robert Muir
> rcmuir@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: surrogate pairs

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Mar 10, 2010 at 6:52 PM, Yuta Kawadai <yu...@gmail.com> wrote:
> Hi
>
> Can Lucene use surrogate pairs (and its term positions or length) ?
>
> Thanks,
> Yuta

Yes, just make sure you use an Analyzer that supports them...
unfortunately most of the ones included with released versions of
Lucene (e.g. CJKAnalyzer) will not do the right thing, hopefully in
the next release they will.

-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org