You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by yamo93 <ya...@gmail.com> on 2012/07/25 15:01:56 UTC

Question on ElisionFilter with d'

Hello,

I'm using ElisionFilter to index french text.
The filter works but ignore the d letter followed by an apostrophe 
(example: d'une).

Is-it an expected behaviour or is it an issue ?

Regards,
Yann.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on ElisionFilter with d'

Posted by Ian Lea <ia...@gmail.com>.
Ah, OK.  I thought you were saying it was removing d' when you thought
it shouldn't.  Sounds like a bug to me but I don't know enough about
it to express a strong opinion.


--
Ian.


On Wed, Jul 25, 2012 at 3:56 PM, yamo93 <ya...@gmail.com> wrote:
> Thanks for replying,
>
> The problem is that the filter don't remove d' (and c' too).
> Shall i open an issue on jira ?
>
>
> On 07/25/2012 04:36 PM, Ian Lea wrote:
>>
>> I bet it's expected.  From http://en.wikipedia.org/wiki/Elision_(French)
>>
>> In written French, elision (both phonetic and orthographic) is
>> obligatory for the following words:
>> ...
>>
>> the preposition de
>>   ...
>>   Le père d'Albert vient d'arriver.
>>
>>
>>
>> So surely the removal of d' is correct.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <ya...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm using ElisionFilter to index french text.
>>> The filter works but ignore the d letter followed by an apostrophe
>>> (example:
>>> d'une).
>>>
>>> Is-it an expected behaviour or is it an issue ?
>>>
>>> Regards,
>>> Yann.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on ElisionFilter with d'

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Jul 26, 2012 at 4:10 AM, yamo93 <ya...@gmail.com> wrote:
> A possible workaround would be to call this constructor
> ElisionFilter(Version matchVersion, TokenStream input, Set<?> articles).
>

Thats the way, just supply the list you want.

> But i don't understand why this "d" and "c" are not present in default
> articles.
>

Its just historically that was the default list in the file.
This list should really be removed, and moved to FrenchAnalyzer, as
this filter is not only used for french:
https://issues.apache.org/jira/browse/LUCENE-3884

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on ElisionFilter with d'

Posted by yamo93 <ya...@gmail.com>.
Hi,

Sorry I forgot the most important : i use lucene 3.6.

Here is my code : tokenStream = new ElisionFilter(Version.LUCENE_36, 
tokenStream);

I looked at the source code of ElisionFilter, and DEFAULT_ARTICLES 
doesn't contain "d" and "c", in order to manage terms like /"d'une/" or 
"/c'est"/.

A possible workaround would be to call this constructor 
ElisionFilter(Version matchVersion, TokenStream input, Set<?> articles).

But i don't understand why this "d" and "c" are not present in default 
articles.

Yann.

On 07/26/2012 03:52 AM, Jack Krupansky wrote:
> The filter should work (remove the letter and apostrophe).
>
> Could you supply an exact code fragment that shows the literal term, 
> the code invoking the filter, and the exact literal output?
>
> And, which release of Lucene?
>
> -- Jack Krupansky
>
> -----Original Message----- From: yamo93
> Sent: Wednesday, July 25, 2012 9:56 AM
> To: java-user@lucene.apache.org
> Subject: Re: Question on ElisionFilter with d'
>
> Thanks for replying,
>
> The problem is that the filter don't remove d' (and c' too).
> Shall i open an issue on jira ?
>
> On 07/25/2012 04:36 PM, Ian Lea wrote:
>> I bet it's expected.  From http://en.wikipedia.org/wiki/Elision_(French)
>>
>> In written French, elision (both phonetic and orthographic) is
>> obligatory for the following words:
>> ...
>>
>> the preposition de
>>   ...
>>   Le père d'Albert vient d'arriver.
>>
>>
>>
>> So surely the removal of d' is correct.
>>
>>
>> -- 
>> Ian.
>>
>>
>> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <ya...@gmail.com> wrote:
>>> Hello,
>>>
>>> I'm using ElisionFilter to index french text.
>>> The filter works but ignore the d letter followed by an apostrophe 
>>> (example:
>>> d'une).
>>>
>>> Is-it an expected behaviour or is it an issue ?
>>>
>>> Regards,
>>> Yann.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



Re: Question on ElisionFilter with d'

Posted by Jack Krupansky <ja...@basetechnology.com>.
The filter should work (remove the letter and apostrophe).

Could you supply an exact code fragment that shows the literal term, the 
code invoking the filter, and the exact literal output?

And, which release of Lucene?

-- Jack Krupansky

-----Original Message----- 
From: yamo93
Sent: Wednesday, July 25, 2012 9:56 AM
To: java-user@lucene.apache.org
Subject: Re: Question on ElisionFilter with d'

Thanks for replying,

The problem is that the filter don't remove d' (and c' too).
Shall i open an issue on jira ?

On 07/25/2012 04:36 PM, Ian Lea wrote:
> I bet it's expected.  From http://en.wikipedia.org/wiki/Elision_(French)
>
> In written French, elision (both phonetic and orthographic) is
> obligatory for the following words:
> ...
>
> the preposition de
>   ...
>   Le père d'Albert vient d'arriver.
>
>
>
> So surely the removal of d' is correct.
>
>
> --
> Ian.
>
>
> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <ya...@gmail.com> wrote:
>> Hello,
>>
>> I'm using ElisionFilter to index french text.
>> The filter works but ignore the d letter followed by an apostrophe 
>> (example:
>> d'une).
>>
>> Is-it an expected behaviour or is it an issue ?
>>
>> Regards,
>> Yann.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on ElisionFilter with d'

Posted by yamo93 <ya...@gmail.com>.
Thanks for replying,

The problem is that the filter don't remove d' (and c' too).
Shall i open an issue on jira ?

On 07/25/2012 04:36 PM, Ian Lea wrote:
> I bet it's expected.  From http://en.wikipedia.org/wiki/Elision_(French)
>
> In written French, elision (both phonetic and orthographic) is
> obligatory for the following words:
> ...
>
> the preposition de
>   ...
>   Le père d'Albert vient d'arriver.
>
>
>
> So surely the removal of d' is correct.
>
>
> --
> Ian.
>
>
> On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <ya...@gmail.com> wrote:
>> Hello,
>>
>> I'm using ElisionFilter to index french text.
>> The filter works but ignore the d letter followed by an apostrophe (example:
>> d'une).
>>
>> Is-it an expected behaviour or is it an issue ?
>>
>> Regards,
>> Yann.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on ElisionFilter with d'

Posted by Ian Lea <ia...@gmail.com>.
I bet it's expected.  From http://en.wikipedia.org/wiki/Elision_(French)

In written French, elision (both phonetic and orthographic) is
obligatory for the following words:
...

the preposition de
 ...
 Le père d'Albert vient d'arriver.



So surely the removal of d' is correct.


--
Ian.


On Wed, Jul 25, 2012 at 2:01 PM, yamo93 <ya...@gmail.com> wrote:
> Hello,
>
> I'm using ElisionFilter to index french text.
> The filter works but ignore the d letter followed by an apostrophe (example:
> d'une).
>
> Is-it an expected behaviour or is it an issue ?
>
> Regards,
> Yann.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org