You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Sanne Grinovero (JIRA)" <ji...@apache.org> on 2015/06/30 13:40:05 UTC

[jira] [Commented] (LUCENE-6212) Remove IndexWriter's per-document analyzer add/updateDocument APIs

    [ https://issues.apache.org/jira/browse/LUCENE-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608156#comment-14608156 ] 

Sanne Grinovero commented on LUCENE-6212:
-----------------------------------------

Hello,
I understand there are good reasons to prevent this for the "average user" but I would beg you to restore the functionality for those who know what they are doing.

There are perfectly valid use cases to use a different Analyzer at query time rather than indexing time, for example when handling synonyms at indexing time you don't need to apply the substitutions again at query time.
Beyond synonyms, it's also possible to have text of different sources which has been pre-processed in different ways, so needs to be tokenized differently to get a consistent output.

I love the idea of Lucene to become more strict regarding to consistent schema choices, but I would hope we could stick to field types and encoding, while Analyzer mappings can use a bit more flexibility?

Would you accept a patch to overload
{code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable<? extends IndexableField>){code}
with the expert version:
{code}org.apache.lucene.index.IndexWriter.updateDocument(Term, Iterable<? extends IndexableField>, Analyzer overrideAnalyzer){code} ?

That would greatly help me to migrate to Lucene 5. My alternatives are to close/open the IndexWriter for each Analyzer change but that would have a significant performance impact; I'd rather cheat and pass an Analyzer instance which is mutable, even if that would prevent me from using the IndexWriter concurrently.

> Remove IndexWriter's per-document analyzer add/updateDocument APIs
> ------------------------------------------------------------------
>
>                 Key: LUCENE-6212
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6212
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 5.1, Trunk
>
>         Attachments: LUCENE-6212.patch
>
>
> IndexWriter already takes an analyzer up-front (via
> IndexWriterConfig), but it also allows you to specify a different one
> for each add/updateDocument.
> I think this is quite dangerous/trappy since it means you can easily
> index tokens for that document that don't match at search-time based
> on the search-time analyzer.
> I think we should remove this trap in 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org