You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Harwood (JIRA)" <ji...@apache.org> on 2007/03/18 13:08:10 UTC

[jira] Created: (LUCENE-835) An IndexReader with run-time support for synonyms

An IndexReader with run-time support for synonyms
-------------------------------------------------

                 Key: LUCENE-835
                 URL: https://issues.apache.org/jira/browse/LUCENE-835
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Index
    Affects Versions: 2.1
            Reporter: Mark Harwood
         Assigned To: Mark Harwood


These classes provide support for enabling the use of synonyms for terms in an existing index.

While Analyzers can be used at Query-parse time or Index-time to inject synonyms these are not always satisfactory means of providing support for synonyms:

* Index-time injection of synonyms is less flexible because changing the lists of synonyms requires an index rebuild. 
* Query-parse-time injection is awkward because special support is required in the parser/query logic  to recognise and cater for the tokens that appear in the same position. Additionally, any statistical analysis of the index content via TermEnum/TermDocs etc does not consider the synonyms unless specific code is added.

What is perhaps more useful is a transparent wrapper for the IndexReader that provides a synonym-ized view of the index without requiring specialised support in the calling code. All of the TermEnum/TermDocs interfaces remain the same but behind the scenes synonyms are being considered/applied silently.

The classes supplied here provide this "virtual" view of the index and all queries or other code that examines this index using the special reader benefit from this view without requiring specialized code. A Junit test illustrates this code in action.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-835) An IndexReader with run-time support for synonyms

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-835:
--------------------------------

    Attachment: TestSynonymIndexReader.java

> An IndexReader with run-time support for synonyms
> -------------------------------------------------
>
>                 Key: LUCENE-835
>                 URL: https://issues.apache.org/jira/browse/LUCENE-835
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Mark Harwood
>         Assigned To: Mark Harwood
>         Attachments: Synonym.java, SynonymIndexReader.java, SynonymSet.java, TestSynonymIndexReader.java
>
>
> These classes provide support for enabling the use of synonyms for terms in an existing index.
> While Analyzers can be used at Query-parse time or Index-time to inject synonyms these are not always satisfactory means of providing support for synonyms:
> * Index-time injection of synonyms is less flexible because changing the lists of synonyms requires an index rebuild. 
> * Query-parse-time injection is awkward because special support is required in the parser/query logic  to recognise and cater for the tokens that appear in the same position. Additionally, any statistical analysis of the index content via TermEnum/TermDocs etc does not consider the synonyms unless specific code is added.
> What is perhaps more useful is a transparent wrapper for the IndexReader that provides a synonym-ized view of the index without requiring specialised support in the calling code. All of the TermEnum/TermDocs interfaces remain the same but behind the scenes synonyms are being considered/applied silently.
> The classes supplied here provide this "virtual" view of the index and all queries or other code that examines this index using the special reader benefit from this view without requiring specialized code. A Junit test illustrates this code in action.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-835) An IndexReader with run-time support for synonyms

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-835:
--------------------------------

    Attachment: Synonym.java

> An IndexReader with run-time support for synonyms
> -------------------------------------------------
>
>                 Key: LUCENE-835
>                 URL: https://issues.apache.org/jira/browse/LUCENE-835
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Mark Harwood
>         Assigned To: Mark Harwood
>         Attachments: Synonym.java, SynonymIndexReader.java, SynonymSet.java, TestSynonymIndexReader.java
>
>
> These classes provide support for enabling the use of synonyms for terms in an existing index.
> While Analyzers can be used at Query-parse time or Index-time to inject synonyms these are not always satisfactory means of providing support for synonyms:
> * Index-time injection of synonyms is less flexible because changing the lists of synonyms requires an index rebuild. 
> * Query-parse-time injection is awkward because special support is required in the parser/query logic  to recognise and cater for the tokens that appear in the same position. Additionally, any statistical analysis of the index content via TermEnum/TermDocs etc does not consider the synonyms unless specific code is added.
> What is perhaps more useful is a transparent wrapper for the IndexReader that provides a synonym-ized view of the index without requiring specialised support in the calling code. All of the TermEnum/TermDocs interfaces remain the same but behind the scenes synonyms are being considered/applied silently.
> The classes supplied here provide this "virtual" view of the index and all queries or other code that examines this index using the special reader benefit from this view without requiring specialized code. A Junit test illustrates this code in action.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-835) An IndexReader with run-time support for synonyms

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-835:
--------------------------------

    Attachment: SynonymIndexReader.java

> An IndexReader with run-time support for synonyms
> -------------------------------------------------
>
>                 Key: LUCENE-835
>                 URL: https://issues.apache.org/jira/browse/LUCENE-835
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Mark Harwood
>         Assigned To: Mark Harwood
>         Attachments: Synonym.java, SynonymIndexReader.java, SynonymSet.java, TestSynonymIndexReader.java
>
>
> These classes provide support for enabling the use of synonyms for terms in an existing index.
> While Analyzers can be used at Query-parse time or Index-time to inject synonyms these are not always satisfactory means of providing support for synonyms:
> * Index-time injection of synonyms is less flexible because changing the lists of synonyms requires an index rebuild. 
> * Query-parse-time injection is awkward because special support is required in the parser/query logic  to recognise and cater for the tokens that appear in the same position. Additionally, any statistical analysis of the index content via TermEnum/TermDocs etc does not consider the synonyms unless specific code is added.
> What is perhaps more useful is a transparent wrapper for the IndexReader that provides a synonym-ized view of the index without requiring specialised support in the calling code. All of the TermEnum/TermDocs interfaces remain the same but behind the scenes synonyms are being considered/applied silently.
> The classes supplied here provide this "virtual" view of the index and all queries or other code that examines this index using the special reader benefit from this view without requiring specialized code. A Junit test illustrates this code in action.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-835) An IndexReader with run-time support for synonyms

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-835:
--------------------------------

    Attachment: SynonymSet.java

> An IndexReader with run-time support for synonyms
> -------------------------------------------------
>
>                 Key: LUCENE-835
>                 URL: https://issues.apache.org/jira/browse/LUCENE-835
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Mark Harwood
>         Assigned To: Mark Harwood
>         Attachments: Synonym.java, SynonymIndexReader.java, SynonymSet.java, TestSynonymIndexReader.java
>
>
> These classes provide support for enabling the use of synonyms for terms in an existing index.
> While Analyzers can be used at Query-parse time or Index-time to inject synonyms these are not always satisfactory means of providing support for synonyms:
> * Index-time injection of synonyms is less flexible because changing the lists of synonyms requires an index rebuild. 
> * Query-parse-time injection is awkward because special support is required in the parser/query logic  to recognise and cater for the tokens that appear in the same position. Additionally, any statistical analysis of the index content via TermEnum/TermDocs etc does not consider the synonyms unless specific code is added.
> What is perhaps more useful is a transparent wrapper for the IndexReader that provides a synonym-ized view of the index without requiring specialised support in the calling code. All of the TermEnum/TermDocs interfaces remain the same but behind the scenes synonyms are being considered/applied silently.
> The classes supplied here provide this "virtual" view of the index and all queries or other code that examines this index using the special reader benefit from this view without requiring specialized code. A Junit test illustrates this code in action.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-835) An IndexReader with run-time support for synonyms

Posted by "Benjamin Henriet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484139 ] 

Benjamin Henriet commented on LUCENE-835:
-----------------------------------------

Hi Mark,
Thank you for your work. You said: "Query-parse-time injection is awkward because special support is required in the parser/query logic to recognise and cater for the tokens that appear in the same position." Is there an implementation of the "special support"? I have a similar problem with dutch word decomposition: at query time i would decompound words like "hulparbeider" in "hulparbeider" OR "hulp" OR "arbeider" but the parsed query contains only one word group:  "hulparbeider hulp arbeider".
Can you give me some tip?
Thank you
Benjamin

> An IndexReader with run-time support for synonyms
> -------------------------------------------------
>
>                 Key: LUCENE-835
>                 URL: https://issues.apache.org/jira/browse/LUCENE-835
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Mark Harwood
>         Assigned To: Mark Harwood
>         Attachments: Synonym.java, SynonymIndexReader.java, SynonymSet.java, TestSynonymIndexReader.java
>
>
> These classes provide support for enabling the use of synonyms for terms in an existing index.
> While Analyzers can be used at Query-parse time or Index-time to inject synonyms these are not always satisfactory means of providing support for synonyms:
> * Index-time injection of synonyms is less flexible because changing the lists of synonyms requires an index rebuild. 
> * Query-parse-time injection is awkward because special support is required in the parser/query logic  to recognise and cater for the tokens that appear in the same position. Additionally, any statistical analysis of the index content via TermEnum/TermDocs etc does not consider the synonyms unless specific code is added.
> What is perhaps more useful is a transparent wrapper for the IndexReader that provides a synonym-ized view of the index without requiring specialised support in the calling code. All of the TermEnum/TermDocs interfaces remain the same but behind the scenes synonyms are being considered/applied silently.
> The classes supplied here provide this "virtual" view of the index and all queries or other code that examines this index using the special reader benefit from this view without requiring specialized code. A Junit test illustrates this code in action.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org