You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Elmer Garduno (JIRA)" <ji...@apache.org> on 2011/06/10 21:36:59 UTC

[jira] [Created] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
----------------------------------------------------------------------------------

                 Key: SOLR-2584
                 URL: https://issues.apache.org/jira/browse/SOLR-2584
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 3.3, 4.0
            Reporter: Elmer Garduno
            Priority: Minor


Hi folks, 

I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 

A typical use case is:

If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 

What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.

Regards



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi updated SOLR-2584:
---------------------------------

    Affects Version/s: 1.4.1
        Fix Version/s: 4.0
                       3.4
             Assignee: Koji Sekiguchi

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4.1, 3.3, 4.0
>            Reporter: Elmer Garduno
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>              Labels: uima
>             Fix For: 3.4, 4.0
>
>         Attachments: SOLR-2584.patch, SOLR-2584.patch, SOLR-2584.patch
>
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elmer Garduno updated SOLR-2584:
--------------------------------

    Attachment: SOLR-2584.patch

Added test cases and fixed an error.

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.3, 4.0
>            Reporter: Elmer Garduno
>            Priority: Minor
>              Labels: uima
>         Attachments: SOLR-2584.patch, SOLR-2584.patch
>
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047793#comment-13047793 ] 

Koji Sekiguchi commented on SOLR-2584:
--------------------------------------

Or we can implement the function in the new update processor and place it after uima update processor in the chain.

Anyway I wish I could have the function.

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.3, 4.0
>            Reporter: Elmer Garduno
>            Priority: Minor
>              Labels: uima
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi updated SOLR-2584:
---------------------------------

    Attachment: SOLR-2584.patch

Thanks Elmer for the patch!

I did some fix in the attached patch:

* remove unused checkNumDocs() from test
* use <lst/> for fields parameter
* use List instead of Set in processAdd() to keep the sequence of values in a multiValued field. I also added check code for it in the test case
* as fields could be null, added null check in processAdd()
* add prettify in javadoc

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.3, 4.0
>            Reporter: Elmer Garduno
>            Priority: Minor
>              Labels: uima
>         Attachments: SOLR-2584.patch, SOLR-2584.patch, SOLR-2584.patch
>
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Sekiguchi resolved SOLR-2584.
----------------------------------

    Resolution: Fixed

committed in trunk and 3x.

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4.1, 3.3, 4.0
>            Reporter: Elmer Garduno
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>              Labels: uima
>             Fix For: 3.4, 4.0
>
>         Attachments: SOLR-2584.patch, SOLR-2584.patch, SOLR-2584.patch
>
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elmer Garduno updated SOLR-2584:
--------------------------------

    Attachment: SOLR-2584.patch

UniqFieldsUpdateProcessor uniq's specified fields content. Useful after an UpdateRequestProcessor that could generate duplicate values for a field.



> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.3, 4.0
>            Reporter: Elmer Garduno
>            Priority: Minor
>              Labels: uima
>         Attachments: SOLR-2584.patch
>
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068150#comment-13068150 ] 

Elmer Garduno commented on SOLR-2584:
-------------------------------------

Thanks Koji

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 1.4.1, 3.3, 4.0
>            Reporter: Elmer Garduno
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>              Labels: uima
>             Fix For: 3.4, 4.0
>
>         Attachments: SOLR-2584.patch, SOLR-2584.patch, SOLR-2584.patch
>
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

Posted by "Elmer Garduno (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067840#comment-13067840 ] 

Elmer Garduno commented on SOLR-2584:
-------------------------------------

Koji, I followed your approach and implemented it using an UpdateRequestProcessor.

I'm submitting the patch for branch 3x.

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-2584
>                 URL: https://issues.apache.org/jira/browse/SOLR-2584
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 3.3, 4.0
>            Reporter: Elmer Garduno
>            Priority: Minor
>              Labels: uima
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org