You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2011/06/16 00:38:47 UTC

[jira] [Created] (SOLR-2599) FieldCopy Update Processor

FieldCopy Update Processor
--------------------------

                 Key: SOLR-2599
                 URL: https://issues.apache.org/jira/browse/SOLR-2599
             Project: Solr
          Issue Type: New Feature
          Components: update
            Reporter: Jan Høydahl


Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2599) FieldCopy Update Processor

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121327#comment-13121327 ] 

Jan Høydahl commented on SOLR-2599:
-----------------------------------

Perhaps {{multival}} should be renamed {{multiValued}} to comply with schema lingo?

Also, if I make it (optionally) schema aware, I can set {{multiValued}} behavior as default if {{dest}} field is multivalued. Also, perhaps it makes sense to allow {{append}} for {{multiValued}} as well, and let it append all {{source}} fields to a string, and then adding this concatenated string as one single field value instead of each {{source}} as its own value?

The reason I want to be able to disable strict schema checking is in the case where a processor creates intermediate fields only, which we know will be removed from {{SolrInputDocument}} before indexing, so that we can be free to name it whatever we like without causing an error. Unfortunately, {{ExtractingRequestHandler}} is too strict here and would benefit from a {{enforceSchema=false}} option.
                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>         Attachments: SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-2599) FieldCopy Update Processor

Posted by "Jan Høydahl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl reassigned SOLR-2599:
---------------------------------

    Assignee: Jan Høydahl

> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2599) FieldCopy Update Processor

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved SOLR-2599.
----------------------------

    Resolution: Fixed

I went ahead and committed my patch.

(one of the beauties of adding more UpdateProcessors like this is that they can be mixed and matched, so if folks have ideas about alternative configuration/behavior we can always add more processors with different names)

Committed revision 1350050. - trunk
Committed revision 1350051. - 4x

                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 4.0
>
>         Attachments: SOLR-2599-hoss.patch, SOLR-2599.patch, SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2599) FieldCopy Update Processor

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2599:
------------------------------

    Fix Version/s: 4.0
                   3.6
    
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2599) FieldCopy Update Processor

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2599:
------------------------------

    Attachment: SOLR-2599.patch

Here's the processor. It's been in production for some time at a customer.

Sample config as follows:
{code}
<processor class="solr.FieldCopyProcessorFactory">
  <str name="source">category</str>
  <str name="dest">category_s</str>
</processor>
{code}

To move (rename) a field:
{code}
<processor class="solr.FieldCopyProcessorFactory">
  <str name="source">LastModified</str>
  <str name="dest">last_modified</str>
  <bool name="move">true</bool>
</processor>
{code}

To append to existing field:
{code}
<processor class="solr.FieldCopyProcessorFactory">
  <str name="source">lastname firstname</str>
  <str name="dest">fullname</str>
  <bool name="append">true</bool>
  <str name="append.delim">, </str>
</processor>
{code}

To append as values to multivalued field, with optional size cap:
{code}
<processor class="solr.FieldCopyProcessorFactory">
  <str name="source">title body</str>
  <str name="dest">text</str>
  <bool name="multival">true</bool>
  <int name="maxChars">100</int>
</processor>
{code}

                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>         Attachments: SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2599) FieldCopy Update Processor

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228347#comment-13228347 ] 

Jan Høydahl commented on SOLR-2599:
-----------------------------------

@Hoss, you have not incorporated this in your SOLR-2802, have you? I'd like to get this in, but have not had time to fully investigate your base classes yet. Can we put this in as is and refactor later? If so, what parameter names should change in order to have the same external API after refactoring?
                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2599.patch, SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2599) CloneFieldUpdateProcessor (copyField-equse equivilent)

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-2599:
---------------------------

    Summary: CloneFieldUpdateProcessor (copyField-equse equivilent)  (was: FieldCopy Update Processor)
    
> CloneFieldUpdateProcessor (copyField-equse equivilent)
> ------------------------------------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 4.0
>
>         Attachments: SOLR-2599-hoss.patch, SOLR-2599.patch, SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2599) FieldCopy Update Processor

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-2599:
---------------------------

    Attachment: SOLR-2599-hoss.patch

Jan: inspired by your patch and tests, i hacked up a new version that incorporates all my previous comments...

* CloneFieldUpdateProcessorFactory 
** handles just the core field cloning
** source can be simple filed name, or the various "selector" style args from FieldMutatingUpdateProcessorFactory
* TruncateFieldUpdateProcessorFactory
** FieldMutatingUpdateProcessorFactory
** implements the 'max chars' style logic
* IgnoreFieldUpdateProcessorFactory
** FieldMutatingUpdateProcessorFactory
** removes fields from the document

...take a look at the javadocs and test case and lemme know what you think.  I'm pretty sure combinations of these three processors cover all of the examples from your test case.
                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 4.0
>
>         Attachments: SOLR-2599-hoss.patch, SOLR-2599.patch, SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2599) FieldCopy Update Processor

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2599:
------------------------------

    Attachment: SOLR-2599.patch

New patch. Renamed multival -> multiValued

Any comments on functionality, naming or conventions before I prepare for commit?
                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2599.patch, SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2599) FieldCopy Update Processor

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281347#comment-13281347 ] 

Hoss Man commented on SOLR-2599:
--------------------------------

Jan:

I did not incorporate any sort of copy field equivalent in the SOLR-2802 work, but i did implement the "append" logic as a processor (see below)

Comments on your patch...

* my personal pref would be to use a slight diff name... (maybe "CloneFieldUpdateProcessor" ?) to help differentiate slightly from {{<copyField/>}} and reduce the likelihood of confusion during casual discussion in email/irc (ie: "I'm copying field A to B..."; "wait, are you FieldCopy-ing or CopyField-ing?")
* as mentioned in SOLR-2825 + SOLR-3095, you shouldn't need to explicitly handle "enabled" in the individual processors
* i would eliminate the append, append.delim, and multiValued options and only support the multiValued=true behavior - if they want the append logic they can combine this processor with the ConcatFieldUpdateProcessorFactory
* instead of a "move=true" boolean config, i think it would be more clear what the behavior/alternatives are if we used an "action=clone|rename" config, with the default being "clone"
* instead of the simple whitespace seperated "source" field name config, it would be nice if we could reuse the field name selector syntax options from FieldMutatingUpdateProcessorFactory (multiple fieldName, fieldRegex, typeName, and typeClass as well as excludes of any/all of those)
* need to think carefully about how maxChars should work:
** what if the source values aren't Strings? they could easily be numbers or dates, so it seems like a bad idea to convert them to strings just because they are copied/renamed.
** even if all we worry about is strings, should it be maxChars per value, maxChars per source field, or total maxChars in dest?
*** specifics need documented
** personally: i would suggest ripping out the maxChars option and making it a distinct processor that can be configured later in the chain.  if we leave it in, then i think it's really important that it should be ignored or throw and error unless the value implements CharSequence, and not forcably toString() every copied value. (so this processor will still be useful with numeric values)
* need to think carefully about field boosts:
** either we should try to preserve/combine them on move/copy, or we should make sure we explicitly blow them away
** either way we need to document it
** if i'm reading the patch correctly it currently obliterates the boost on the dest field in all cases, even if there is not source value(s) to copy, and ignores any boost on any source field, but we should double check that.
                
> FieldCopy Update Processor
> --------------------------
>
>                 Key: SOLR-2599
>                 URL: https://issues.apache.org/jira/browse/SOLR-2599
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 4.0
>
>         Attachments: SOLR-2599.patch, SOLR-2599.patch
>
>
> Need an UpdateProcessor which can copy and move fields

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org