You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2010/02/13 07:09:27 UTC

[jira] Created: (SOLR-1772) UpdateProcessor to prune "empty" values

UpdateProcessor to prune "empty" values
---------------------------------------

                 Key: SOLR-1772
                 URL: https://issues.apache.org/jira/browse/SOLR-1772
             Project: Solr
          Issue Type: Wish
            Reporter: Hoss Man


Users seem to frequently get confused when some FieldTypes (typically the numeric ones) complain about invalid field values when the inadvertantly index an empty string.

It would be cool to provide an UpdateProcessor that makes it easy to strip out any fields being added as empty values ... it could be configured using field (and/or field type) names or globs to select/ignore certain fields -- i haven't thought it through all that hard

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1772) UpdateProcessor to prune "empty" values

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836173#action_12836173 ] 

Yonik Seeley commented on SOLR-1772:
------------------------------------

Going forward this should be less of an issue since the new numeric types all do validation on their input.

> UpdateProcessor to prune "empty" values
> ---------------------------------------
>
>                 Key: SOLR-1772
>                 URL: https://issues.apache.org/jira/browse/SOLR-1772
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Hoss Man
>
> Users seem to frequently get confused when some FieldTypes (typically the numeric ones) complain about invalid field values when the inadvertantly index an empty string.
> It would be cool to provide an UpdateProcessor that makes it easy to strip out any fields being added as empty values ... it could be configured using field (and/or field type) names or globs to select/ignore certain fields -- i haven't thought it through all that hard

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1772) UpdateProcessor to prune "empty" values

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836933#action_12836933 ] 

Hoss Man commented on SOLR-1772:
--------------------------------

Actually my point is that the new FieldTypes make it  *more* of an issue (in the eyes of end users) because now Solr errors out on empty (numeric) field values ... having an UpdateProcessor like this would be an easy solution for people who just want a simple way to tell Solr to ignore empty fields (with certain names, or certain types)

> UpdateProcessor to prune "empty" values
> ---------------------------------------
>
>                 Key: SOLR-1772
>                 URL: https://issues.apache.org/jira/browse/SOLR-1772
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Hoss Man
>
> Users seem to frequently get confused when some FieldTypes (typically the numeric ones) complain about invalid field values when the inadvertantly index an empty string.
> It would be cool to provide an UpdateProcessor that makes it easy to strip out any fields being added as empty values ... it could be configured using field (and/or field type) names or globs to select/ignore certain fields -- i haven't thought it through all that hard

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1772) UpdateProcessor to prune "empty" values

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839812#action_12839812 ] 

Hoss Man commented on SOLR-1772:
--------------------------------

bq. I'd almost rather see the default behavior changed rather than to put another configurable component in the chain that would slow things down (slightly) for everyone.

That seems backwards -- if FieldType(s) start checking for the empty string, that's a few extra cycles of cost that everyone spends even if their indexing clients are already well behaved and only send "real" values.

Adding it as an optional UpdateProcessor makes it something that only people who need hand holdinghave to spend cycles on.

bq. ... confused that the empty string was being indexed at all, for fields that aren't even numbers. They thought this was equivalent to not sending it any value. I haven't verified this first hand but I believe it.

Nope: there are many use cases for both strings and numbers where you may need to "skip" a value in a multiValued field -- parallel arrays and such. ... it's actually one main situations we still have where IntField comes in handy (besides just supporting completely legacy Lucene indexes)

> UpdateProcessor to prune "empty" values
> ---------------------------------------
>
>                 Key: SOLR-1772
>                 URL: https://issues.apache.org/jira/browse/SOLR-1772
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Hoss Man
>
> Users seem to frequently get confused when some FieldTypes (typically the numeric ones) complain about invalid field values when the inadvertantly index an empty string.
> It would be cool to provide an UpdateProcessor that makes it easy to strip out any fields being added as empty values ... it could be configured using field (and/or field type) names or globs to select/ignore certain fields -- i haven't thought it through all that hard

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1772) UpdateProcessor to prune "empty" values

Posted by "David Smiley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837939#action_12837939 ] 

David Smiley commented on SOLR-1772:
------------------------------------

I was just talking to a Solr user who was confused that the empty string was being indexed at all, for fields that aren't even numbers.  They thought this was equivalent to not sending it any value.  I haven't verified this first hand but I believe it.  Perhaps it would make sense for all fields types to ignore an empty string by default, and then if for some weird reason I can't contemplate, you could disable it for a certain field.  Does that make sense?

> UpdateProcessor to prune "empty" values
> ---------------------------------------
>
>                 Key: SOLR-1772
>                 URL: https://issues.apache.org/jira/browse/SOLR-1772
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Hoss Man
>
> Users seem to frequently get confused when some FieldTypes (typically the numeric ones) complain about invalid field values when the inadvertantly index an empty string.
> It would be cool to provide an UpdateProcessor that makes it easy to strip out any fields being added as empty values ... it could be configured using field (and/or field type) names or globs to select/ignore certain fields -- i haven't thought it through all that hard

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1772) UpdateProcessor to prune "empty" values

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838008#action_12838008 ] 

Yonik Seeley commented on SOLR-1772:
------------------------------------

David - I think that could make sense.  I'd almost rather see the default behavior changed rather than to put another configurable component in the chain that would slow things down (slightly) for everyone.

> UpdateProcessor to prune "empty" values
> ---------------------------------------
>
>                 Key: SOLR-1772
>                 URL: https://issues.apache.org/jira/browse/SOLR-1772
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Hoss Man
>
> Users seem to frequently get confused when some FieldTypes (typically the numeric ones) complain about invalid field values when the inadvertantly index an empty string.
> It would be cool to provide an UpdateProcessor that makes it easy to strip out any fields being added as empty values ... it could be configured using field (and/or field type) names or globs to select/ignore certain fields -- i haven't thought it through all that hard

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.