You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Will Johnson (JIRA)" <ji...@apache.org> on 2007/04/27 16:21:15 UTC

[jira] Created: (SOLR-217) schema option to ignore unused fields

schema option to ignore unused fields
-------------------------------------

                 Key: SOLR-217
                 URL: https://issues.apache.org/jira/browse/SOLR-217
             Project: Solr
          Issue Type: Improvement
          Components: update
    Affects Versions: 1.2
            Reporter: Will Johnson
            Priority: Minor
             Fix For: 1.2
         Attachments: ignoreUnnamedFields.patch

One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Updated: (SOLR-217) schema option to ignore unused fields

Posted by Chris Hostetter <ho...@fucit.org>.
And aparently i just replied to the wrong email, which just goes to show
i'm not in shape to run svn commit...

: as far as i'm concerned it's ready to commit ... but i've been sick, and i
: i have moral objections to using my commiter bit while:
:   1) sleep deprived
:   2) intoxicated
:   3) medicated
:   4) suffering from headaches
:
: ..since i've been 2 for 4 the past few days, i've been holding off.

	...

: Any update on this?  I'm one little * away from having a clean
: build/test.
:
: - will
:
: -----Original Message-----
: From: Hoss Man (JIRA) [mailto:jira@apache.org]
: Sent: Tuesday, May 01, 2007 7:42 PM
: To: solr-dev@lucene.apache.org
: Subject: [jira] Updated: (SOLR-217) schema option to ignore unused
: fields
:
:
:      [
: https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
: lugin.system.issuetabpanels:all-tabpanel ]
:
: Hoss Man updated SOLR-217:
: --------------------------
:
:     Attachment: ignoreUnnamedFields_v3.patch
:
: added a simple test to the existing patch.
:
: one thing to note is that this will result in the field being "ignored"
: if you try to query on it as well ... but this is a more general problem
: of qhat to do when people try to query on an unindexed field (see
: SOLR-223)
:
: will commit in a day or so barring objections
:
: > schema option to ignore unused fields
: > -------------------------------------
: >
: >                 Key: SOLR-217
: >                 URL: https://issues.apache.org/jira/browse/SOLR-217
: >             Project: Solr
: >          Issue Type: Improvement
: >          Components: update
: >    Affects Versions: 1.2
: >            Reporter: Will Johnson
: >            Priority: Minor
: >             Fix For: 1.2
: >
: >         Attachments: ignoreNonIndexedNonStoredField.patch,
: ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch,
: ignoreUnnamedFields_v3.patch
: >
: >
: > One thing that causes problems for me (and i assume others) is that
: Solr is schema-strict in that unknown fields cause solr to throw
: exceptions and there is no way to relax this constraint.  this can cause
: all sorts of serious problems if you have automated feeding applications
: that do things like SELECT * FROM table1 or where you want to add other
: fields to the document for processing purposes before sending them to
: solr but don't want to deal with 'cleanup'
:
: --
: This message is automatically generated by JIRA.
: -
: You can reply to this email to add a comment to the issue online.
:



-Hoss


RE: [jira] Updated: (SOLR-217) schema option to ignore unused fields

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
Any update on this?  I'm one little * away from having a clean
build/test.

- will

-----Original Message-----
From: Hoss Man (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, May 01, 2007 7:42 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Updated: (SOLR-217) schema option to ignore unused
fields


     [
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
lugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-217:
--------------------------

    Attachment: ignoreUnnamedFields_v3.patch

added a simple test to the existing patch.

one thing to note is that this will result in the field being "ignored"
if you try to query on it as well ... but this is a more general problem
of qhat to do when people try to query on an unindexed field (see
SOLR-223)

will commit in a day or so barring objections

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch,
ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch,
ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that
Solr is schema-strict in that unknown fields cause solr to throw
exceptions and there is no way to relax this constraint.  this can cause
all sorts of serious problems if you have automated feeding applications
that do things like SELECT * FROM table1 or where you want to add other
fields to the document for processing purposes before sending them to
solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
So are you proposing that the DocumentBuilder check those properties on
the field before it adds the field or do we need to add checks
everywhere else to make sure nothing happens?  

I'm happy to make either change and resubmit the patch.

- will

-----Original Message-----
From: Erik Hatcher (JIRA) [mailto:jira@apache.org] 
Sent: Friday, April 27, 2007 12:11 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-217) schema option to ignore unused
fields


    [
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12492332 ] 

Erik Hatcher commented on SOLR-217:
-----------------------------------

I like Yonik's suggestion of allowing unstored+unindexed fields to be
no-op.

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that
Solr is schema-strict in that unknown fields cause solr to throw
exceptions and there is no way to relax this constraint.  this can cause
all sorts of serious problems if you have automated feeding applications
that do things like SELECT * FROM table1 or where you want to add other
fields to the document for processing purposes before sending them to
solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by Will Johnson <wj...@GETCONNECTED.COM>.
I agree, the default schema should preserve the strictness of the
existing core as it's already helped me figure out more than a few
problems.  Having the documented option to bypass that error is also
nice.  

Fyi:  the second patch does include a log.finest() message about
ignoring the field.  I wasn't sure what level would be appropriate but
that was the same level used in the rest of the class.

- will

-----Original Message-----
From: J.J. Larrea (JIRA) [mailto:jira@apache.org] 
Sent: Friday, April 27, 2007 2:54 PM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-217) schema option to ignore unused
fields


    [
https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12492369 ] 

J.J. Larrea commented on SOLR-217:
----------------------------------

+1 to Hoss' elaboration of Yonik's suggested approach, except for
reverse-compatibility (where we DO want an error for unknown fields)
schema.xml should probably read something like:

   <!-- since fields of this type are by default not stored or indexed,
any data added to
         them will be ignored outright
     -->
   <fieldtype name="ignored" stored="false" indexed="false"
class="solr.StrField" />
   ...
   <!-- uncomment the following to ignore any fields that don't already
match an existing
          field name or dynamic field, rather than reporting them as an
error.
          alternately, change the type="ignored" to some other type e.g.
"text" if you want
          unknown fields indexed and/or stored by default -->
   <!--dynamicField name="*" type="ignored" /-->


> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch,
ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that
Solr is schema-strict in that unknown fields cause solr to throw
exceptions and there is no way to relax this constraint.  this can cause
all sorts of serious problems if you have automated feeding applications
that do things like SELECT * FROM table1 or where you want to add other
fields to the document for processing purposes before sending them to
solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-217) schema option to ignore unused fields

Posted by "Will Johnson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Johnson updated SOLR-217:
------------------------------

    Attachment: ignoreUnnamedFields_v3.patch

v3 patch included.  this version of the patch also takes into account the suggested example/solr/conf/schema.xml changes.  

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "Will Johnson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492326 ] 

Will Johnson commented on SOLR-217:
-----------------------------------

i was actually taking this requirement from the other enterprise search
engines that i've worked with that do this by default.  ie, solr is
different in this case.  your *->nothing method sounds good as well but it
doesn't seem as obvious to someone reading the schema or trying to feed
data.  you might also run into problems later on when there are other
properties for 'things to do' for fields other than indexing and searching.

- will




> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-217) schema option to ignore unused fields

Posted by "Will Johnson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Johnson updated SOLR-217:
------------------------------

    Attachment: ignoreUnnamedFields.patch

the attached patch solve this problme by adding a new option to schema.xml to allow unnamed fields including those that don't match dynamic fields to be ignored.  the default is false if the attribute is missing which is consistent with existing SOLR functionality.  if you want to enable this feature the schema.xml would look like:

....  blah blah blah ...
<fields ignoreUnnamedFields="true">
....  blah blah blah ...

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Assigned: (SOLR-217) schema option to ignore unused fields

Posted by Ryan McKinley <ry...@gmail.com>.
Chris Hostetter wrote:
> as far as i'm concerned it's ready to commit ... but i've been sick, and i
> i have moral objections to using my commiter bit while:
>   1) sleep deprived
>   2) intoxicated
>   3) medicated
>   4) suffering from headaches
> 
> ..since i've been 2 for 4 the past few days, i've been holding off.
> 

I think we should add this to:
http://wiki.apache.org/solr/CommitPolicy

:)






Re: [jira] Assigned: (SOLR-217) schema option to ignore unused fields

Posted by Chris Hostetter <ho...@fucit.org>.
as far as i'm concerned it's ready to commit ... but i've been sick, and i
i have moral objections to using my commiter bit while:
  1) sleep deprived
  2) intoxicated
  3) medicated
  4) suffering from headaches

..since i've been 2 for 4 the past few days, i've been holding off.

: Date: Tue, 1 May 2007 16:42:15 -0700 (PDT)
: From: "Hoss Man (JIRA)" <ji...@apache.org>
: Reply-To: solr-dev@lucene.apache.org
: To: solr-dev@lucene.apache.org
: Subject: [jira] Assigned: (SOLR-217) schema option to ignore unused fields
:
:
:      [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
:
: Hoss Man reassigned SOLR-217:
: -----------------------------
:
:     Assignee: Hoss Man
:
: > schema option to ignore unused fields
: > -------------------------------------
: >
: >                 Key: SOLR-217
: >                 URL: https://issues.apache.org/jira/browse/SOLR-217
: >             Project: Solr
: >          Issue Type: Improvement
: >          Components: update
: >    Affects Versions: 1.2
: >            Reporter: Will Johnson
: >         Assigned To: Hoss Man
: >            Priority: Minor
: >             Fix For: 1.2
: >
: >         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch, ignoreUnnamedFields_v3.patch
: >
: >
: > One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'
:
: --
: This message is automatically generated by JIRA.
: -
: You can reply to this email to add a comment to the issue online.
:



-Hoss


[jira] Assigned: (SOLR-217) schema option to ignore unused fields

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man reassigned SOLR-217:
-----------------------------

    Assignee: Hoss Man

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>         Assigned To: Hoss Man
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch, ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-217) schema option to ignore unused fields

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-217:
--------------------------

    Attachment: ignoreUnnamedFields_v3.patch

added a simple test to the existing patch.

one thing to note is that this will result in the field being "ignored" if you try to query on it as well ... but this is a more general problem of qhat to do when people try to query on an unindexed field (see SOLR-223)

will commit in a day or so barring objections

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch, ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "Will Johnson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492664 ] 

Will Johnson commented on SOLR-217:
-----------------------------------

since we now have required fields (http://issues.apache.org/jira/browse/SOLR-181) any chance we can have ignored fields as well?  let me know if something else needs to be done to the patch but as far as i can tell the code works and people seem to agree that it's the correct approach.

- will

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-217) schema option to ignore unused fields

Posted by "Will Johnson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Johnson updated SOLR-217:
------------------------------

    Attachment: ignoreNonIndexedNonStoredField.patch

I like that solution and I can definitely see the advantages of having
dumb_*=ignored and so on.  How does this patch sound instead of the
previous:


public Field createField(SchemaField field, String externalVal, float
boost) {
    String val;
    try {
      val = toInternal(externalVal);
    } catch (NumberFormatException e) {
      throw new SolrException(500, "Error while creating field '" +
field + "' from value '" + externalVal + "'", e, false);
    }
    if (val==null) return null;
    if (!field.indexed() && !field.stored()) {
        log.finest("Ignoring unindexed/unstored field: " + field);
        return null;
    }

    ... blah blah blah....


- will






> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492332 ] 

Erik Hatcher commented on SOLR-217:
-----------------------------------

I like Yonik's suggestion of allowing unstored+unindexed fields to be no-op.

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492665 ] 

Yonik Seeley commented on SOLR-217:
-----------------------------------

Will, could you please add the last patch again, and click "Grant License to ASF"?



> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492357 ] 

Hoss Man commented on SOLR-217:
-------------------------------

whatever mechanism we may add for supporting something like this, the default if unspecified should definitely be an error ... if Solr is asked to index data it doesn't know what to do with it should complain, rather then silently ignoring it ... this will help people with typos in their schema or indexing code find their problems faster.

As for the proposed solutions: my initial reaction to reading the comments so far was to agree with Will: having an explicit true/false option makes it much cleraer to people reading the schema what's going on ... but in thinking about the possible use cases I prefer yonik's approach: leveraging the existing field/dynamcField syntax will allow people to not only say "any unknown field should be ignored" but also "field XXXX should be ignored" and "any unknown field that starts with S_* should be ignored"

(there's also the question as to hwat should happen if i did have a stored="true" dynamicField of "*" and i set ignoreUnnamedFields="true")


For the example config, we might want to do something like this to make it more obvious what's going on, and to serve as a recommended config style...

   <!-- since fields of this type are by default not stored or indexed, any data added to 
         them will be ignored outright
     -->
   <fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />
   ...
   <!-- ignore any fields that don't already match an existing field name or dynamic field -->
   <dynamicField name="*" type="ignored" />



> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "J.J. Larrea (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492369 ] 

J.J. Larrea commented on SOLR-217:
----------------------------------

+1 to Hoss' elaboration of Yonik's suggested approach, except for reverse-compatibility (where we DO want an error for unknown fields) schema.xml should probably read something like:

   <!-- since fields of this type are by default not stored or indexed, any data added to
         them will be ignored outright
     -->
   <fieldtype name="ignored" stored="false" indexed="false" class="solr.StrField" />
   ...
   <!-- uncomment the following to ignore any fields that don't already match an existing
          field name or dynamic field, rather than reporting them as an error.
          alternately, change the type="ignored" to some other type e.g. "text" if you want
          unknown fields indexed and/or stored by default -->
   <!--dynamicField name="*" type="ignored" /-->


> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-217) schema option to ignore unused fields

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492324 ] 

Yonik Seeley commented on SOLR-217:
-----------------------------------

This is a unique enough of a requirement, I'm not sure an additional configuration switch is warranted.

However, another solution might be to allow fields to be unstored *and* unindexed (essentially doing nothing).  That would allow you to map a dynamic field of "*" to an unstored + unindexed field.
It would also allow people to transition schemas + older clients.  They could change the old field to unstored + unindexed and use a copyField to move it to the new field.


> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreUnnamedFields.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-217) schema option to ignore unused fields

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man resolved SOLR-217.
---------------------------

    Resolution: Fixed

commited r536278

> schema option to ignore unused fields
> -------------------------------------
>
>                 Key: SOLR-217
>                 URL: https://issues.apache.org/jira/browse/SOLR-217
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.2
>            Reporter: Will Johnson
>         Assigned To: Hoss Man
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: ignoreNonIndexedNonStoredField.patch, ignoreUnnamedFields.patch, ignoreUnnamedFields_v3.patch, ignoreUnnamedFields_v3.patch
>
>
> One thing that causes problems for me (and i assume others) is that Solr is schema-strict in that unknown fields cause solr to throw exceptions and there is no way to relax this constraint.  this can cause all sorts of serious problems if you have automated feeding applications that do things like SELECT * FROM table1 or where you want to add other fields to the document for processing purposes before sending them to solr but don't want to deal with 'cleanup'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.