You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2011/12/14 19:39:54 UTC

ExternalFileFields and FloatType

All:

Currently, there's this code in ExternalFileField.init

    String ftypeS = getArg("valType", args);
    if (ftypeS!=null) {
      ftype = schema.getFieldTypes().get(ftypeS);
      if (ftype==null || !(ftype instanceof FloatField)) {
        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Only float (FloatField) is currently supported as external field
type.  got '" + ftypeS + "'");
      }
    }


Defining an EFF with valType="pfloat" works, but pfloat is old, really
old. Solr throws an error on "float", which is a TrieFloat type. But
now that Trie fields support sortMissingFirst/Last, does it make sense
to allow the "float" type too? I'm guessing here that
sortMissingFirst/Last is why EFFs are restricted this way, but I
haven't really dug into it.

So as a proof-of-concept, I hacked this in to see if it would work and
"it seems to work just fine" (tm) but I didn't want to look any deeper
until I ran it by folks.

      if (ftype==null ||
          (! (ftype instanceof TrieFloatField) && !(ftype instanceof
FloatField))) {
        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Only float (FloatField) is currently supported as external field
type.  got '" + ftypeS + "'");
      }

Should I raise a JIRA? And anybody got any cautions about this?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: ExternalFileFields and FloatType

Posted by Chris Hostetter <ho...@fucit.org>.
: valType is NOT optional at all, at least in the 3x code line.
: You get errors like this on startup if you leave it out:
: 
: Dec 14, 2011 2:07:48 PM org.apache.solr.common.SolrException log
: SEVERE: org.apache.solr.common.SolrException: Missing parameter
: 'valType' for FieldType=eff_float{keyField=id, defVal=1}
: 	at org.apache.solr.schema.FieldType.getArg(FieldType.java:121)

ah, interesting .. it looks like it was intended to be optional, but 
"getArg" fails if it's not specified (note that ExternalFileField 
explicitly checks if valType is not null, but doesn't error if it is.

: So I think I now propose to remove that check altogether. The code should
: still work whether the valType is specified or not  and make note that it
: ignored for now. This is really no change in behavior, just documenting what
: happens currently.
: 
: I'm assuming that we should still work with schemas that define it rather
: than break someone's schema if it is there by, for instance, using a 3.5
: schema file with valType in an EFF field with new code and the proposed
: change.

exactly.  I would just change that valType parsing to stop using the 
getArg method (and pull it from the params directly) and add a comment 
about this conversation and the hypothetical use for valType in the 
future.

I guess the main take away is that if someone specifies valType and it 
does *not* point to a FloatField, we should still probably error (don't 
want them assuming they can use ints or doubles or longs or dates or 
strings etc...) but in all our examples / docs we should just stop 
bothering to advertise valType.



-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: ExternalFileFields and FloatType

Posted by Chris Hostetter <ho...@fucit.org>.
: But that's just it. There's no way for the EFF to point to any underlying type
: at all! Still, I can easily be persuaded that it'd be a bad thing to

I'm talking about configuration.

* this should be valid, because it has been valid in the past and we don't 
want to break existing configs (and it doesn't hurt anything to let people 
be explicit that they expect a 'float' style ValueSource)...

  <fieldType name="float" class="solr.FloatField" omitNorms="true"/>
  <fieldType name="file" keyField="id" defVal="1" 
             class="solr.ExternalFileField" valType="float"/>

* This should *not* be valid, because it has not been valid in the past 
and implies it will do something special that it will not really do...

  <fieldType name="long" class="solr.TrieLongField" omitNorms="true"/>
  <fieldType name="file" keyField="id" defVal="1"
             class="solr.ExternalFileField" valType="long"/>

* This should be valid (yes that means we need to change the init code) 
and should be what we change all of the examples to be, since valType is a 
No-Op anyway so let's stop advertising it...

  <fieldType name="file" keyField="id" defVal="1" 
             class="solr.ExternalFileField" />

: make valType optional and document it as a no-op currently. We'll still error if
: they make it anything except FloatField or TrieFloatField.

...or just stop documenting valType entirely.  No reason to document it 
until it actually does something useful.



-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: ExternalFileFields and FloatType

Posted by Erick Erickson <er...@gmail.com>.
"I guess the main take away is that if someone specifies valType and it
does *not* point to a FloatField, we should still probably error"

But that's just it. There's no way for the EFF to point to any underlying type
at all! Still, I can easily be persuaded that it'd be a bad thing to
allow, say, a
"string" type (which I just tested and it works just fine BTW) and later
pull the rug out from under someone if we get more strict.

I think I understand how this came to pass. There are tests (see schema11.xml
and TestFunctionQuery) that should have tripped this trigger. But schema11.xml
has
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
where more recent schemas have
<fieldType name="float" class="solr.TrieFloatField" omitNorms="true"/>

And since TrieFloatField isn't derived from FloatField, things go boom.

So, the take-away here is that my original hack is actually kind of
OK, but we need to
make valType optional and document it as a no-op currently. We'll still error if
they make it anything except FloatField or TrieFloatField.

I'll open a JIRA real soon now.


On Wed, Dec 14, 2011 at 2:54 PM, Erick Erickson <er...@gmail.com> wrote:
> valType is NOT optional at all, at least in the 3x code line.
> You get errors like this on startup if you leave it out:
>
> Dec 14, 2011 2:07:48 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Missing parameter
> 'valType' for FieldType=eff_float{keyField=id, defVal=1}
>        at org.apache.solr.schema.FieldType.getArg(FieldType.java:121)
>
>
> and Solr goes downhill from there.
>
> And now I see why I prompt with questions before diving in *too* far.
> Your comments made me look at the rest of the class (I mean it's
> another 25 lines down in the file, after all....) and there's this little
> gem of a method:
>
>  @Override
>  public ValueSource getValueSource(SchemaField field, QParser parser) {
>    // default key field to unique key
>    SchemaField keyField = keyFieldName==null ?
> schema.getUniqueKeyField() : schema.getField(keyFieldName);
>    return new FileFloatSource(field, keyField, defVal, parser);
>  }
>
> So the valType is, as you say, completely irrelevant because a FileFloatSource
> is returned no matter what.
>
> So I think I now propose to remove that check altogether. The code should
> still work whether the valType is specified or not  and make note that it
> ignored for now. This is really no change in behavior, just documenting what
> happens currently.
>
> I'm assuming that we should still work with schemas that define it rather
> than break someone's schema if it is there by, for instance, using a 3.5
> schema file with valType in an EFF field with new code and the proposed
> change.
>
>
> On Wed, Dec 14, 2011 at 1:59 PM, Chris Hostetter
> <ho...@fucit.org> wrote:
>>
>> :       if (ftype==null || !(ftype instanceof FloatField)) {
>> :         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>> : "Only float (FloatField) is currently supported as external field
>> : type.  got '" + ftypeS + "'");
>>        ...
>> : now that Trie fields support sortMissingFirst/Last, does it make sense
>> : to allow the "float" type too? I'm guessing here that
>> : sortMissingFirst/Last is why EFFs are restricted this way, but I
>> : haven't really dug into it.
>>
>> I don't think i've ever looked closely at the internals of this field type
>> before, but I'm not sure that it really matters what field types are
>> allowed there -- it's not used for anything as far as i can tell.  I can't
>> see any reason why the "valType" param exists at all (let alone the
>> insistence that it refer to a FloatField) the ExternalFileField completely
>> ignores the ftype once it has it.
>>
>> (The fact that valType is totally optional suggests that the limitation
>> isn't really that big of a deal anyway)
>>
>> I suspect the intent was that ExternalFileField *should* ultimatley be
>> able to return any "type" of ValueSource, and that the specified valType
>> would then dictate the ValueSource impl returned (so that if you used an
>> "IntField" you would get a ValueSource that "natively" dealt with int
>> values from the file, but if you used "DoubleField" it would natively
>> deal with double values.
>>
>> changing that validation code to also accept TrieFloatField
>> doesn't seem like it would practically make any different w/o also
>> changing a lot about how the actaul value source is returned -- either
>> by adding more File_____Source classes or my generalizing FileFloatSource.
>> barring actual changes to those classes, a better way to deal with the "no
>> one uses FloatField anymore so this limitation is confusing" problem is to
>> just stop promoting/suggestion people use the valType at all (until the
>> day comes when it's actually useful for something) and focus on
>> documenting the fact that ExternalFileFields reads and returns
>> values of type "float" from the specified file.
>>
>>
>>
>> -Hoss
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: ExternalFileFields and FloatType

Posted by Erick Erickson <er...@gmail.com>.
valType is NOT optional at all, at least in the 3x code line.
You get errors like this on startup if you leave it out:

Dec 14, 2011 2:07:48 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Missing parameter
'valType' for FieldType=eff_float{keyField=id, defVal=1}
	at org.apache.solr.schema.FieldType.getArg(FieldType.java:121)


and Solr goes downhill from there.

And now I see why I prompt with questions before diving in *too* far.
Your comments made me look at the rest of the class (I mean it's
another 25 lines down in the file, after all....) and there's this little
gem of a method:

  @Override
  public ValueSource getValueSource(SchemaField field, QParser parser) {
    // default key field to unique key
    SchemaField keyField = keyFieldName==null ?
schema.getUniqueKeyField() : schema.getField(keyFieldName);
    return new FileFloatSource(field, keyField, defVal, parser);
  }

So the valType is, as you say, completely irrelevant because a FileFloatSource
is returned no matter what.

So I think I now propose to remove that check altogether. The code should
still work whether the valType is specified or not  and make note that it
ignored for now. This is really no change in behavior, just documenting what
happens currently.

I'm assuming that we should still work with schemas that define it rather
than break someone's schema if it is there by, for instance, using a 3.5
schema file with valType in an EFF field with new code and the proposed
change.


On Wed, Dec 14, 2011 at 1:59 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> :       if (ftype==null || !(ftype instanceof FloatField)) {
> :         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> : "Only float (FloatField) is currently supported as external field
> : type.  got '" + ftypeS + "'");
>        ...
> : now that Trie fields support sortMissingFirst/Last, does it make sense
> : to allow the "float" type too? I'm guessing here that
> : sortMissingFirst/Last is why EFFs are restricted this way, but I
> : haven't really dug into it.
>
> I don't think i've ever looked closely at the internals of this field type
> before, but I'm not sure that it really matters what field types are
> allowed there -- it's not used for anything as far as i can tell.  I can't
> see any reason why the "valType" param exists at all (let alone the
> insistence that it refer to a FloatField) the ExternalFileField completely
> ignores the ftype once it has it.
>
> (The fact that valType is totally optional suggests that the limitation
> isn't really that big of a deal anyway)
>
> I suspect the intent was that ExternalFileField *should* ultimatley be
> able to return any "type" of ValueSource, and that the specified valType
> would then dictate the ValueSource impl returned (so that if you used an
> "IntField" you would get a ValueSource that "natively" dealt with int
> values from the file, but if you used "DoubleField" it would natively
> deal with double values.
>
> changing that validation code to also accept TrieFloatField
> doesn't seem like it would practically make any different w/o also
> changing a lot about how the actaul value source is returned -- either
> by adding more File_____Source classes or my generalizing FileFloatSource.
> barring actual changes to those classes, a better way to deal with the "no
> one uses FloatField anymore so this limitation is confusing" problem is to
> just stop promoting/suggestion people use the valType at all (until the
> day comes when it's actually useful for something) and focus on
> documenting the fact that ExternalFileFields reads and returns
> values of type "float" from the specified file.
>
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: ExternalFileFields and FloatType

Posted by Chris Hostetter <ho...@fucit.org>.
:       if (ftype==null || !(ftype instanceof FloatField)) {
:         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
: "Only float (FloatField) is currently supported as external field
: type.  got '" + ftypeS + "'");
	...
: now that Trie fields support sortMissingFirst/Last, does it make sense
: to allow the "float" type too? I'm guessing here that
: sortMissingFirst/Last is why EFFs are restricted this way, but I
: haven't really dug into it.

I don't think i've ever looked closely at the internals of this field type 
before, but I'm not sure that it really matters what field types are 
allowed there -- it's not used for anything as far as i can tell.  I can't 
see any reason why the "valType" param exists at all (let alone the 
insistence that it refer to a FloatField) the ExternalFileField completely 
ignores the ftype once it has it.

(The fact that valType is totally optional suggests that the limitation 
isn't really that big of a deal anyway)

I suspect the intent was that ExternalFileField *should* ultimatley be 
able to return any "type" of ValueSource, and that the specified valType 
would then dictate the ValueSource impl returned (so that if you used an 
"IntField" you would get a ValueSource that "natively" dealt with int 
values from the file, but if you used "DoubleField" it would natively 
deal with double values.

changing that validation code to also accept TrieFloatField 
doesn't seem like it would practically make any different w/o also 
changing a lot about how the actaul value source is returned -- either 
by adding more File_____Source classes or my generalizing FileFloatSource.  
barring actual changes to those classes, a better way to deal with the "no 
one uses FloatField anymore so this limitation is confusing" problem is to 
just stop promoting/suggestion people use the valType at all (until the 
day comes when it's actually useful for something) and focus on 
documenting the fact that ExternalFileFields reads and returns 
values of type "float" from the specified file.



-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org