You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2011/12/14 19:39:54 UTC
ExternalFileFields and FloatType
All:
Currently, there's this code in ExternalFileField.init
String ftypeS = getArg("valType", args);
if (ftypeS!=null) {
ftype = schema.getFieldTypes().get(ftypeS);
if (ftype==null || !(ftype instanceof FloatField)) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Only float (FloatField) is currently supported as external field
type. got '" + ftypeS + "'");
}
}
Defining an EFF with valType="pfloat" works, but pfloat is old, really
old. Solr throws an error on "float", which is a TrieFloat type. But
now that Trie fields support sortMissingFirst/Last, does it make sense
to allow the "float" type too? I'm guessing here that
sortMissingFirst/Last is why EFFs are restricted this way, but I
haven't really dug into it.
So as a proof-of-concept, I hacked this in to see if it would work and
"it seems to work just fine" (tm) but I didn't want to look any deeper
until I ran it by folks.
if (ftype==null ||
(! (ftype instanceof TrieFloatField) && !(ftype instanceof
FloatField))) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Only float (FloatField) is currently supported as external field
type. got '" + ftypeS + "'");
}
Should I raise a JIRA? And anybody got any cautions about this?
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: ExternalFileFields and FloatType
Posted by Chris Hostetter <ho...@fucit.org>.
: valType is NOT optional at all, at least in the 3x code line.
: You get errors like this on startup if you leave it out:
:
: Dec 14, 2011 2:07:48 PM org.apache.solr.common.SolrException log
: SEVERE: org.apache.solr.common.SolrException: Missing parameter
: 'valType' for FieldType=eff_float{keyField=id, defVal=1}
: at org.apache.solr.schema.FieldType.getArg(FieldType.java:121)
ah, interesting .. it looks like it was intended to be optional, but
"getArg" fails if it's not specified (note that ExternalFileField
explicitly checks if valType is not null, but doesn't error if it is.
: So I think I now propose to remove that check altogether. The code should
: still work whether the valType is specified or not and make note that it
: ignored for now. This is really no change in behavior, just documenting what
: happens currently.
:
: I'm assuming that we should still work with schemas that define it rather
: than break someone's schema if it is there by, for instance, using a 3.5
: schema file with valType in an EFF field with new code and the proposed
: change.
exactly. I would just change that valType parsing to stop using the
getArg method (and pull it from the params directly) and add a comment
about this conversation and the hypothetical use for valType in the
future.
I guess the main take away is that if someone specifies valType and it
does *not* point to a FloatField, we should still probably error (don't
want them assuming they can use ints or doubles or longs or dates or
strings etc...) but in all our examples / docs we should just stop
bothering to advertise valType.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: ExternalFileFields and FloatType
Posted by Chris Hostetter <ho...@fucit.org>.
: But that's just it. There's no way for the EFF to point to any underlying type
: at all! Still, I can easily be persuaded that it'd be a bad thing to
I'm talking about configuration.
* this should be valid, because it has been valid in the past and we don't
want to break existing configs (and it doesn't hurt anything to let people
be explicit that they expect a 'float' style ValueSource)...
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
<fieldType name="file" keyField="id" defVal="1"
class="solr.ExternalFileField" valType="float"/>
* This should *not* be valid, because it has not been valid in the past
and implies it will do something special that it will not really do...
<fieldType name="long" class="solr.TrieLongField" omitNorms="true"/>
<fieldType name="file" keyField="id" defVal="1"
class="solr.ExternalFileField" valType="long"/>
* This should be valid (yes that means we need to change the init code)
and should be what we change all of the examples to be, since valType is a
No-Op anyway so let's stop advertising it...
<fieldType name="file" keyField="id" defVal="1"
class="solr.ExternalFileField" />
: make valType optional and document it as a no-op currently. We'll still error if
: they make it anything except FloatField or TrieFloatField.
...or just stop documenting valType entirely. No reason to document it
until it actually does something useful.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: ExternalFileFields and FloatType
Posted by Erick Erickson <er...@gmail.com>.
"I guess the main take away is that if someone specifies valType and it
does *not* point to a FloatField, we should still probably error"
But that's just it. There's no way for the EFF to point to any underlying type
at all! Still, I can easily be persuaded that it'd be a bad thing to
allow, say, a
"string" type (which I just tested and it works just fine BTW) and later
pull the rug out from under someone if we get more strict.
I think I understand how this came to pass. There are tests (see schema11.xml
and TestFunctionQuery) that should have tripped this trigger. But schema11.xml
has
<fieldType name="float" class="solr.FloatField" omitNorms="true"/>
where more recent schemas have
<fieldType name="float" class="solr.TrieFloatField" omitNorms="true"/>
And since TrieFloatField isn't derived from FloatField, things go boom.
So, the take-away here is that my original hack is actually kind of
OK, but we need to
make valType optional and document it as a no-op currently. We'll still error if
they make it anything except FloatField or TrieFloatField.
I'll open a JIRA real soon now.
On Wed, Dec 14, 2011 at 2:54 PM, Erick Erickson <er...@gmail.com> wrote:
> valType is NOT optional at all, at least in the 3x code line.
> You get errors like this on startup if you leave it out:
>
> Dec 14, 2011 2:07:48 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Missing parameter
> 'valType' for FieldType=eff_float{keyField=id, defVal=1}
> at org.apache.solr.schema.FieldType.getArg(FieldType.java:121)
>
>
> and Solr goes downhill from there.
>
> And now I see why I prompt with questions before diving in *too* far.
> Your comments made me look at the rest of the class (I mean it's
> another 25 lines down in the file, after all....) and there's this little
> gem of a method:
>
> @Override
> public ValueSource getValueSource(SchemaField field, QParser parser) {
> // default key field to unique key
> SchemaField keyField = keyFieldName==null ?
> schema.getUniqueKeyField() : schema.getField(keyFieldName);
> return new FileFloatSource(field, keyField, defVal, parser);
> }
>
> So the valType is, as you say, completely irrelevant because a FileFloatSource
> is returned no matter what.
>
> So I think I now propose to remove that check altogether. The code should
> still work whether the valType is specified or not and make note that it
> ignored for now. This is really no change in behavior, just documenting what
> happens currently.
>
> I'm assuming that we should still work with schemas that define it rather
> than break someone's schema if it is there by, for instance, using a 3.5
> schema file with valType in an EFF field with new code and the proposed
> change.
>
>
> On Wed, Dec 14, 2011 at 1:59 PM, Chris Hostetter
> <ho...@fucit.org> wrote:
>>
>> : if (ftype==null || !(ftype instanceof FloatField)) {
>> : throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
>> : "Only float (FloatField) is currently supported as external field
>> : type. got '" + ftypeS + "'");
>> ...
>> : now that Trie fields support sortMissingFirst/Last, does it make sense
>> : to allow the "float" type too? I'm guessing here that
>> : sortMissingFirst/Last is why EFFs are restricted this way, but I
>> : haven't really dug into it.
>>
>> I don't think i've ever looked closely at the internals of this field type
>> before, but I'm not sure that it really matters what field types are
>> allowed there -- it's not used for anything as far as i can tell. I can't
>> see any reason why the "valType" param exists at all (let alone the
>> insistence that it refer to a FloatField) the ExternalFileField completely
>> ignores the ftype once it has it.
>>
>> (The fact that valType is totally optional suggests that the limitation
>> isn't really that big of a deal anyway)
>>
>> I suspect the intent was that ExternalFileField *should* ultimatley be
>> able to return any "type" of ValueSource, and that the specified valType
>> would then dictate the ValueSource impl returned (so that if you used an
>> "IntField" you would get a ValueSource that "natively" dealt with int
>> values from the file, but if you used "DoubleField" it would natively
>> deal with double values.
>>
>> changing that validation code to also accept TrieFloatField
>> doesn't seem like it would practically make any different w/o also
>> changing a lot about how the actaul value source is returned -- either
>> by adding more File_____Source classes or my generalizing FileFloatSource.
>> barring actual changes to those classes, a better way to deal with the "no
>> one uses FloatField anymore so this limitation is confusing" problem is to
>> just stop promoting/suggestion people use the valType at all (until the
>> day comes when it's actually useful for something) and focus on
>> documenting the fact that ExternalFileFields reads and returns
>> values of type "float" from the specified file.
>>
>>
>>
>> -Hoss
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: ExternalFileFields and FloatType
Posted by Erick Erickson <er...@gmail.com>.
valType is NOT optional at all, at least in the 3x code line.
You get errors like this on startup if you leave it out:
Dec 14, 2011 2:07:48 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Missing parameter
'valType' for FieldType=eff_float{keyField=id, defVal=1}
at org.apache.solr.schema.FieldType.getArg(FieldType.java:121)
and Solr goes downhill from there.
And now I see why I prompt with questions before diving in *too* far.
Your comments made me look at the rest of the class (I mean it's
another 25 lines down in the file, after all....) and there's this little
gem of a method:
@Override
public ValueSource getValueSource(SchemaField field, QParser parser) {
// default key field to unique key
SchemaField keyField = keyFieldName==null ?
schema.getUniqueKeyField() : schema.getField(keyFieldName);
return new FileFloatSource(field, keyField, defVal, parser);
}
So the valType is, as you say, completely irrelevant because a FileFloatSource
is returned no matter what.
So I think I now propose to remove that check altogether. The code should
still work whether the valType is specified or not and make note that it
ignored for now. This is really no change in behavior, just documenting what
happens currently.
I'm assuming that we should still work with schemas that define it rather
than break someone's schema if it is there by, for instance, using a 3.5
schema file with valType in an EFF field with new code and the proposed
change.
On Wed, Dec 14, 2011 at 1:59 PM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : if (ftype==null || !(ftype instanceof FloatField)) {
> : throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> : "Only float (FloatField) is currently supported as external field
> : type. got '" + ftypeS + "'");
> ...
> : now that Trie fields support sortMissingFirst/Last, does it make sense
> : to allow the "float" type too? I'm guessing here that
> : sortMissingFirst/Last is why EFFs are restricted this way, but I
> : haven't really dug into it.
>
> I don't think i've ever looked closely at the internals of this field type
> before, but I'm not sure that it really matters what field types are
> allowed there -- it's not used for anything as far as i can tell. I can't
> see any reason why the "valType" param exists at all (let alone the
> insistence that it refer to a FloatField) the ExternalFileField completely
> ignores the ftype once it has it.
>
> (The fact that valType is totally optional suggests that the limitation
> isn't really that big of a deal anyway)
>
> I suspect the intent was that ExternalFileField *should* ultimatley be
> able to return any "type" of ValueSource, and that the specified valType
> would then dictate the ValueSource impl returned (so that if you used an
> "IntField" you would get a ValueSource that "natively" dealt with int
> values from the file, but if you used "DoubleField" it would natively
> deal with double values.
>
> changing that validation code to also accept TrieFloatField
> doesn't seem like it would practically make any different w/o also
> changing a lot about how the actaul value source is returned -- either
> by adding more File_____Source classes or my generalizing FileFloatSource.
> barring actual changes to those classes, a better way to deal with the "no
> one uses FloatField anymore so this limitation is confusing" problem is to
> just stop promoting/suggestion people use the valType at all (until the
> day comes when it's actually useful for something) and focus on
> documenting the fact that ExternalFileFields reads and returns
> values of type "float" from the specified file.
>
>
>
> -Hoss
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: ExternalFileFields and FloatType
Posted by Chris Hostetter <ho...@fucit.org>.
: if (ftype==null || !(ftype instanceof FloatField)) {
: throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
: "Only float (FloatField) is currently supported as external field
: type. got '" + ftypeS + "'");
...
: now that Trie fields support sortMissingFirst/Last, does it make sense
: to allow the "float" type too? I'm guessing here that
: sortMissingFirst/Last is why EFFs are restricted this way, but I
: haven't really dug into it.
I don't think i've ever looked closely at the internals of this field type
before, but I'm not sure that it really matters what field types are
allowed there -- it's not used for anything as far as i can tell. I can't
see any reason why the "valType" param exists at all (let alone the
insistence that it refer to a FloatField) the ExternalFileField completely
ignores the ftype once it has it.
(The fact that valType is totally optional suggests that the limitation
isn't really that big of a deal anyway)
I suspect the intent was that ExternalFileField *should* ultimatley be
able to return any "type" of ValueSource, and that the specified valType
would then dictate the ValueSource impl returned (so that if you used an
"IntField" you would get a ValueSource that "natively" dealt with int
values from the file, but if you used "DoubleField" it would natively
deal with double values.
changing that validation code to also accept TrieFloatField
doesn't seem like it would practically make any different w/o also
changing a lot about how the actaul value source is returned -- either
by adding more File_____Source classes or my generalizing FileFloatSource.
barring actual changes to those classes, a better way to deal with the "no
one uses FloatField anymore so this limitation is confusing" problem is to
just stop promoting/suggestion people use the valType at all (until the
day comes when it's actually useful for something) and focus on
documenting the fact that ExternalFileFields reads and returns
values of type "float" from the specified file.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org