You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Allan <ma...@ed.ac.uk> on 2010/07/02 15:51:47 UTC
Modifications to AbstractSubTypeFieldType
Hi folks,
I've made a few small changes to the AbstractSubTypeFieldType class to
allow users to define distinct field types for each subfield. This
enables us to define complex data types in the schema.
For example, we have our own subclass of the CoordinateFieldType
called TemporalCoverage (I've spoken about this recently on the
mailing list) where we store a start and end date for an event but now
we can store a name for the event as well.
<fieldType name="temporal"
class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3"
subFieldSuffix="_ti,_ti,_s"/>
In this example, the start and end dates get stored as trie-coded
integers and the description as a string.
As usual, it's up to your own subclass to do sanity checking on the
input to ensure the right number and type of subfields are there in
the document field.
We now store documents like this:
<doc>
<field name="id">15250</field>
<field name="name">Events of the 20th Century</field>
<field name="description">Film covering a variety of important
events in the 20th Century.</field>
<field name="daterange">1914,1918, First World War</field>
<field name="daterange1">1939,1945, Second World War</field>
<field name="daterange2">1957,1969, Space Race</field>
<field name="daterange3">1990,2000, random date</field>
...
</doc>
The changes to AbstractSubTypeFieldType do not have any adverse
effects on the solr.PointType class, so I'd quite like to suggest it
gets included in the main solr source code. Where can I send a patch
for someone to evaluate or should I just attach it to the issue in
JIRA and see what happens?
https://issues.apache.org/jira/browse/SOLR-1131
Mark
PS. Is the solr-dev mailing list dead? There's nothing in the archives
since April.
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: Modifications to AbstractSubTypeFieldType
Posted by Mark Allan <ma...@ed.ac.uk>.
On 3 Jul 2010, at 1:50 am, Chris Hostetter wrote:
> : The changes to AbstractSubTypeFieldType do not have any adverse
> effects on the
> : solr.PointType class, so I'd quite like to suggest it gets
> included in the
> : main solr source code. Where can I send a patch for someone to
> evaluate or
> : should I just attach it to the issue in JIRA and see what happens?
> : https://issues.apache.org/jira/browse/SOLR-1131
>
> please open a new Jira issue.
OK, done.
https://issues.apache.org/jira/browse/SOLR-1986
> I'm not too familiar with AbstractSubTypeFieldType, but your
> improvement
> sounds pretty good to me on the surface ... i'm just wondering if we
> should have a simpler way of specifying the suffix when dimension is
> really large.
Yes, I wondered that myself but wasn't sure which way to go.
I thought about something like this:
<fieldType name="temporal"
class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3">
<subFieldSuffix>_ti</subFieldSuffix>
<subFieldSuffix>_ti</subFieldSuffix>
<subFieldSuffix>_s</subFieldSuffix>
</fieldType>
but it doesn't really seem to help much. If anything, it probably
makes it *less* readable.
Mark
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: Modifications to AbstractSubTypeFieldType
Posted by Chris Hostetter <ho...@fucit.org>.
: The changes to AbstractSubTypeFieldType do not have any adverse effects on the
: solr.PointType class, so I'd quite like to suggest it gets included in the
: main solr source code. Where can I send a patch for someone to evaluate or
: should I just attach it to the issue in JIRA and see what happens?
: https://issues.apache.org/jira/browse/SOLR-1131
please open a new Jira issue.
I'm not too familiar with AbstractSubTypeFieldType, but your improvement
sounds pretty good to me on the surface ... i'm just wondering if we
should have a simpler way of specifying the suffix when dimension is
really large.
-Hoss
Re: Modifications to AbstractSubTypeFieldType
Posted by Lance Norskog <go...@gmail.com>.
Compound types are young and will probably mutate. I will do my own
hack until things settle down.
Lance
On Mon, Jul 12, 2010 at 12:47 AM, Mark Allan <ma...@ed.ac.uk> wrote:
> On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
>>
>> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll <gs...@apache.org>
>> wrote:
>>>
>>> Originally, I had intended that it was just for one Field Sub Type,
>>> thinking that if we ever wanted multiple sub types, that a new, separate
>>> class would be needed
>>
>>
>> Right - this was my original thinking too. AbstractSubTypeFieldType
>> is only a convenience class to create compound types... people can do
>> it other ways.
>
> Just for clarification, does that mean my modifications won't be included?
> If so, can you let me know so that I can extract the changes and maintain
> them in a different package structure from the main Solr code please.
>
> Cheers
> Mark
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
--
Lance Norskog
goksron@gmail.com
Re: Modifications to AbstractSubTypeFieldType
Posted by Mark Allan <ma...@ed.ac.uk>.
On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll
> <gs...@apache.org> wrote:
>> Originally, I had intended that it was just for one Field Sub Type,
>> thinking that if we ever wanted multiple sub types, that a new,
>> separate class would be needed
>
>
> Right - this was my original thinking too. AbstractSubTypeFieldType
> is only a convenience class to create compound types... people can do
> it other ways.
Just for clarification, does that mean my modifications won't be
included? If so, can you let me know so that I can extract the
changes and maintain them in a different package structure from the
main Solr code please.
Cheers
Mark
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: Modifications to AbstractSubTypeFieldType
Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll <gs...@apache.org> wrote:
> Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed
Right - this was my original thinking too. AbstractSubTypeFieldType
is only a convenience class to create compound types... people can do
it other ways.
-Yonik
http://www.lucidimagination.com
Re: Modifications to AbstractSubTypeFieldType
Posted by Mark Allan <ma...@ed.ac.uk>.
Currently our only requirement is to be able to search on the
numerical part of the daterange field, so our field type overrides
getRangeQuery and getFieldQuery to consider only the first two
subfields. If we wanted to be able to search the name subfield as
well, I suppose we could do some checking to see whether the user's
search term included any non-numeric characters, if so, search the
name subfield instead.
I imagine, however, that it would be harder to include an ability to
search /specific/ subfields without inventing some new query syntax.
Mark
On 7 Jul 2010, at 1:15 pm, Grant Ingersoll wrote:
> This looks reasonable. I'll take a look at the patch. Originally,
> I had intended that it was just for one Field Sub Type, thinking
> that if we ever wanted multiple sub types, that a new, separate
> class would be needed, but if this proves to be clean this way, then
> I see no reason not to incorporate it. Besides, I could see
> extending PointType, etc. to be NamedPointType, for example.
>
> I'm curious, Mark, how are you searching those fields? What types
> of queries are you generating?
>
> -Grant
>
> On Jul 2, 2010, at 9:51 AM, Mark Allan wrote:
>
>> Hi folks,
>>
>> I've made a few small changes to the AbstractSubTypeFieldType class
>> to allow users to define distinct field types for each subfield.
>> This enables us to define complex data types in the schema.
>>
>> For example, we have our own subclass of the CoordinateFieldType
>> called TemporalCoverage (I've spoken about this recently on the
>> mailing list) where we store a start and end date for an event but
>> now we can store a name for the event as well.
>>
>> <fieldType name="temporal"
>> class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3"
>> subFieldSuffix="_ti,_ti,_s"/>
>>
>> In this example, the start and end dates get stored as trie-coded
>> integers and the description as a string.
>>
>> As usual, it's up to your own subclass to do sanity checking on the
>> input to ensure the right number and type of subfields are there in
>> the document field.
>>
>> We now store documents like this:
>> <doc>
>> <field name="id">15250</field>
>> <field name="name">Events of the 20th Century</field>
>> <field name="description">Film covering a variety of important
>> events in the 20th Century.</field>
>> <field name="daterange">1914,1918, First World War</field>
>> <field name="daterange1">1939,1945, Second World War</field>
>> <field name="daterange2">1957,1969, Space Race</field>
>> <field name="daterange3">1990,2000, random date</field>
>> ...
>> </doc>
>>
>> The changes to AbstractSubTypeFieldType do not have any adverse
>> effects on the solr.PointType class, so I'd quite like to suggest
>> it gets included in the main solr source code. Where can I send a
>> patch for someone to evaluate or should I just attach it to the
>> issue in JIRA and see what happens?
>> https://issues.apache.org/jira/browse/SOLR-1131
>>
>> Mark
>>
>> PS. Is the solr-dev mailing list dead? There's nothing in the
>> archives since April.
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: Modifications to AbstractSubTypeFieldType
Posted by Grant Ingersoll <gs...@apache.org>.
This looks reasonable. I'll take a look at the patch. Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed, but if this proves to be clean this way, then I see no reason not to incorporate it. Besides, I could see extending PointType, etc. to be NamedPointType, for example.
I'm curious, Mark, how are you searching those fields? What types of queries are you generating?
-Grant
On Jul 2, 2010, at 9:51 AM, Mark Allan wrote:
> Hi folks,
>
> I've made a few small changes to the AbstractSubTypeFieldType class to allow users to define distinct field types for each subfield. This enables us to define complex data types in the schema.
>
> For example, we have our own subclass of the CoordinateFieldType called TemporalCoverage (I've spoken about this recently on the mailing list) where we store a start and end date for an event but now we can store a name for the event as well.
>
> <fieldType name="temporal" class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3" subFieldSuffix="_ti,_ti,_s"/>
>
> In this example, the start and end dates get stored as trie-coded integers and the description as a string.
>
> As usual, it's up to your own subclass to do sanity checking on the input to ensure the right number and type of subfields are there in the document field.
>
> We now store documents like this:
> <doc>
> <field name="id">15250</field>
> <field name="name">Events of the 20th Century</field>
> <field name="description">Film covering a variety of important events in the 20th Century.</field>
> <field name="daterange">1914,1918, First World War</field>
> <field name="daterange1">1939,1945, Second World War</field>
> <field name="daterange2">1957,1969, Space Race</field>
> <field name="daterange3">1990,2000, random date</field>
> ...
> </doc>
>
> The changes to AbstractSubTypeFieldType do not have any adverse effects on the solr.PointType class, so I'd quite like to suggest it gets included in the main solr source code. Where can I send a patch for someone to evaluate or should I just attach it to the issue in JIRA and see what happens?
> https://issues.apache.org/jira/browse/SOLR-1131
>
> Mark
>
> PS. Is the solr-dev mailing list dead? There's nothing in the archives since April.
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
Re: Modifications to AbstractSubTypeFieldType
Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Jul 2, 2010 at 9:51 AM, Mark Allan <ma...@ed.ac.uk> wrote:
[...]
> The changes to AbstractSubTypeFieldType do not have any adverse effects on
> the solr.PointType class, so I'd quite like to suggest it gets included in
> the main solr source code. Where can I send a patch for someone to evaluate
> or should I just attach it to the issue in JIRA and see what happens?
> https://issues.apache.org/jira/browse/SOLR-1131
Open up a new JIRA issue and attach your suggested patch.
> PS. Is the solr-dev mailing list dead? There's nothing in the archives since
> April.
Lucene and Solr have merged (one development project, two downloads).
The new dev mailing list is dev@lucene.apache.org (you should have
been subscribed to dev if you were previously subscribed to solr-dev).
http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/
-Yonik
http://www.lucidimagination.com