You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Allan <ma...@ed.ac.uk> on 2010/07/02 15:51:47 UTC

Modifications to AbstractSubTypeFieldType

Hi folks,

I've made a few small changes to the AbstractSubTypeFieldType class to  
allow users to define distinct field types for each subfield.  This  
enables us to define complex data types in the schema.

For example, we have our own subclass of the CoordinateFieldType  
called TemporalCoverage (I've spoken about this recently on the  
mailing list) where we store a start and end date for an event but now  
we can store a name for the event as well.

<fieldType name="temporal"  
class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3"  
subFieldSuffix="_ti,_ti,_s"/>

In this example, the start and end dates get stored as trie-coded  
integers and the description as a string.

As usual, it's up to your own subclass to do sanity checking on the  
input to ensure the right number and type of subfields are there in  
the document field.

We now store documents like this:
<doc>
   <field name="id">15250</field>
   <field name="name">Events of the 20th Century</field>
   <field name="description">Film covering a variety of important  
events in the 20th Century.</field>
   <field name="daterange">1914,1918, First World War</field>
   <field name="daterange1">1939,1945, Second World War</field>
   <field name="daterange2">1957,1969, Space Race</field>
   <field name="daterange3">1990,2000, random date</field>
   ...
</doc>

The changes to AbstractSubTypeFieldType do not have any adverse  
effects on the solr.PointType class, so I'd quite like to suggest it  
gets included in the main solr source code.  Where can I send a patch  
for someone to evaluate or should I just attach it to the issue in  
JIRA and see what happens?
	https://issues.apache.org/jira/browse/SOLR-1131

Mark

PS. Is the solr-dev mailing list dead? There's nothing in the archives  
since April.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Modifications to AbstractSubTypeFieldType

Posted by Mark Allan <ma...@ed.ac.uk>.
On 3 Jul 2010, at 1:50 am, Chris Hostetter wrote:

> : The changes to AbstractSubTypeFieldType do not have any adverse  
> effects on the
> : solr.PointType class, so I'd quite like to suggest it gets  
> included in the
> : main solr source code.  Where can I send a patch for someone to  
> evaluate or
> : should I just attach it to the issue in JIRA and see what happens?
> : 	https://issues.apache.org/jira/browse/SOLR-1131
>
> please open a new Jira issue.

OK, done.
https://issues.apache.org/jira/browse/SOLR-1986

> I'm not too familiar with AbstractSubTypeFieldType, but your  
> improvement
> sounds pretty good to me on the surface ... i'm just wondering if we
> should have a simpler way of specifying the suffix when dimension is
> really large.

Yes, I wondered that myself but wasn't sure which way to go.

I thought about something like this:

<fieldType name="temporal"  
class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3">
	<subFieldSuffix>_ti</subFieldSuffix>
	<subFieldSuffix>_ti</subFieldSuffix>
	<subFieldSuffix>_s</subFieldSuffix>
</fieldType>

but it doesn't really seem to help much. If anything, it probably  
makes it *less* readable.

Mark

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Modifications to AbstractSubTypeFieldType

Posted by Chris Hostetter <ho...@fucit.org>.
: The changes to AbstractSubTypeFieldType do not have any adverse effects on the
: solr.PointType class, so I'd quite like to suggest it gets included in the
: main solr source code.  Where can I send a patch for someone to evaluate or
: should I just attach it to the issue in JIRA and see what happens?
: 	https://issues.apache.org/jira/browse/SOLR-1131

please open a new Jira issue.

I'm not too familiar with AbstractSubTypeFieldType, but your improvement 
sounds pretty good to me on the surface ... i'm just wondering if we 
should have a simpler way of specifying the suffix when dimension is 
really large.


-Hoss


Re: Modifications to AbstractSubTypeFieldType

Posted by Lance Norskog <go...@gmail.com>.
Compound types are young and will probably mutate. I will do my own
hack until things settle down.

Lance

On Mon, Jul 12, 2010 at 12:47 AM, Mark Allan <ma...@ed.ac.uk> wrote:
> On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
>>
>> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll <gs...@apache.org>
>> wrote:
>>>
>>> Originally, I had intended that it was just for one Field Sub Type,
>>> thinking that if we ever wanted multiple sub types, that a new, separate
>>> class would be needed
>>
>>
>> Right - this was my original thinking too.  AbstractSubTypeFieldType
>> is only a convenience class to create compound types... people can do
>> it other ways.
>
> Just for clarification, does that mean my modifications won't be included?
>  If so, can you let me know so that I can extract the changes and maintain
> them in a different package structure from the main Solr code please.
>
> Cheers
> Mark
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Modifications to AbstractSubTypeFieldType

Posted by Mark Allan <ma...@ed.ac.uk>.
On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>> Originally, I had intended that it was just for one Field Sub Type,  
>> thinking that if we ever wanted multiple sub types, that a new,  
>> separate class would be needed
>
>
> Right - this was my original thinking too.  AbstractSubTypeFieldType
> is only a convenience class to create compound types... people can do
> it other ways.

Just for clarification, does that mean my modifications won't be  
included?  If so, can you let me know so that I can extract the  
changes and maintain them in a different package structure from the  
main Solr code please.

Cheers
Mark

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Modifications to AbstractSubTypeFieldType

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll <gs...@apache.org> wrote:
> Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed


Right - this was my original thinking too.  AbstractSubTypeFieldType
is only a convenience class to create compound types... people can do
it other ways.

-Yonik
http://www.lucidimagination.com

Re: Modifications to AbstractSubTypeFieldType

Posted by Mark Allan <ma...@ed.ac.uk>.
Currently our only requirement is to be able to search on the  
numerical part of the daterange field, so our field type overrides  
getRangeQuery and getFieldQuery to consider only the first two  
subfields.  If we wanted to be able to search the name subfield as  
well, I suppose we could do some checking to see whether the user's  
search term included any non-numeric characters, if so, search the  
name subfield instead.

I imagine, however, that it would be harder to include an ability to  
search /specific/ subfields without inventing some new query syntax.

Mark

On 7 Jul 2010, at 1:15 pm, Grant Ingersoll wrote:

> This looks reasonable.  I'll take a look at the patch.  Originally,  
> I had intended that it was just for one Field Sub Type, thinking  
> that if we ever wanted multiple sub types, that a new, separate  
> class would be needed, but if this proves to be clean this way, then  
> I see no reason not to incorporate it.  Besides, I could see  
> extending PointType, etc. to be NamedPointType, for example.
>
> I'm curious, Mark, how are you searching those fields?  What types  
> of queries are you generating?
>
> -Grant
>
> On Jul 2, 2010, at 9:51 AM, Mark Allan wrote:
>
>> Hi folks,
>>
>> I've made a few small changes to the AbstractSubTypeFieldType class  
>> to allow users to define distinct field types for each subfield.   
>> This enables us to define complex data types in the schema.
>>
>> For example, we have our own subclass of the CoordinateFieldType  
>> called TemporalCoverage (I've spoken about this recently on the  
>> mailing list) where we store a start and end date for an event but  
>> now we can store a name for the event as well.
>>
>> <fieldType name="temporal"  
>> class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3"  
>> subFieldSuffix="_ti,_ti,_s"/>
>>
>> In this example, the start and end dates get stored as trie-coded  
>> integers and the description as a string.
>>
>> As usual, it's up to your own subclass to do sanity checking on the  
>> input to ensure the right number and type of subfields are there in  
>> the document field.
>>
>> We now store documents like this:
>> <doc>
>> <field name="id">15250</field>
>> <field name="name">Events of the 20th Century</field>
>> <field name="description">Film covering a variety of important  
>> events in the 20th Century.</field>
>> <field name="daterange">1914,1918, First World War</field>
>> <field name="daterange1">1939,1945, Second World War</field>
>> <field name="daterange2">1957,1969, Space Race</field>
>> <field name="daterange3">1990,2000, random date</field>
>> ...
>> </doc>
>>
>> The changes to AbstractSubTypeFieldType do not have any adverse  
>> effects on the solr.PointType class, so I'd quite like to suggest  
>> it gets included in the main solr source code.  Where can I send a  
>> patch for someone to evaluate or should I just attach it to the  
>> issue in JIRA and see what happens?
>> 	https://issues.apache.org/jira/browse/SOLR-1131
>>
>> Mark
>>
>> PS. Is the solr-dev mailing list dead? There's nothing in the  
>> archives since April.


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Modifications to AbstractSubTypeFieldType

Posted by Grant Ingersoll <gs...@apache.org>.
This looks reasonable.  I'll take a look at the patch.  Originally, I had intended that it was just for one Field Sub Type, thinking that if we ever wanted multiple sub types, that a new, separate class would be needed, but if this proves to be clean this way, then I see no reason not to incorporate it.  Besides, I could see extending PointType, etc. to be NamedPointType, for example.

I'm curious, Mark, how are you searching those fields?  What types of queries are you generating?

-Grant

On Jul 2, 2010, at 9:51 AM, Mark Allan wrote:

> Hi folks,
> 
> I've made a few small changes to the AbstractSubTypeFieldType class to allow users to define distinct field types for each subfield.  This enables us to define complex data types in the schema.
> 
> For example, we have our own subclass of the CoordinateFieldType called TemporalCoverage (I've spoken about this recently on the mailing list) where we store a start and end date for an event but now we can store a name for the event as well.
> 
> <fieldType name="temporal" class="uk.ac.edina.solr.schema.TemporalCoverage" dimension="3" subFieldSuffix="_ti,_ti,_s"/>
> 
> In this example, the start and end dates get stored as trie-coded integers and the description as a string.
> 
> As usual, it's up to your own subclass to do sanity checking on the input to ensure the right number and type of subfields are there in the document field.
> 
> We now store documents like this:
> <doc>
>  <field name="id">15250</field>
>  <field name="name">Events of the 20th Century</field>
>  <field name="description">Film covering a variety of important events in the 20th Century.</field>
>  <field name="daterange">1914,1918, First World War</field>
>  <field name="daterange1">1939,1945, Second World War</field>
>  <field name="daterange2">1957,1969, Space Race</field>
>  <field name="daterange3">1990,2000, random date</field>
>  ...
> </doc>
> 
> The changes to AbstractSubTypeFieldType do not have any adverse effects on the solr.PointType class, so I'd quite like to suggest it gets included in the main solr source code.  Where can I send a patch for someone to evaluate or should I just attach it to the issue in JIRA and see what happens?
> 	https://issues.apache.org/jira/browse/SOLR-1131
> 
> Mark
> 
> PS. Is the solr-dev mailing list dead? There's nothing in the archives since April.
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 


Re: Modifications to AbstractSubTypeFieldType

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Fri, Jul 2, 2010 at 9:51 AM, Mark Allan <ma...@ed.ac.uk> wrote:
[...]
> The changes to AbstractSubTypeFieldType do not have any adverse effects on
> the solr.PointType class, so I'd quite like to suggest it gets included in
> the main solr source code.  Where can I send a patch for someone to evaluate
> or should I just attach it to the issue in JIRA and see what happens?
>        https://issues.apache.org/jira/browse/SOLR-1131

Open up a new JIRA issue and attach your suggested patch.

> PS. Is the solr-dev mailing list dead? There's nothing in the archives since
> April.

Lucene and Solr have merged (one development project, two downloads).
The new dev mailing list is dev@lucene.apache.org (you should have
been subscribed to dev if you were previously subscribed to solr-dev).

http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/

-Yonik
http://www.lucidimagination.com