You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by eakarsu <ea...@gmail.com> on 2013/11/11 03:49:45 UTC

Unit of dimension for solr field

I would like to have a SOLR field that has multiple unit of dimension.
Suppose we store the memory value of a computer in solr field. That can have
value 256 MB, 512 MB, or 1 GB where we use MB and GB units. Same case is for
hard drive sizes : 256 MB,50GB or 3TB where we use MB,GB and TB units.

How can I store these unit of dimensions with values itself? I would like to
have range queries on such fields: say bring me desktops that has 256M-1G
memory values.

I appreciate any guidance

Thanks

Erol Akarsu



--
View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unit of dimension for solr field

Posted by Erick Erickson <er...@gmail.com>.
Yep, doing this outside Solr at ingestion should be a simpler model if
you already have an external ingestion method. Otherwise a custom
update processor would be reasonably easy.

Best,
Erick


On Tue, Nov 12, 2013 at 8:04 AM, eakarsu <ea...@gmail.com> wrote:

> Erick,
>
> I haven't written any SOLR plugin before so it takes time to understand
> concepts.
>
> This is more simpler to implement and I think this way does not need to
> write any plugin SOLR, isn't it?
> Outside process analyses values with dimensions and prepare 2 fields as you
> described
>
> Erol Akarsu
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Unit of dimension for solr field

Posted by eakarsu <ea...@gmail.com>.
Erick,

I haven't written any SOLR plugin before so it takes time to understand
concepts.

This is more simpler to implement and I think this way does not need to
write any plugin SOLR, isn't it?
Outside process analyses values with dimensions and prepare 2 fields as you
described

Erol Akarsu



--
View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unit of dimension for solr field

Posted by Erick Erickson <er...@gmail.com>.
You seem to be consistently missing the problem that your queries will not
work as expected. How would you do a range query without writing a some
kind of custom code that looked at the payloads to determine the normalized
units?

The simplest way to do this is probably have your ingestion side normalize.
Put the original (complete with units) in a field that has indexed="false",
this will only be used for showing in the results list.

_Also_ add the normalized field to another filed that you set
indexed="true" and stored="false" to. that will allow range searches,
faceting, etc.

HTH,
Erick


On Mon, Nov 11, 2013 at 2:36 PM, eakarsu <ea...@gmail.com> wrote:

> Can DelimitedPayloadTokenFilterFactory be used to store unit dimension
> information? This factory class can store extra information for field.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100345.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Unit of dimension for solr field

Posted by eakarsu <ea...@gmail.com>.
Can DelimitedPayloadTokenFilterFactory be used to store unit dimension
information? This factory class can store extra information for field.



--
View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100345.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unit of dimension for solr field

Posted by Jack Krupansky <ja...@basetechnology.com>.
A custom token filter may indeed be the right way to go, but an alternative 
is the combination of an update processor and a query preprocessor.

The update processor, which could be a JavaScript script could normalize the 
string into a simple integer byte count. You might also want to keep 
separate fields, one for the raw string and one for the final byte count. A 
JavaScript script would be a lot easier to develop than a custom token 
filter.

A query preprocessor could do two things: First, the same string to byte 
count normalization as the update processor, plus generate a range query. 
So, for example, a query for 0.5 TB could match 512 GB, 500 GB, etc, with 
[500000000000 TO 499999999999].

Technically, you could implement a query preprocessor as a plugin Solr 
search component, but if that sounds like too much effort, an 
application-level implementation would probably be easier to master.

-- Jack Krupansky

-----Original Message----- 
From: Ryan Cutter
Sent: Monday, November 11, 2013 10:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Unit of dimension for solr field

I think Upayavira's suggestion of writing a filter factory fits what you're
asking for.  However, the other end of cleverness is to simple use
solr.TrieIntField and store everything in MB.  So for 1TB you'd
write 51200.  A range query for 256MB to 1GB would be field:[256 TO 1024].

Conversion from MB to your displayed unit (2TB, for example) would happen
in the application layer.  But using trie ints would be simple and
efficient.

- Ryan


On Mon, Nov 11, 2013 at 7:06 AM, eakarsu <ea...@gmail.com> wrote:

> Thanks Upayavira
>
> It seems it needs too much work. I will have several more fields that will
> have unit values.
> Do we have more quicker way of implementing it?
>
> We have Currency filed coming as default with SOLR. Can we use it?
> Creating conversion rate table for each field? What I am expecting from
> units is similar to currency field
>
> Erol Akarsu
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100295.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Unit of dimension for solr field

Posted by eakarsu <ea...@gmail.com>.
Ryan and Upayavira,

Do we have an example skeleton to do this for schema.xml and solrconfig.xml?
Example java class that would help to build UnitResolvingFilterFactory
class?

Thanks

Erol Akarsu



--
View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100303.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unit of dimension for solr field

Posted by Ryan Cutter <ry...@gmail.com>.
I think Upayavira's suggestion of writing a filter factory fits what you're
asking for.  However, the other end of cleverness is to simple use
solr.TrieIntField and store everything in MB.  So for 1TB you'd
write 51200.  A range query for 256MB to 1GB would be field:[256 TO 1024].

Conversion from MB to your displayed unit (2TB, for example) would happen
in the application layer.  But using trie ints would be simple and
efficient.

- Ryan


On Mon, Nov 11, 2013 at 7:06 AM, eakarsu <ea...@gmail.com> wrote:

> Thanks Upayavira
>
> It seems it needs too much work. I will have several more fields that will
> have unit values.
> Do we have more quicker way of implementing it?
>
> We have Currency filed coming as default with SOLR. Can we use it?
> Creating conversion rate table for each field? What I am expecting from
> units is similar to currency field
>
> Erol Akarsu
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100295.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Unit of dimension for solr field

Posted by eakarsu <ea...@gmail.com>.
Thanks Upayavira 

It seems it needs too much work. I will have several more fields that will
have unit values.
Do we have more quicker way of implementing it?

We have Currency filed coming as default with SOLR. Can we use it?
Creating conversion rate table for each field? What I am expecting from
units is similar to currency field

Erol Akarsu




--
View this message in context: http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209p4100295.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unit of dimension for solr field

Posted by Upayavira <uv...@odoko.co.uk>.
It really depends upon how clever you want to be.

If I were to do it, I would push two versions into Solr, one with MB or
GB in, for display, and another, resolved to a number, for faceting and
querying. I.e do the work outside Solr.

If you did want to be clever, you could use a KeywordTokenizer in an
analysis chain (it spits out just a single token) and then write your
own UnitResolvingFilterFactory, which you could configure with mapping
so such as KB->1024, and spits out integer or float fields.

This should then work for querying, as querystring terms would be
analysed. It would be neat because the stored.field values would include
the pretty units, whilst the indexed values would be pure numbers. You
could use the field for range faceting, but you would get the indexed
value, I.e. Without the units, as faceting uses the indexed value not
the stored one.

Upayavira

On Mon, Nov 11, 2013, at 02:49 AM, eakarsu wrote:
> I would like to have a SOLR field that has multiple unit of dimension.
> Suppose we store the memory value of a computer in solr field. That can
> have
> value 256 MB, 512 MB, or 1 GB where we use MB and GB units. Same case is
> for
> hard drive sizes : 256 MB,50GB or 3TB where we use MB,GB and TB units.
> 
> How can I store these unit of dimensions with values itself? I would like
> to
> have range queries on such fields: say bring me desktops that has 256M-1G
> memory values.
> 
> I appreciate any guidance
> 
> Thanks
> 
> Erol Akarsu
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unit-of-dimension-for-solr-field-tp4100209.html
> Sent from the Solr - User mailing list archive at Nabble.com.