You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Edwards, Joshua" <Jo...@capitalone.com> on 2014/08/14 22:10:32 UTC

Processing on a numeric fieldType?

Hello -

I am wanting to perform range searching on some numeric data.  The catch is that the numeric data is sometimes spelled out - ie "one hundred" instead of 100.  I have created a filter that allows me to convert the textual representation into a numeric one.  However, I can't add the filter to a numeric field, as they do not support filters, and if I store my data in a text field, then the range query doesn't work correctly (it treats it alphanumerically instead of numerically).  I also attempted to use a copyField, but it appears that it performs the copy before my processing occurs, and so it throws an Exception because it attempts to copy "one hundred" into the numeric field instead of the processed value of 100.

Can anyone please advise me on how to work through this issue?

Thanks,
Josh Edwards
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

RE: Processing on a numeric fieldType?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> BTW, I chatted with Uwe quite some time ago about the possibility of
> allowing numeric fields to allow analysis chains and it would be difficult. Trust
> me, I'll take his word for it ;).

The problem is mainly that the *Tokenizer* creates binary tokens, that cannot be modified in TokenFilters. The only thing that could be added would be a CharFilter (which runs before the Tokenizer), but that’s not supported by Solr, but it could be.

> I wonder if Jack's idea would be easier to implement with something like
> FieldMutatingUpdateProcessor?

I would suggest to do this, too. Does this also work on query side?

> On Fri, Aug 15, 2014 at 8:03 AM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
> > And it would be worth filing a Jira for a “flexible numeric field
> > type” in Lucene that could parse the common natural language
> > representations of numbers. I mean, we’re just giving you
> WORKAROUNDS.
> >
> > -- Jack Krupansky
> >
> > From: Edwards, Joshua
> > Sent: Friday, August 15, 2014 8:51 AM
> > To: dev@lucene.apache.org
> > Subject: RE: Processing on a numeric fieldType?
> >
> >
> > Thanks guys, I will try that today!
> >
> >
> >
> > Josh Edwards
> >
> >
> >
> > From: Erik Hatcher [mailto:erik.hatcher@gmail.com]
> > Sent: Thursday, August 14, 2014 6:01 PM
> > To: dev@lucene.apache.org
> > Subject: Re: Processing on a numeric fieldType?
> >
> >
> >
> > And within an update script you can even call out to your own analysis
> > just like Solr indexing does internally.  See slide #10 here:
> > http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-trick
> > s
> >
> >
> >
> >             Erik
> >
> >
> >
> > On Aug 14, 2014, at 5:25 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
> >
> >
> >
> > Write an update request processor to massage the data as you see fit.
> > It’s easy to write a JavaScript snippet with the stateless script
> > update processor.
> >
> >
> >
> > See plenty of examples in my e-book:
> >
> > http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-
> early-
> > access-release-7/ebook/product-21203548.html
> >
> >
> >
> > -- Jack Krupansky
> >
> >
> >
> > From: Edwards, Joshua
> >
> > Sent: Thursday, August 14, 2014 4:10 PM
> >
> > To: dev@lucene.apache.org
> >
> > Subject: Processing on a numeric fieldType?
> >
> >
> >
> > Hello –
> >
> >
> >
> > I am wanting to perform range searching on some numeric data.  The
> > catch is that the numeric data is sometimes spelled out – ie “one
> > hundred” instead of 100.  I have created a filter that allows me to
> > convert the textual representation into a numeric one.  However, I
> > can’t add the filter to a numeric field, as they do not support
> > filters, and if I store my data in a text field, then the range query
> > doesn’t work correctly (it treats it alphanumerically instead of
> > numerically).  I also attempted to use a copyField, but it appears
> > that it performs the copy before my processing occurs, and so it
> > throws an Exception because it attempts to copy “one hundred” into the
> numeric field instead of the processed value of 100.
> >
> >
> >
> > Can anyone please advise me on how to work through this issue?
> >
> >
> >
> > Thanks,
> >
> > Josh Edwards
> >
> >
> >
> > ________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates. The information
> > transmitted herewith is intended only for use by the individual or
> > entity to which it is addressed.  If the reader of this message is not
> > the intended recipient, you are hereby notified that any review,
> > retransmission, dissemination, distribution, copying or other use of,
> > or taking of any action in reliance upon this information is strictly
> > prohibited. If you have received this communication in error, please
> > contact the sender and delete the material from your computer.
> >
> >
> >
> >
> > ________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates. The information
> > transmitted herewith is intended only for use by the individual or
> > entity to which it is addressed.  If the reader of this message is not
> > the intended recipient, you are hereby notified that any review,
> > retransmission, dissemination, distribution, copying or other use of,
> > or taking of any action in reliance upon this information is strictly
> > prohibited. If you have received this communication in error, please
> > contact the sender and delete the material from your computer.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Processing on a numeric fieldType?

Posted by Erick Erickson <er...@gmail.com>.
BTW, I chatted with Uwe quite some time ago about the possibility
of allowing numeric fields to allow analysis chains and it would be
difficult. Trust me, I'll take his word for it ;).

I wonder if Jack's idea would be easier to implement with something
like FieldMutatingUpdateProcessor?

On Fri, Aug 15, 2014 at 8:03 AM, Jack Krupansky <ja...@basetechnology.com> wrote:
> And it would be worth filing a Jira for a “flexible numeric field type” in
> Lucene that could parse the common natural language representations of
> numbers. I mean, we’re just giving you WORKAROUNDS.
>
> -- Jack Krupansky
>
> From: Edwards, Joshua
> Sent: Friday, August 15, 2014 8:51 AM
> To: dev@lucene.apache.org
> Subject: RE: Processing on a numeric fieldType?
>
>
> Thanks guys, I will try that today!
>
>
>
> Josh Edwards
>
>
>
> From: Erik Hatcher [mailto:erik.hatcher@gmail.com]
> Sent: Thursday, August 14, 2014 6:01 PM
> To: dev@lucene.apache.org
> Subject: Re: Processing on a numeric fieldType?
>
>
>
> And within an update script you can even call out to your own analysis just
> like Solr indexing does internally.  See slide #10 here:
> http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks
>
>
>
>             Erik
>
>
>
> On Aug 14, 2014, at 5:25 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>
>
>
> Write an update request processor to massage the data as you see fit. It’s
> easy to write a JavaScript snippet with the stateless script update
> processor.
>
>
>
> See plenty of examples in my e-book:
>
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
>
>
> -- Jack Krupansky
>
>
>
> From: Edwards, Joshua
>
> Sent: Thursday, August 14, 2014 4:10 PM
>
> To: dev@lucene.apache.org
>
> Subject: Processing on a numeric fieldType?
>
>
>
> Hello –
>
>
>
> I am wanting to perform range searching on some numeric data.  The catch is
> that the numeric data is sometimes spelled out – ie “one hundred” instead of
> 100.  I have created a filter that allows me to convert the textual
> representation into a numeric one.  However, I can’t add the filter to a
> numeric field, as they do not support filters, and if I store my data in a
> text field, then the range query doesn’t work correctly (it treats it
> alphanumerically instead of numerically).  I also attempted to use a
> copyField, but it appears that it performs the copy before my processing
> occurs, and so it throws an Exception because it attempts to copy “one
> hundred” into the numeric field instead of the processed value of 100.
>
>
>
> Can anyone please advise me on how to work through this issue?
>
>
>
> Thanks,
>
> Josh Edwards
>
>
>
> ________________________________
>
> The information contained in this e-mail is confidential and/or proprietary
> to Capital One and/or its affiliates. The information transmitted herewith
> is intended only for use by the individual or entity to which it is
> addressed.  If the reader of this message is not the intended recipient, you
> are hereby notified that any review, retransmission, dissemination,
> distribution, copying or other use of, or taking of any action in reliance
> upon this information is strictly prohibited. If you have received this
> communication in error, please contact the sender and delete the material
> from your computer.
>
>
>
>
> ________________________________
>
> The information contained in this e-mail is confidential and/or proprietary
> to Capital One and/or its affiliates. The information transmitted herewith
> is intended only for use by the individual or entity to which it is
> addressed.  If the reader of this message is not the intended recipient, you
> are hereby notified that any review, retransmission, dissemination,
> distribution, copying or other use of, or taking of any action in reliance
> upon this information is strictly prohibited. If you have received this
> communication in error, please contact the sender and delete the material
> from your computer.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Processing on a numeric fieldType?

Posted by Jack Krupansky <ja...@basetechnology.com>.
And it would be worth filing a Jira for a “flexible numeric field type” in Lucene that could parse the common natural language representations of numbers. I mean, we’re just giving you WORKAROUNDS.

-- Jack Krupansky

From: Edwards, Joshua 
Sent: Friday, August 15, 2014 8:51 AM
To: dev@lucene.apache.org 
Subject: RE: Processing on a numeric fieldType?

Thanks guys, I will try that today!

 

Josh Edwards

 

From: Erik Hatcher [mailto:erik.hatcher@gmail.com] 
Sent: Thursday, August 14, 2014 6:01 PM
To: dev@lucene.apache.org
Subject: Re: Processing on a numeric fieldType?

 

And within an update script you can even call out to your own analysis just like Solr indexing does internally.  See slide #10 here: http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks

 

            Erik

 

On Aug 14, 2014, at 5:25 PM, Jack Krupansky <ja...@basetechnology.com> wrote:





Write an update request processor to massage the data as you see fit. It’s easy to write a JavaScript snippet with the stateless script update processor.

 

See plenty of examples in my e-book:

http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

 

-- Jack Krupansky

 

From: Edwards, Joshua

Sent: Thursday, August 14, 2014 4:10 PM

To: dev@lucene.apache.org

Subject: Processing on a numeric fieldType?

 

Hello –

 

I am wanting to perform range searching on some numeric data.  The catch is that the numeric data is sometimes spelled out – ie “one hundred” instead of 100.  I have created a filter that allows me to convert the textual representation into a numeric one.  However, I can’t add the filter to a numeric field, as they do not support filters, and if I store my data in a text field, then the range query doesn’t work correctly (it treats it alphanumerically instead of numerically).  I also attempted to use a copyField, but it appears that it performs the copy before my processing occurs, and so it throws an Exception because it attempts to copy “one hundred” into the numeric field instead of the processed value of 100. 

 

Can anyone please advise me on how to work through this issue?

 

Thanks,

Josh Edwards

 


--------------------------------------------------------------------------------

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

 



--------------------------------------------------------------------------------

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

RE: Processing on a numeric fieldType?

Posted by "Edwards, Joshua" <Jo...@capitalone.com>.
Thanks guys, I will try that today!

Josh Edwards

From: Erik Hatcher [mailto:erik.hatcher@gmail.com]
Sent: Thursday, August 14, 2014 6:01 PM
To: dev@lucene.apache.org
Subject: Re: Processing on a numeric fieldType?

And within an update script you can even call out to your own analysis just like Solr indexing does internally.  See slide #10 here: http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks

            Erik

On Aug 14, 2014, at 5:25 PM, Jack Krupansky <ja...@basetechnology.com>> wrote:


Write an update request processor to massage the data as you see fit. It's easy to write a JavaScript snippet with the stateless script update processor.

See plenty of examples in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

From: Edwards, Joshua<ma...@capitalone.com>
Sent: Thursday, August 14, 2014 4:10 PM
To: dev@lucene.apache.org<ma...@lucene.apache.org>
Subject: Processing on a numeric fieldType?

Hello -

I am wanting to perform range searching on some numeric data.  The catch is that the numeric data is sometimes spelled out - ie "one hundred" instead of 100.  I have created a filter that allows me to convert the textual representation into a numeric one.  However, I can't add the filter to a numeric field, as they do not support filters, and if I store my data in a text field, then the range query doesn't work correctly (it treats it alphanumerically instead of numerically).  I also attempted to use a copyField, but it appears that it performs the copy before my processing occurs, and so it throws an Exception because it attempts to copy "one hundred" into the numeric field instead of the processed value of 100.

Can anyone please advise me on how to work through this issue?

Thanks,
Josh Edwards

________________________________
The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Processing on a numeric fieldType?

Posted by Erik Hatcher <er...@gmail.com>.
And within an update script you can even call out to your own analysis just like Solr indexing does internally.  See slide #10 here: http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks

	Erik

On Aug 14, 2014, at 5:25 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

> Write an update request processor to massage the data as you see fit. It’s easy to write a JavaScript snippet with the stateless script update processor.
>  
> See plenty of examples in my e-book:
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>  
> -- Jack Krupansky
>  
> From: Edwards, Joshua
> Sent: Thursday, August 14, 2014 4:10 PM
> To: dev@lucene.apache.org
> Subject: Processing on a numeric fieldType?
>  
> Hello –
>  
> I am wanting to perform range searching on some numeric data.  The catch is that the numeric data is sometimes spelled out – ie “one hundred” instead of 100.  I have created a filter that allows me to convert the textual representation into a numeric one.  However, I can’t add the filter to a numeric field, as they do not support filters, and if I store my data in a text field, then the range query doesn’t work correctly (it treats it alphanumerically instead of numerically).  I also attempted to use a copyField, but it appears that it performs the copy before my processing occurs, and so it throws an Exception because it attempts to copy “one hundred” into the numeric field instead of the processed value of 100. 
>  
> Can anyone please advise me on how to work through this issue?
>  
> Thanks,
> Josh Edwards
> 
> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


Re: Processing on a numeric fieldType?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Write an update request processor to massage the data as you see fit. It’s easy to write a JavaScript snippet with the stateless script update processor.

See plenty of examples in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

From: Edwards, Joshua 
Sent: Thursday, August 14, 2014 4:10 PM
To: dev@lucene.apache.org 
Subject: Processing on a numeric fieldType?

Hello –

 

I am wanting to perform range searching on some numeric data.  The catch is that the numeric data is sometimes spelled out – ie “one hundred” instead of 100.  I have created a filter that allows me to convert the textual representation into a numeric one.  However, I can’t add the filter to a numeric field, as they do not support filters, and if I store my data in a text field, then the range query doesn’t work correctly (it treats it alphanumerically instead of numerically).  I also attempted to use a copyField, but it appears that it performs the copy before my processing occurs, and so it throws an Exception because it attempts to copy “one hundred” into the numeric field instead of the processed value of 100.  

 

Can anyone please advise me on how to work through this issue?

 

Thanks,

Josh Edwards



--------------------------------------------------------------------------------

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.