You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Koushik Mitra <Ko...@infosys.com> on 2009/04/28 10:17:00 UTC

Getting incorrect value while trying to extract content from xlsx

HI,

I was trying to extract content from an xlsx file for indexing.
However, I am getting julian date value for a cell with date format and '1.0' in place of '100%'.
I want to retain the value as present in that xlsx file.

Solution appreciated.

Thanks,
Koushik

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Re: Getting incorrect value while trying to extract content from xlsx

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Koushik,

You didn't say much about how you are doing the extraction.  Note that Solr doesn't do any extraction from spreadsheets, even though it has a component (known as Solr Cell) to provide that interface.  The actual extraction is done by a tool called Tika, or more precisely, POI, both of which are separate Apache projects.  Asking there may get you to the solution faster.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Koushik Mitra <Ko...@infosys.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Tuesday, April 28, 2009 4:17:00 AM
> Subject: Getting incorrect value while trying to extract content from xlsx 
> 
> HI,
> 
> I was trying to extract content from an xlsx file for indexing.
> However, I am getting julian date value for a cell with date format and '1.0' in 
> place of '100%'.
> I want to retain the value as present in that xlsx file.
> 
> Solution appreciated.
> 
> Thanks,
> Koushik
> 
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
> for the use of the addressee(s). If you are not the intended recipient, please 
> notify the sender by e-mail and delete the original message. Further, you are 
> not 
> to copy, disclose, or distribute this e-mail or its contents to any other person 
> and 
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken 
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage 
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your 
> own virus checks before opening the e-mail or attachment. Infosys reserves the 
> right to monitor and review the content of all messages sent to or from this 
> e-mail 
> address. Messages sent to or from this e-mail address may be stored on the 
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***


Re: Getting incorrect value while trying to extract content from xlsx

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
How are you indexing it?   A sample of the CSV file would be helpful.   
Note that while the CSV update handler is very convenient and very  
fast, it also doesn't have much in the way of data massaging/ 
transformation - so it might require you pre-format the data for Solr  
ingestion, or have a programmatic indexer that does this.

	Erik

On Apr 28, 2009, at 4:17 AM, Koushik Mitra wrote:

> HI,
>
> I was trying to extract content from an xlsx file for indexing.
> However, I am getting julian date value for a cell with date format  
> and '1.0' in place of '100%'.
> I want to retain the value as present in that xlsx file.
>
> Solution appreciated.
>
> Thanks,
> Koushik
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION  
> intended solely
> for the use of the addressee(s). If you are not the intended  
> recipient, please
> notify the sender by e-mail and delete the original message.  
> Further, you are not
> to copy, disclose, or distribute this e-mail or its contents to any  
> other person and
> any such actions are unlawful. This e-mail may contain viruses.  
> Infosys has taken
> every reasonable precaution to minimize this risk, but is not liable  
> for any damage
> you may sustain as a result of any virus in this e-mail. You should  
> carry out your
> own virus checks before opening the e-mail or attachment. Infosys  
> reserves the
> right to monitor and review the content of all messages sent to or  
> from this e-mail
> address. Messages sent to or from this e-mail address may be stored  
> on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***