You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by crane <80...@qq.com> on 2015/08/14 10:06:40 UTC

sax get nothing but numbers from a large excel2007 and sharedStrings.xml is empty

Hi, 

I am using Apache SAX (Event API) to process xlsx file (size-167 MB with
around 7 lakh records). 

I want to upload this data to database. However, it gets nothing but
numbers. String can't be read unless  the xlsx file is edited and saved. And
I hava a lot of xlsx files like this to be processed everyday.

As far as I know,strings are stored in sharedStrings.xml. 
However,the sharedStrings.xml is 1 KB before i edit and save the xlsx file. 
<http://apache-poi.1045710.n5.nabble.com/file/n5719797/before.jpg> 

After that, it's 90.8 KB.
<http://apache-poi.1045710.n5.nabble.com/file/n5719797/after.jpg> 

Any one please help me in resolving this issue. 

Thanks in advance. 



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/sax-get-nothing-but-numbers-from-a-large-excel2007-and-sharedStrings-xml-is-empty-tp5719797.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: sax get nothing but numbers from a large excel2007 and sharedStrings.xml is empty

Posted by crane <80...@qq.com>.
Thank you very much! In fact, you helped me a lot.

With your explanation, I understand the design philosophy of excel and
callback mechanism of sax.

As my work is kind of little busy, I solved this problem with a vb script
temporarily. And I will try to implement another interface to handle inline
notification with your guidance when i'm free.

Thank you!Good luck! 




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/sax-get-nothing-but-numbers-from-a-large-excel2007-and-sharedStrings-xml-is-empty-tp5719797p5719829.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: sax get nothing but numbers from a large excel2007 and sharedStrings.xml is empty

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Thanks for letting us know you found a solution to the problem.

Also, I realised that I did not answer one of your questions. The reason
that the shared strings table grew in size once you had opened the file and
then re-saved it using Excel is because Excel would have populated the
shared strings table. Typically, Excel will seek to minimise the file size
and one of the ways it does this is to remove any duplicated strings by
utilising the shared strings table.

It should be quite possible to use POI to get at in-line strings. As I said,
I have no direct experience with the streaming API but have used parsers in
the past. Typically, there is a class that handles the markup via a callback
mechanism. Methods will be called when the opening and closing tags of an
element are called. It would be possible to create logic within that class
to watch for the inlie notification and then act accordingly; i.e. not look
in the shared strings table but read the value directly from the relevant
element.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/sax-get-nothing-but-numbers-from-a-large-excel2007-and-sharedStrings-xml-is-empty-tp5719797p5719814.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: sax get nothing but numbers from a large excel2007 and sharedStrings.xml is empty

Posted by crane <80...@qq.com>.
Thank you very much! I found those strings in the files you mentioned.

The files which I'm working with were offerd by  another Corporation, so I
don't know how those files are produced.

I have fix this problem with VBS. Still thank you very much.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/sax-get-nothing-but-numbers-from-a-large-excel2007-and-sharedStrings-xml-is-empty-tp5719797p5719810.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: sax get nothing but numbers from a large excel2007 and sharedStrings.xml is empty

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
It is usual for strings to be stored in the shared string stable but not
compulsory. Strings may be stored in line, that is as part of the main
markup for the worksheet and not in the shared strings table.

How are the files you are working with produced, using Excel, POI or some
other software? Obviously, you understand that an xlsx file is simply zipped
xml. May I ask you to look into the worksheets folder and open one of the
files called sheetn.xml (where n is an integer) and take a look at the
markup. It might be that the strings are store inline and you will be able
to see that in the markup for each cell.

In truth, I have never needed to use the streaming api and it is possible
that all you need to do is to modify it for in line strings - if this is
indeed the problem here. Anyway, the first step is to check for in line
strings by looking at the markup for a sheet.



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/sax-get-nothing-but-numbers-from-a-large-excel2007-and-sharedStrings-xml-is-empty-tp5719797p5719799.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org