You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Mahesh Ganapathy <mg...@gmail.com> on 2010/03/02 19:44:01 UTC

Re: [PROBLEM] POI and Excel : First Quote in Cell

Excel always uses a leading ' to denote the rest of the value as string. This prevents excel from auto formatting numerical data that as anything else (date/ time or auto round off) for instance, if we have a value 0901245e81 excel tends to format this as 9*10^86. Once this happens, the actual value is lost even if you change the cell format to string. Excel provides this leading ' to denote "pre-formatted" values and the cell format is set to string. To have a leading ' in the cell value, one must escape that with another ' as ''.
Hope this helps. 
Sent via BlackBerry by AT&T

-----Original Message-----
From: MSB <ma...@tiscali.co.uk>
Date: Sat, 27 Feb 2010 04:43:56 
To: <us...@poi.apache.org>
Subject: Re: [PROBLEM] POI and Excel : First Quote in Cell


Sorry to say that this will not help solve your immediate problem Timothee
but it is interesting nonetheless and may - that is may - point the way to a
solution. Today, I did have the chance to do a little bit of digging around
and I used the xml based file format as that makes it far easier to open the
file using simple tools - a zip utility and a text editor are all that is
needed - and the results of this investigation were quite interesting.

Firstly, I created a workbook using Excel that contained a single worksheet,
and I populated three cells of this worksheet, cells A1, B1 and C1. Into
cell A1, I entered the string VLA into cell B1 the string 'ULA (that is a
single apostrophe followed by the letters U, L and A) and, finally, in cell
C1 I entered ''WLA (that is two apostrophes followed by the letters W, L and
A). Viewing the worksheet using Excel, the value displayed in the first cell
was VLA, in the second was ULA and in the third cell was 'WLA. In the latter
two cases, cells B1 and C1 Excel had removed the leading apostrophe from the
value I entered and modified it somewhat. Clicking on each cell in turn
caused the cells value to be echoed into the editing bar above the worksheet
and in all cases, the value displayed here echoed what I had entered into
the cell originally, VLA, 'ULA and ''WLA for cells A1, B1 and C1
respectively, demonstrating that there is a difference between the value
entered into the cell and what is displayed.

My next step was to unzip the xlsx file so that I could look into the
contents of the file directly - this would have been much harder to do if
the workbook had been stored in the older binary format but it is not
impossible. The contents of the cells are often stored in a file called
sharedStrings.xml so I opened this file firstly and found the following;

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
count="3" uniqueCount="3">
   <si>
      <t>VLA</t>
   </si>
   <si>
      <t>ULA</t>
   </si>
   <si>
      <t>'WLA</t>
   </si>
</sst>

which I think shows that the values recorded as being stored in the cells
are VLA, ULA and 'WLA. Where they were added, the leading apostrophes have
been removed and it seems safe to conclude, therefore, that POI was
reporting the cells contents accurately.

Next, I wanted to test whether my hypothesis was correct and that the style
applied to the cell by Excel when it recognises that an apostrophe is the
first character entered into a cell is somehow 'special'. Therefore, I
looked into the styles.xml file and found this;

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
<styleSheet
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
   <fonts count="1">
      
         <sz val="11" /> 
         <color theme="1" /> 
         <name val="Calibri" /> 
         <family val="2" /> 
         <scheme val="minor" /> 
      
   </fonts>
   <fills count="2">
      <fill>
         <patternFill patternType="none" /> 
      </fill>
      <fill>
         <patternFill patternType="gray125" /> 
      </fill>
   </fills>
   <borders count="1">
      <border>
         <left /> 
         <right /> 
         <top /> 
         <bottom /> 
         <diagonal /> 
      </border>
   </borders>
   <cellStyleXfs count="1">
      <xf numFmtId="0" fontId="0" fillId="0" borderId="0" /> 
   </cellStyleXfs>
   <cellXfs count="2">
      <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" /> 
      <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"
quotePrefix="1" /> 
   </cellXfs>
   <cellStyles count="1">
      <cellStyle name="Normal" xfId="0" builtinId="0" /> 
   </cellStyles>
   <dxfs count="0" /> 
   <tableStyles count="0" defaultTableStyle="TableStyleMedium9"
defaultPivotStyle="PivotStyleLight16" /> 
</styleSheet>

To my mind, the key line is this one;

<xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" quotePrefix="1"
/>

which I am going to assume - and it is an assumption as it is not clear to
me how the style is linked to the cell and its contents - determines how the
leading apostrophe is actually handled.

I did something similar with the same file converted into the older binary
format. This time, I used a hex editor rather than POI to look into the file
and found that, just as in the example above, the values VLA, ULA and 'WLA
were stored in the cells. Unfortunately, I could not look at the styles as
easily so I do not know if there is similarly a style that determines
whether or not the leading apostrophe - or indeed quotation symbol - will or
should be displayed but I suspect there is. Running POI across that same
binary file and asking it to print out the hash code of the cell style
object applied to the cells I found that one cell style was applied to cell
A1 and another to both B1 and C1. Therefore, I cannot prove this hypothesis
but I think that the style needs to indicate whether or not there should
be/is a leading apostrophe or quotation symbol and that this is something
which POI does not currently do. This test does also demonstrate there is a
possible workaround - if you want to see values in cells preceded by an
apostrophe and wish to see that value reported accurately by POI, then use
two leading apostrophes.

Yours

Mark B

PS. I think that I have just confirmed the role and purpose of the
quotePrefix="1" attribute that I found on this line 

<xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" quotePrefix="1"
/>

of the styles.xml file. Simply, I wrote some code to unzip the archive,
change that attribute from quotePrefix="1" to quotePrefix="0" and then
recreate the archive again. Next, I opened the .xlsx file using Excel and
found that this time the contents of the cells and the value displayed in
the editing bar when the cell was selected were the same - VLA, ULA and 'WLA
respectively. Will try to experiment with other values to see if two, for
example, causes the leading apostrophe to be displayed in the cell.


Chevrier, Timothée wrote:
> 
> Hello,
> 
> Currently I encounter a strange problem and I don't know how to solve it
> using POI. I'm not sure I can. 
> 
> In my cell i've the value ['BLA]. The cell type is text. In POI, i can't
> catch the first apostrophe. It's not part of the value.
> Can we avoid this behaviour using POI ?
> Does any body as a solution to catch this first quote (apostrophe) in a
> text cell ?
> 
> Just for your information.
> Here is what I get in a sample Read/system.out program : 
> Debut du traitement d'imports de fichiers Excel
> LIGNE 1 : Cellule sans quote
> cell.getCellType() : 1
> cell.getStringCellValue() : BLA
> cell.toString() : BLA
> cell.getCellComment() : null
> cell.getCellStyle() : org.apache.poi.hssf.usermodel.HSSFCellStyle@1b9142d0
> cell.getRichStringCellValue() : BLA
> cell.getHyperlink() : null
> LIGNE 2 : Cellule avec quote STANDARD
> cell.getCellType() : 1
> cell.getStringCellValue() : BLA
> cell.toString() : BLA
> cell.getCellComment() : null
> cell.getCellStyle() : org.apache.poi.hssf.usermodel.HSSFCellStyle@5a9aa9f7
> cell.getRichStringCellValue() : BLA
> cell.getHyperlink() : null
> LIGNE 3 : Cellule avec quote TEXTE
> cell.getCellType() : 1
> cell.getStringCellValue() : BLA
> cell.toString() : BLA
> cell.getCellComment() : null
> cell.getCellStyle() : org.apache.poi.hssf.usermodel.HSSFCellStyle@8f166529
> cell.getRichStringCellValue() : BLA
> cell.getHyperlink() : null
> LIGNE 4 : Cellule avec quote PERSO
> cell.getCellType() : 1
> cell.getStringCellValue() : BLA
> cell.toString() : BLA
> cell.getCellComment() : null
> cell.getCellStyle() : org.apache.poi.hssf.usermodel.HSSFCellStyle@8f166529
> cell.getRichStringCellValue() : BLA
> cell.getHyperlink() : null
> LIGNE 5 : Cellule avec quote STANDARD et numérique
> cell.getCellType() : 1
> cell.getStringCellValue() : 99
> cell.toString() : 99
> cell.getCellComment() : null
> cell.getCellStyle() : org.apache.poi.hssf.usermodel.HSSFCellStyle@5a9aa9f7
> cell.getRichStringCellValue() : 99
> cell.getHyperlink() : null
> 
> 
> I've tried several POI classes to read my Excel workbook (version 3.6):
> - HSSFCell
> - ExcelExtractor
> - Row and HSSFListener
> 
> Just FYI, I've also tried JExcel 2.6.
> 
> Thanks for your help,
> 
> Regards,
> Timothée CHEVRIER
> 
> ---------------------------------------------------------------------------
> 
> LOGICA
> 
> 
> 
> T. : +33 (0)2 14 37 80 78
> VOIP : 14078
> timothee.chevrier@logica.com
> www.logica.com/france
> 
> 
> Please help Logica to respect the environment by not printing this email 
> / Pour contribuer comme Logica au respect de l'environnement, merci de ne
> pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und
> helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a
> Logica a respeitar o ambiente nao imprimindo este correio electronico.
> 
> 
> 
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/-PROBLEM--POI-and-Excel-%3A-First-Quote-in-Cell-tp27715413p27727677.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org