You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by smh821025 <sm...@yahoo.com.cn> on 2010/11/04 09:35:35 UTC

Re: HSSFCell API changes - I do not see setEncoding() in 3.2

i  am using poi 3.6 ,but have a Unreadable code too,how handle??
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3249690.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Matthew Carey <ma...@ssl.co.uk>.
Should of course be 

java -Dfile.encoding=UTF-8 -jar my.jar params



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p5713464.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Matthew Carey <ma...@ssl.co.uk>.
I fixed the issue which I think was being picked from the LANG or Locale or
the user running the server by setting the file.encoding system property
explicitly on the command line. 

java -Dfile.encoding=UTF-8 my.jar params



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p5713463.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Concur with Nick.  How are you initializing the Reader that you pass to: 

CSVReader r = new au.com.bytecode.opencsv. CSVReader(Reader reader)

-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org] 
Sent: Friday, August 02, 2013 1:01 PM
To: POI Users List
Subject: Re: HSSFCell API changes - I do not see setEncoding() in 3.2

On Fri, 2 Aug 2013, Matthew Carey wrote:
> I have written an application that takes an xls file as a template and reads
> a csv file and outputs another xls file containing the data and images
> referenced in the csv file in rows. It uses the formatting from the template
> file.
>
> When my POI using applications jar file is run from the command line a
> correctly formatted xls file is generated from an xls file used as a
> template with imported utf-8 data generating correctly accented characters
> in the result xls file.

I'm minded to blame your CSV reading here. POI works only on Java strings, 
which are unicode. POI sorts out any encoding parts for you when writing 
to the binary file format.

When reading from a csv file, you need to turn the bytes into the correct 
characters if you want to get valid Java strings out. Sounds like you're 
not doing that right in some cases... Easiest way to be sure is add some 
debugging that prints out a character at a time the values in one of your 
accented strings, and ensure they're correct

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 2 Aug 2013, Matthew Carey wrote:
> I have written an application that takes an xls file as a template and reads
> a csv file and outputs another xls file containing the data and images
> referenced in the csv file in rows. It uses the formatting from the template
> file.
>
> When my POI using applications jar file is run from the command line a
> correctly formatted xls file is generated from an xls file used as a
> template with imported utf-8 data generating correctly accented characters
> in the result xls file.

I'm minded to blame your CSV reading here. POI works only on Java strings, 
which are unicode. POI sorts out any encoding parts for you when writing 
to the binary file format.

When reading from a csv file, you need to turn the bytes into the correct 
characters if you want to get valid Java strings out. Sounds like you're 
not doing that right in some cases... Easiest way to be sure is add some 
debugging that prints out a character at a time the values in one of your 
accented strings, and ensure they're correct

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Matthew Carey <ma...@ssl.co.uk>.
Using POI 3.9 I get strange issues with encoding. Running under CentOS
release 5.9 (Final)

I have written an application that takes an xls file as a template and reads
a csv file and outputs another xls file containing the data and images
referenced in the csv file in rows. It uses the formatting from the template
file.

When my POI using applications jar file is run from the command line a
correctly formatted xls file is generated from an xls file used as a
template with imported utf-8 data generating correctly accented characters
in the result xls file.

When the POI using application is run by a server process running as root on
the same box the character encoding of the generated cells is iso-8859-1 and
the accented characters become pairs of unknown characters in the xls.

I imagine that it is picking up some Locale setting and doing the wrong
thing. It would be nice to be able to override this.

Now it could be the csv file library (au.com.bytecode.opencsv) rather than
POI 




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p5713419.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 11 Feb 2011, Mark Beardsley wrote:
> The other thought I had is could it be concerned with the language 
> settings of the machine? I do not know if this is the case but could 
> that change the appearance of a character?

I've just created an excel file with an alpha and beta in it, and both 
came out as expected

One thing I noticed was that if I entered some characters in a "silly" 
font, then changed the font back to a normal one, then I got some text. On 
your problematic beta cell, try setting the font to something like arial 
and see what your cell shows then

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Not sure whether you could do it by opening the file with a hex editor and
then sift through the information it dumps to screen. There is the
BiffViewer utility that is bundled with the api. This might be useful to you
as it extracts and displays to screen information about the various records
that make up the file. Sifting through that may expose what the character is
but there remains, of course, the concern that if POI is 'mangling' the
character (which I doubt as Nick has explained) the same will happen here.
The other thought I had is could it be concerned with the language settings
of the machine? I do not know if this is the case but could that change the
appearance of a character?

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3380798.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by apollo <am...@bcm.tmc.edu>.
The data is in the  old .xls file format.

-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3379574.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
May I ask which type of file you are dealing with? Is it one of the older
binary format files (ending with .xls) or one of the newer OOXML files
(ending with something like .xlsx)? If the latter, you can take a look
directly at the contents of the file to check that POI is retrieving them
properly. Simply copy the file, change the copy's extension to .zip and then
unzip it into a folder. Then you need to locate a file called
sharedStrings.xml and open that using a simple text editor. This will allow
you to look directly at the contents of the cells and you can quickly scan
that to see if the beta caharcter is there or whether it is, as Nick
suggested, the font that is performing the transformation for you.

Now, to get at font information from within the api itself, you will need to
obtain a reference to the style applied to the cell and call the
getFontIndex() method - there are different methods defined for the
different workbook types, this one is defined for both within the SS stream
as I do not know the format you are targetting. You can then pass the short
value this method returns to the getFontAt() method of the workbook object
which will return you a Font object (actually an instance of either HSSFFont
of XSSFFont depening upon which file type you are dealing with) that you can
interrogate for information such as it's name, etc. The method above relates
to coding using the SS model and you may ned to make modifications depending
upon whether you are using the HSSF stream or the XSSF one.

Yours

Mark B
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3378914.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by apollo <am...@bcm.tmc.edu>.
I unfortunately cannot change the excel. I receive it with data already input
and need to transfer it to tables.
Now, i do not know what is stored in the excel, so i take your word for it,
ie it is the b i see.
Both Win and Mac show the β on the excel. So, given an excel cell, how would
i programmatically know that the cell value is a special( Greek) character.
>From what you describe, do i need to get the font associated with cell? Is
that the way? Any pointers? Thanks.

-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3378563.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 9 Feb 2011, apollo wrote:
> So there is of course a font being used to "show" me the b, which means 
> POI understands that the cell value is "\U0b32"

I think it's the other way around. The excel file stores the literal B 
value. There's then a font which when it sees the letter B displays a beta 
sign (much like how windings displays funky characters when you type 
normal letters in). POI is simply showing you what's in the file, the 
magic is entirely within the windows fonts...

I'd suggest you try setting a real font (not a special symbol one), then 
input a beta, that should store the correct value for you.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by apollo <am...@bcm.tmc.edu>.
So there is of course a font being used to "show" me the b, which means POI
understands that the cell value is "\U0b32"  and it attempts to show me a
logical value. Should not the POI API   give the user the options of what to
do with it instead of quesing? Is there a way to control this excel to java
conversion in POI?

-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3378538.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Nick Burch <ni...@alfresco.com>.
On Tue, 8 Feb 2011, apollo wrote:
> I have a cell that has Greek glyphs, when i do cell.getCellValue() or 
> RIchTextValue, the string i get is "intepreted for me , for example, β 
> appears as b. Why is that so? b is not \U0b32. Is there some conversion 
> that takes place? Can i control this?

You get back the value that Excel stored in the cell. I wonder if there's 
something going on with the font used, where perhaps it renders the letter 
b as the beta symbol?

Nick

Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by apollo <am...@bcm.tmc.edu>.
I have a cell that has Greek glyphs, when i do cell.getCellValue() or
RIchTextValue, the string i get is "intepreted for me , for example, β
appears as b. Why is that so? b is not \U0b32. Is there some conversion that
takes place? Can i control this?
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3377019.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 27 Dec 2010, sanjaychandak wrote:
> As there was no problem in  jakarta-poi 3.0 as we were using setEncoding
> method i.e. (setEncoding(HSSFCell.ENCODING_UTF_16);). Once we upgraded to
> 3.6 problem comes to render Polish characters and it throws the following
> exceptions while doing export to excel i.e.
>
> org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.

HSSF should now do the correct thing when it comes to deciding on the 
encoding of a cell, based on the contents. XSSF is XML based, and the low 
level XML parser in the JVM does all the hard work for us.

>From your exception, I'm guessing you're using XSSF rather than HSSF? What 
did you do to get that exception? It doesn't look like the sort of thing 
one should generally get except when monkeying around with the low level 
xml stream is all..

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by sanjaychandak <sa...@hotmail.com>.
Hi,

I am using jakarta-poi 3.6 and we have problem in rendering polish
characters while doing export to Excel.

As there was no problem in  jakarta-poi 3.0 as we were using setEncoding
method i.e. (setEncoding(HSSFCell.ENCODING_UTF_16);). Once we upgraded to
3.6 problem comes to render Polish characters and it throws the following
exceptions while doing export to excel i.e. 

org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.

I just checked the 3.6 API and there is no method and need not to do the
setEncoding (as method is deprecated) but it said setEncoding automatically
done after jakarta-poi 3.2.

Any idea if someone has problem in render polish characters and whatz the
solution to get rid of above exception (org.xml.sax.SAXParseException:
Invalid byte 2 of 3-byte UTF-8 sequence.). When I downgraded to 3.0 its
works fine with SetEncoding method (as above) but when I use the 3.6 it
throws an parse exception (as above) with the same xml file.

Thanks
Sanjay
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/HSSFCell-API-changes-I-do-not-see-setEncoding-in-3-2-tp2303497p3319301.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: HSSFCell API changes - I do not see setEncoding() in 3.2

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 4 Nov 2010, smh821025 wrote:
> i am using poi 3.6 ,but have a Unreadable code too,how handle??

Newer versions of POI handle unicode characters just fine, there's no need 
for you to do anything special. When you set a java string, the string is 
unicode and POI handles it fine. When you get back text, it's a java 
string and again handles unicode fine. If you're having issues, it's 
almost certainly with getting unicode in or out of your java program, in 
which case you'll want to go read a java unicode tutorial for how to fix 
your issues.

Nick