You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by jonmikelm <jm...@ibermatica.com> on 2012/10/29 12:02:21 UTC

Encoding problems reading excel files

Hi all,

We are having problems while reading accents from a excel file. We are using
the SS api to read both xls and xlsx files.

We started having problems when we migrated our application to a tomcat with
UTF-8 encoding. 

¿Is there a way to tell POI that it is reading a CP1252 encoded file?

¿Does POI take care of encodings or it just takes them from the enviroment
values?

I don't get on well with encodings, they always make me loose a lot of
time...

thanks & regards

Jon Mikel





--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Encoding-problems-reading-excel-files-tp5711320.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding problems reading excel files

Posted by jonmikelm <jm...@ibermatica.com>.
I forgot saying that we do close de FileOutputStream after writting the
workbook.

thanks again,

Jon Mikel



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Encoding-problems-reading-excel-files-tp5711320p5711324.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding problems reading excel files

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 29 Oct 2012, jonmikelm wrote:
> Our program has 3 input excel files that generate a 4th output excel 
> file. I thought the problem occurred when we read the files, but it 
> happens when we write a workbook to disk.

Have you tried creating a simple program that creates a new excel file, 
writes some accented characters from their unicode escape values 
(eg \u00e9) so that you know it's not an input problem, and writes that 
out. I think that should work. Does that work? Then, try opening the input 
template file and saving it, does that behave or do things go wrong then? 
Now, try opening the template, adding some strings from their unicode 
escape sequence, is that ok?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding problems reading excel files

Posted by jonmikelm <jm...@ibermatica.com>.
Hi Nick, thanks for the response.

You were right, the problem is in the output, but in our case, the output is
another excel file...

Our program has 3 input excel files that generate a 4th output excel file. I
thought the problem occurred when we read the files, but it happens when we
write a workbook to disk.

I'll try to explain briefly our programs flow. We combine data from two
input excel files with data in a external database. We use a third input
excel file as a template, and after filling it with the previus data we
write it to disk.

*//START*

///We load two input excel files/
Workbook inputwb1 = WorkbookFactory.create(excelInputStream1);
Workbook inputwb2 = WorkbookFactory.create(excelInputStream2);

///We read both workbooks and combine the data with another data got from a
database
...
...
//ACCENTS ARE CORRECT HERE/

///We load the third input excel (the one which works as a template) into
another workbook object/
Workbook templatewb = WorkbookFactory.create(templateInputStream);

///We fill the template workbook cells with the data got from the two first
input excels and the database
...
...
//We make no encoding treatment to the data anywhere/

///We write the template workbook to file/
String outputExcelPath = "c:\path\dummy.xls"
FileOutputStream fos = new FileOutputStream(outputExcelPath); 
templatewb.write(fos);

*//END*

When we open the generated excel file, all the accents are corrupt. I have
noticed that if the input template file contains an accent(for example, in a
column header), it also gets corrupted in the output excel file.

It seems like the output is not being written in the right encoding. As far
as I have seen in the documentation, it is not posible to set the encoding
when we write a workbook to disk.

¿Any other clue to solve this problem?

thanks & regards

Jon Mikel






--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Encoding-problems-reading-excel-files-tp5711320p5711323.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding problems reading excel files

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 29 Oct 2012, jonmikelm wrote:
> ¿Is there a way to tell POI that it is reading a CP1252 encoded file?

The files should include that information

> ¿Does POI take care of encodings or it just takes them from the enviroment
> values?

POI turns bytes into Java Strings based on the info in the files

> I don't get on well with encodings, they always make me loose a lot of
> time...

Make sure you're *output* is correct. Most POI encoding confusion is 
actually people printing out debug, html etc in the wrong encoding!

Nick