You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "chaitanya.gorugontula@kkr.com" <ch...@kkr.com> on 2010/12/17 15:44:46 UTC

Encoding Issue - POI 3.6

Hi,

I am using POI version 3.6 . I am trying to read some excel data from cells
through java code and save to mySQL database.
The cell that I am reading, I am expecting characters from multiple
languages which includes Turkish. Currently the table column that I store
data into in mySQL is having 'latin1' encoding and the windows machine on
which the java program is running is having the default character encoding
set to 'windows-1252'.

I changed the mySQL column encoding to UTF-8 and tried to ovveride the
default encoding type that java is using also to UTF-8, though I see many
suggesting that all Strings in java are UTF-16 already and hence must be
able to support quite a few character types.

After all these changes, when I am trying to read a simple Excel cell with
Turkish characters in it and print them out on the console, I see them
getting garbled.

It would be really helpful, if someone can point me in where I might be
doing a mistake or what I need to change to support multiple character sets
when reading from Excel. Is there anyway that I can try telling POI to read
a specific cell using a particular encoding. I learnt from one of existing
posts that POI is smart enough to figure it out itself, which is best
encoding to use to read the data from the cell.

Thanks,
Chai
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/Encoding-Issue-POI-3-6-tp3309486p3309486.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding Issue - POI 3.6

Posted by "chaitanya.gorugontula@kkr.com" <ch...@kkr.com>.
Thanks Nan...

I have written a much simpler program to read just one cell from an Excel
Spreadsheet and print it out onto the console and 1 cell... contains some
text with turkish characters. I am printing the text of 

Erkan-i Harp Sokak No:4 Tünel / Beyoğlu / İstanbul

Just reading first cell and printing out the value. The text is getting
garbled when I print it on the console.
Let me know, if you see anything wrong in the way I am handling this.

Code
-----

import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFDateUtil;

import java.io.FileInputStream;
import java.io.FileNotFoundException;

public class Test {
    public static void main(String[] args) throws Exception {
        String filename = "c:\\svn-datadump\\1.xls";

        FileInputStream fis = null;
        try {
            fis = new FileInputStream(filename);

            HSSFWorkbook workbook = new HSSFWorkbook(fis);
            HSSFSheet sheet = workbook.getSheetAt(0);

            //
            // Read a cell the first cell on the sheet.
            //
            HSSFCell cell = sheet.getRow(0).getCell(0);
		System.out.println("cell value read is ["+cell.getStringCellValue()+"]");

            //
            // Using HSSFDateUtil to check if a cell contains a date.
            //
            //if (HSSFDateUtil.isCellDateFormatted(cell)) {
            //    System.out.println("The cell contains a date value: " +
cell.getDateCellValue());
            //}
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } finally {
            if (fis != null) {
                fis.close();
            }
        }
    }
}
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/Encoding-Issue-POI-3-6-tp3309486p3309864.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding Issue - POI 3.6

Posted by Nan Lei <na...@gmail.com>.
I think the way of 2003 is similar, just use HSSF* instead of XSSF*,for
example:

private static List<Student> readFromXLS2003(String filePath) {
File excelFile = null;
InputStream is = null;
String cellStr = null;
List<Student> studentList = new ArrayList<Student>();
Student student = null;
try {
excelFile = new File(filePath);
is = new FileInputStream(excelFile);
HSSFWorkbook workbook2003 = new HSSFWorkbook(is);
HSSFSheet sheet = workbook2003.getSheetAt(0);
                        for (int i = 1; i <= sheet.getLastRowNum(); i++) {
student = new Student();
HSSFRow row = sheet.getRow(i);
if (row == null) {
continue;
}
for (int j = 0; j < row.getLastCellNum(); j++) {
HSSFCell cell = row.getCell(j);
if (cell == null) {
cellStr = "";
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_BOOLEAN) {
cellStr = String.valueOf(cell.getBooleanCellValue());
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
cellStr = cell.getNumericCellValue() + "";
} else {
cellStr = cell.getStringCellValue();
}
if (j == 0) {
student.setName(cellStr);
} else if (j == 1) {
student.setGender(cellStr);
} else if (j == 2) {
student.setAge(new Double(cellStr).intValue());
} else if (j == 3) {
student.setSclass(cellStr);
} else {
student.setScore(new Double(cellStr).intValue());
}
}
studentList.add(student);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return studentList;
}


and the Student bean is:
public class Student {
private String name;
private String gender;
private int age;
private String sclass;
private int score;

public Student() {
super();
}

public Student(String name, String gender, int age, String sclass, int
score) {
super();
this.name = name;
this.gender = gender;
this.age = age;
this.sclass = sclass;
this.score = score;
}
//getters and setters
}

The xls file has 5 rows, the 1st is the
header(Name,gender,age,class&score),the rest is the students' infomation.

My Environment is Eclipse 3.5 & JDK 1.6.21, in the eclipse, I set all the
encoding properties into UTF-8, so the charset is no problem.

And in the console, it is normal too.

Best Wishes!
Nan Lei

Re: Encoding Issue - POI 3.6

Posted by "chaitanya.gorugontula@kkr.com" <ch...@kkr.com>.
Hi Nan,

thanks for the reply. I am sorry I dint mention in my earlier post that I
was working with MS Office 2003.
As of now, we are not supporting MS Office 2007. So, I am using all "HSSF"
stuff...

Could you clarify if you had similar success handling characters by reading
them from Office 2003 ?
My Environment  - MS Excel 2003 - POI 3.6 - Windows XP Machine - mySQL DB. 

I think saving to DB in the right format is not the issue now. I am not able
to read the data and print it on to the console in the right format, without
characters getting garbled.

Thanks,
Chai
-- 
View this message in context: http://apache-poi.1045710.n5.nabble.com/Encoding-Issue-POI-3-6-tp3309486p3309570.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Encoding Issue - POI 3.6

Posted by Nan Lei <na...@gmail.com>.
Hi,

I am a Chinese developer, I use UTF-8 as the charset for the coding and
database(MySQL InnoDB engine). So there is no problem for the Chinese
character.

For the coding, from workbook,sheet,row to cell, read the cell value as
Boolean, Number and String.

For example,
public static List<Student> readFromXLSX2007(String filePath) {
File excelFile = null;
InputStream is = null;
String cellStr = null;
List<Student> studentList = new ArrayList<Student>();
Student student = null;
try {
excelFile = new File(filePath);
is = new FileInputStream(excelFile);
XSSFWorkbook workbook2007 = new XSSFWorkbook(is);
XSSFSheet sheet = workbook2007.getSheetAt(0);
for (int i = 1; i <= sheet.getLastRowNum(); i++) {
student = new Student();
XSSFRow row = sheet.getRow(i);
if (row == null) {
continue;
}
for (int j = 0; j < row.getLastCellNum(); j++) {
XSSFCell cell = row.getCell(j);
if (cell == null) {
cellStr = "";
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_BOOLEAN) {
cellStr = String.valueOf(cell.getBooleanCellValue());
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
cellStr = cell.getNumericCellValue() + "";
} else {
cellStr = cell.getStringCellValue();
}
if (j == 0) {
student.setName(cellStr);
} else if (j == 1) {
student.setGender(cellStr);
} else if (j == 2) {
student.setAge(new Double(cellStr).intValue());
} else if (j == 3) {
student.setSclass(cellStr);
} else {
student.setScore(new Double(cellStr).intValue());
}
}
studentList.add(student);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return studentList;
}

I think UTF-8 is good choice.

Best Wishes!
Nan Lei