You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Jussi Koiranen <ju...@solita.fi> on 2003/08/27 09:26:33 UTC

'ä', 'ö' and 'å' with WordDocument

I am readin word document with org.apache.poi.hdf.extractor.WordDocument as
follows:

    WordDocument wordDoc = new WordDocument("test.doc");
    StringWriter strWriter = new StringWriter();
    wordDoc.writeAllText(strWriter);
    System.out.println(strWriter); //for debuging

But ä, ö and å are not read correctly from word document (test.doc).
I am doing something wrong, or is this bug?

I am tested this with jakarta-poi-1.5.1-final-bin and
jakarta-poi-1.8.0-dev-bin packages.

Jussi Koiranen


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: 'ä', 'ö' and 'å' with WordDocument

Posted by Jussi Koiranen <ju...@solita.fi>.
I downloaded sources from CVS and compiled,
now 'ä', 'ö' and 'å' are woking.

I down't know why it didn't work with the release-version.

Jussi Koiranen

----- Original Message ----- 
From: "Ryan Ackley" <sa...@cfl.rr.com>
To: "POI Users List" <po...@jakarta.apache.org>
Sent: Wednesday, August 27, 2003 5:23 PM
Subject: Re: 'ä', 'ö' and 'å' with WordDocument


> Your StringWriter may not be using the correct character encoding.
Character
> encoding determines how java converts bytes to characters. I think the
> default is utf-8 for java Strings. One way to test this out is to step
> through the code and find the actual bytes that are being read from the
Word
> doc then go to http://www.unicode.org and verify that these are the
correct
> bytes for 'ä', 'ö' and 'å'. If they are correct then the encoding you are
> using is wrong.
>
> If its our fault I will try to address this issue in a future release.
>
> Ryan
>
> ----- Original Message ----- 
> From: "Jussi Koiranen" <ju...@solita.fi>
> To: <po...@jakarta.apache.org>
> Sent: Wednesday, August 27, 2003 3:26 AM
> Subject: 'ä', 'ö' and 'å' with WordDocument
>
>
> > I am readin word document with org.apache.poi.hdf.extractor.WordDocument
> as
> > follows:
> >
> >     WordDocument wordDoc = new WordDocument("test.doc");
> >     StringWriter strWriter = new StringWriter();
> >     wordDoc.writeAllText(strWriter);
> >     System.out.println(strWriter); //for debuging
> >
> > But ä, ö and å are not read correctly from word document (test.doc).
> > I am doing something wrong, or is this bug?
> >
> > I am tested this with jakarta-poi-1.5.1-final-bin and
> > jakarta-poi-1.8.0-dev-bin packages.
> >
> > Jussi Koiranen
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: poi-user-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: 'ä', 'ö' and 'å' with WordDocument

Posted by Ryan Ackley <sa...@cfl.rr.com>.
Your StringWriter may not be using the correct character encoding. Character
encoding determines how java converts bytes to characters. I think the
default is utf-8 for java Strings. One way to test this out is to step
through the code and find the actual bytes that are being read from the Word
doc then go to http://www.unicode.org and verify that these are the correct
bytes for 'ä', 'ö' and 'å'. If they are correct then the encoding you are
using is wrong.

If its our fault I will try to address this issue in a future release.

Ryan

----- Original Message ----- 
From: "Jussi Koiranen" <ju...@solita.fi>
To: <po...@jakarta.apache.org>
Sent: Wednesday, August 27, 2003 3:26 AM
Subject: 'ä', 'ö' and 'å' with WordDocument


> I am readin word document with org.apache.poi.hdf.extractor.WordDocument
as
> follows:
>
>     WordDocument wordDoc = new WordDocument("test.doc");
>     StringWriter strWriter = new StringWriter();
>     wordDoc.writeAllText(strWriter);
>     System.out.println(strWriter); //for debuging
>
> But ä, ö and å are not read correctly from word document (test.doc).
> I am doing something wrong, or is this bug?
>
> I am tested this with jakarta-poi-1.5.1-final-bin and
> jakarta-poi-1.8.0-dev-bin packages.
>
> Jussi Koiranen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: How to convert .xls file to .csv format file using POI

Posted by Avik Sengupta <av...@apache.org>.
Oh, well.. you said few ...:)

Processing Excel files on the server side is what poi does best... so
yes, it is certainly possible to do it in poi. However, POI is a file
format reader, so it doesnt have any application level code it in ..
(and for good reason too, imo!)

So POI will give you programmatic access to the excel file, and its your
job to extract the data. The complexity of that code is primarily a
function of the complexity of the sheet. For a simple sheet, it should
be about five lines of code. Check the POI website, particularly the FAQ
and HSSF sections, and see the examples directory of the distribution.

Note also, that POI is best used for reading if you can have a modicum
of control over the format of the sheets sent to you. It is certainly
possible to throw a sheet at POI that makes it barf, but if you do
proper error handling, you should be OK. And yes, lots of people DO use
it in production. 

If you get stuck, ask on the lists. 

Regards
-
Avik


On Wed, 2003-08-27 at 18:55, Ashwani Mayur wrote:
> Avik,
> Thanks for a quick reply. My client gets approx 500 files every day on a
> unix file server and
> would like us to process them based on some logic and then load the data in
> Oracle database.
> I was wondering if it could be possible to convert the files
> programatically in csv format using POI ?
> 
> Thanks
> Ashwani
> 
> 
> Avik Sengupta wrote:
> 
> > There is a Save As function in Excel.. wont that do?
> > On Wed, 2003-08-27 at 18:14, Ashwani Mayur wrote:
> > > I am seeking  help to convert few of my .xls files to .csv format.
> > > Is there any existing function in the POI for this job?
> > >
> > > Thanks for any help
> > > Ashwani
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: poi-user-help@jakarta.apache.org
> > >
> > --
> > Avik Sengupta <av...@apache.org>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: poi-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
> 
-- 
Avik Sengupta <av...@apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: How to convert .xls file to .csv format file using POI

Posted by Ashwani Mayur <as...@fanniemae.com>.
Avik,
Thanks for a quick reply. My client gets approx 500 files every day on a
unix file server and
would like us to process them based on some logic and then load the data in
Oracle database.
I was wondering if it could be possible to convert the files
programatically in csv format using POI ?

Thanks
Ashwani


Avik Sengupta wrote:

> There is a Save As function in Excel.. wont that do?
> On Wed, 2003-08-27 at 18:14, Ashwani Mayur wrote:
> > I am seeking  help to convert few of my .xls files to .csv format.
> > Is there any existing function in the POI for this job?
> >
> > Thanks for any help
> > Ashwani
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: poi-user-help@jakarta.apache.org
> >
> --
> Avik Sengupta <av...@apache.org>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: How to convert .xls file to .csv format file using POI

Posted by Ashwani Mayur <as...@fanniemae.com>.
Amon,

I was moving on that direction and bumped into few problems .
1. In the usermodel, I was not able to read the formula fields. Infact I
could not find any function that could read the value of the formula column
for me in the UserModel (I am sure there will be some valid reason for
that). The getCellFormula() function provides me the formula name and this
one does not work when the user have copy pasted the formula from one row
to the rest of the rows.
>From one of the archived mails I came to know that this is an issue (might
get fixed in future) with the formula fields in the present versions.

2. When I tried the eventmodel, I was able to read the formula values
without any problem but date fields were the problem. There are various
subclasses of the org.apache.poi.hssf.record.Record class including the
FormulaRecord but I am having hard time to read the date field because I
could not find the DateRecord or equivalent class to read the date format
fields.
In the utility class there is one function to read the date format but that
takes HSSFCell as an input and I am still not able to figure out how to
pass a record to this util class to get the date field.

Also as I already metioned that my requirement was only to read the data
from the selected columns and not to do any formula manipulation so I
though that conversion to CSV could be one of the possiblities to read.

Once again I do appreciate the hard work put in by the folks in this
project and the qucik replies from every one

Ashwani





---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: How to convert .xls file to .csv format file using POI

Posted by Avik Sengupta <av...@apache.org>.
There is a Save As function in Excel.. wont that do?
On Wed, 2003-08-27 at 18:14, Ashwani Mayur wrote:
> I am seeking  help to convert few of my .xls files to .csv format.
> Is there any existing function in the POI for this job?
> 
> Thanks for any help
> Ashwani
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
> 
-- 
Avik Sengupta <av...@apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


How to convert .xls file to .csv format file using POI

Posted by Ashwani Mayur <as...@fanniemae.com>.
I am seeking  help to convert few of my .xls files to .csv format.
Is there any existing function in the POI for this job?

Thanks for any help
Ashwani



---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org


Re: 'ä', 'ö' and 'å' with WordDocument

Posted by Mark Fortner <ph...@mindspring.com>.
I wonder if the problem is the way in which you're writing the text 
out.  Could you try this again
with this snippet of code I found on javaalmanac.com:
try {
         Writer out = new BufferedWriter(new OutputStreamWriter(
             new FileOutputStream( "outfilename" ), "UTF8"));
         out.write( aString );
         out.close();
     } catch (UnsupportedEncodingException e) {
     } catch (IOException e) {
     }
This would at least insure that the you're using the write encoding to 
write
stuff out.

The other thing you could do is modify your unit test for your code
to assert whether or not the proper unicode number is returned.  You 
would
create a simple Word document with ä in it for example, and then assert 
whether
or not the unicode value returned was 00E4.

Hope this helps,

Mark




On Wednesday, August 27, 2003, at 02:26 AM, Jussi Koiranen wrote:

> I am readin word document with 
> org.apache.poi.hdf.extractor.WordDocument as
> follows:
>
>     WordDocument wordDoc = new WordDocument("test.doc");
>     StringWriter strWriter = new StringWriter();
>     wordDoc.writeAllText(strWriter);
>     System.out.println(strWriter); //for debuging
>
> But ä, ö and å are not read correctly from word document (test.doc).
> I am doing something wrong, or is this bug?
>
> I am tested this with jakarta-poi-1.5.1-final-bin and
> jakarta-poi-1.8.0-dev-bin packages.
>
> Jussi Koiranen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: poi-user-help@jakarta.apache.org
>