You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Rainer Schwarze <rs...@admadic.de> on 2007/02/01 01:08:41 UTC

Re: Identify document language???

At 14:59 30.01.2007, Thang To wrote:
>Hello all,
>
>Is there anyway to identify the document language? I know one can retrieve
>the 'language' property in CustomProperties but this property is usually
>unset. When this property is blank, Word is still able to detect the correct
>language of the document.

Hello Thang,

as far as I know Word can set different languages for arbitrary regions of
text. (I "liked" the autodetect language feature a couple of years ago,
because it detected my German technical typing to be French, a few lines
below it decided that I now write Swedish or something like that... I
noticed that only because of the strange quote-signs Word "auto corrected"...)

I did not go into language details, so I'm not sure which is the right
source of information. 

You might first do HWPFDocument.getFileInformationBlock().getLid(), but
this may return the language of the Word which was used to create the file.
Maybe its the default document language - I don't know for sure right now.

If the results look strange, try to use CharacterProperties.getLidDefault()
or ...getLidFE() for far east versions. To retrieve a CharacterProperties
instance get a Range from the document and do something like
range.getCharacterRun(index).cloneProperties().

Best wishes,
Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


RE: Identify document language???

Posted by Rainer Schwarze <rs...@admadic.de>.
At 01:05 03.02.2007, Thang To wrote:
>Dear Rainer,
>
>Thank you very much for your response.
>
>One more question, the getLid() method returns an integer. How can I map
>that number to the corresponding language name?

Hi Thang,

you can go to www.wotsit.org and lookup the Word format specification.
Download the files and look in the section for the CHP structure. The FIB
structure is probably also interesting. But beware, the specification is
wrong in some places and you have to check things yourself to be sure.

Best wishes,
Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


RE: Identify document language???

Posted by Thang To <th...@uea.ac.uk>.
Dear Rainer,

Thank you very much for your response.

One more question, the getLid() method returns an integer. How can I map
that number to the corresponding language name?

Best regards,
Thang


-----Original Message-----
From: Rainer Schwarze [mailto:rsc@admadic.de] 
Sent: 01 February 2007 00:09
To: POI Users List
Subject: Re: Identify document language???

At 14:59 30.01.2007, Thang To wrote:
>Hello all,
>
>Is there anyway to identify the document language? I know one can retrieve
>the 'language' property in CustomProperties but this property is usually
>unset. When this property is blank, Word is still able to detect the
correct
>language of the document.

Hello Thang,

as far as I know Word can set different languages for arbitrary regions of
text. (I "liked" the autodetect language feature a couple of years ago,
because it detected my German technical typing to be French, a few lines
below it decided that I now write Swedish or something like that... I
noticed that only because of the strange quote-signs Word "auto
corrected"...)

I did not go into language details, so I'm not sure which is the right
source of information. 

You might first do HWPFDocument.getFileInformationBlock().getLid(), but
this may return the language of the Word which was used to create the file.
Maybe its the default document language - I don't know for sure right now.

If the results look strange, try to use CharacterProperties.getLidDefault()
or ...getLidFE() for far east versions. To retrieve a CharacterProperties
instance get a Range from the document and do something like
range.getCharacterRun(index).cloneProperties().

Best wishes,
Rainer


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/