You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Branden Visser <mr...@gmail.com> on 2016/07/19 22:37:37 UTC

Finding document language?

Hi all,

Does anyone know the best way to get the document language for both
XWPF and HWPF documents?

I'm guessing if it's File > Properties > Custom, then it can be
extracted from the custom properties in XWPF, but is there a similar
API available in HWPF?

Thanks,
Branden

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Finding document language?

Posted by Branden Visser <mr...@gmail.com>.
Thanks again Timothy, that info is helpful. It sounds like HWPF simply
doesn't have any document-level language setting. FWIW, I've found
that the language code of the character runs are quite reliably set
for the most part.

Additionally, it seems for XWPF that the custom properties does retain
the document language when set in Word, which is nice. Whether or not
it's accurate or useful is another task ;)

All the best,
Branden

On Wed, Jul 20, 2016 at 10:24 AM, Allison, Timothy B.
<ta...@mitre.org> wrote:
> Again, this may miss the mark of the document language.
>
> This [1] points out how to get the language from each run in HWPF: CharacterRun.getLanguageCode();
>
> in XWPF, the lang can be stored in the run's properties:
> <w:r><w:rPr><w:lang w:bidi="ar-QA"/></w:rPr><w:t xml:space="preserve">here is the text</w:t>
>
> [1] http://stackoverflow.com/questions/28904283/generate-a-word-document-using-different-languages
>
> -----Original Message-----
> From: Branden Visser [mailto:mrvisser@gmail.com]
> Sent: Wednesday, July 20, 2016 10:22 AM
> To: POI Users List <us...@poi.apache.org>
> Subject: Re: Finding document language?
>
> Hi Timothy, thanks for your reply.
>
> I'm not trying to learn what the language of a document is, I'm actually just trying to see if the language of the document was set and if so, what it was set to. That said, do you recall how to get the language metadata?
>
> Thanks,
> Branden
>
> On Wed, Jul 20, 2016 at 6:12 AM, Allison, Timothy B. <ta...@mitre.org> wrote:
>> This doesn't answer your question on HWPF.
>>
>> Last I looked at this, a few years ago, I figured out how to get the
>> language via OLE, and it was so rarely populated that it was better to
>> run language id on the extracted content.  For language id (in Java),
>> consider optimaize or yalder
>>
>>
>>
>> -----Original Message-----
>> From: Branden Visser [mailto:mrvisser@gmail.com]
>> Sent: Tuesday, July 19, 2016 6:38 PM
>> To: POI Users List <us...@poi.apache.org>
>> Subject: Finding document language?
>>
>> Hi all,
>>
>> Does anyone know the best way to get the document language for both XWPF and HWPF documents?
>>
>> I'm guessing if it's File > Properties > Custom, then it can be extracted from the custom properties in XWPF, but is there a similar API available in HWPF?
>>
>> Thanks,
>> Branden
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional
>> commands, e-mail: user-help@poi.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Finding document language?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Again, this may miss the mark of the document language.

This [1] points out how to get the language from each run in HWPF: CharacterRun.getLanguageCode();

in XWPF, the lang can be stored in the run's properties:
<w:r><w:rPr><w:lang w:bidi="ar-QA"/></w:rPr><w:t xml:space="preserve">here is the text</w:t>

[1] http://stackoverflow.com/questions/28904283/generate-a-word-document-using-different-languages

-----Original Message-----
From: Branden Visser [mailto:mrvisser@gmail.com] 
Sent: Wednesday, July 20, 2016 10:22 AM
To: POI Users List <us...@poi.apache.org>
Subject: Re: Finding document language?

Hi Timothy, thanks for your reply.

I'm not trying to learn what the language of a document is, I'm actually just trying to see if the language of the document was set and if so, what it was set to. That said, do you recall how to get the language metadata?

Thanks,
Branden

On Wed, Jul 20, 2016 at 6:12 AM, Allison, Timothy B. <ta...@mitre.org> wrote:
> This doesn't answer your question on HWPF.
>
> Last I looked at this, a few years ago, I figured out how to get the 
> language via OLE, and it was so rarely populated that it was better to 
> run language id on the extracted content.  For language id (in Java), 
> consider optimaize or yalder
>
>
>
> -----Original Message-----
> From: Branden Visser [mailto:mrvisser@gmail.com]
> Sent: Tuesday, July 19, 2016 6:38 PM
> To: POI Users List <us...@poi.apache.org>
> Subject: Finding document language?
>
> Hi all,
>
> Does anyone know the best way to get the document language for both XWPF and HWPF documents?
>
> I'm guessing if it's File > Properties > Custom, then it can be extracted from the custom properties in XWPF, but is there a similar API available in HWPF?
>
> Thanks,
> Branden
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional 
> commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org


Re: Finding document language?

Posted by Branden Visser <mr...@gmail.com>.
Hi Timothy, thanks for your reply.

I'm not trying to learn what the language of a document is, I'm
actually just trying to see if the language of the document was set
and if so, what it was set to. That said, do you recall how to get the
language metadata?

Thanks,
Branden

On Wed, Jul 20, 2016 at 6:12 AM, Allison, Timothy B. <ta...@mitre.org> wrote:
> This doesn't answer your question on HWPF.
>
> Last I looked at this, a few years ago, I figured out how to get the language via OLE, and it was so rarely populated that it was better to run language id on the extracted content.  For language id (in Java), consider optimaize or yalder
>
>
>
> -----Original Message-----
> From: Branden Visser [mailto:mrvisser@gmail.com]
> Sent: Tuesday, July 19, 2016 6:38 PM
> To: POI Users List <us...@poi.apache.org>
> Subject: Finding document language?
>
> Hi all,
>
> Does anyone know the best way to get the document language for both XWPF and HWPF documents?
>
> I'm guessing if it's File > Properties > Custom, then it can be extracted from the custom properties in XWPF, but is there a similar API available in HWPF?
>
> Thanks,
> Branden
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Finding document language?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
This doesn't answer your question on HWPF.

Last I looked at this, a few years ago, I figured out how to get the language via OLE, and it was so rarely populated that it was better to run language id on the extracted content.  For language id (in Java), consider optimaize or yalder 



-----Original Message-----
From: Branden Visser [mailto:mrvisser@gmail.com] 
Sent: Tuesday, July 19, 2016 6:38 PM
To: POI Users List <us...@poi.apache.org>
Subject: Finding document language?

Hi all,

Does anyone know the best way to get the document language for both XWPF and HWPF documents?

I'm guessing if it's File > Properties > Custom, then it can be extracted from the custom properties in XWPF, but is there a similar API available in HWPF?

Thanks,
Branden

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org For additional commands, e-mail: user-help@poi.apache.org