You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by A Z <st...@live.co.uk> on 2013/03/07 09:28:40 UTC
Questions about java TIKA project.
//-----------------------------------------------------------------------------------
I notice that the java TIKA project is for file format support using java
and various Office file formats.
I also notice that you are building on POI (presumably 3.9).
-POI has shortfalls around HWPFDocument objects; Microsoft Word
.doc files. One may not really easily insert
org.apache.poi.hwpf.usermodel.Picture
objects into the document and save it with success.
setFtcAscii(int ftcAscii)
setFtcFE(int ftcFE)
functions don't make it easy to alter Font information in an HWPFDocument,
with their intended names, certainly the int values for font, by no means
evident as they aren't included as fields in a companion class.
-Is your project about addressing these sorts of shortfalls inside POI?
//-----------------------------------------------------------------------------------
-Similarly, I want more support for dealing with *.rtf files. Particularly
to insert text and images, and not simply append them. I also want the ability
to read images out of *.rtf files too. Are these going to be dealt with?
//-----------------------------------------------------------------------------------
Re: Questions about java TIKA project.
Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 7 Mar 2013, A Z wrote:
> I also notice that you are building on POI (presumably 3.9).
>
> -POI has shortfalls around HWPFDocument objects; Microsoft Word
> .doc files. One may not really easily insert
>
> org.apache.poi.hwpf.usermodel.Picture
Apache Tika only reads files in through the various libraries it uses, so
write/change support in libraries like Apache POI don't affect Tika.
If these limitations in POI do affect you, then the best bet is to ask for
advice from the Apache POI community, and work up patches to add in the
missing features!
> -Similarly, I want more support for dealing with *.rtf files. Particularly
> to insert text and images, and not simply append them.
Again, Tika is only interested in reading data out of RTF formats, not
making changes to them, so that sort of thing is out of scope
Nick