You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Julien Nioche <li...@gmail.com> on 2011/04/06 16:30:31 UTC

Invisible text displayed for headings in doc files

Hi guys,

We are currently getting duplicated text for the heading from .doc files
e.g.

*<p class="index_Heading"><b>29. No Partnership or Agency</b><b> XE "29. No
Partnership or Agency" </b></p>*

XE seems to be a flag in MS Word
http://taxonomist.tripod.com/indexing/wordflags.html but I don't think it
should be displayed.

Have I missed a parameter somewhere that could be used to hide these things
or shall I open a JIRA?

BTW is the class name vary from one user to another (depending on the
stylesheet) or is it consistent?

Thanks

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com