You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Julien Nioche <li...@gmail.com> on 2011/04/06 16:30:31 UTC
Invisible text displayed for headings in doc files
Hi guys,
We are currently getting duplicated text for the heading from .doc files
e.g.
*<p class="index_Heading"><b>29. No Partnership or Agency</b><b> XE "29. No
Partnership or Agency" </b></p>*
XE seems to be a flag in MS Word
http://taxonomist.tripod.com/indexing/wordflags.html but I don't think it
should be displayed.
Have I missed a parameter somewhere that could be used to hide these things
or shall I open a JIRA?
BTW is the class name vary from one user to another (depending on the
stylesheet) or is it consistent?
Thanks
Julien
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com