You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@poi.apache.org by kyopedlr <ky...@yahoo.com> on 2011/09/06 23:03:39 UTC

Identifying a tab in .doc

Greetings All.

I am having some issues using the WordExtractor to remove content from my
word document. When I use:

"
HWPFDocument doc = new HWPFDocument( stream );
WordExtractor word = new WordExtractor(doc);
word.getText();
"

It removes the tabs I have placed in the beginning of any paragraphs in my
word document. It does matter if I have one or two tabs in the beginning of
the paragraph. It seems to be triming the entire paragraph.

Is there anyway to eliminate the trim or detect the tabs in the
HWPFDocument?

Thanks.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Identifying-a-tab-in-doc-tp4776070p4776070.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Identifying a tab in .doc

Posted by Nick Burch <ni...@alfresco.com>.

On Tue, 6 Sep 2011, kyopedlr wrote:
> I am having some issues using the WordExtractor to remove content from my
> word document. When I use:
>
> "
> HWPFDocument doc = new HWPFDocument( stream );
> WordExtractor word = new WordExtractor(doc);
> word.getText();

If you want full control over the text you get, I'd suggest you don't use 
the WordExtractor. Instead, get the paragraphs from the HWPFDocument 
directly, and fetch the text (and any formatting you want) from those. 
That way you get complete control

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org