You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Polk, Scott W" <Sc...@Pearson.com> on 2013/02/07 18:25:13 UTC
Failing to read text in bookmarks for Word
I am attempting to read text out of a table cell that is bookmarked
(yes, the table cell is bookmarked, not the text inside the table cell)
using HWPF. The results I am receiving are incorrect.
The document (.doc) is set up with 1 table with 2 rows and 4 cells in
each row. Each cell is bookmarked, and each bookmark is named cell1,
cell2, cell3, etc. for testing purposes. Each cell (or bookmark) has
text that represents the row and column like "R1 C1", "R1 C2", "R1 C3",
"R1 C4", "R2 C1", etc.
When I use the following code:
POIFSFileSystem poifs = new POIFSFileSystem(new
FileInputStream(path));
HWPFDocument wdDoc = new HWPFDocument(poifs);
// get a list of all bookmarks in the document
Bookmarks bookmarks = wdDoc.getBookmarks();
for (int i = 0; i < bookmarks.getBookmarksCount(); i++)
{
Bookmark bkm = bookmarks.getBookmark(i);
Range bkmRange = new
Range(bkm.getStart(), bkm.getEnd(), wdDoc);
System.out.println(bkm.getName());
System.out.println(" Start: " +
bkm.getStart());
System.out.println(" End: " +
bkm.getEnd());
System.out.println(" Text: " +
bkmRange.text());
}
I receive the following results (the underscores represent End of Cell
and/or End of Row markers):
cell1
Start: 0
End: 25
Text: R1 C1_R1 C2_R1 C3_R1 C4__
cell2
Start: 0
End: 25
Text: R1 C1_R1 C2_R1 C3_R1 C4__
cell3
Start: 0
End: 25
Text: R1 C1_R1 C2_R1 C3_R1 C4__
cell4
Start: 0
End: 25
Text: R1 C1_R1 C2_R1 C3_R1 C4__
cell5
Start: 25
End: 50
Text: R2 C1_R2 C2_R2 C3_R2 C4__
cell6
Start: 25
End: 50
Text: R2 C1_R2 C2_R2 C3_R2 C4__
cell7
Start: 25
End: 50
Text: R2 C1_R2 C2_R2 C3_R2 C4__
cell8
Start: 25
End: 50
Text: R2 C1_R2 C2_R2 C3_R2 C4__
How do I get the text of only the cell that is bookmarked rather than
the entire row of text? It is quite obvious that the start and end
ranges are incorrect. I've been trying to figure this out for quite
some time and have attempted to get an answer 3 other times with no
responses (one attempt was to this mailing list). I am not in any real
rush to get this done since I have a .NET tool built for the time being
that does something similar to this using Word automation (very
slooowwww). Would someone PLEASE help me figure this out (yes, I am
begging)? I will gladly post or attach my test document for anyone to
use. Just tell me where to post it.
-Scott