You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by "Polk, Scott W" <Sc...@Pearson.com> on 2013/02/07 18:25:13 UTC

Failing to read text in bookmarks for Word

I am attempting to read text out of a table cell that is bookmarked
(yes, the table cell is bookmarked, not the text inside the table cell)
using HWPF.  The results I am receiving are incorrect.

 

The document (.doc) is set up with 1 table with 2 rows and 4 cells in
each row.  Each cell is bookmarked, and each bookmark is named cell1,
cell2, cell3, etc. for testing purposes.  Each cell (or bookmark) has
text that represents the row and column like "R1 C1", "R1 C2", "R1 C3",
"R1 C4", "R2 C1", etc.

 

When I use the following code:

 

                POIFSFileSystem poifs = new POIFSFileSystem(new
FileInputStream(path));

                HWPFDocument wdDoc = new HWPFDocument(poifs);

                

                // get a list of all bookmarks in the document

                Bookmarks bookmarks = wdDoc.getBookmarks();

                for (int i = 0; i < bookmarks.getBookmarksCount(); i++)
{

                                Bookmark bkm = bookmarks.getBookmark(i);

                                

                                Range bkmRange = new
Range(bkm.getStart(), bkm.getEnd(), wdDoc);

                                System.out.println(bkm.getName());

                                System.out.println("  Start: " +
bkm.getStart());

                                System.out.println("  End: " +
bkm.getEnd());

                                System.out.println("  Text: " +
bkmRange.text());

                }

 

I receive the following results (the underscores represent End of Cell
and/or End of Row markers):

 

cell1

  Start: 0

  End: 25

  Text: R1 C1_R1 C2_R1 C3_R1 C4__

cell2

  Start: 0

  End: 25

  Text: R1 C1_R1 C2_R1 C3_R1 C4__

cell3

  Start: 0

  End: 25

  Text: R1 C1_R1 C2_R1 C3_R1 C4__

cell4

  Start: 0

  End: 25

  Text: R1 C1_R1 C2_R1 C3_R1 C4__

cell5

  Start: 25

  End: 50

  Text: R2 C1_R2 C2_R2 C3_R2 C4__

cell6

  Start: 25

  End: 50

  Text: R2 C1_R2 C2_R2 C3_R2 C4__

cell7

  Start: 25

  End: 50

  Text: R2 C1_R2 C2_R2 C3_R2 C4__

cell8

  Start: 25

  End: 50

  Text: R2 C1_R2 C2_R2 C3_R2 C4__

 

How do I get the text of only the cell that is bookmarked rather than
the entire row of text?  It is quite obvious that the start and end
ranges are incorrect.  I've been trying to figure this out for quite
some time and have attempted to get an answer 3 other times with no
responses (one attempt was to this mailing list).  I am not in any real
rush to get this done since I have a .NET tool built for the time being
that does something similar to this using Word automation (very
slooowwww).  Would someone PLEASE help me figure this out (yes, I am
begging)?  I will gladly post or attach my test document for anyone to
use.  Just tell me where to post it.

 

-Scott