You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by semendian <se...@mail.ru> on 2012/06/06 12:59:04 UTC

Replacing the contents of the bookmarks

How do I replace the contents of the bookmarks in the document (docx and
doc)?
Bookmarks are in the text and tables.
I need to paste the text instead of Bookmarks.
I would be grateful to the sample how to do it.
Please help me.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-contents-of-the-bookmarks-tp5710052.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by Sander Postma <sa...@yahoo.com>.
Addition, replace .document.getParagraphs() with collectParagraphs() :




--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5724923.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Now handles bookmarks in tables and it strips out any and all text that may
be contained within the bookmark. I still do not know if this is correct
however.

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTMarkupRange;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;
import java.util.List;
import java.util.Iterator;
import org.apache.xmlbeans.XmlCursor;
import org.w3c.dom.Node;

/**
 * Second attempt at inserting text at a bookmark defined within a Word
document.
 * Note that there is one SERIOUS limitations with the code as it stands; at 
 * least only one as far as I am aware: nested bookmarks.
 * 
 * It is possible to create a document and to nest one bookmark within
another.
 * Typically, a bookmark is inserted into a piece of text, that is then
selected
 * and another bookmark is added to that selection. The xml markup might
look 
 * something like this
 * 
 * <pre>
 * <w:p w:rsidR="00945150" w:rsidRDefault="00945150">
 *   <w:r>
 *     <w:t xml:space="preserve">
 *     Imagine I want to insert one bookmark at the start of this 
 *     </w:t>
 *   </w:r>
 *     <w:bookmarkStart w:id="0" w:name="OUTER"/>
 *       <w:r>
 *         <w:t xml:space="preserve">piece of text and another just 
 *         </w:t>
 *     </w:r>
 *   <w:proofErr w:type="gramStart"/>
 *   <w:r>
 *     <w:t xml:space="preserve">here 
 *     </w:t>
 *   </w:r>
 *   <w:bookmarkStart w:id="1" w:name="INNER"/>
 *   <w:bookmarkEnd w:id="1"/>
 *     <w:r>
 *       <w:t>.
 *       </w:t>
 *     </w:r>
 *   <w:bookmarkEnd w:id="0"/>
 *   <w:proofErr w:type="gramEnd"/>
 * </w:p>
 * </pre>
 * 
 * In this case the codes usual behaviour - which is to remove any nodes
found
 * between the bookmarkStart and bookmarkEnd nodes will simply result in the 
 * 'inner' or nested bookmark being removed. So, is the default behaviour
 * correct? If it is, then the code needs to be amended to handle nested 
 * bookmarks and the decision must be made about just how to handle them..
 *
 * @author Mark Beardsley
 * @version 0.20 10th June 2012
 */
public class DOCXTest {
    
    private XWPFDocument document = null;
    
    public DOCXTest() {
    }
    
    /**
     * Opens a Word OOXML file.
     * 
     * @param filename An instance of the String class that encapsulates the
     *        path to and name of a Word OOXML (.docx) file.
     * @throws IOException  Thrown if a problem occurs within the underlying
     *         file system.
     */
    public final void openFile(String filename) throws IOException {
        File file = null;
        FileInputStream fis = null;
        try {
            // Simply open the file and store a reference into the
'document'
            // local variable.
            file = new File(filename);
            fis = new FileInputStream(file);
            this.document = new XWPFDocument(fis);
        }
        finally {
            try {
                if(fis != null) {
                    fis.close();
                    fis = null;
                }
            }
            catch(IOException ioEx) {
                // Swallow this exception. It would have occured onyl
                // when releasing the file handle and should not pose
                // problems to later processing.
            }
        }
    }
    
    /**
     * Saves a Word OOXML file away under the name, and to the location, 
     * specified.
     * 
     * @param filename An instance of the String class that encapsulates the
     *        of the file and the location into which it should be stored.
     * @throws IOException  Thrown if a problem occurs in the underlying
file
     *         system.
     */
    public final void saveAs(String filename) throws IOException {
        File file = null;
        FileOutputStream fos = null;
        try {
            file = new File(filename);
            fos = new FileOutputStream(file);
            this.document.write(fos);
        }
        finally {
            if(fos != null) {
                fos.close();
               fos = null;
            }
        }
    }
    
    /**
     * Inserts a value at a location within the Word document specified by a
     * named bookmark.
     * 
     * @param bookmarkName An instance of the String class that encapsulates
     *        the name of the bookmark. Note that case is important and the
case
     *        of the bookmarks name within the document and that of the
value
     *        passed to this parameter must match.
     * @param bookmarkValue An instance of the String class that
encapsulates 
     *        the value that should be inserted into the document at the
location
     *        specified by the bookmark. 
     */
    public final void insertAtBookmark(String bookmarkName, String
bookmarkValue) {
        List<XWPFTable> tableList = null;
        Iterator<XWPFTable> tableIter = null;
        List<XWPFTableRow> rowList = null;
        Iterator<XWPFTableRow> rowIter = null;
        List<XWPFTableCell> cellList = null;
        Iterator<XWPFTableCell> cellIter = null;
        XWPFTable table = null;
        XWPFTableRow row = null;
        XWPFTableCell cell = null;
        
        // Firstly, deal with any paragraphs in the body of the document.
        this.procParaList(this.document.getParagraphs(), bookmarkName,
bookmarkValue);
        
        // Then check to see if there are any bookmarks in table cells. To
do this
        // it is necessary to get at the list of paragraphs 'stored' within
the
        // individual table cell, hence this code which get the tables from
the
        // document, the rows from each table, the cells from each row and
the 
        // paragraphs from each cell.
        tableList = this.document.getTables();
        tableIter = tableList.iterator();
        while(tableIter.hasNext()) {
            table = tableIter.next();
            rowList = table.getRows();
            rowIter = rowList.iterator();
            while(rowIter.hasNext()) {
                row = rowIter.next();
                cellList = row.getTableCells();
                cellIter = cellList.iterator();
                while(cellIter.hasNext()) {
                    cell = cellIter.next();
                    this.procParaList(cell.getParagraphs(),
                            bookmarkName,
                            bookmarkValue);
                }
            }
        }
    }
    
    /**
     * Inserts text into the document at the position indicated by a
specific
     * bookmark. Note that the current implementation does not take account
     * of nested bookmarks, that is bookmarks that contain other bookmarks.
Note
     * also that any text contained within the bookmark itself will be
removed.
     * 
     * @param paraList An instance of a class that implements the List
interface
     *        and which encapsulates references to one or more instances of
the
     *        XWPFParagraph class.
     * @param bookmarkName An instance of the String class that encapsulates
the
     *        name of the bookmark that identifies the position within the
     *        document some text should be inserted.
     * @param bookmarkValue An instance of the AString class that
encapsulates
     *        the text that should be inserted at the location specified by
the
     *        bookmark.
     */
    private final void procParaList(List<XWPFParagraph> paraList,
            String bookmarkName, String bookmarkValue) {
        Iterator<XWPFParagraph> paraIter = null;
        XWPFParagraph para = null;
        List<CTBookmark> bookmarkList = null;
        Iterator<CTBookmark> bookmarkIter = null;
        CTBookmark bookmark = null;
        XWPFRun run = null;
        Node nextNode = null;

        // Get an Iterator to step through the contents of the paragraph
list.
        paraIter = paraList.iterator();
        while(paraIter.hasNext()) {
            // Get the paragraph, a llist of CTBookmark objects and an
Iterator
            // to step through the list of CTBookmarks.
            para = paraIter.next();
            bookmarkList = para.getCTP().getBookmarkStartList();
            bookmarkIter = bookmarkList.iterator();
            
            while(bookmarkIter.hasNext()) {
                // Get a Bookmark and check it's name. If the name of the
                // bookmark matches the name the user has specified...
                bookmark = bookmarkIter.next();
                if(bookmark.getName().equals(bookmarkName)) {
                    // ...create the text run to insert and set it's text 
                    // content and then insert that text into the document.
                    run = para.createRun();
                    run.setText(bookmarkValue);
                    // The new Run should be inserted between the
bookmarkStart
                    // and bookmarkEnd nodes, so find the bookmarkEnd node.
                    // Note that we are looking for the next sibling of the
                    // bookmarkStart node as it does not contain any child
nodes
                    // as far as I am aware.
                    nextNode = bookmark.getDomNode().getNextSibling();
                    // If the next node is not the bookmarkEnd node, then
step
                    // along the sibling nodes, until the bookmarkEnd node
                    // is found. As the code is here, it will remove
anything
                    // it finds between the start and end nodes. This, of
course
                    // comepltely sidesteps the issues surrounding
boorkamrks
                    // that contain other bookmarks which I understand can
happen.
                    while(!(nextNode.getNodeName().contains("bookmarkEnd")))
{
                        para.getCTP().getDomNode().removeChild(nextNode);
                        nextNode = bookmark.getDomNode().getNextSibling();
                    }
                    
                    // Finally, insert the new Run node into the document
                    // between the bookmarkStrat and the bookmarkEnd nodes.
                    para.getCTP().getDomNode().insertBefore(
                            run.getCTR().getDomNode(),
                            nextNode);
                }
            }
        }
    }
    
    public static void main(String[] args) {
        try {
            DOCXTest docxTest = new DOCXTest();
            docxTest.openFile("C:/temp/Doc1.docx");
            docxTest.insertAtBookmark("WHOLE_WORD", "This should be inserted
at the WHOLE_WORD bookmark.");
            docxTest.insertAtBookmark("MARK_ONE", "..and this at the
MARK_ONE bookmark.");
            docxTest.saveAs("C:/temp/Doc1 With Bookmarks Updated.docx");
        }
        catch(Exception ex) {
            System.out.println("Caught a: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows:.....");
            ex.printStackTrace(System.out);
        }
    }
}

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710121.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Have a go with this; as always, the main method shows how to cal the text and
I have mentioned the current limitations in the doc block.

/**
 * First attempt at inserting text at a bookmark defined within a Word
document.
 * Note that there are a few SERIOUS limitations with the code as it stands:
 * 
 * Firstly, it will not insert text at a bookmark defined within a table
cell. It
 * should be possible to amend this failing by simply iterating through the
tables
 * the document contains in much the same way that this implementation
iterates
 * through the document's paragraphs. TO DO.
 * 
 * Secondly, it is possible to use bookmarks in a couple of ways. They can 
 * simply be inserted into the document as place markers or it is possible
to
 * select a character, word or number of words and stipulate that these
serve as
 * a bookmark. I am assuming that, in the latter case, the value that is to
be
 * inserted at the bookmark will replace all of these selected characters
and,
 * as yet, the code does not do this. TO DO.
 * 
 * Modifications to follow.
 *
 * @author Mark Beardsley
 * @version 0.1 10th June 2012
 */
public class DOCXTest {
    
    private XWPFDocument document = null;
    
    public DOCXTest() {
    }
    
    public final void openFile(String filename) throws IOException {
        File file = null;
        FileInputStream fis = null;
        try {
            file = new File(filename);
            fis = new FileInputStream(file);
            this.document = new XWPFDocument(fis);
        }
        finally {
            try {
                if(fis != null) {
                    fis.close();
                    fis = null;
                }
            }
            catch(IOException ioEx) {
                // Swallow this exception. It would have occured onyl
                // when releasing the file handle and should not pose
                // problems to later processing.
            }
        }
    }
    
    public final void saveAs(String filename) throws IOException {
        File file = null;
        FileOutputStream fos = null;
        try {
            file = new File(filename);
            fos = new FileOutputStream(file);
            this.document.write(fos);
        }
        finally {
            if(fos != null) {
                fos.close();
               fos = null;
            }
        }
    }
    
    public final void insertAtBookmark(String bookmarkName, String
bookmarkValue) {
        List<XWPFParagraph> paraList = null;
        Iterator<XWPFParagraph> paraIter = null;
        XWPFParagraph para = null;
        List<CTBookmark> bookmarkList = null;
        Iterator<CTBookmark> bookmarkIter = null;
        CTBookmark bookmark = null;
        XWPFRun run = null;
        
        paraList = this.document.getParagraphs();
        paraIter = paraList.iterator();
            
        while(paraIter.hasNext()) {
            para = paraIter.next();
                
            bookmarkList = para.getCTP().getBookmarkStartList();
            bookmarkIter = bookmarkList.iterator();
                
            while(bookmarkIter.hasNext()) {
                bookmark = bookmarkIter.next();
                if(bookmark.getName().equals(bookmarkName)) {
                    run = para.createRun();
                    run.setText(bookmarkValue);
                   
para.getCTP().getDomNode().insertBefore(run.getCTR().getDomNode(),
bookmark.getDomNode());
                }
            }
        }
    }
    
    public static void main(String[] args) {
        try {
            DOCXTest docxTest = new DOCXTest();
            docxTest.openFile("C:/temp/Doc1.docx");
            docxTest.insertAtBookmark("WHOLE_WORD", "This should be inserted
at the WHOLE_WORD bookmark.");
            docxTest.insertAtBookmark("MARK_ONE", "..and this at the
MARK_ONE bookmark.");
            docxTest.saveAs("C:/temp/Doc1 With Bookmarks Updated.docx");
        }
        catch(Exception ex) {
            System.out.println("Caught a: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows:.....");
            ex.printStackTrace(System.out);
        }
    }
}

I have tested it against a document with bookmarks inserted into paragraphs
and tables. As indicated, it will not work with tables yet but the fix ought
to be fairly straightforward (famous last words!!).

If my memory is not playing too many tricks, I think you said you had
documents where bookmarks with the same name appeared more than once; even
more than once in the same paragraph? Should this be the case, I am keen to
see if the code works because I was not able to create such a test document.
The copy of Word that I have access to does not allow me to insert two
bookmarks with the same name into the text and so I have been unable to try
it out against this scenario.

Will you let me know the results you have please?

Yours

Mark B

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710115.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by semendian <se...@mail.ru>.
Thanks you, Mark, for your concern.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710107.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by PoojaGerera <Po...@Infosys.com>.
Hi All,

I am looking at deleting whatever is contained between bookmark start and
bookmark end tags in XWPF document using Apache POI.
The problem I am facing here is data contained in bookmark spans multiple
pages and hence the bookmark start tag appears in one paragraph whereas
bookmark end tag appears in some other paragraph.
Hence I am not able to delete the entire chunk contained between the
bookmark start and bookmark end tags.

Can some one advise if there is some way to go about it.

Any pointers would be of great help!!.

Many Thanks,
Pooja



--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5713404.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
I think that to insert text at a bookmark using XWPF should be a very similar
process to using HWPF. Having said that, I need to look more closely at it
and am currently having 'issues' with the API.

After a bit of digging, I did find the UNO code that I referred to but all
it currently does is file transformations and search and replace. It ought
to be possible to use many of the same objects to implement the 'insert at
bookmark' functionality and I will take a look at that, if I have the time,
this weekend. POI would be a far better solution and so I will try looking
at it firstly.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710099.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by semendian <se...@mail.ru>.
My code will run on Linux and Windows.

It's the code I used for doc:

final HWPFDocument doc = new HWPFDocument(in);

		int count = 1;
				
			for (int i = 0; i < doc.getBookmarks().getBookmarksCount() ; i++)
			{
				System.out.println(" Size: "+doc.getBookmarks().getBookmarksCount());
				
					final Bookmark bookmark = doc.getBookmarks().getBookmark(i);

					if (bookmark.getName() != null && !bookmark.getName().startsWith("_")
)//if (bookmarkName.equals(bookmark.getName()))
					{
						System.out.println(count + " | " + bookmark.getName());
						count++;	
				
						final String pValue = "****";
						
			      final Range range = new Range(bookmark.getStart(),
bookmark.getEnd(), doc);
			      			      
			      if (range.text().length()>0)
			      	range.replaceText(pValue,false);
			      else
			      	range.insertAfter(pValue);

					}
			}
		
		doc.write(out);
It does not work when the bookmark is in a table.

How to insert the run between the strat and end bookmark elements?

I would like to solve the problem of POI if it is possible. 
If this is impossible then used to UNO and Open/LibreOffice. 

I'll be glad to any suggestions and advice, and code samples

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710092.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Can I ask how you are going to be running the code please? The reason I ask
is that I do have code that will insert text at bookmarks perfectly
successfully but it relies upon OLE/COM. This means that you will be
restricted to running the code on a Windows box (I have still not tested it
on Linux under Wine even though I am assured it is possible to run Office
that way).

If all you are creating is a utility that you will use to ruin on a Windows
box then the code may be satisfactory and you are quite welcome to it. Just
let me know. Furthermore, it should be possible to achieve something very
similar using UNO and Open/LibreOffice. At one time, I had an application
that used this API to accomplish something similar but that was two PC's ago
and I cannot seem to find the code now. It is there, somewhere, and I will
continue to look for it. The advantage of the UNO approach is, of course,
platform independence - anything that will run open.LibreOffice will run the
UNO client app. Also, it should be possible to get your hands on the URE
(UNO Runtime Environment) so that the full Open/LibreOffice application does
not need to be deployed.

Yours

Mark B

PS. It should be possible to insert text at bookmarks using HWPF. I had a
quick play and, as it is possible to get at the start and end offsets of
both bookmarks and character runs it should be possible, in principal at
least, to step through the character runs, get their start and end offsets,
look to see if they 'encapsulate' a bookmark, find out what the bookmarks
name is and substitute the required text. I have put together a very simple
piece of code that sort of does this but only with a single bookmark in a
single run of text. I suspect it would be necessary to re-save the document
after each substitution however as modifying one range may corrupt others.
XWPF is a different matter; I have not made any progress with that although
it should be possible to simply insert the run between the strat and end
bookmark elements.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710085.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by Mark Beardsley <ma...@tiscali.co.uk>.
Things have moved on a little now and the code in this thread -
http://apache-poi.1045710.n5.nabble.com/How-to-read-the-value-of-bookmarks-docx-td5710184.html
- also works where table cells have been bookmarked.

It is not the final version though because I have found out that there are
lots of little wrinkles to deal with - for example the markup that
identifies bookmarked cells and bookmarked columns differs only by the
placement of the bookmarkEnd tag - and I am slowly working to identify all
of these variations and create a single code base that will work for all
instances; if that is even possible. Time is short at the moment - even
though the rain has made it s little difficult to get our work done - and so
I cannot promise when this will be completed. Am also looking at options to
inserting text - inserting a table or a picture at a bookmark for instance -
but I do not have a clear picture in my mind yet concerning how I am going
to to proceed with this yet. There is also the problem of the RaspberryPI
that I want to have a play with; am even looking into the possibility of
building a cluster of these little boards, that is if I can ever get my
hands on them.

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710406.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Replacing the value of the bookmarks

Posted by semendian <se...@mail.ru>.
Thanks!

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Replacing-the-value-of-the-bookmarks-tp5710052p5710405.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org