You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Fabián Avilés Martínez <fa...@gmv.com> on 2009/11/23 16:42:34 UTC

Modify word document

Hi all,
	I have a Word document, as a template: In this template there are some tokenized words, which have to be modified and the result has to be saved into another file. The original file has some properties, like header and footer, images, etc. The resulting file has to be the same, but with the modified words. I am trying it with the code below, but it does not work.

public ByteArrayOutputStream processFile(final InputStream is, final Map<String, String> replacementText)
        throws IOException {
        Set<String> keys = replacementText.keySet();
        try {
            POIFSFileSystem poifs = new POIFSFileSystem(is);
            HWPFDocument document = new HWPFDocument(poifs);
            Range range = document.getRange();

            for (int i = 0; i < range.numParagraphs(); i++) {
                String newTxt = range.getParagraph(i).text();
                String oldTxt = range.getParagraph(i).text();
                for (Iterator<String> it = keys.iterator(); it.hasNext();) {
                    String key = it.next();
                    if (newTxt.contains(key)) {
                        newTxt = replacePlaceholders(key, replacementText.get(key), newTxt);
                    }
                }
                if (!oldTxt.equals(newTxt)) {
                    range.getParagraph(i).replaceText(oldTxt, newTxt);
                }
            }

            // Save the document away.
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            document.write(bos);
            bos.flush();
            bos.close();
            return bos;
        } catch (IOException e) {
            logger.error("Error procesando el fichero WORD: " + e);
            throw new IOException("Error procesando el fichero WORD");
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }

Any help, please?

Thanks in advance, Fabi.



______________________
This message including any attachments may contain confidential 
information, according to our Information Security Management System,
 and intended solely for a specific individual to whom they are addressed.
 Any unauthorised copy, disclosure or distribution of this message
 is strictly forbidden. If you have received this transmission in error,
 please notify the sender immediately and delete it.

______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
 puede contener informacion clasificada por su emisor como confidencial
 en el marco de su Sistema de Gestion de Seguridad de la 
Informacion siendo para uso exclusivo del destinatario, quedando 
prohibida su divulgacion copia o distribucion a terceros sin la 
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
 erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
Gracias por su colaboracion.

______________________


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Modify word document

Posted by Fabián Avilés Martínez <fa...@gmv.com>.
Thank you so much. I'm going to try it, and I will tell you the results.

-----Mensaje original-----
De: MSB [mailto:markbrdsly@tiscali.co.uk] 
Enviado el: martes, 24 de noviembre de 2009 8:43
Para: user@poi.apache.org
Asunto: Re: Modify word document


You have not dug down far enough into the structure of the document yet I am
afraid - all of the formatting information is stopred (encapsulated) within
the CharacterRun class and you need to perform the repllacements at that
level.

I do not have any suitable code at hand as I type this so what follows will
need to be converted into Java and tested;

Open the Word document.
Get the overall Range for the document.
Get the number of Paragraph objects the Range contains.
Iterate through the Pargraphs and for each Pargraph
    Get the CharacterRun(s) the Paragraph contains.
    Call the method to replace the search term with the replacement text on
the CharacterRun
Save the modified document away again.

You do however face a couple of problems with this. It has been a long time
since I tried to write a search and replace routine using HWPF and I could
not get it to work if the replacement text was longer that the search term.
In that case, HWPF threw an exception and would not allow me to complete the
process; but that problem could well have been addressed by now as it was
well known and caused by faulty bounds checking within the Range class. Only
testing will prove or disprove this for you I am afraid.

Secondly, the CharacterRun class encapsulates a piece of text with common
properties. So, imagine that we are searching for the phrase 'search term'
and that the word 'search' has been emboldened whilst the word 'term' has
been left as normal text, then my suggested approach will not work. That is
because the words search and term will be held in different CharacterRun(s).
If you do hit this problem, then I am afraid you will have to write code
that searches for the term at the Paragraph level and that identifies where
the search terms can be found and recovers the CharacterRun(s) that
encapsulate them. Once you have these, you can modify the runs or create and
substitute new ones but I have to admit that I have never tried to do this
myself. Instead I chose to automate Word using OLE and to explore the
possibilities offered by OpenOffices UNO interface. Both options did work
but threw up other problems that proved more limiting (in terms of
architecture and platform). If you can get it to work, HWPF offers the
better solution IMO.

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi all,
> 	I have a Word document, as a template: In this template there are some
> tokenized words, which have to be modified and the result has to be saved
> into another file. The original file has some properties, like header and
> footer, images, etc. The resulting file has to be the same, but with the
> modified words. I am trying it with the code below, but it does not work.
> 
> public ByteArrayOutputStream processFile(final InputStream is, final
> Map<String, String> replacementText)
>         throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
> 
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 String newTxt = range.getParagraph(i).text();
>                 String oldTxt = range.getParagraph(i).text();
>                 for (Iterator<String> it = keys.iterator(); it.hasNext();)
> {
>                     String key = it.next();
>                     if (newTxt.contains(key)) {
>                         newTxt = replacePlaceholders(key,
> replacementText.get(key), newTxt);
>                     }
>                 }
>                 if (!oldTxt.equals(newTxt)) {
>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>                 }
>             }
> 
>             // Save the document away.
>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>             document.write(bos);
>             bos.flush();
>             bos.close();
>             return bos;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
> 
> Any help, please?
> 
> Thanks in advance, Fabi.
> 
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


______________________
This message including any attachments may contain confidential 
information, according to our Information Security Management System,
 and intended solely for a specific individual to whom they are addressed.
 Any unauthorised copy, disclosure or distribution of this message
 is strictly forbidden. If you have received this transmission in error,
 please notify the sender immediately and delete it.

______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
 puede contener informacion clasificada por su emisor como confidencial
 en el marco de su Sistema de Gestion de Seguridad de la 
Informacion siendo para uso exclusivo del destinatario, quedando 
prohibida su divulgacion copia o distribucion a terceros sin la 
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
 erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
Gracias por su colaboracion.

______________________


RE: Modify word document

Posted by MSB <ma...@tiscali.co.uk>.
Thanks for that, I have seen people asking for just this sort of information
before on the list. Can I assume you have been able to get something to
work?

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi Mark, version 3.2-FINAL is accesible in public maven repositories,
> these are the dependencies:
> 
> <dependency>
>     <groupId>org.apache.poi</groupId>
>     <artifactId>poi</artifactId>
>     <version>3.2-FINAL</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.poi</groupId>
>     <artifactId>poi-scratchpad</artifactId>
>     <version>3.2-FINAL</version>
> </dependency>
> 
> 
> Thanks, Fabi.
> 
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk]
> Enviado el: martes, 24 de noviembre de 2009 17:27
> Para: user@poi.apache.org
> Asunto: RE: Modify word document
> 
> 
> You are welcome.
> 
> If you do not have access to 3.2 FINAL of the API, it is possible to
> download older releases from here -
> http://archive.apache.org/dist/poi/release/bin/. Must admit that I do not
> know what changes were made to HWPF between 3.2 and 3.5 so cannot say why
> the formatting information is being lost and can only hope that you will
> ne
> able to revert to using 3.2 FINAL for this project.
> 
> All that you will need to do is to ensure that both the scratchpad and POI
> archives are in your classpath and you should be able to successfully
> compile and run the code. Any problems, just let me know.
> 
> Yours
> 
> Mark B
> 
> 
> 
> Fabián Avilés Martínez wrote:
>>
>> Wow, thats great. At least I have new direction to work with. I have been
>> struggling myself for at least three days. I can not try it today, but
>> tomorrow wil be the first thing I am going to do. I will told you the
>> results.
>>
>> Thank you so nuch.
>>
>> -----Mensaje original-----
>> De: MSB [mailto:markbrdsly@tiscali.co.uk]
>> Enviado el: martes, 24 de noviembre de 2009 16:51
>> Para: user@poi.apache.org
>> Asunto: RE: Modify word document
>>
>>
>> I have had the chance to play around with some code and I have to admit
>> that
>> I was wrong, on two counts.
>>
>> Firstly, if you do drill down to the level of the CharacterRun and
>> perform
>> a
>> replacement operation there, you will not retain the formatting applied
>> to
>> the text, further more, it seems to fail completely; no replacements will
>> be
>> made in the document at all. To have the search term be successfully
>> replaced, you DO need to operate at the Pargraph level.
>>
>> Secondly, if the search term is shorter than the replacement term, then
>> HWPF
>> will throw an exception. It seems quite happy to work if the replacement
>> term is equal to or longer - in terms of the number of characters - than
>> the
>> search term.
>>
>> Please see the code I have attached below;
>>
>> /* ====================================================================
>>    Licensed to the Apache Software Foundation (ASF) under one or more
>>    contributor license agreements.  See the NOTICE file distributed with
>>    this work for additional information regarding copyright ownership.
>>    The ASF licenses this file to You under the Apache License, Version
>> 2.0
>>    (the "License"); you may not use this file except in compliance with
>>    the License.  You may obtain a copy of the License at
>>
>>        http://www.apache.org/licenses/LICENSE-2.0
>>
>>    Unless required by applicable law or agreed to in writing, software
>>    distributed under the License is distributed on an "AS IS" BASIS,
>>    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>    See the License for the specific language governing permissions and
>>    limitations under the License.
>> ==================================================================== */
>>
>> package newsearchreplace;
>>
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.io.FileOutputStream;
>> import java.io.FileNotFoundException;
>> import java.io.IOException;
>> import java.util.HashMap;
>> import java.util.Set;
>>
>> import org.apache.poi.hwpf.HWPFDocument;
>> import org.apache.poi.hwpf.usermodel.Range;
>> import org.apache.poi.hwpf.usermodel.Paragraph;
>> import org.apache.poi.hwpf.usermodel.CharacterRun;
>>
>>
>> /**
>>  *
>>  * @author win Mark Beardsley [msb at apache.org]
>>  * @version 1.00
>>  */
>> public class SearchReplace {
>>
>>     private HashMap<String, String> searchTerms = null;
>>     private Set<String> searchKeys = null;
>>     private HWPFDocument wordDocument = null;
>>
>>     public SearchReplace() {
>>         searchTerms = new HashMap<String, String>();
>>         // The first String is the text that will be searched for, the
>> second is what will be used to
>>         // replace it. Of course, it is possible to create more than one
>> search term, replacement text
>>         // pairing.
>>         searchTerms.put("replace", "tester");
>>         searchKeys = searchTerms.keySet();
>>     }
>>
>>     public void openTemplate(String filename) throws
>> FileNotFoundException,
>> IOException {
>>         File file = null;
>>         FileInputStream fis = null;
>>         try {
>>             file = new File(filename);
>>             fis = new FileInputStream(file);
>>             this.wordDocument = new HWPFDocument(fis);
>>         }
>>         finally {
>>             if(fis != null) {
>>                 try {
>>                     fis.close();
>>                     fis = null;
>>                 }
>>                 catch(Exception ex) {
>>                     // I G N O R E
>>                 }
>>             }
>>         }
>>     }
>>
>>     public void searchAndReplace() {
>>         Range docRange = this.wordDocument.getRange();
>>         int numParas = docRange.numParagraphs();
>>         for(int i = 0; i < numParas; i++) {
>>             Paragraph para = docRange.getParagraph(i);
>>             int numCharRuns = para.numCharacterRuns();
>>             for(int j = 0; j < numCharRuns; j++) {
>>                 CharacterRun charRun = para.getCharacterRun(j);
>>                 String text = charRun.text();
>>                 for(String key : this.searchKeys) {
>>                     if(text.contains(key)) {
>>                         String replacementTerm =
>> this.searchTerms.get(key);
>>                         charRun.replaceText(replacementTerm, key);
>>                         System.out.println("Found: " + key + " in " +
>> text
>> +
>> ". Will replace with: " + replacementTerm);
>>                     }
>>                 }
>>             }
>>         }
>>
>>     }
>>
>>     public void searchReplace() {
>>         Range docRange = this.wordDocument.getRange();
>>         int numParas = docRange.numParagraphs();
>>         for(int i = 0; i < numParas; i++) {
>>             Paragraph para = docRange.getParagraph(i);
>>             String text = para.text();
>>             for(String key : this.searchKeys) {
>>                 if(text.contains(key)) {
>>                     String replacementTerm = this.searchTerms.get(key);
>>                     para.replaceText(key, replacementTerm);
>>                 }
>>             }
>>         }
>>     }
>>
>>     public void saveResults(String filename) throws
>> FileNotFoundException,
>> IOException {
>>         File file = null;
>>         FileOutputStream fos = null;
>>         try {
>>             file = new File(filename);
>>             fos = new FileOutputStream(file);
>>             this.wordDocument.write(fos);
>>         }
>>         finally {
>>             if(fos != null) {
>>                 try {
>>                     fos.close();
>>                     fos = null;
>>                 }
>>                 catch(Exception ex) {
>>                     // I G N O R E
>>                 }
>>             }
>>         }
>>     }
>>
>>     /**
>>      * @param args the command line arguments
>>      */
>>     public static void main(String[] args) {
>>         try {
>>             SearchReplace sr = new SearchReplace();
>>             sr.openTemplate("C:/temp/Test Document.doc");
>>             sr.searchAndReplace();
>>             //sr.searchReplace();
>>             sr.saveResults("C:/temp/New Updated Document.doc");
>>         }
>>         catch(Exception ex) {
>>             System.out.println("Caught an: " + ex.getClass().getName());
>>             System.out.println("Message: " + ex.getMessage());
>>             System.out.println("Stacktrace follows............");
>>             ex.printStackTrace(System.out);
>>         }
>>     }
>> }
>>
>> More particularly, look at the main method. If you comment out the
>> sr.searchAndReplace() and un-comment the sr.searchReplace() line, then
>> the
>> code will work successfully. But, and this is a BIG but, it will only
>> work
>> if you compile and run it against 3.2 FINAL of the API. I have found that
>> later versions seem to 'drop' or lose the formatting information
>> completely;
>> to convince yourself of this, just modify the main method so that it
>> contains only these lines of code;
>>
>> SearchReplace sr = new SearchReplace();
>> sr.openTemplate("C:/temp/Test Document.doc");
>> sr.saveResults("C:/temp/New Updated Document.doc");
>>
>> If you run that against versions later than 3.2 FINAL, you should see
>> that
>> the copy of the original document that this produces loses all of it's
>> formatting.
>>
>> Yours
>>
>> Mark B
>>
>> PS. I guess that it should go without saying, you will need to replace
>> the
>> paths to and document names passed to the openTemplate() and
>> saveResults()
>> methods to point to locations and files that exist on your machine.
>>
>> PPS Forgive the lack of comments please. I hope that the it is apparant
>> just
>> what the methods do.
>>
>>
>> Fabián Avilés Martínez wrote:
>>>
>>> Hi, as I told you, I have tried it, but with the same result, the
>>> resulting file is corrupted, that is what MSWord says. My next approach
>>> is
>>> to create a copy file, and do modifications within this file. My problem
>>> is that I do not know how to save modifications done in the charRuns of
>>> the paragraphs, what I mean is to persist modifications done in the
>>> resulting file, without have to coopy it, calling
>>> document.write(outputStream)
>>>
>>> My code is:
>>>
>>> public File processFile(final InputStream is, final Map<String, String>
>>> replacementText) throws IOException {
>>>         Set<String> keys = replacementText.keySet();
>>>         try {
>>>             // Makes a copy of the file.
>>>             File res = copyfile(is);
>>>             InputStream auxIs = new FileInputStream(res);
>>>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>>>             HWPFDocument document = new HWPFDocument(poifs);
>>>             Range range = document.getRange();
>>>
>>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>>                 Paragraph paragraph = range.getParagraph(i);
>>>                 int numCharRuns = paragraph.numCharacterRuns();
>>>                 for (int j = 0; j < numCharRuns; j++) {
>>>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>>>                     for (Iterator<String> it = keys.iterator();
>>> it.hasNext();) {
>>>                         String key = it.next();
>>>                         if (charRun.text().contains(key)) {
>>>                             String value = replacementText.get(key);
>>>                             charRun.replaceText(key, value);
>>>                             range = document.getRange();
>>>                             paragraph = range.getParagraph(i);
>>>                             charRun = paragraph.getCharacterRun(j);
>>>                         }
>>>                     }
>>>                 }
>>>             }
>>>             is.close();
>>>             return res;
>>>         } catch (IOException e) {
>>>             logger.error("Error procesando el fichero WORD: " + e);
>>>             throw new IOException("Error procesando el fichero WORD");
>>>         } finally {
>>>             if (is != null) {
>>>                 is.close();
>>>             }
>>>         }
>>>     }
>>>
>>>
>>> Thanks in advance, Fabi.
>>>
>>> -----Mensaje original-----
>>> De: MSB [mailto:markbrdsly@tiscali.co.uk]
>>> Enviado el: martes, 24 de noviembre de 2009 8:43
>>> Para: user@poi.apache.org
>>> Asunto: Re: Modify word document
>>>
>>>
>>> You have not dug down far enough into the structure of the document yet
>>> I
>>> am
>>> afraid - all of the formatting information is stopred (encapsulated)
>>> within
>>> the CharacterRun class and you need to perform the repllacements at that
>>> level.
>>>
>>> I do not have any suitable code at hand as I type this so what follows
>>> will
>>> need to be converted into Java and tested;
>>>
>>> Open the Word document.
>>> Get the overall Range for the document.
>>> Get the number of Paragraph objects the Range contains.
>>> Iterate through the Pargraphs and for each Pargraph
>>>     Get the CharacterRun(s) the Paragraph contains.
>>>     Call the method to replace the search term with the replacement text
>>> on
>>> the CharacterRun
>>> Save the modified document away again.
>>>
>>> You do however face a couple of problems with this. It has been a long
>>> time
>>> since I tried to write a search and replace routine using HWPF and I
>>> could
>>> not get it to work if the replacement text was longer that the search
>>> term.
>>> In that case, HWPF threw an exception and would not allow me to complete
>>> the
>>> process; but that problem could well have been addressed by now as it
>>> was
>>> well known and caused by faulty bounds checking within the Range class.
>>> Only
>>> testing will prove or disprove this for you I am afraid.
>>>
>>> Secondly, the CharacterRun class encapsulates a piece of text with
>>> common
>>> properties. So, imagine that we are searching for the phrase 'search
>>> term'
>>> and that the word 'search' has been emboldened whilst the word 'term'
>>> has
>>> been left as normal text, then my suggested approach will not work. That
>>> is
>>> because the words search and term will be held in different
>>> CharacterRun(s).
>>> If you do hit this problem, then I am afraid you will have to write code
>>> that searches for the term at the Paragraph level and that identifies
>>> where
>>> the search terms can be found and recovers the CharacterRun(s) that
>>> encapsulate them. Once you have these, you can modify the runs or create
>>> and
>>> substitute new ones but I have to admit that I have never tried to do
>>> this
>>> myself. Instead I chose to automate Word using OLE and to explore the
>>> possibilities offered by OpenOffices UNO interface. Both options did
>>> work
>>> but threw up other problems that proved more limiting (in terms of
>>> architecture and platform). If you can get it to work, HWPF offers the
>>> better solution IMO.
>>>
>>> Yours
>>>
>>> Mark B
>>>
>>>
>>> Fabián Avilés Martínez wrote:
>>>>
>>>> Hi all,
>>>>      I have a Word document, as a template: In this template there are
>>>> some
>>>> tokenized words, which have to be modified and the result has to be
>>>> saved
>>>> into another file. The original file has some properties, like header
>>>> and
>>>> footer, images, etc. The resulting file has to be the same, but with
>>>> the
>>>> modified words. I am trying it with the code below, but it does not
>>>> work.
>>>>
>>>> public ByteArrayOutputStream processFile(final InputStream is, final
>>>> Map<String, String> replacementText)
>>>>         throws IOException {
>>>>         Set<String> keys = replacementText.keySet();
>>>>         try {
>>>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>>>             HWPFDocument document = new HWPFDocument(poifs);
>>>>             Range range = document.getRange();
>>>>
>>>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>>>                 String newTxt = range.getParagraph(i).text();
>>>>                 String oldTxt = range.getParagraph(i).text();
>>>>                 for (Iterator<String> it = keys.iterator();
>>>> it.hasNext();)
>>>> {
>>>>                     String key = it.next();
>>>>                     if (newTxt.contains(key)) {
>>>>                         newTxt = replacePlaceholders(key,
>>>> replacementText.get(key), newTxt);
>>>>                     }
>>>>                 }
>>>>                 if (!oldTxt.equals(newTxt)) {
>>>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>>>                 }
>>>>             }
>>>>
>>>>             // Save the document away.
>>>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>>>             document.write(bos);
>>>>             bos.flush();
>>>>             bos.close();
>>>>             return bos;
>>>>         } catch (IOException e) {
>>>>             logger.error("Error procesando el fichero WORD: " + e);
>>>>             throw new IOException("Error procesando el fichero WORD");
>>>>         } finally {
>>>>             if (is != null) {
>>>>                 is.close();
>>>>             }
>>>>         }
>>>>     }
>>>>
>>>> Any help, please?
>>>>
>>>> Thanks in advance, Fabi.
>>>>
>>>>
>>>>
>>>> ______________________
>>>> This message including any attachments may contain confidential
>>>> information, according to our Information Security Management System,
>>>>  and intended solely for a specific individual to whom they are
>>>> addressed.
>>>>  Any unauthorised copy, disclosure or distribution of this message
>>>>  is strictly forbidden. If you have received this transmission in
>>>> error,
>>>>  please notify the sender immediately and delete it.
>>>>
>>>> ______________________
>>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>>>  puede contener informacion clasificada por su emisor como confidencial
>>>>  en el marco de su Sistema de Gestion de Seguridad de la
>>>> Informacion siendo para uso exclusivo del destinatario, quedando
>>>> prohibida su divulgacion copia o distribucion a terceros sin la
>>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>>>  erroneamente, se ruega lo notifique al remitente y proceda a su
>>>> borrado.
>>>> Gracias por su colaboracion.
>>>>
>>>> ______________________
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>>> For additional commands, e-mail: user-help@poi.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
>>> Sent from the POI - User mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>> ______________________
>>> This message including any attachments may contain confidential
>>> information, according to our Information Security Management System,
>>>  and intended solely for a specific individual to whom they are
>>> addressed.
>>>  Any unauthorised copy, disclosure or distribution of this message
>>>  is strictly forbidden. If you have received this transmission in error,
>>>  please notify the sender immediately and delete it.
>>>
>>> ______________________
>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>>  puede contener informacion clasificada por su emisor como confidencial
>>>  en el marco de su Sistema de Gestion de Seguridad de la
>>> Informacion siendo para uso exclusivo del destinatario, quedando
>>> prohibida su divulgacion copia o distribucion a terceros sin la
>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>>  erroneamente, se ruega lo notifique al remitente y proceda a su
>>> borrado.
>>> Gracias por su colaboracion.
>>>
>>> ______________________
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>>
> 
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26498547.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26514349.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Modify word document

Posted by Fabián Avilés Martínez <fa...@gmv.com>.
Hi Mark, version 3.2-FINAL is accesible in public maven repositories, these are the dependencies:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.2-FINAL</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>3.2-FINAL</version>
</dependency>


Thanks, Fabi.

-----Mensaje original-----
De: MSB [mailto:markbrdsly@tiscali.co.uk]
Enviado el: martes, 24 de noviembre de 2009 17:27
Para: user@poi.apache.org
Asunto: RE: Modify word document


You are welcome.

If you do not have access to 3.2 FINAL of the API, it is possible to
download older releases from here -
http://archive.apache.org/dist/poi/release/bin/. Must admit that I do not
know what changes were made to HWPF between 3.2 and 3.5 so cannot say why
the formatting information is being lost and can only hope that you will ne
able to revert to using 3.2 FINAL for this project.

All that you will need to do is to ensure that both the scratchpad and POI
archives are in your classpath and you should be able to successfully
compile and run the code. Any problems, just let me know.

Yours

Mark B



Fabián Avilés Martínez wrote:
>
> Wow, thats great. At least I have new direction to work with. I have been
> struggling myself for at least three days. I can not try it today, but
> tomorrow wil be the first thing I am going to do. I will told you the
> results.
>
> Thank you so nuch.
>
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk]
> Enviado el: martes, 24 de noviembre de 2009 16:51
> Para: user@poi.apache.org
> Asunto: RE: Modify word document
>
>
> I have had the chance to play around with some code and I have to admit
> that
> I was wrong, on two counts.
>
> Firstly, if you do drill down to the level of the CharacterRun and perform
> a
> replacement operation there, you will not retain the formatting applied to
> the text, further more, it seems to fail completely; no replacements will
> be
> made in the document at all. To have the search term be successfully
> replaced, you DO need to operate at the Pargraph level.
>
> Secondly, if the search term is shorter than the replacement term, then
> HWPF
> will throw an exception. It seems quite happy to work if the replacement
> term is equal to or longer - in terms of the number of characters - than
> the
> search term.
>
> Please see the code I have attached below;
>
> /* ====================================================================
>    Licensed to the Apache Software Foundation (ASF) under one or more
>    contributor license agreements.  See the NOTICE file distributed with
>    this work for additional information regarding copyright ownership.
>    The ASF licenses this file to You under the Apache License, Version 2.0
>    (the "License"); you may not use this file except in compliance with
>    the License.  You may obtain a copy of the License at
>
>        http://www.apache.org/licenses/LICENSE-2.0
>
>    Unless required by applicable law or agreed to in writing, software
>    distributed under the License is distributed on an "AS IS" BASIS,
>    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>    See the License for the specific language governing permissions and
>    limitations under the License.
> ==================================================================== */
>
> package newsearchreplace;
>
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.FileNotFoundException;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.Set;
>
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.CharacterRun;
>
>
> /**
>  *
>  * @author win Mark Beardsley [msb at apache.org]
>  * @version 1.00
>  */
> public class SearchReplace {
>
>     private HashMap<String, String> searchTerms = null;
>     private Set<String> searchKeys = null;
>     private HWPFDocument wordDocument = null;
>
>     public SearchReplace() {
>         searchTerms = new HashMap<String, String>();
>         // The first String is the text that will be searched for, the
> second is what will be used to
>         // replace it. Of course, it is possible to create more than one
> search term, replacement text
>         // pairing.
>         searchTerms.put("replace", "tester");
>         searchKeys = searchTerms.keySet();
>     }
>
>     public void openTemplate(String filename) throws
> FileNotFoundException,
> IOException {
>         File file = null;
>         FileInputStream fis = null;
>         try {
>             file = new File(filename);
>             fis = new FileInputStream(file);
>             this.wordDocument = new HWPFDocument(fis);
>         }
>         finally {
>             if(fis != null) {
>                 try {
>                     fis.close();
>                     fis = null;
>                 }
>                 catch(Exception ex) {
>                     // I G N O R E
>                 }
>             }
>         }
>     }
>
>     public void searchAndReplace() {
>         Range docRange = this.wordDocument.getRange();
>         int numParas = docRange.numParagraphs();
>         for(int i = 0; i < numParas; i++) {
>             Paragraph para = docRange.getParagraph(i);
>             int numCharRuns = para.numCharacterRuns();
>             for(int j = 0; j < numCharRuns; j++) {
>                 CharacterRun charRun = para.getCharacterRun(j);
>                 String text = charRun.text();
>                 for(String key : this.searchKeys) {
>                     if(text.contains(key)) {
>                         String replacementTerm =
> this.searchTerms.get(key);
>                         charRun.replaceText(replacementTerm, key);
>                         System.out.println("Found: " + key + " in " + text
> +
> ". Will replace with: " + replacementTerm);
>                     }
>                 }
>             }
>         }
>
>     }
>
>     public void searchReplace() {
>         Range docRange = this.wordDocument.getRange();
>         int numParas = docRange.numParagraphs();
>         for(int i = 0; i < numParas; i++) {
>             Paragraph para = docRange.getParagraph(i);
>             String text = para.text();
>             for(String key : this.searchKeys) {
>                 if(text.contains(key)) {
>                     String replacementTerm = this.searchTerms.get(key);
>                     para.replaceText(key, replacementTerm);
>                 }
>             }
>         }
>     }
>
>     public void saveResults(String filename) throws FileNotFoundException,
> IOException {
>         File file = null;
>         FileOutputStream fos = null;
>         try {
>             file = new File(filename);
>             fos = new FileOutputStream(file);
>             this.wordDocument.write(fos);
>         }
>         finally {
>             if(fos != null) {
>                 try {
>                     fos.close();
>                     fos = null;
>                 }
>                 catch(Exception ex) {
>                     // I G N O R E
>                 }
>             }
>         }
>     }
>
>     /**
>      * @param args the command line arguments
>      */
>     public static void main(String[] args) {
>         try {
>             SearchReplace sr = new SearchReplace();
>             sr.openTemplate("C:/temp/Test Document.doc");
>             sr.searchAndReplace();
>             //sr.searchReplace();
>             sr.saveResults("C:/temp/New Updated Document.doc");
>         }
>         catch(Exception ex) {
>             System.out.println("Caught an: " + ex.getClass().getName());
>             System.out.println("Message: " + ex.getMessage());
>             System.out.println("Stacktrace follows............");
>             ex.printStackTrace(System.out);
>         }
>     }
> }
>
> More particularly, look at the main method. If you comment out the
> sr.searchAndReplace() and un-comment the sr.searchReplace() line, then the
> code will work successfully. But, and this is a BIG but, it will only work
> if you compile and run it against 3.2 FINAL of the API. I have found that
> later versions seem to 'drop' or lose the formatting information
> completely;
> to convince yourself of this, just modify the main method so that it
> contains only these lines of code;
>
> SearchReplace sr = new SearchReplace();
> sr.openTemplate("C:/temp/Test Document.doc");
> sr.saveResults("C:/temp/New Updated Document.doc");
>
> If you run that against versions later than 3.2 FINAL, you should see that
> the copy of the original document that this produces loses all of it's
> formatting.
>
> Yours
>
> Mark B
>
> PS. I guess that it should go without saying, you will need to replace the
> paths to and document names passed to the openTemplate() and saveResults()
> methods to point to locations and files that exist on your machine.
>
> PPS Forgive the lack of comments please. I hope that the it is apparant
> just
> what the methods do.
>
>
> Fabián Avilés Martínez wrote:
>>
>> Hi, as I told you, I have tried it, but with the same result, the
>> resulting file is corrupted, that is what MSWord says. My next approach
>> is
>> to create a copy file, and do modifications within this file. My problem
>> is that I do not know how to save modifications done in the charRuns of
>> the paragraphs, what I mean is to persist modifications done in the
>> resulting file, without have to coopy it, calling
>> document.write(outputStream)
>>
>> My code is:
>>
>> public File processFile(final InputStream is, final Map<String, String>
>> replacementText) throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             // Makes a copy of the file.
>>             File res = copyfile(is);
>>             InputStream auxIs = new FileInputStream(res);
>>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>>
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 Paragraph paragraph = range.getParagraph(i);
>>                 int numCharRuns = paragraph.numCharacterRuns();
>>                 for (int j = 0; j < numCharRuns; j++) {
>>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>>                     for (Iterator<String> it = keys.iterator();
>> it.hasNext();) {
>>                         String key = it.next();
>>                         if (charRun.text().contains(key)) {
>>                             String value = replacementText.get(key);
>>                             charRun.replaceText(key, value);
>>                             range = document.getRange();
>>                             paragraph = range.getParagraph(i);
>>                             charRun = paragraph.getCharacterRun(j);
>>                         }
>>                     }
>>                 }
>>             }
>>             is.close();
>>             return res;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>>
>>
>> Thanks in advance, Fabi.
>>
>> -----Mensaje original-----
>> De: MSB [mailto:markbrdsly@tiscali.co.uk]
>> Enviado el: martes, 24 de noviembre de 2009 8:43
>> Para: user@poi.apache.org
>> Asunto: Re: Modify word document
>>
>>
>> You have not dug down far enough into the structure of the document yet I
>> am
>> afraid - all of the formatting information is stopred (encapsulated)
>> within
>> the CharacterRun class and you need to perform the repllacements at that
>> level.
>>
>> I do not have any suitable code at hand as I type this so what follows
>> will
>> need to be converted into Java and tested;
>>
>> Open the Word document.
>> Get the overall Range for the document.
>> Get the number of Paragraph objects the Range contains.
>> Iterate through the Pargraphs and for each Pargraph
>>     Get the CharacterRun(s) the Paragraph contains.
>>     Call the method to replace the search term with the replacement text
>> on
>> the CharacterRun
>> Save the modified document away again.
>>
>> You do however face a couple of problems with this. It has been a long
>> time
>> since I tried to write a search and replace routine using HWPF and I
>> could
>> not get it to work if the replacement text was longer that the search
>> term.
>> In that case, HWPF threw an exception and would not allow me to complete
>> the
>> process; but that problem could well have been addressed by now as it was
>> well known and caused by faulty bounds checking within the Range class.
>> Only
>> testing will prove or disprove this for you I am afraid.
>>
>> Secondly, the CharacterRun class encapsulates a piece of text with common
>> properties. So, imagine that we are searching for the phrase 'search
>> term'
>> and that the word 'search' has been emboldened whilst the word 'term' has
>> been left as normal text, then my suggested approach will not work. That
>> is
>> because the words search and term will be held in different
>> CharacterRun(s).
>> If you do hit this problem, then I am afraid you will have to write code
>> that searches for the term at the Paragraph level and that identifies
>> where
>> the search terms can be found and recovers the CharacterRun(s) that
>> encapsulate them. Once you have these, you can modify the runs or create
>> and
>> substitute new ones but I have to admit that I have never tried to do
>> this
>> myself. Instead I chose to automate Word using OLE and to explore the
>> possibilities offered by OpenOffices UNO interface. Both options did work
>> but threw up other problems that proved more limiting (in terms of
>> architecture and platform). If you can get it to work, HWPF offers the
>> better solution IMO.
>>
>> Yours
>>
>> Mark B
>>
>>
>> Fabián Avilés Martínez wrote:
>>>
>>> Hi all,
>>>      I have a Word document, as a template: In this template there are
>>> some
>>> tokenized words, which have to be modified and the result has to be
>>> saved
>>> into another file. The original file has some properties, like header
>>> and
>>> footer, images, etc. The resulting file has to be the same, but with the
>>> modified words. I am trying it with the code below, but it does not
>>> work.
>>>
>>> public ByteArrayOutputStream processFile(final InputStream is, final
>>> Map<String, String> replacementText)
>>>         throws IOException {
>>>         Set<String> keys = replacementText.keySet();
>>>         try {
>>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>>             HWPFDocument document = new HWPFDocument(poifs);
>>>             Range range = document.getRange();
>>>
>>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>>                 String newTxt = range.getParagraph(i).text();
>>>                 String oldTxt = range.getParagraph(i).text();
>>>                 for (Iterator<String> it = keys.iterator();
>>> it.hasNext();)
>>> {
>>>                     String key = it.next();
>>>                     if (newTxt.contains(key)) {
>>>                         newTxt = replacePlaceholders(key,
>>> replacementText.get(key), newTxt);
>>>                     }
>>>                 }
>>>                 if (!oldTxt.equals(newTxt)) {
>>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>>                 }
>>>             }
>>>
>>>             // Save the document away.
>>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>>             document.write(bos);
>>>             bos.flush();
>>>             bos.close();
>>>             return bos;
>>>         } catch (IOException e) {
>>>             logger.error("Error procesando el fichero WORD: " + e);
>>>             throw new IOException("Error procesando el fichero WORD");
>>>         } finally {
>>>             if (is != null) {
>>>                 is.close();
>>>             }
>>>         }
>>>     }
>>>
>>> Any help, please?
>>>
>>> Thanks in advance, Fabi.
>>>
>>>
>>>
>>> ______________________
>>> This message including any attachments may contain confidential
>>> information, according to our Information Security Management System,
>>>  and intended solely for a specific individual to whom they are
>>> addressed.
>>>  Any unauthorised copy, disclosure or distribution of this message
>>>  is strictly forbidden. If you have received this transmission in error,
>>>  please notify the sender immediately and delete it.
>>>
>>> ______________________
>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>>  puede contener informacion clasificada por su emisor como confidencial
>>>  en el marco de su Sistema de Gestion de Seguridad de la
>>> Informacion siendo para uso exclusivo del destinatario, quedando
>>> prohibida su divulgacion copia o distribucion a terceros sin la
>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>>  erroneamente, se ruega lo notifique al remitente y proceda a su
>>> borrado.
>>> Gracias por su colaboracion.
>>>
>>> ______________________
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
> ______________________
> This message including any attachments may contain confidential
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
>
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la
> Informacion siendo para uso exclusivo del destinatario, quedando
> prohibida su divulgacion copia o distribucion a terceros sin la
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
> Gracias por su colaboracion.
>
> ______________________
>
>
>

--
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26498547.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


______________________
This message including any attachments may contain confidential 
information, according to our Information Security Management System,
 and intended solely for a specific individual to whom they are addressed.
 Any unauthorised copy, disclosure or distribution of this message
 is strictly forbidden. If you have received this transmission in error,
 please notify the sender immediately and delete it.

______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
 puede contener informacion clasificada por su emisor como confidencial
 en el marco de su Sistema de Gestion de Seguridad de la 
Informacion siendo para uso exclusivo del destinatario, quedando 
prohibida su divulgacion copia o distribucion a terceros sin la 
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
 erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
Gracias por su colaboracion.

______________________


RE: Modify word document

Posted by MSB <ma...@tiscali.co.uk>.
You are welcome.

If you do not have access to 3.2 FINAL of the API, it is possible to
download older releases from here -
http://archive.apache.org/dist/poi/release/bin/. Must admit that I do not
know what changes were made to HWPF between 3.2 and 3.5 so cannot say why
the formatting information is being lost and can only hope that you will ne
able to revert to using 3.2 FINAL for this project.

All that you will need to do is to ensure that both the scratchpad and POI
archives are in your classpath and you should be able to successfully
compile and run the code. Any problems, just let me know.

Yours

Mark B



Fabián Avilés Martínez wrote:
> 
> Wow, thats great. At least I have new direction to work with. I have been
> struggling myself for at least three days. I can not try it today, but
> tomorrow wil be the first thing I am going to do. I will told you the
> results.
> 
> Thank you so nuch.
> 
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk]
> Enviado el: martes, 24 de noviembre de 2009 16:51
> Para: user@poi.apache.org
> Asunto: RE: Modify word document
> 
> 
> I have had the chance to play around with some code and I have to admit
> that
> I was wrong, on two counts.
> 
> Firstly, if you do drill down to the level of the CharacterRun and perform
> a
> replacement operation there, you will not retain the formatting applied to
> the text, further more, it seems to fail completely; no replacements will
> be
> made in the document at all. To have the search term be successfully
> replaced, you DO need to operate at the Pargraph level.
> 
> Secondly, if the search term is shorter than the replacement term, then
> HWPF
> will throw an exception. It seems quite happy to work if the replacement
> term is equal to or longer - in terms of the number of characters - than
> the
> search term.
> 
> Please see the code I have attached below;
> 
> /* ====================================================================
>    Licensed to the Apache Software Foundation (ASF) under one or more
>    contributor license agreements.  See the NOTICE file distributed with
>    this work for additional information regarding copyright ownership.
>    The ASF licenses this file to You under the Apache License, Version 2.0
>    (the "License"); you may not use this file except in compliance with
>    the License.  You may obtain a copy of the License at
> 
>        http://www.apache.org/licenses/LICENSE-2.0
> 
>    Unless required by applicable law or agreed to in writing, software
>    distributed under the License is distributed on an "AS IS" BASIS,
>    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>    See the License for the specific language governing permissions and
>    limitations under the License.
> ==================================================================== */
> 
> package newsearchreplace;
> 
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.FileNotFoundException;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.Set;
> 
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.CharacterRun;
> 
> 
> /**
>  *
>  * @author win Mark Beardsley [msb at apache.org]
>  * @version 1.00
>  */
> public class SearchReplace {
> 
>     private HashMap<String, String> searchTerms = null;
>     private Set<String> searchKeys = null;
>     private HWPFDocument wordDocument = null;
> 
>     public SearchReplace() {
>         searchTerms = new HashMap<String, String>();
>         // The first String is the text that will be searched for, the
> second is what will be used to
>         // replace it. Of course, it is possible to create more than one
> search term, replacement text
>         // pairing.
>         searchTerms.put("replace", "tester");
>         searchKeys = searchTerms.keySet();
>     }
> 
>     public void openTemplate(String filename) throws
> FileNotFoundException,
> IOException {
>         File file = null;
>         FileInputStream fis = null;
>         try {
>             file = new File(filename);
>             fis = new FileInputStream(file);
>             this.wordDocument = new HWPFDocument(fis);
>         }
>         finally {
>             if(fis != null) {
>                 try {
>                     fis.close();
>                     fis = null;
>                 }
>                 catch(Exception ex) {
>                     // I G N O R E
>                 }
>             }
>         }
>     }
> 
>     public void searchAndReplace() {
>         Range docRange = this.wordDocument.getRange();
>         int numParas = docRange.numParagraphs();
>         for(int i = 0; i < numParas; i++) {
>             Paragraph para = docRange.getParagraph(i);
>             int numCharRuns = para.numCharacterRuns();
>             for(int j = 0; j < numCharRuns; j++) {
>                 CharacterRun charRun = para.getCharacterRun(j);
>                 String text = charRun.text();
>                 for(String key : this.searchKeys) {
>                     if(text.contains(key)) {
>                         String replacementTerm =
> this.searchTerms.get(key);
>                         charRun.replaceText(replacementTerm, key);
>                         System.out.println("Found: " + key + " in " + text
> +
> ". Will replace with: " + replacementTerm);
>                     }
>                 }
>             }
>         }
> 
>     }
> 
>     public void searchReplace() {
>         Range docRange = this.wordDocument.getRange();
>         int numParas = docRange.numParagraphs();
>         for(int i = 0; i < numParas; i++) {
>             Paragraph para = docRange.getParagraph(i);
>             String text = para.text();
>             for(String key : this.searchKeys) {
>                 if(text.contains(key)) {
>                     String replacementTerm = this.searchTerms.get(key);
>                     para.replaceText(key, replacementTerm);
>                 }
>             }
>         }
>     }
> 
>     public void saveResults(String filename) throws FileNotFoundException,
> IOException {
>         File file = null;
>         FileOutputStream fos = null;
>         try {
>             file = new File(filename);
>             fos = new FileOutputStream(file);
>             this.wordDocument.write(fos);
>         }
>         finally {
>             if(fos != null) {
>                 try {
>                     fos.close();
>                     fos = null;
>                 }
>                 catch(Exception ex) {
>                     // I G N O R E
>                 }
>             }
>         }
>     }
> 
>     /**
>      * @param args the command line arguments
>      */
>     public static void main(String[] args) {
>         try {
>             SearchReplace sr = new SearchReplace();
>             sr.openTemplate("C:/temp/Test Document.doc");
>             sr.searchAndReplace();
>             //sr.searchReplace();
>             sr.saveResults("C:/temp/New Updated Document.doc");
>         }
>         catch(Exception ex) {
>             System.out.println("Caught an: " + ex.getClass().getName());
>             System.out.println("Message: " + ex.getMessage());
>             System.out.println("Stacktrace follows............");
>             ex.printStackTrace(System.out);
>         }
>     }
> }
> 
> More particularly, look at the main method. If you comment out the
> sr.searchAndReplace() and un-comment the sr.searchReplace() line, then the
> code will work successfully. But, and this is a BIG but, it will only work
> if you compile and run it against 3.2 FINAL of the API. I have found that
> later versions seem to 'drop' or lose the formatting information
> completely;
> to convince yourself of this, just modify the main method so that it
> contains only these lines of code;
> 
> SearchReplace sr = new SearchReplace();
> sr.openTemplate("C:/temp/Test Document.doc");
> sr.saveResults("C:/temp/New Updated Document.doc");
> 
> If you run that against versions later than 3.2 FINAL, you should see that
> the copy of the original document that this produces loses all of it's
> formatting.
> 
> Yours
> 
> Mark B
> 
> PS. I guess that it should go without saying, you will need to replace the
> paths to and document names passed to the openTemplate() and saveResults()
> methods to point to locations and files that exist on your machine.
> 
> PPS Forgive the lack of comments please. I hope that the it is apparant
> just
> what the methods do.
> 
> 
> Fabián Avilés Martínez wrote:
>>
>> Hi, as I told you, I have tried it, but with the same result, the
>> resulting file is corrupted, that is what MSWord says. My next approach
>> is
>> to create a copy file, and do modifications within this file. My problem
>> is that I do not know how to save modifications done in the charRuns of
>> the paragraphs, what I mean is to persist modifications done in the
>> resulting file, without have to coopy it, calling
>> document.write(outputStream)
>>
>> My code is:
>>
>> public File processFile(final InputStream is, final Map<String, String>
>> replacementText) throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             // Makes a copy of the file.
>>             File res = copyfile(is);
>>             InputStream auxIs = new FileInputStream(res);
>>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>>
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 Paragraph paragraph = range.getParagraph(i);
>>                 int numCharRuns = paragraph.numCharacterRuns();
>>                 for (int j = 0; j < numCharRuns; j++) {
>>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>>                     for (Iterator<String> it = keys.iterator();
>> it.hasNext();) {
>>                         String key = it.next();
>>                         if (charRun.text().contains(key)) {
>>                             String value = replacementText.get(key);
>>                             charRun.replaceText(key, value);
>>                             range = document.getRange();
>>                             paragraph = range.getParagraph(i);
>>                             charRun = paragraph.getCharacterRun(j);
>>                         }
>>                     }
>>                 }
>>             }
>>             is.close();
>>             return res;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>>
>>
>> Thanks in advance, Fabi.
>>
>> -----Mensaje original-----
>> De: MSB [mailto:markbrdsly@tiscali.co.uk]
>> Enviado el: martes, 24 de noviembre de 2009 8:43
>> Para: user@poi.apache.org
>> Asunto: Re: Modify word document
>>
>>
>> You have not dug down far enough into the structure of the document yet I
>> am
>> afraid - all of the formatting information is stopred (encapsulated)
>> within
>> the CharacterRun class and you need to perform the repllacements at that
>> level.
>>
>> I do not have any suitable code at hand as I type this so what follows
>> will
>> need to be converted into Java and tested;
>>
>> Open the Word document.
>> Get the overall Range for the document.
>> Get the number of Paragraph objects the Range contains.
>> Iterate through the Pargraphs and for each Pargraph
>>     Get the CharacterRun(s) the Paragraph contains.
>>     Call the method to replace the search term with the replacement text
>> on
>> the CharacterRun
>> Save the modified document away again.
>>
>> You do however face a couple of problems with this. It has been a long
>> time
>> since I tried to write a search and replace routine using HWPF and I
>> could
>> not get it to work if the replacement text was longer that the search
>> term.
>> In that case, HWPF threw an exception and would not allow me to complete
>> the
>> process; but that problem could well have been addressed by now as it was
>> well known and caused by faulty bounds checking within the Range class.
>> Only
>> testing will prove or disprove this for you I am afraid.
>>
>> Secondly, the CharacterRun class encapsulates a piece of text with common
>> properties. So, imagine that we are searching for the phrase 'search
>> term'
>> and that the word 'search' has been emboldened whilst the word 'term' has
>> been left as normal text, then my suggested approach will not work. That
>> is
>> because the words search and term will be held in different
>> CharacterRun(s).
>> If you do hit this problem, then I am afraid you will have to write code
>> that searches for the term at the Paragraph level and that identifies
>> where
>> the search terms can be found and recovers the CharacterRun(s) that
>> encapsulate them. Once you have these, you can modify the runs or create
>> and
>> substitute new ones but I have to admit that I have never tried to do
>> this
>> myself. Instead I chose to automate Word using OLE and to explore the
>> possibilities offered by OpenOffices UNO interface. Both options did work
>> but threw up other problems that proved more limiting (in terms of
>> architecture and platform). If you can get it to work, HWPF offers the
>> better solution IMO.
>>
>> Yours
>>
>> Mark B
>>
>>
>> Fabián Avilés Martínez wrote:
>>>
>>> Hi all,
>>>      I have a Word document, as a template: In this template there are
>>> some
>>> tokenized words, which have to be modified and the result has to be
>>> saved
>>> into another file. The original file has some properties, like header
>>> and
>>> footer, images, etc. The resulting file has to be the same, but with the
>>> modified words. I am trying it with the code below, but it does not
>>> work.
>>>
>>> public ByteArrayOutputStream processFile(final InputStream is, final
>>> Map<String, String> replacementText)
>>>         throws IOException {
>>>         Set<String> keys = replacementText.keySet();
>>>         try {
>>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>>             HWPFDocument document = new HWPFDocument(poifs);
>>>             Range range = document.getRange();
>>>
>>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>>                 String newTxt = range.getParagraph(i).text();
>>>                 String oldTxt = range.getParagraph(i).text();
>>>                 for (Iterator<String> it = keys.iterator();
>>> it.hasNext();)
>>> {
>>>                     String key = it.next();
>>>                     if (newTxt.contains(key)) {
>>>                         newTxt = replacePlaceholders(key,
>>> replacementText.get(key), newTxt);
>>>                     }
>>>                 }
>>>                 if (!oldTxt.equals(newTxt)) {
>>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>>                 }
>>>             }
>>>
>>>             // Save the document away.
>>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>>             document.write(bos);
>>>             bos.flush();
>>>             bos.close();
>>>             return bos;
>>>         } catch (IOException e) {
>>>             logger.error("Error procesando el fichero WORD: " + e);
>>>             throw new IOException("Error procesando el fichero WORD");
>>>         } finally {
>>>             if (is != null) {
>>>                 is.close();
>>>             }
>>>         }
>>>     }
>>>
>>> Any help, please?
>>>
>>> Thanks in advance, Fabi.
>>>
>>>
>>>
>>> ______________________
>>> This message including any attachments may contain confidential
>>> information, according to our Information Security Management System,
>>>  and intended solely for a specific individual to whom they are
>>> addressed.
>>>  Any unauthorised copy, disclosure or distribution of this message
>>>  is strictly forbidden. If you have received this transmission in error,
>>>  please notify the sender immediately and delete it.
>>>
>>> ______________________
>>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>>  puede contener informacion clasificada por su emisor como confidencial
>>>  en el marco de su Sistema de Gestion de Seguridad de la
>>> Informacion siendo para uso exclusivo del destinatario, quedando
>>> prohibida su divulgacion copia o distribucion a terceros sin la
>>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>>  erroneamente, se ruega lo notifique al remitente y proceda a su
>>> borrado.
>>> Gracias por su colaboracion.
>>>
>>> ______________________
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>>
> 
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26498547.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Modify word document

Posted by Fabián Avilés Martínez <fa...@gmv.com>.
Wow, thats great. At least I have new direction to work with. I have been struggling myself for at least three days. I can not try it today, but tomorrow wil be the first thing I am going to do. I will told you the results.

Thank you so nuch.

-----Mensaje original-----
De: MSB [mailto:markbrdsly@tiscali.co.uk]
Enviado el: martes, 24 de noviembre de 2009 16:51
Para: user@poi.apache.org
Asunto: RE: Modify word document


I have had the chance to play around with some code and I have to admit that
I was wrong, on two counts.

Firstly, if you do drill down to the level of the CharacterRun and perform a
replacement operation there, you will not retain the formatting applied to
the text, further more, it seems to fail completely; no replacements will be
made in the document at all. To have the search term be successfully
replaced, you DO need to operate at the Pargraph level.

Secondly, if the search term is shorter than the replacement term, then HWPF
will throw an exception. It seems quite happy to work if the replacement
term is equal to or longer - in terms of the number of characters - than the
search term.

Please see the code I have attached below;

/* ====================================================================
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
==================================================================== */

package newsearchreplace;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Set;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.CharacterRun;


/**
 *
 * @author win Mark Beardsley [msb at apache.org]
 * @version 1.00
 */
public class SearchReplace {

    private HashMap<String, String> searchTerms = null;
    private Set<String> searchKeys = null;
    private HWPFDocument wordDocument = null;

    public SearchReplace() {
        searchTerms = new HashMap<String, String>();
        // The first String is the text that will be searched for, the
second is what will be used to
        // replace it. Of course, it is possible to create more than one
search term, replacement text
        // pairing.
        searchTerms.put("replace", "tester");
        searchKeys = searchTerms.keySet();
    }

    public void openTemplate(String filename) throws FileNotFoundException,
IOException {
        File file = null;
        FileInputStream fis = null;
        try {
            file = new File(filename);
            fis = new FileInputStream(file);
            this.wordDocument = new HWPFDocument(fis);
        }
        finally {
            if(fis != null) {
                try {
                    fis.close();
                    fis = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    public void searchAndReplace() {
        Range docRange = this.wordDocument.getRange();
        int numParas = docRange.numParagraphs();
        for(int i = 0; i < numParas; i++) {
            Paragraph para = docRange.getParagraph(i);
            int numCharRuns = para.numCharacterRuns();
            for(int j = 0; j < numCharRuns; j++) {
                CharacterRun charRun = para.getCharacterRun(j);
                String text = charRun.text();
                for(String key : this.searchKeys) {
                    if(text.contains(key)) {
                        String replacementTerm = this.searchTerms.get(key);
                        charRun.replaceText(replacementTerm, key);
                        System.out.println("Found: " + key + " in " + text +
". Will replace with: " + replacementTerm);
                    }
                }
            }
        }

    }

    public void searchReplace() {
        Range docRange = this.wordDocument.getRange();
        int numParas = docRange.numParagraphs();
        for(int i = 0; i < numParas; i++) {
            Paragraph para = docRange.getParagraph(i);
            String text = para.text();
            for(String key : this.searchKeys) {
                if(text.contains(key)) {
                    String replacementTerm = this.searchTerms.get(key);
                    para.replaceText(key, replacementTerm);
                }
            }
        }
    }

    public void saveResults(String filename) throws FileNotFoundException,
IOException {
        File file = null;
        FileOutputStream fos = null;
        try {
            file = new File(filename);
            fos = new FileOutputStream(file);
            this.wordDocument.write(fos);
        }
        finally {
            if(fos != null) {
                try {
                    fos.close();
                    fos = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        try {
            SearchReplace sr = new SearchReplace();
            sr.openTemplate("C:/temp/Test Document.doc");
            sr.searchAndReplace();
            //sr.searchReplace();
            sr.saveResults("C:/temp/New Updated Document.doc");
        }
        catch(Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows............");
            ex.printStackTrace(System.out);
        }
    }
}

More particularly, look at the main method. If you comment out the
sr.searchAndReplace() and un-comment the sr.searchReplace() line, then the
code will work successfully. But, and this is a BIG but, it will only work
if you compile and run it against 3.2 FINAL of the API. I have found that
later versions seem to 'drop' or lose the formatting information completely;
to convince yourself of this, just modify the main method so that it
contains only these lines of code;

SearchReplace sr = new SearchReplace();
sr.openTemplate("C:/temp/Test Document.doc");
sr.saveResults("C:/temp/New Updated Document.doc");

If you run that against versions later than 3.2 FINAL, you should see that
the copy of the original document that this produces loses all of it's
formatting.

Yours

Mark B

PS. I guess that it should go without saying, you will need to replace the
paths to and document names passed to the openTemplate() and saveResults()
methods to point to locations and files that exist on your machine.

PPS Forgive the lack of comments please. I hope that the it is apparant just
what the methods do.


Fabián Avilés Martínez wrote:
>
> Hi, as I told you, I have tried it, but with the same result, the
> resulting file is corrupted, that is what MSWord says. My next approach is
> to create a copy file, and do modifications within this file. My problem
> is that I do not know how to save modifications done in the charRuns of
> the paragraphs, what I mean is to persist modifications done in the
> resulting file, without have to coopy it, calling
> document.write(outputStream)
>
> My code is:
>
> public File processFile(final InputStream is, final Map<String, String>
> replacementText) throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             // Makes a copy of the file.
>             File res = copyfile(is);
>             InputStream auxIs = new FileInputStream(res);
>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
>
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 Paragraph paragraph = range.getParagraph(i);
>                 int numCharRuns = paragraph.numCharacterRuns();
>                 for (int j = 0; j < numCharRuns; j++) {
>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>                     for (Iterator<String> it = keys.iterator();
> it.hasNext();) {
>                         String key = it.next();
>                         if (charRun.text().contains(key)) {
>                             String value = replacementText.get(key);
>                             charRun.replaceText(key, value);
>                             range = document.getRange();
>                             paragraph = range.getParagraph(i);
>                             charRun = paragraph.getCharacterRun(j);
>                         }
>                     }
>                 }
>             }
>             is.close();
>             return res;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
>
>
> Thanks in advance, Fabi.
>
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk]
> Enviado el: martes, 24 de noviembre de 2009 8:43
> Para: user@poi.apache.org
> Asunto: Re: Modify word document
>
>
> You have not dug down far enough into the structure of the document yet I
> am
> afraid - all of the formatting information is stopred (encapsulated)
> within
> the CharacterRun class and you need to perform the repllacements at that
> level.
>
> I do not have any suitable code at hand as I type this so what follows
> will
> need to be converted into Java and tested;
>
> Open the Word document.
> Get the overall Range for the document.
> Get the number of Paragraph objects the Range contains.
> Iterate through the Pargraphs and for each Pargraph
>     Get the CharacterRun(s) the Paragraph contains.
>     Call the method to replace the search term with the replacement text
> on
> the CharacterRun
> Save the modified document away again.
>
> You do however face a couple of problems with this. It has been a long
> time
> since I tried to write a search and replace routine using HWPF and I could
> not get it to work if the replacement text was longer that the search
> term.
> In that case, HWPF threw an exception and would not allow me to complete
> the
> process; but that problem could well have been addressed by now as it was
> well known and caused by faulty bounds checking within the Range class.
> Only
> testing will prove or disprove this for you I am afraid.
>
> Secondly, the CharacterRun class encapsulates a piece of text with common
> properties. So, imagine that we are searching for the phrase 'search term'
> and that the word 'search' has been emboldened whilst the word 'term' has
> been left as normal text, then my suggested approach will not work. That
> is
> because the words search and term will be held in different
> CharacterRun(s).
> If you do hit this problem, then I am afraid you will have to write code
> that searches for the term at the Paragraph level and that identifies
> where
> the search terms can be found and recovers the CharacterRun(s) that
> encapsulate them. Once you have these, you can modify the runs or create
> and
> substitute new ones but I have to admit that I have never tried to do this
> myself. Instead I chose to automate Word using OLE and to explore the
> possibilities offered by OpenOffices UNO interface. Both options did work
> but threw up other problems that proved more limiting (in terms of
> architecture and platform). If you can get it to work, HWPF offers the
> better solution IMO.
>
> Yours
>
> Mark B
>
>
> Fabián Avilés Martínez wrote:
>>
>> Hi all,
>>      I have a Word document, as a template: In this template there are some
>> tokenized words, which have to be modified and the result has to be saved
>> into another file. The original file has some properties, like header and
>> footer, images, etc. The resulting file has to be the same, but with the
>> modified words. I am trying it with the code below, but it does not work.
>>
>> public ByteArrayOutputStream processFile(final InputStream is, final
>> Map<String, String> replacementText)
>>         throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>>
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 String newTxt = range.getParagraph(i).text();
>>                 String oldTxt = range.getParagraph(i).text();
>>                 for (Iterator<String> it = keys.iterator();
>> it.hasNext();)
>> {
>>                     String key = it.next();
>>                     if (newTxt.contains(key)) {
>>                         newTxt = replacePlaceholders(key,
>> replacementText.get(key), newTxt);
>>                     }
>>                 }
>>                 if (!oldTxt.equals(newTxt)) {
>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>                 }
>>             }
>>
>>             // Save the document away.
>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>             document.write(bos);
>>             bos.flush();
>>             bos.close();
>>             return bos;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>>
>> Any help, please?
>>
>> Thanks in advance, Fabi.
>>
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
> ______________________
> This message including any attachments may contain confidential
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
>
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la
> Informacion siendo para uso exclusivo del destinatario, quedando
> prohibida su divulgacion copia o distribucion a terceros sin la
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
> Gracias por su colaboracion.
>
> ______________________
>
>
>

--
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


______________________
This message including any attachments may contain confidential 
information, according to our Information Security Management System,
 and intended solely for a specific individual to whom they are addressed.
 Any unauthorised copy, disclosure or distribution of this message
 is strictly forbidden. If you have received this transmission in error,
 please notify the sender immediately and delete it.

______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
 puede contener informacion clasificada por su emisor como confidencial
 en el marco de su Sistema de Gestion de Seguridad de la 
Informacion siendo para uso exclusivo del destinatario, quedando 
prohibida su divulgacion copia o distribucion a terceros sin la 
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
 erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
Gracias por su colaboracion.

______________________


RE: Modify word document

Posted by Fabián Avilés Martínez <fa...@gmv.com>.
Hi Mark,
        I could not achieve it, I think in giving up and I am going to try it with OpenOffice API, using UNO. Thanks for your effort and answers.

Yours, Fabi.

Fabián Avilés Martínez
Área de desarrollo de software /
Software Development Area



GMV SOLUCIONES
GLOBALES INTERNET, S.A.
Avda. Américo Vespucio
Edificio Cartuja, Bloque E, 1ª Pta.
E-41092 Sevilla
Tel. +34 95 408 80 60
Fax +34 95 408 12 33
www.gmv.com
www.gmv-sgi.com

Antes de imprimir este mensaje, asegúrate de que es necesario. Proteger el medio ambiente está también en tu mano

-----Mensaje original-----
De: MSB [mailto:markbrdsly@tiscali.co.uk]
Enviado el: martes, 24 de noviembre de 2009 16:51
Para: user@poi.apache.org
Asunto: RE: Modify word document


I have had the chance to play around with some code and I have to admit that
I was wrong, on two counts.

Firstly, if you do drill down to the level of the CharacterRun and perform a
replacement operation there, you will not retain the formatting applied to
the text, further more, it seems to fail completely; no replacements will be
made in the document at all. To have the search term be successfully
replaced, you DO need to operate at the Pargraph level.

Secondly, if the search term is shorter than the replacement term, then HWPF
will throw an exception. It seems quite happy to work if the replacement
term is equal to or longer - in terms of the number of characters - than the
search term.

Please see the code I have attached below;

/* ====================================================================
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
==================================================================== */

package newsearchreplace;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Set;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.CharacterRun;


/**
 *
 * @author win Mark Beardsley [msb at apache.org]
 * @version 1.00
 */
public class SearchReplace {

    private HashMap<String, String> searchTerms = null;
    private Set<String> searchKeys = null;
    private HWPFDocument wordDocument = null;

    public SearchReplace() {
        searchTerms = new HashMap<String, String>();
        // The first String is the text that will be searched for, the
second is what will be used to
        // replace it. Of course, it is possible to create more than one
search term, replacement text
        // pairing.
        searchTerms.put("replace", "tester");
        searchKeys = searchTerms.keySet();
    }

    public void openTemplate(String filename) throws FileNotFoundException,
IOException {
        File file = null;
        FileInputStream fis = null;
        try {
            file = new File(filename);
            fis = new FileInputStream(file);
            this.wordDocument = new HWPFDocument(fis);
        }
        finally {
            if(fis != null) {
                try {
                    fis.close();
                    fis = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    public void searchAndReplace() {
        Range docRange = this.wordDocument.getRange();
        int numParas = docRange.numParagraphs();
        for(int i = 0; i < numParas; i++) {
            Paragraph para = docRange.getParagraph(i);
            int numCharRuns = para.numCharacterRuns();
            for(int j = 0; j < numCharRuns; j++) {
                CharacterRun charRun = para.getCharacterRun(j);
                String text = charRun.text();
                for(String key : this.searchKeys) {
                    if(text.contains(key)) {
                        String replacementTerm = this.searchTerms.get(key);
                        charRun.replaceText(replacementTerm, key);
                        System.out.println("Found: " + key + " in " + text +
". Will replace with: " + replacementTerm);
                    }
                }
            }
        }

    }

    public void searchReplace() {
        Range docRange = this.wordDocument.getRange();
        int numParas = docRange.numParagraphs();
        for(int i = 0; i < numParas; i++) {
            Paragraph para = docRange.getParagraph(i);
            String text = para.text();
            for(String key : this.searchKeys) {
                if(text.contains(key)) {
                    String replacementTerm = this.searchTerms.get(key);
                    para.replaceText(key, replacementTerm);
                }
            }
        }
    }

    public void saveResults(String filename) throws FileNotFoundException,
IOException {
        File file = null;
        FileOutputStream fos = null;
        try {
            file = new File(filename);
            fos = new FileOutputStream(file);
            this.wordDocument.write(fos);
        }
        finally {
            if(fos != null) {
                try {
                    fos.close();
                    fos = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        try {
            SearchReplace sr = new SearchReplace();
            sr.openTemplate("C:/temp/Test Document.doc");
            sr.searchAndReplace();
            //sr.searchReplace();
            sr.saveResults("C:/temp/New Updated Document.doc");
        }
        catch(Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows............");
            ex.printStackTrace(System.out);
        }
    }
}

More particularly, look at the main method. If you comment out the
sr.searchAndReplace() and un-comment the sr.searchReplace() line, then the
code will work successfully. But, and this is a BIG but, it will only work
if you compile and run it against 3.2 FINAL of the API. I have found that
later versions seem to 'drop' or lose the formatting information completely;
to convince yourself of this, just modify the main method so that it
contains only these lines of code;

SearchReplace sr = new SearchReplace();
sr.openTemplate("C:/temp/Test Document.doc");
sr.saveResults("C:/temp/New Updated Document.doc");

If you run that against versions later than 3.2 FINAL, you should see that
the copy of the original document that this produces loses all of it's
formatting.

Yours

Mark B

PS. I guess that it should go without saying, you will need to replace the
paths to and document names passed to the openTemplate() and saveResults()
methods to point to locations and files that exist on your machine.

PPS Forgive the lack of comments please. I hope that the it is apparant just
what the methods do.


Fabián Avilés Martínez wrote:
>
> Hi, as I told you, I have tried it, but with the same result, the
> resulting file is corrupted, that is what MSWord says. My next approach is
> to create a copy file, and do modifications within this file. My problem
> is that I do not know how to save modifications done in the charRuns of
> the paragraphs, what I mean is to persist modifications done in the
> resulting file, without have to coopy it, calling
> document.write(outputStream)
>
> My code is:
>
> public File processFile(final InputStream is, final Map<String, String>
> replacementText) throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             // Makes a copy of the file.
>             File res = copyfile(is);
>             InputStream auxIs = new FileInputStream(res);
>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
>
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 Paragraph paragraph = range.getParagraph(i);
>                 int numCharRuns = paragraph.numCharacterRuns();
>                 for (int j = 0; j < numCharRuns; j++) {
>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>                     for (Iterator<String> it = keys.iterator();
> it.hasNext();) {
>                         String key = it.next();
>                         if (charRun.text().contains(key)) {
>                             String value = replacementText.get(key);
>                             charRun.replaceText(key, value);
>                             range = document.getRange();
>                             paragraph = range.getParagraph(i);
>                             charRun = paragraph.getCharacterRun(j);
>                         }
>                     }
>                 }
>             }
>             is.close();
>             return res;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
>
>
> Thanks in advance, Fabi.
>
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk]
> Enviado el: martes, 24 de noviembre de 2009 8:43
> Para: user@poi.apache.org
> Asunto: Re: Modify word document
>
>
> You have not dug down far enough into the structure of the document yet I
> am
> afraid - all of the formatting information is stopred (encapsulated)
> within
> the CharacterRun class and you need to perform the repllacements at that
> level.
>
> I do not have any suitable code at hand as I type this so what follows
> will
> need to be converted into Java and tested;
>
> Open the Word document.
> Get the overall Range for the document.
> Get the number of Paragraph objects the Range contains.
> Iterate through the Pargraphs and for each Pargraph
>     Get the CharacterRun(s) the Paragraph contains.
>     Call the method to replace the search term with the replacement text
> on
> the CharacterRun
> Save the modified document away again.
>
> You do however face a couple of problems with this. It has been a long
> time
> since I tried to write a search and replace routine using HWPF and I could
> not get it to work if the replacement text was longer that the search
> term.
> In that case, HWPF threw an exception and would not allow me to complete
> the
> process; but that problem could well have been addressed by now as it was
> well known and caused by faulty bounds checking within the Range class.
> Only
> testing will prove or disprove this for you I am afraid.
>
> Secondly, the CharacterRun class encapsulates a piece of text with common
> properties. So, imagine that we are searching for the phrase 'search term'
> and that the word 'search' has been emboldened whilst the word 'term' has
> been left as normal text, then my suggested approach will not work. That
> is
> because the words search and term will be held in different
> CharacterRun(s).
> If you do hit this problem, then I am afraid you will have to write code
> that searches for the term at the Paragraph level and that identifies
> where
> the search terms can be found and recovers the CharacterRun(s) that
> encapsulate them. Once you have these, you can modify the runs or create
> and
> substitute new ones but I have to admit that I have never tried to do this
> myself. Instead I chose to automate Word using OLE and to explore the
> possibilities offered by OpenOffices UNO interface. Both options did work
> but threw up other problems that proved more limiting (in terms of
> architecture and platform). If you can get it to work, HWPF offers the
> better solution IMO.
>
> Yours
>
> Mark B
>
>
> Fabián Avilés Martínez wrote:
>>
>> Hi all,
>>      I have a Word document, as a template: In this template there are some
>> tokenized words, which have to be modified and the result has to be saved
>> into another file. The original file has some properties, like header and
>> footer, images, etc. The resulting file has to be the same, but with the
>> modified words. I am trying it with the code below, but it does not work.
>>
>> public ByteArrayOutputStream processFile(final InputStream is, final
>> Map<String, String> replacementText)
>>         throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>>
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 String newTxt = range.getParagraph(i).text();
>>                 String oldTxt = range.getParagraph(i).text();
>>                 for (Iterator<String> it = keys.iterator();
>> it.hasNext();)
>> {
>>                     String key = it.next();
>>                     if (newTxt.contains(key)) {
>>                         newTxt = replacePlaceholders(key,
>> replacementText.get(key), newTxt);
>>                     }
>>                 }
>>                 if (!oldTxt.equals(newTxt)) {
>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>                 }
>>             }
>>
>>             // Save the document away.
>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>             document.write(bos);
>>             bos.flush();
>>             bos.close();
>>             return bos;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>>
>> Any help, please?
>>
>> Thanks in advance, Fabi.
>>
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>
> ______________________
> This message including any attachments may contain confidential
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
>
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la
> Informacion siendo para uso exclusivo del destinatario, quedando
> prohibida su divulgacion copia o distribucion a terceros sin la
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
> Gracias por su colaboracion.
>
> ______________________
>
>
>

--
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


______________________
This message including any attachments may contain confidential 
information, according to our Information Security Management System,
 and intended solely for a specific individual to whom they are addressed.
 Any unauthorised copy, disclosure or distribution of this message
 is strictly forbidden. If you have received this transmission in error,
 please notify the sender immediately and delete it.

______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
 puede contener informacion clasificada por su emisor como confidencial
 en el marco de su Sistema de Gestion de Seguridad de la 
Informacion siendo para uso exclusivo del destinatario, quedando 
prohibida su divulgacion copia o distribucion a terceros sin la 
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
 erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
Gracias por su colaboracion.

______________________


RE: Modify word document

Posted by MSB <ma...@tiscali.co.uk>.
I have had the chance to play around with some code and I have to admit that
I was wrong, on two counts.

Firstly, if you do drill down to the level of the CharacterRun and perform a
replacement operation there, you will not retain the formatting applied to
the text, further more, it seems to fail completely; no replacements will be
made in the document at all. To have the search term be successfully
replaced, you DO need to operate at the Pargraph level.

Secondly, if the search term is shorter than the replacement term, then HWPF
will throw an exception. It seems quite happy to work if the replacement
term is equal to or longer - in terms of the number of characters - than the
search term.

Please see the code I have attached below;

/* ====================================================================
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
==================================================================== */

package newsearchreplace;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Set;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.CharacterRun;


/**
 *
 * @author win Mark Beardsley [msb at apache.org]
 * @version 1.00
 */
public class SearchReplace {

    private HashMap<String, String> searchTerms = null;
    private Set<String> searchKeys = null;
    private HWPFDocument wordDocument = null;

    public SearchReplace() {
        searchTerms = new HashMap<String, String>();
        // The first String is the text that will be searched for, the
second is what will be used to
        // replace it. Of course, it is possible to create more than one
search term, replacement text
        // pairing.
        searchTerms.put("replace", "tester");
        searchKeys = searchTerms.keySet();
    }

    public void openTemplate(String filename) throws FileNotFoundException,
IOException {
        File file = null;
        FileInputStream fis = null;
        try {
            file = new File(filename);
            fis = new FileInputStream(file);
            this.wordDocument = new HWPFDocument(fis);
        }
        finally {
            if(fis != null) {
                try {
                    fis.close();
                    fis = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    public void searchAndReplace() {
        Range docRange = this.wordDocument.getRange();
        int numParas = docRange.numParagraphs();
        for(int i = 0; i < numParas; i++) {
            Paragraph para = docRange.getParagraph(i);
            int numCharRuns = para.numCharacterRuns();
            for(int j = 0; j < numCharRuns; j++) {
                CharacterRun charRun = para.getCharacterRun(j);
                String text = charRun.text();
                for(String key : this.searchKeys) {
                    if(text.contains(key)) {
                        String replacementTerm = this.searchTerms.get(key);
                        charRun.replaceText(replacementTerm, key);
                        System.out.println("Found: " + key + " in " + text +
". Will replace with: " + replacementTerm);
                    }
                }
            }
        }

    }

    public void searchReplace() {
        Range docRange = this.wordDocument.getRange();
        int numParas = docRange.numParagraphs();
        for(int i = 0; i < numParas; i++) {
            Paragraph para = docRange.getParagraph(i);
            String text = para.text();
            for(String key : this.searchKeys) {
                if(text.contains(key)) {
                    String replacementTerm = this.searchTerms.get(key);
                    para.replaceText(key, replacementTerm);
                }
            }
        }
    }

    public void saveResults(String filename) throws FileNotFoundException,
IOException {
        File file = null;
        FileOutputStream fos = null;
        try {
            file = new File(filename);
            fos = new FileOutputStream(file);
            this.wordDocument.write(fos);
        }
        finally {
            if(fos != null) {
                try {
                    fos.close();
                    fos = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        try {
            SearchReplace sr = new SearchReplace();
            sr.openTemplate("C:/temp/Test Document.doc");
            sr.searchAndReplace();
            //sr.searchReplace();
            sr.saveResults("C:/temp/New Updated Document.doc");
        }
        catch(Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows............");
            ex.printStackTrace(System.out);
        }
    }
}

More particularly, look at the main method. If you comment out the
sr.searchAndReplace() and un-comment the sr.searchReplace() line, then the
code will work successfully. But, and this is a BIG but, it will only work
if you compile and run it against 3.2 FINAL of the API. I have found that
later versions seem to 'drop' or lose the formatting information completely;
to convince yourself of this, just modify the main method so that it
contains only these lines of code;

SearchReplace sr = new SearchReplace();
sr.openTemplate("C:/temp/Test Document.doc");
sr.saveResults("C:/temp/New Updated Document.doc");

If you run that against versions later than 3.2 FINAL, you should see that
the copy of the original document that this produces loses all of it's
formatting.

Yours

Mark B

PS. I guess that it should go without saying, you will need to replace the
paths to and document names passed to the openTemplate() and saveResults()
methods to point to locations and files that exist on your machine. 

PPS Forgive the lack of comments please. I hope that the it is apparant just
what the methods do.


Fabián Avilés Martínez wrote:
> 
> Hi, as I told you, I have tried it, but with the same result, the
> resulting file is corrupted, that is what MSWord says. My next approach is
> to create a copy file, and do modifications within this file. My problem
> is that I do not know how to save modifications done in the charRuns of
> the paragraphs, what I mean is to persist modifications done in the
> resulting file, without have to coopy it, calling
> document.write(outputStream)
> 
> My code is:
> 
> public File processFile(final InputStream is, final Map<String, String>
> replacementText) throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             // Makes a copy of the file.
>             File res = copyfile(is);
>             InputStream auxIs = new FileInputStream(res);
>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
> 
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 Paragraph paragraph = range.getParagraph(i);
>                 int numCharRuns = paragraph.numCharacterRuns();
>                 for (int j = 0; j < numCharRuns; j++) {
>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>                     for (Iterator<String> it = keys.iterator();
> it.hasNext();) {
>                         String key = it.next();
>                         if (charRun.text().contains(key)) {
>                             String value = replacementText.get(key);
>                             charRun.replaceText(key, value);
>                             range = document.getRange();
>                             paragraph = range.getParagraph(i);
>                             charRun = paragraph.getCharacterRun(j);
>                         }
>                     }
>                 }
>             }
>             is.close();
>             return res;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
> 
> 
> Thanks in advance, Fabi.
> 
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk] 
> Enviado el: martes, 24 de noviembre de 2009 8:43
> Para: user@poi.apache.org
> Asunto: Re: Modify word document
> 
> 
> You have not dug down far enough into the structure of the document yet I
> am
> afraid - all of the formatting information is stopred (encapsulated)
> within
> the CharacterRun class and you need to perform the repllacements at that
> level.
> 
> I do not have any suitable code at hand as I type this so what follows
> will
> need to be converted into Java and tested;
> 
> Open the Word document.
> Get the overall Range for the document.
> Get the number of Paragraph objects the Range contains.
> Iterate through the Pargraphs and for each Pargraph
>     Get the CharacterRun(s) the Paragraph contains.
>     Call the method to replace the search term with the replacement text
> on
> the CharacterRun
> Save the modified document away again.
> 
> You do however face a couple of problems with this. It has been a long
> time
> since I tried to write a search and replace routine using HWPF and I could
> not get it to work if the replacement text was longer that the search
> term.
> In that case, HWPF threw an exception and would not allow me to complete
> the
> process; but that problem could well have been addressed by now as it was
> well known and caused by faulty bounds checking within the Range class.
> Only
> testing will prove or disprove this for you I am afraid.
> 
> Secondly, the CharacterRun class encapsulates a piece of text with common
> properties. So, imagine that we are searching for the phrase 'search term'
> and that the word 'search' has been emboldened whilst the word 'term' has
> been left as normal text, then my suggested approach will not work. That
> is
> because the words search and term will be held in different
> CharacterRun(s).
> If you do hit this problem, then I am afraid you will have to write code
> that searches for the term at the Paragraph level and that identifies
> where
> the search terms can be found and recovers the CharacterRun(s) that
> encapsulate them. Once you have these, you can modify the runs or create
> and
> substitute new ones but I have to admit that I have never tried to do this
> myself. Instead I chose to automate Word using OLE and to explore the
> possibilities offered by OpenOffices UNO interface. Both options did work
> but threw up other problems that proved more limiting (in terms of
> architecture and platform). If you can get it to work, HWPF offers the
> better solution IMO.
> 
> Yours
> 
> Mark B
> 
> 
> Fabián Avilés Martínez wrote:
>> 
>> Hi all,
>> 	I have a Word document, as a template: In this template there are some
>> tokenized words, which have to be modified and the result has to be saved
>> into another file. The original file has some properties, like header and
>> footer, images, etc. The resulting file has to be the same, but with the
>> modified words. I am trying it with the code below, but it does not work.
>> 
>> public ByteArrayOutputStream processFile(final InputStream is, final
>> Map<String, String> replacementText)
>>         throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>> 
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 String newTxt = range.getParagraph(i).text();
>>                 String oldTxt = range.getParagraph(i).text();
>>                 for (Iterator<String> it = keys.iterator();
>> it.hasNext();)
>> {
>>                     String key = it.next();
>>                     if (newTxt.contains(key)) {
>>                         newTxt = replacePlaceholders(key,
>> replacementText.get(key), newTxt);
>>                     }
>>                 }
>>                 if (!oldTxt.equals(newTxt)) {
>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>                 }
>>             }
>> 
>>             // Save the document away.
>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>             document.write(bos);
>>             bos.flush();
>>             bos.close();
>>             return bos;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>> 
>> Any help, please?
>> 
>> Thanks in advance, Fabi.
>> 
>> 
>> 
>> ______________________
>> This message including any attachments may contain confidential 
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>> 
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la 
>> Informacion siendo para uso exclusivo del destinatario, quedando 
>> prohibida su divulgacion copia o distribucion a terceros sin la 
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
>> Gracias por su colaboracion.
>> 
>> ______________________
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Modify word document

Posted by MSB <ma...@tiscali.co.uk>.
I am surprised at that because I was able to create a search/replace routine
that worked as long as the search term was longer than the replacement text.
That limitation aside, and as far as I am aware, it has not yet been
possible to create a fully working search/replace routine using HWPF as the
API is still very immature.

One problem you are likely to face when replacing the CharacterRun(s) is
that HWPF will not alow you to make more than one modification to the
formatting of the text in the 'new' CharacterRun - at least, it did not the
last time I tried it. I was able to set the font for example OR to set the
colour of the font but if I tried to do both HWPF threw an exception. If I
have the time later today when I am back at 'my' PC, I will try to look out
the test code I was putting together to demonstrate how to use the API to
create Word documents and run the tests again to see if I get the same
problems. The other problem you are likely to face is locating the
CharacterRun that has to be replaced and then actually substituting one run
for another. I have never tried to do this and imagine that they are
maintained within the Paragraph object as a list, but I do not know if it is
possible to get at the index number of the CharacterRun you intend to
replace so that you can place the new CharacterRun into the correct
location. Further, this may well corrupt at least one of the pointers that
the Document maintains. Word .doc files are composed of one or more streams
of data and each stream can be thought of as a linked list. The File
Information Block stores imortant information about the location of the
streams in the file - just as an example, it records where the document's
text starts and where it ends. Now, imagine what might happen if we change
the amount of text in the file and do not set that value stored in the FIB
correctly. This is what I fear could happen if there is no existing
mechanism allowing us to swap out CharaterRun(s) form Paragraph(s). Further,
what knock on effects could this have for the other linked lists? It may
well render all of the pointers used to establish those links inaccurate.

Of course, I could very well be wrong as I am typing this without access to
the javadoc and do not know if there is a method defined on the Paragraph
class to allow us to insert/delete a CharacterRun.

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi, as I told you, I have tried it, but with the same result, the
> resulting file is corrupted, that is what MSWord says. My next approach is
> to create a copy file, and do modifications within this file. My problem
> is that I do not know how to save modifications done in the charRuns of
> the paragraphs, what I mean is to persist modifications done in the
> resulting file, without have to coopy it, calling
> document.write(outputStream)
> 
> My code is:
> 
> public File processFile(final InputStream is, final Map<String, String>
> replacementText) throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             // Makes a copy of the file.
>             File res = copyfile(is);
>             InputStream auxIs = new FileInputStream(res);
>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
> 
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 Paragraph paragraph = range.getParagraph(i);
>                 int numCharRuns = paragraph.numCharacterRuns();
>                 for (int j = 0; j < numCharRuns; j++) {
>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>                     for (Iterator<String> it = keys.iterator();
> it.hasNext();) {
>                         String key = it.next();
>                         if (charRun.text().contains(key)) {
>                             String value = replacementText.get(key);
>                             charRun.replaceText(key, value);
>                             range = document.getRange();
>                             paragraph = range.getParagraph(i);
>                             charRun = paragraph.getCharacterRun(j);
>                         }
>                     }
>                 }
>             }
>             is.close();
>             return res;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
> 
> 
> Thanks in advance, Fabi.
> 
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk] 
> Enviado el: martes, 24 de noviembre de 2009 8:43
> Para: user@poi.apache.org
> Asunto: Re: Modify word document
> 
> 
> You have not dug down far enough into the structure of the document yet I
> am
> afraid - all of the formatting information is stopred (encapsulated)
> within
> the CharacterRun class and you need to perform the repllacements at that
> level.
> 
> I do not have any suitable code at hand as I type this so what follows
> will
> need to be converted into Java and tested;
> 
> Open the Word document.
> Get the overall Range for the document.
> Get the number of Paragraph objects the Range contains.
> Iterate through the Pargraphs and for each Pargraph
>     Get the CharacterRun(s) the Paragraph contains.
>     Call the method to replace the search term with the replacement text
> on
> the CharacterRun
> Save the modified document away again.
> 
> You do however face a couple of problems with this. It has been a long
> time
> since I tried to write a search and replace routine using HWPF and I could
> not get it to work if the replacement text was longer that the search
> term.
> In that case, HWPF threw an exception and would not allow me to complete
> the
> process; but that problem could well have been addressed by now as it was
> well known and caused by faulty bounds checking within the Range class.
> Only
> testing will prove or disprove this for you I am afraid.
> 
> Secondly, the CharacterRun class encapsulates a piece of text with common
> properties. So, imagine that we are searching for the phrase 'search term'
> and that the word 'search' has been emboldened whilst the word 'term' has
> been left as normal text, then my suggested approach will not work. That
> is
> because the words search and term will be held in different
> CharacterRun(s).
> If you do hit this problem, then I am afraid you will have to write code
> that searches for the term at the Paragraph level and that identifies
> where
> the search terms can be found and recovers the CharacterRun(s) that
> encapsulate them. Once you have these, you can modify the runs or create
> and
> substitute new ones but I have to admit that I have never tried to do this
> myself. Instead I chose to automate Word using OLE and to explore the
> possibilities offered by OpenOffices UNO interface. Both options did work
> but threw up other problems that proved more limiting (in terms of
> architecture and platform). If you can get it to work, HWPF offers the
> better solution IMO.
> 
> Yours
> 
> Mark B
> 
> 
> Fabián Avilés Martínez wrote:
>> 
>> Hi all,
>> 	I have a Word document, as a template: In this template there are some
>> tokenized words, which have to be modified and the result has to be saved
>> into another file. The original file has some properties, like header and
>> footer, images, etc. The resulting file has to be the same, but with the
>> modified words. I am trying it with the code below, but it does not work.
>> 
>> public ByteArrayOutputStream processFile(final InputStream is, final
>> Map<String, String> replacementText)
>>         throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>> 
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 String newTxt = range.getParagraph(i).text();
>>                 String oldTxt = range.getParagraph(i).text();
>>                 for (Iterator<String> it = keys.iterator();
>> it.hasNext();)
>> {
>>                     String key = it.next();
>>                     if (newTxt.contains(key)) {
>>                         newTxt = replacePlaceholders(key,
>> replacementText.get(key), newTxt);
>>                     }
>>                 }
>>                 if (!oldTxt.equals(newTxt)) {
>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>                 }
>>             }
>> 
>>             // Save the document away.
>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>             document.write(bos);
>>             bos.flush();
>>             bos.close();
>>             return bos;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>> 
>> Any help, please?
>> 
>> Thanks in advance, Fabi.
>> 
>> 
>> 
>> ______________________
>> This message including any attachments may contain confidential 
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>> 
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la 
>> Informacion siendo para uso exclusivo del destinatario, quedando 
>> prohibida su divulgacion copia o distribucion a terceros sin la 
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
>> Gracias por su colaboracion.
>> 
>> ______________________
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26495362.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Modify word document

Posted by Fabián Avilés Martínez <fa...@gmv.com>.
Hi, as I told you, I have tried it, but with the same result, the resulting file is corrupted, that is what MSWord says. My next approach is to create a copy file, and do modifications within this file. My problem is that I do not know how to save modifications done in the charRuns of the paragraphs, what I mean is to persist modifications done in the resulting file, without have to coopy it, calling document.write(outputStream)

My code is:

public File processFile(final InputStream is, final Map<String, String> replacementText) throws IOException {
        Set<String> keys = replacementText.keySet();
        try {
            // Makes a copy of the file.
            File res = copyfile(is);
            InputStream auxIs = new FileInputStream(res);
            POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
            HWPFDocument document = new HWPFDocument(poifs);
            Range range = document.getRange();

            for (int i = 0; i < range.numParagraphs(); i++) {
                Paragraph paragraph = range.getParagraph(i);
                int numCharRuns = paragraph.numCharacterRuns();
                for (int j = 0; j < numCharRuns; j++) {
                    CharacterRun charRun = paragraph.getCharacterRun(j);
                    for (Iterator<String> it = keys.iterator(); it.hasNext();) {
                        String key = it.next();
                        if (charRun.text().contains(key)) {
                            String value = replacementText.get(key);
                            charRun.replaceText(key, value);
                            range = document.getRange();
                            paragraph = range.getParagraph(i);
                            charRun = paragraph.getCharacterRun(j);
                        }
                    }
                }
            }
            is.close();
            return res;
        } catch (IOException e) {
            logger.error("Error procesando el fichero WORD: " + e);
            throw new IOException("Error procesando el fichero WORD");
        } finally {
            if (is != null) {
                is.close();
            }
        }
    }


Thanks in advance, Fabi.

-----Mensaje original-----
De: MSB [mailto:markbrdsly@tiscali.co.uk] 
Enviado el: martes, 24 de noviembre de 2009 8:43
Para: user@poi.apache.org
Asunto: Re: Modify word document


You have not dug down far enough into the structure of the document yet I am
afraid - all of the formatting information is stopred (encapsulated) within
the CharacterRun class and you need to perform the repllacements at that
level.

I do not have any suitable code at hand as I type this so what follows will
need to be converted into Java and tested;

Open the Word document.
Get the overall Range for the document.
Get the number of Paragraph objects the Range contains.
Iterate through the Pargraphs and for each Pargraph
    Get the CharacterRun(s) the Paragraph contains.
    Call the method to replace the search term with the replacement text on
the CharacterRun
Save the modified document away again.

You do however face a couple of problems with this. It has been a long time
since I tried to write a search and replace routine using HWPF and I could
not get it to work if the replacement text was longer that the search term.
In that case, HWPF threw an exception and would not allow me to complete the
process; but that problem could well have been addressed by now as it was
well known and caused by faulty bounds checking within the Range class. Only
testing will prove or disprove this for you I am afraid.

Secondly, the CharacterRun class encapsulates a piece of text with common
properties. So, imagine that we are searching for the phrase 'search term'
and that the word 'search' has been emboldened whilst the word 'term' has
been left as normal text, then my suggested approach will not work. That is
because the words search and term will be held in different CharacterRun(s).
If you do hit this problem, then I am afraid you will have to write code
that searches for the term at the Paragraph level and that identifies where
the search terms can be found and recovers the CharacterRun(s) that
encapsulate them. Once you have these, you can modify the runs or create and
substitute new ones but I have to admit that I have never tried to do this
myself. Instead I chose to automate Word using OLE and to explore the
possibilities offered by OpenOffices UNO interface. Both options did work
but threw up other problems that proved more limiting (in terms of
architecture and platform). If you can get it to work, HWPF offers the
better solution IMO.

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi all,
> 	I have a Word document, as a template: In this template there are some
> tokenized words, which have to be modified and the result has to be saved
> into another file. The original file has some properties, like header and
> footer, images, etc. The resulting file has to be the same, but with the
> modified words. I am trying it with the code below, but it does not work.
> 
> public ByteArrayOutputStream processFile(final InputStream is, final
> Map<String, String> replacementText)
>         throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
> 
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 String newTxt = range.getParagraph(i).text();
>                 String oldTxt = range.getParagraph(i).text();
>                 for (Iterator<String> it = keys.iterator(); it.hasNext();)
> {
>                     String key = it.next();
>                     if (newTxt.contains(key)) {
>                         newTxt = replacePlaceholders(key,
> replacementText.get(key), newTxt);
>                     }
>                 }
>                 if (!oldTxt.equals(newTxt)) {
>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>                 }
>             }
> 
>             // Save the document away.
>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>             document.write(bos);
>             bos.flush();
>             bos.close();
>             return bos;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
> 
> Any help, please?
> 
> Thanks in advance, Fabi.
> 
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


______________________
This message including any attachments may contain confidential 
information, according to our Information Security Management System,
 and intended solely for a specific individual to whom they are addressed.
 Any unauthorised copy, disclosure or distribution of this message
 is strictly forbidden. If you have received this transmission in error,
 please notify the sender immediately and delete it.

______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
 puede contener informacion clasificada por su emisor como confidencial
 en el marco de su Sistema de Gestion de Seguridad de la 
Informacion siendo para uso exclusivo del destinatario, quedando 
prohibida su divulgacion copia o distribucion a terceros sin la 
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
 erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
Gracias por su colaboracion.

______________________


Re: Modify word document

Posted by MSB <ma...@tiscali.co.uk>.
You have not dug down far enough into the structure of the document yet I am
afraid - all of the formatting information is stopred (encapsulated) within
the CharacterRun class and you need to perform the repllacements at that
level.

I do not have any suitable code at hand as I type this so what follows will
need to be converted into Java and tested;

Open the Word document.
Get the overall Range for the document.
Get the number of Paragraph objects the Range contains.
Iterate through the Pargraphs and for each Pargraph
    Get the CharacterRun(s) the Paragraph contains.
    Call the method to replace the search term with the replacement text on
the CharacterRun
Save the modified document away again.

You do however face a couple of problems with this. It has been a long time
since I tried to write a search and replace routine using HWPF and I could
not get it to work if the replacement text was longer that the search term.
In that case, HWPF threw an exception and would not allow me to complete the
process; but that problem could well have been addressed by now as it was
well known and caused by faulty bounds checking within the Range class. Only
testing will prove or disprove this for you I am afraid.

Secondly, the CharacterRun class encapsulates a piece of text with common
properties. So, imagine that we are searching for the phrase 'search term'
and that the word 'search' has been emboldened whilst the word 'term' has
been left as normal text, then my suggested approach will not work. That is
because the words search and term will be held in different CharacterRun(s).
If you do hit this problem, then I am afraid you will have to write code
that searches for the term at the Paragraph level and that identifies where
the search terms can be found and recovers the CharacterRun(s) that
encapsulate them. Once you have these, you can modify the runs or create and
substitute new ones but I have to admit that I have never tried to do this
myself. Instead I chose to automate Word using OLE and to explore the
possibilities offered by OpenOffices UNO interface. Both options did work
but threw up other problems that proved more limiting (in terms of
architecture and platform). If you can get it to work, HWPF offers the
better solution IMO.

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi all,
> 	I have a Word document, as a template: In this template there are some
> tokenized words, which have to be modified and the result has to be saved
> into another file. The original file has some properties, like header and
> footer, images, etc. The resulting file has to be the same, but with the
> modified words. I am trying it with the code below, but it does not work.
> 
> public ByteArrayOutputStream processFile(final InputStream is, final
> Map<String, String> replacementText)
>         throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
> 
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 String newTxt = range.getParagraph(i).text();
>                 String oldTxt = range.getParagraph(i).text();
>                 for (Iterator<String> it = keys.iterator(); it.hasNext();)
> {
>                     String key = it.next();
>                     if (newTxt.contains(key)) {
>                         newTxt = replacePlaceholders(key,
> replacementText.get(key), newTxt);
>                     }
>                 }
>                 if (!oldTxt.equals(newTxt)) {
>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>                 }
>             }
> 
>             // Save the document away.
>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>             document.write(bos);
>             bos.flush();
>             bos.close();
>             return bos;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
> 
> Any help, please?
> 
> Thanks in advance, Fabi.
> 
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org