You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Keith Denny <ke...@gmail.com> on 2014/09/05 04:49:52 UTC

POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Hello,

  I am attempting to use POI to support a document/template tool and am
receiving unexpected results when I am parsing through an XWPFDocument.
 Specifically, when I start reviewing each line of text, the String return
from the XWPFRun.getText() call is not the same text that is visible in the
actual document.  Here are my specific details:

*Simple Use Case*
- Create a MS Word 2010 document, i.e. Test.docx  NOTE:  Although I am
basically doing something similar to MS templates, I am not using a .dotx
file; rather, my starting point is a .docx file.
- In the document, insert a text *<<TAG>>* such as  'Dear <<CLIENT_NAME>>'
   NOTE:  In the Word document, the line of characters 'Dear
<<CLIENT_NAME>>' exists all on a single line
- The *<<TAG>>* is a placeholder that will be dynamically replaced by a
custom document management system.  In this case, there is a system entity
tag with the identifier as <<CLIENT_NAME>> and when the document is parsed,
the code will look to see if the entity tag, such as <<CLIENT_NAME>>,
exists in the document and will replace it with a real runtime value.

*Simplified Code:*
InputStream in = mContent.getBinaryStream();
String _newText;
XWPFDocument _doc = new XWPFDocument(in);
      for (XWPFParagraph p : _doc.getParagraphs()) {
             for (XWPFRun r : p.getRuns()) {
                   String text = r.getText(0);
                   if (text != null) {
                    LinkedHashMap<String, String> _entityMap =
(LinkedHashMap<String, String>)req.getSession().getAttribute("ENTITY_MAP");
                    Set<String> _entityKeys = _entityMap.keySet();
                     for (String key:_entityKeys) {
                          if (text.contains(key.trim())) {
                               _newText =
next.replace(key,_entityMap.get(key));
                                r.setText(_newText, 0);

                           }

                     }

                }

        }

 }

*Results:*
One call to r.getText(0) returns only '<<CLIENT_' ;therefore, there's no
match with the comparison check of the entity tag of <<CLIENT_NAME>>.  The
following call to r.getText(0) returns only 'NAME>>'.  Again, obviously, no
match.

Sometimes, r.getText(0) returns <<CLIENT_NAME and leaves the trailing ">>"
for the next call to r.getText(0).  Again, obviously, no match.

Sometimes, some tags do get returned by XWPFRun.getText() and the
substitution occurs as planned.

*Questions*

1. If the literal string of characters in the actual MS Word document exist
in one single line of text, why does XWPFRun.getText() return the line as
multiple sets of text characters?

2.  How do I ensure that I get the actual line, as it exists in the MS Word
document, in POI so I can inspect and replace key text?

Any help would be greatly appreciated.  Thank you in advance for your
feedback.

Sincerely,
Keith G. Denny

Re: POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Posted by "Keith G. Denny" <ke...@gmail.com>.
Good to know.  Thank you for your assistance.  Enjoy your weekend.

Respectfully, 
Keith G. Denny

Sent from my mobile device

> On Sep 5, 2014, at 2:50 PM, Nick Burch <ap...@gagravarr.org> wrote:
> 
>> On Fri, 5 Sep 2014, Keith Denny wrote:
>> With a Run, I can set the text of the acquired Run.  But, I don't see where I can reset the text of the Paragraph if I get all the Paragraph text from either getParagraphText or just getText.  It would definitely be preferred if I could do it with a setter such as setParagraphText or setText.  Am I overlooking a method like that?
> 
> I'm on a train right now, but IIRC there is a method on a xwpd paragraph that'll zap all the runs and replace it with a single new one with the given text. Check the source code and you ought to be able to find it!
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 5 Sep 2014, Keith Denny wrote:
> With a Run, I can set the text of the acquired Run.  But, I don't see 
> where I can reset the text of the Paragraph if I get all the Paragraph 
> text from either getParagraphText or just getText.  It would definitely 
> be preferred if I could do it with a setter such as setParagraphText or 
> setText.  Am I overlooking a method like that?

I'm on a train right now, but IIRC there is a method on a xwpd paragraph 
that'll zap all the runs and replace it with a single new one with the 
given text. Check the source code and you ought to be able to find it!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Posted by Keith Denny <ke...@gmail.com>.
With a Run, I can set the text of the acquired Run.  But, I don't see where
I can reset the text of the Paragraph if I get all the Paragraph text from
either getParagraphText or just getText.   It would definitely be preferred
if I could do it with a setter such as setParagraphText or setText.  Am I
overlooking a method like that?


On Fri, Sep 5, 2014 at 2:20 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Fri, 5 Sep 2014, Keith Denny wrote:
>
>> Thank you for confirming the functionality.  Basically, I'll have to
>> assemble the Paragraph lines from all the Runs and then inspect the
>> assembled Paragraph full text for my translation/substitution routine.
>>
>
> The paragraph object itself can give you the overall paragraph text, why
> not use that?
>
>  In essence, I think I will have to remove all the Runs after assembling
>> them at runtime, translate/make substitutions, and then add a single Run
>> back to the Paragraph with the whole text that was assembled.  Is there a
>> limit to the size of a given Run?
>>
>
> Nope, runs are created by word to handle adjacent blocks of text that need
> different formatting, and sometimes when it thinks there's a risk that they
> might later / might once have... If the paragraph is supposed to all be the
> same, there's something to be said for squashing it down to just one run,
> then modifying the text in that!
>
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 5 Sep 2014, Keith Denny wrote:
> Thank you for confirming the functionality.  Basically, I'll have to 
> assemble the Paragraph lines from all the Runs and then inspect the 
> assembled Paragraph full text for my translation/substitution routine.

The paragraph object itself can give you the overall paragraph text, why 
not use that?

> In essence, I think I will have to remove all the Runs after assembling
> them at runtime, translate/make substitutions, and then add a single Run
> back to the Paragraph with the whole text that was assembled.  Is there a
> limit to the size of a given Run?

Nope, runs are created by word to handle adjacent blocks of text that need 
different formatting, and sometimes when it thinks there's a risk that 
they might later / might once have... If the paragraph is supposed to all 
be the same, there's something to be said for squashing it down to just 
one run, then modifying the text in that!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Posted by Keith Denny <ke...@gmail.com>.
Nick,

Thank you for confirming the functionality.  Basically, I'll have to
assemble the Paragraph lines from all the Runs and then inspect the
assembled Paragraph full text for my translation/substitution routine.

In essence, I think I will have to remove all the Runs after assembling
them at runtime, translate/make substitutions, and then add a single Run
back to the Paragraph with the whole text that was assembled.  Is there a
limit to the size of a given Run?

Thanks,
Keith


On Fri, Sep 5, 2014 at 6:14 AM, Nick Burch <ap...@gagravarr.org> wrote:

> On Thu, 4 Sep 2014, Keith Denny wrote:
>
>> *Results:*
>> One call to r.getText(0) returns only '<<CLIENT_' ;therefore, there's no
>> match with the comparison check of the entity tag of <<CLIENT_NAME>>.  The
>> following call to r.getText(0) returns only 'NAME>>'.  Again, obviously,
>> no
>> match.
>>
>
> This is normal. That's just how the word file format works. A given run
> contains text that is all styled the same. A paragraph is made up of
> possibly multiple runs, each run having text of the same style, each
> subsequent run may or may not have a different style
>
> All depends on the history of the file, and what mood Word was in when
> creating it
>
>  2.  How do I ensure that I get the actual line, as it exists in the MS
>> Word
>> document, in POI so I can inspect and replace key text?
>>
>
> Fetch the text at the paragraph level, then work out which run(s) to
> change within that taking account that a given bit of text could well be
> across multiple runs
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Re: POI 3.10.1 XWPFRun getText() Does Not Return Full Line of Text

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 4 Sep 2014, Keith Denny wrote:
> *Results:*
> One call to r.getText(0) returns only '<<CLIENT_' ;therefore, there's no
> match with the comparison check of the entity tag of <<CLIENT_NAME>>.  The
> following call to r.getText(0) returns only 'NAME>>'.  Again, obviously, no
> match.

This is normal. That's just how the word file format works. A given run 
contains text that is all styled the same. A paragraph is made up of 
possibly multiple runs, each run having text of the same style, each 
subsequent run may or may not have a different style

All depends on the history of the file, and what mood Word was in when 
creating it

> 2.  How do I ensure that I get the actual line, as it exists in the MS Word
> document, in POI so I can inspect and replace key text?

Fetch the text at the paragraph level, then work out which run(s) to 
change within that taking account that a given bit of text could well be 
across multiple runs

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org