You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Nick Burch <ap...@gagravarr.org> on 2021/10/23 08:27:21 UTC

Re: org.apache.poi.xwpf.usermodel.XWPFDocument Set start/end Page to extract text

On Sat, 23 Oct 2021, nskarthik wrote:
> Process : POI 4.1.2 ,jdk15 ,win10

You should consider upgrading to Apache POI 5.0 - there have been quite a 
few fixes since then, see http://poi.apache.org/changes.html#5.0.0

> Question : Extract Text only from MS-Word docx/doc from specific
> pages  ( Start Page / End Page ) defined.

Not possible. The pages aren't stored in the Word file formats. Your only 
way to know what is on each page is to render it, which we don't support

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: org.apache.poi.xwpf.usermodel.XWPFDocument Set start/end Page to extract text

Posted by nskarthik <ns...@gmail.com>.
Hi

Suggestion provided :  > Not possible. The pages aren't stored in the Word file formats. Your only 
                                    > way to know what is on each page is to render it, which we don't support

Process :  Reading a Large doc/docx file is very costly and unnecessary,

Question  :  If  Start Page / End Page is not supported, 
  then how do u plan to support new page creation while writing to a doc...
   Do u  simply dump the text to a doc/docx which aligns no of pages ???


with regards
karthik



On 2021/10/23 08:27:21, Nick Burch <ap...@gagravarr.org> wrote: 
> On Sat, 23 Oct 2021, nskarthik wrote:
> > Process : POI 4.1.2 ,jdk15 ,win10
> 
> You should consider upgrading to Apache POI 5.0 - there have been quite a 
> few fixes since then, see http://poi.apache.org/changes.html#5.0.0
> 
> > Question : Extract Text only from MS-Word docx/doc from specific
> > pages  ( Start Page / End Page ) defined.
> 
> Not possible. The pages aren't stored in the Word file formats. Your only 
> way to know what is on each page is to render it, which we don't support
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org