You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Rajasekar <sr...@gmail.com> on 2013/05/08 09:13:47 UTC
Re: Extract Text with style/type information
Hi
It is possible to get style information from doc and docx files. Extract the
doc file using hwpf and Extract the docx file using xwpf. For example i
given the below code here.
This code for extract the docx file only
XWPFDocument doc = null;
doc = new XWPFDocument(new FileInputStream("File Path"));
XWPFParagraph paragraph = null;
List<XWPFParagraph> paraList=null;
paraList=document.getParagraphs();
Iterator<XWPFParagraph> Iterpara=null;
Iterpara=paraList.iterator();
List<IBodyElement> ibe = document.getBodyElements();
Iterator<IBodyElement> ibei = null;
ibei = ibe.iterator();
IBodyElement ibe1 = null;
while (ibei.hasNext())
{
ibe1 = ibei.next();
BodyElementType bet = ibe1.getElementType();
if(bet.compareTo(BodyElementType.PARAGRAPH) == 0)
{
if (Iterpara.hasNext())
{
paragraph = Iterpara.next();
// System.out.println(paragraph.getStyleID()+" <--->
"+paragraph.getStyle()+" <-----> "+paragraph.getText());
}
styleName=paragraph.getStyle();
System.out.println(styleName);
}
}
doc file means it is different to get style information .........
--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Extract-Text-with-style-type-information-tp2304876p5712654.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org