You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Rajasekar <sr...@gmail.com> on 2013/05/08 09:13:47 UTC

Re: Extract Text with style/type information

Hi 

It is possible to get style information from doc and docx files. Extract the
doc file using hwpf and Extract the docx file using xwpf. For example i
given the below code here.

This code for extract the docx file only

	XWPFDocument doc = null;
       doc = new XWPFDocument(new FileInputStream("File Path"));
 XWPFParagraph paragraph = null;
									
 List<XWPFParagraph> paraList=null;

 paraList=document.getParagraphs();
  Iterator<XWPFParagraph> Iterpara=null;
 Iterpara=paraList.iterator();

List<IBodyElement> ibe = document.getBodyElements();
   Iterator<IBodyElement> ibei = null;
 ibei = ibe.iterator();
  IBodyElement ibe1 = null;

 while (ibei.hasNext())
  {
	       ibe1 = ibei.next();
              BodyElementType bet = ibe1.getElementType();
		if(bet.compareTo(BodyElementType.PARAGRAPH) == 0)					
			{
				  if (Iterpara.hasNext()) 
					  {
						      paragraph = Iterpara.next();
		     //  System.out.println(paragraph.getStyleID()+" <--->  
"+paragraph.getStyle()+" <-----> "+paragraph.getText());
	   }
	    styleName=paragraph.getStyle();

System.out.println(styleName);
	     }
}

doc file means it is different to get style information .........





--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Extract-Text-with-style-type-information-tp2304876p5712654.html
Sent from the POI - User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org