You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Поникаровский Константин <um...@unilib.neva.ru> on 2010/04/21 14:34:49 UTC

Too many spaces in some documents

When I read whole document (.net code):

///////////////////////////////////////////////

java.io.StringWriter w = new java.io.StringWriter();

StreamWriter sw = new StreamWriter(fs);

PDFTextStripper stripper = new PDFTextStripper("UTF-8");

stripper.setSortByPosition(false);

stripper.setStartPage(0);

stripper.setEndPage(1000);

sw.Write(w.toString());

//////////////////////////////////////////

I see to many spaces inside and between words.



But if I read one page after another:

////////////////////////////////////////////////

for (int k = 1; k <= document.getPageCount(); ++k)

{

  stripper.setStartPage(k);

  stripper.setEndPage(k);

  stripper.writeText(document, w); 

}

///////////////////////////////////////////////////

thare are no additional spaces. 

It appears since pdf_box version 1.0 in some pdf-files.



Can you explain me why?

Thanks.

 Konstantin