You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Поникаровский Константин <um...@unilib.neva.ru> on 2010/04/21 14:34:49 UTC
Too many spaces in some documents
When I read whole document (.net code):
///////////////////////////////////////////////
java.io.StringWriter w = new java.io.StringWriter();
StreamWriter sw = new StreamWriter(fs);
PDFTextStripper stripper = new PDFTextStripper("UTF-8");
stripper.setSortByPosition(false);
stripper.setStartPage(0);
stripper.setEndPage(1000);
sw.Write(w.toString());
//////////////////////////////////////////
I see to many spaces inside and between words.
But if I read one page after another:
////////////////////////////////////////////////
for (int k = 1; k <= document.getPageCount(); ++k)
{
stripper.setStartPage(k);
stripper.setEndPage(k);
stripper.writeText(document, w);
}
///////////////////////////////////////////////////
thare are no additional spaces.
It appears since pdf_box version 1.0 in some pdf-files.
Can you explain me why?
Thanks.
Konstantin