You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jérôme Wacongne <ch...@c4-soft.com> on 2011/03/04 11:24:56 UTC

pdfbox 1.5.0 regressions ?

Hi,

I have tried to move from 1.4.0 to 1.5.0 but have noticed unexpected
changes:
- looks much slower
- some line breaks are not present any more (see "1. Introduction" in
provided example)
- some characters are not extracted corectly anymore (see © copyrigth char
on first line in provided example). But this might be me misconfiguring the
charset when calling the 'parse(charset)' method

Best regards,
ch4mp

p.s.
Source pdf is too big according this users mailing list policy. So request
it if needed for testing ;)

JIRA problem

Posted by Thomas Fischer <fi...@aon.at>.
Hello,

I have serious problems with the upload procedure at JIRA.
I suppose it is now some kind of JavaScript driving the upload process, in any case, after choosing the file to upload in a standard choice window, the JavaScript runs for ages and sometimes unsuccessfully, no idea what it's doing.
One file was accepted after I changed the name from "Test2-1.4.txt" to "Test2.1.4.txt, but it is choking now for five minutes on Test2.pdf, eventually giving "Unknown error occurred uploading file."
(Mac OS X 10.6.6, Safari 5.0.3)
A second try succeeded though. What's going on?

Cheers
Thomas


Re: pdfbox 1.5.0 regressions ?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 04.03.2011 17:07, schrieb Thomas Fischer:
> Hello,
>
> first tests with 1.5 showed that it lost the ability to recognise the kind of ligatures created with TeX:
>
>    1.4		 1.5
> official	ocial
> effort 	e ort
> fields	elds
> first	rst

Please create an issue on JIRA and attach a sample pdf if possible.

> Am 04.03.2011 um 11:24 schrieb Jérôme Wacongne:
>
>> Hi,
>>
>> I have tried to move from 1.4.0 to 1.5.0 but have noticed unexpected changes:
>> - looks much slower
>> - some line breaks are not present any more (see "1. Introduction" in provided example)
>> - some characters are not extracted corectly anymore (see © copyrigth char on first line in provided example). But this might be me misconfiguring the charset when calling the 'parse(charset)' method
>>
>> Best regards,
>> ch4mp
>>
>> p.s.
>> Source pdf is too big according this users mailing list policy. So request it if needed for testing ;)
>> <1_5.txt><1_4.txt>

BR
Andreas Lehmkühler


Re: pdfbox 1.5.0 regressions ?

Posted by Thomas Fischer <fi...@aon.at>.
Hello,

first tests with 1.5 showed that it lost the ability to recognise the kind of ligatures created with TeX:

  1.4		 1.5
official	ocial
effort 	e ort
fields	elds
first	rst

Best
Thomas


Am 04.03.2011 um 11:24 schrieb Jérôme Wacongne:

> Hi,
> 
> I have tried to move from 1.4.0 to 1.5.0 but have noticed unexpected changes:
> - looks much slower
> - some line breaks are not present any more (see "1. Introduction" in provided example)
> - some characters are not extracted corectly anymore (see © copyrigth char on first line in provided example). But this might be me misconfiguring the charset when calling the 'parse(charset)' method
> 
> Best regards,
> ch4mp
> 
> p.s.
> Source pdf is too big according this users mailing list policy. So request it if needed for testing ;)
> <1_5.txt><1_4.txt>


Re: pdfbox 1.5.0 regressions ?

Posted by Jukka Zitting <jz...@adobe.com>.
Hi,

On 03/04/2011 11:24 AM, Jérôme Wacongne wrote:
> I have tried to move from 1.4.0 to 1.5.0 but have noticed unexpected
> changes:
> - looks much slower
> - some line breaks are not present any more (see "1. Introduction" in
> provided example)
> - some characters are not extracted corectly anymore (see © copyrigth
> char on first line in provided example). But this might be me
> misconfiguring the charset when calling the 'parse(charset)' method

Can you please file bug reports about these problems in the issue 
tracker at https://issues.apache.org/jira/browse/PDFBOX? You can also 
attach example PDFs there.

-- 
Jukka Zitting