You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Tilman Hausherr <TH...@t-online.de> on 2015/07/01 21:22:26 UTC

Re: PDFBox 1.8.10 release

Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
> Hi,
>
> there are again a number of solved issues and I'm thinking about a new
> bugfix release. How about a new one next week, maybe later if someone
> wants to get some addtional things done before?

I have only one thing I'd like to test, with Tim Allison, before a 
release: there's a line in PDTextStripper

if ((wordSpacing == 0) || (wordSpacing == Float.NaN))

however wordSpacing == Float.NaN is always false. So I'd like to find 
out if there is any difference in using what the developer probably 
intended, which is

if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))

(BCC to Tim)

Tilman

Re: PDFBox 1.8.10 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 09.07.2015 um 18:40 schrieb Tilman Hausherr:
>>
>>
>> This table will likely be wrecked, but let me know if you’d like me 
>> to post it somewhere:
> Thanks, I think I get it. I can identify the files from what you posted.


I checked these files, I can't find any difference in ExtractText, so 
I'll do the change. Thanks for your help!

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 1.8.10 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 09.07.2015 um 18:25 schrieb Allison, Timothy B.:
> 9 files out of ~240k pdfs in govdocs1 had very, very minor differences.  None of the differences were actual words.
>
>
>
> This table will likely be wrecked, but let me know if you’d like me to post it somewhere:
Thanks, I think I get it. I can identify the files from what you posted.

Tilman

>
> FILE_PATH
>
> TOKEN_COUNT_A
>
> TOKEN_COUNT_B
>
> UNIQUE_TOKEN_COUNT_A
>
> UNIQUE_TOKEN_COUNT_B
>
> TOP_N_WORDS_A
>
> TOP_B_WORDS_B
>
> TOP_10_UNIQUE_TOKEN_DIFFS_A
>
> TOP_10_UNIQUE_TOKEN_DIFFS_B
>
> TOP_10_MORE_IN_A
>
> TOP_10_MORE_IN_B
>
> DICE_COEFFICIENT
>
> OVERLAP
>
> 095/095028.pdf
>
> 99708
>
> 99880
>
> 8216
>
> 8244
>
> the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669
>
> the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669
>
> bc: 6 | cb: 5 | bm: 3 | ied: 2 | ec: 2 | gi: 1 | fg: 1 | fd: 1 | edbb: 1 | bd: 1
>
> c: 18 | d: 18 | b: 17 | f: 13 | de: 11 | h: 8 | bc: 6 | e: 6 | cb: 5 | m: 5
>
> 0.998299
>
> 0.999138
>
> 167/167852.pdf
>
> 38313
>
> 39154
>
> 6035
>
> 6101
>
> wkh: 2000 | ri: 1201 | dqg: 1091 | wr: 907 | d: 776 | lq: 582 | lv: 531 | iru: 494 | h: 411 | 6: 378
>
> wkh: 2035 | ri: 1221 | dqg: 1115 | wr: 922 | d: 792 | lq: 589 | lv: 539 | iru: 509 | h: 417 | 6: 385
>
> dpswrq: 2 | 2uelwlqj: 2 | prghudwh: 1 | odfwlf: 1 | lqiudvwuxfwxuhv: 1 | lplw: 1 | hqdeohv: 1 | gurvskhuh: 1 | 526: 1 | 3krwrphwu: 1
>
> wkh: 35 | dqg: 24 | ri: 20 | d: 16 | iru: 15 | wr: 15 | eh: 12 | plvvlrqv: 12 | 0lfur0dsv: 11 | odxqfk: 11
>
> 0.994562
>
> 0.989144
>
> 552/552762.pdf
>
> 157799
>
> 157798
>
> 8156
>
> 8156
>
> the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759
>
> the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759
>
> s: 1
>
> 1
>
> 0.999997
>
> 575/575190.pdf
>
> 1127
>
> 1128
>
> 260
>
> 261
>
> y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19
>
> y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19
>
> æ: 1
>
> æ: 1
>
> 0.998081
>
> 0.999557
>
> 660/660406.pdf
>
> 2434
>
> 2437
>
> 1084
>
> 1085
>
> the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18
>
> the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18
>
> zat: 1
>
> at: 1
>
> z: 3 | zat: 1
>
> 0.999539
>
> 0.998973
>
> 660/660684.pdf
>
> 21803
>
> 21776
>
> 2268
>
> 2268
>
> the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216
>
> the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216
>
> 9:27
>
> 1
>
> 0.99938
>
> 729/729805.pdf
>
> 11261
>
> 11266
>
> 1866
>
> 1866
>
> the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120
>
> the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120
>
> z: 5
>
> 1
>
> 0.999778
>
> 792/792201.pdf
>
> 1268
>
> 1265
>
> 255
>
> 254
>
> 05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24
>
> 05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24
>
> r: 3
>
> r: 3
>
> 0.998035
>
> 0.998816
>
> 999/999419.pdf
>
> 18917
>
> 18917
>
> 1291
>
> 1290
>
> 0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408
>
> 0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408
>
> í9,150: 1 | í8,600: 1 | í13,200: 1
>
> 9,150: 1 | 8,600: 1
>
> í13,200: 1 | í8,600: 1 | í9,150: 1
>
> 13,200: 1 | 8,600: 1 | 9,150: 1
>
> 0.998063
>
> 0.999841
>
>
>
>
>
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Wednesday, July 08, 2015 7:58 AM
> To: dev@pdfbox.apache.org
> Subject: RE: PDFBox 1.8.10 release
>
>
>
> Done and launched.
>
>
>
> -----Original Message-----
>
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>
> Sent: Wednesday, July 08, 2015 3:00 AM
>
> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
> Subject: Re: PDFBox 1.8.10 release
>
>
>
> Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
>
>> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in.  I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>> Should I make the change locally or do you plan to commit?
>
>
> Locally would be best, as it is really just 1 line, and I haven't
>
> created an issue yet.
>
>
>
> Tilman
>
>
>
>> Thank you!
>> -----Original Message-----
>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>> Sent: Tuesday, July 07, 2015 3:59 PM
>> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> Subject: Re: PDFBox 1.8.10 release
>> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...
>> This sounds like work. Isn't all that is needed to run a version before
>> the change, one after the change, and display the differences as a table
>> like you already do?
>> Tilman
>>> Thank you, Tilman, for pinging me. :)
>>> -----Original Message-----
>>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>>> Sent: Thursday, July 02, 2015 2:24 AM
>>> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> Subject: Re: PDFBox 1.8.10 release
>>> Hi,
>>>> Tilman Hausherr <TH...@t-online.de>> hat am 1. Juli 2015 um 21:22
>>>> geschrieben:
>>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>>> Hi,
>>>>> there are again a number of solved issues and I'm thinking about a new
>>>>> bugfix release. How about a new one next week, maybe later if someone
>>>>> wants to get some addtional things done before?
>>>> I have only one thing I'd like to test, with Tim Allison, before a
>>>> release: there's a line in PDTextStripper
>>> I'm not in a hurry ...
>>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>>> out if there is any difference in using what the developer probably
>>>> intended, which is
>>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>> (BCC to Tim)
>>>> Tilman
>>> BR
>>> Andreas
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: PDFBox 1.8.10 release

Posted by "Allison, Timothy B." <ta...@mitre.org>.
9 files out of ~240k pdfs in govdocs1 had very, very minor differences.  None of the differences were actual words.



This table will likely be wrecked, but let me know if you’d like me to post it somewhere:


FILE_PATH

TOKEN_COUNT_A

TOKEN_COUNT_B

UNIQUE_TOKEN_COUNT_A

UNIQUE_TOKEN_COUNT_B

TOP_N_WORDS_A

TOP_B_WORDS_B

TOP_10_UNIQUE_TOKEN_DIFFS_A

TOP_10_UNIQUE_TOKEN_DIFFS_B

TOP_10_MORE_IN_A

TOP_10_MORE_IN_B

DICE_COEFFICIENT

OVERLAP

095/095028.pdf

99708

99880

8216

8244

the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669

the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669

bc: 6 | cb: 5 | bm: 3 | ied: 2 | ec: 2 | gi: 1 | fg: 1 | fd: 1 | edbb: 1 | bd: 1

c: 18 | d: 18 | b: 17 | f: 13 | de: 11 | h: 8 | bc: 6 | e: 6 | cb: 5 | m: 5

0.998299

0.999138

167/167852.pdf

38313

39154

6035

6101

wkh: 2000 | ri: 1201 | dqg: 1091 | wr: 907 | d: 776 | lq: 582 | lv: 531 | iru: 494 | h: 411 | 6: 378

wkh: 2035 | ri: 1221 | dqg: 1115 | wr: 922 | d: 792 | lq: 589 | lv: 539 | iru: 509 | h: 417 | 6: 385

dpswrq: 2 | 2uelwlqj: 2 | prghudwh: 1 | odfwlf: 1 | lqiudvwuxfwxuhv: 1 | lplw: 1 | hqdeohv: 1 | gurvskhuh: 1 | 526: 1 | 3krwrphwu: 1

wkh: 35 | dqg: 24 | ri: 20 | d: 16 | iru: 15 | wr: 15 | eh: 12 | plvvlrqv: 12 | 0lfur0dsv: 11 | odxqfk: 11

0.994562

0.989144

552/552762.pdf

157799

157798

8156

8156

the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759

the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759

s: 1

1

0.999997

575/575190.pdf

1127

1128

260

261

y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19

y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19

æ: 1

æ: 1

0.998081

0.999557

660/660406.pdf

2434

2437

1084

1085

the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18

the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18

zat: 1

at: 1

z: 3 | zat: 1

0.999539

0.998973

660/660684.pdf

21803

21776

2268

2268

the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216

the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216

9:27

1

0.99938

729/729805.pdf

11261

11266

1866

1866

the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120

the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120

z: 5

1

0.999778

792/792201.pdf

1268

1265

255

254

05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24

05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24

r: 3

r: 3

0.998035

0.998816

999/999419.pdf

18917

18917

1291

1290

0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408

0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408

í9,150: 1 | í8,600: 1 | í13,200: 1

9,150: 1 | 8,600: 1

í13,200: 1 | í8,600: 1 | í9,150: 1

13,200: 1 | 8,600: 1 | 9,150: 1

0.998063

0.999841






-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Wednesday, July 08, 2015 7:58 AM
To: dev@pdfbox.apache.org
Subject: RE: PDFBox 1.8.10 release



Done and launched.



-----Original Message-----

From: Tilman Hausherr [mailto:THausherr@t-online.de]

Sent: Wednesday, July 08, 2015 3:00 AM

To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>

Subject: Re: PDFBox 1.8.10 release



Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:

> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in.  I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.

>

> Should I make the change locally or do you plan to commit?



Locally would be best, as it is really just 1 line, and I haven't

created an issue yet.



Tilman



>

> Thank you!

>

> -----Original Message-----

> From: Tilman Hausherr [mailto:THausherr@t-online.de]

> Sent: Tuesday, July 07, 2015 3:59 PM

> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>

> Subject: Re: PDFBox 1.8.10 release

>

> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:

>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...

> This sounds like work. Isn't all that is needed to run a version before

> the change, one after the change, and display the differences as a table

> like you already do?

>

> Tilman

>

>> Thank you, Tilman, for pinging me. :)

>>

>> -----Original Message-----

>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]

>> Sent: Thursday, July 02, 2015 2:24 AM

>> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>

>> Subject: Re: PDFBox 1.8.10 release

>>

>> Hi,

>>

>>> Tilman Hausherr <TH...@t-online.de>> hat am 1. Juli 2015 um 21:22

>>> geschrieben:

>>>

>>>

>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:

>>>> Hi,

>>>>

>>>> there are again a number of solved issues and I'm thinking about a new

>>>> bugfix release. How about a new one next week, maybe later if someone

>>>> wants to get some addtional things done before?

>>> I have only one thing I'd like to test, with Tim Allison, before a

>>> release: there's a line in PDTextStripper

>> I'm not in a hurry ...

>>

>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))

>>>

>>> however wordSpacing == Float.NaN is always false. So I'd like to find

>>> out if there is any difference in using what the developer probably

>>> intended, which is

>>>

>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))

>>>

>>> (BCC to Tim)

>>>

>>> Tilman

>> BR

>> Andreas

>>

>> ---------------------------------------------------------------------

>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>

>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>

>>

>>

>> ---------------------------------------------------------------------

>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>

>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>

>>

>

> ---------------------------------------------------------------------

> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>

> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>

>

>

> ---------------------------------------------------------------------

> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>

> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>

>





---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>

For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>





---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>

For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>



RE: PDFBox 1.8.10 release

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Done and launched.

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Wednesday, July 08, 2015 3:00 AM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.10 release

Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in.  I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>
> Should I make the change locally or do you plan to commit?

Locally would be best, as it is really just 1 line, and I haven't 
created an issue yet.

Tilman

>
> Thank you!
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, July 07, 2015 3:59 PM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...
> This sounds like work. Isn't all that is needed to run a version before
> the change, one after the change, and display the differences as a table
> like you already do?
>
> Tilman
>
>> Thank you, Tilman, for pinging me. :)
>>
>> -----Original Message-----
>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>> Sent: Thursday, July 02, 2015 2:24 AM
>> To: dev@pdfbox.apache.org
>> Subject: Re: PDFBox 1.8.10 release
>>
>> Hi,
>>
>>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>>> geschrieben:
>>>
>>>
>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>> Hi,
>>>>
>>>> there are again a number of solved issues and I'm thinking about a new
>>>> bugfix release. How about a new one next week, maybe later if someone
>>>> wants to get some addtional things done before?
>>> I have only one thing I'd like to test, with Tim Allison, before a
>>> release: there's a line in PDTextStripper
>> I'm not in a hurry ...
>>
>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>
>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>> out if there is any difference in using what the developer probably
>>> intended, which is
>>>
>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>
>>> (BCC to Tim)
>>>
>>> Tilman
>> BR
>> Andreas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 1.8.10 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in.  I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>
> Should I make the change locally or do you plan to commit?

Locally would be best, as it is really just 1 line, and I haven't 
created an issue yet.

Tilman

>
> Thank you!
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, July 07, 2015 3:59 PM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...
> This sounds like work. Isn't all that is needed to run a version before
> the change, one after the change, and display the differences as a table
> like you already do?
>
> Tilman
>
>> Thank you, Tilman, for pinging me. :)
>>
>> -----Original Message-----
>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>> Sent: Thursday, July 02, 2015 2:24 AM
>> To: dev@pdfbox.apache.org
>> Subject: Re: PDFBox 1.8.10 release
>>
>> Hi,
>>
>>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>>> geschrieben:
>>>
>>>
>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>> Hi,
>>>>
>>>> there are again a number of solved issues and I'm thinking about a new
>>>> bugfix release. How about a new one next week, maybe later if someone
>>>> wants to get some addtional things done before?
>>> I have only one thing I'd like to test, with Tim Allison, before a
>>> release: there's a line in PDTextStripper
>> I'm not in a hurry ...
>>
>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>
>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>> out if there is any difference in using what the developer probably
>>> intended, which is
>>>
>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>
>>> (BCC to Tim)
>>>
>>> Tilman
>> BR
>> Andreas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: PDFBox 1.8.10 release

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in.  I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.

Should I make the change locally or do you plan to commit?

Thank you!

-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de] 
Sent: Tuesday, July 07, 2015 3:59 PM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.10 release

Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...

This sounds like work. Isn't all that is needed to run a version before 
the change, one after the change, and display the differences as a table 
like you already do?

Tilman

>
> Thank you, Tilman, for pinging me. :)
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Thursday, July 02, 2015 2:24 AM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Hi,
>
>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>> geschrieben:
>>
>>
>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>> Hi,
>>>
>>> there are again a number of solved issues and I'm thinking about a new
>>> bugfix release. How about a new one next week, maybe later if someone
>>> wants to get some addtional things done before?
>> I have only one thing I'd like to test, with Tim Allison, before a
>> release: there's a line in PDTextStripper
> I'm not in a hurry ...
>
>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>
>> however wordSpacing == Float.NaN is always false. So I'd like to find
>> out if there is any difference in using what the developer probably
>> intended, which is
>>
>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>
>> (BCC to Tim)
>>
>> Tilman
>
> BR
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 1.8.10 release

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...

This sounds like work. Isn't all that is needed to run a version before 
the change, one after the change, and display the differences as a table 
like you already do?

Tilman

>
> Thank you, Tilman, for pinging me. :)
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Thursday, July 02, 2015 2:24 AM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Hi,
>
>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>> geschrieben:
>>
>>
>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>> Hi,
>>>
>>> there are again a number of solved issues and I'm thinking about a new
>>> bugfix release. How about a new one next week, maybe later if someone
>>> wants to get some addtional things done before?
>> I have only one thing I'd like to test, with Tim Allison, before a
>> release: there's a line in PDTextStripper
> I'm not in a hurry ...
>
>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>
>> however wordSpacing == Float.NaN is always false. So I'd like to find
>> out if there is any difference in using what the developer probably
>> intended, which is
>>
>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>
>> (BCC to Tim)
>>
>> Tilman
>
> BR
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


RE: PDFBox 1.8.10 release

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now.  Results in a few days...

Thank you, Tilman, for pinging me. :)

-----Original Message-----
From: Andreas Lehmkühler [mailto:andreas@lehmi.de] 
Sent: Thursday, July 02, 2015 2:24 AM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.10 release

Hi,

> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
> geschrieben:
> 
> 
> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
> > Hi,
> >
> > there are again a number of solved issues and I'm thinking about a new
> > bugfix release. How about a new one next week, maybe later if someone
> > wants to get some addtional things done before?
> 
> I have only one thing I'd like to test, with Tim Allison, before a 
> release: there's a line in PDTextStripper
I'm not in a hurry ... 

> 
> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
> 
> however wordSpacing == Float.NaN is always false. So I'd like to find 
> out if there is any difference in using what the developer probably 
> intended, which is
> 
> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
> 
> (BCC to Tim)
> 
> Tilman


BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Re: PDFBox 1.8.10 release

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,

> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
> geschrieben:
> 
> 
> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
> > Hi,
> >
> > there are again a number of solved issues and I'm thinking about a new
> > bugfix release. How about a new one next week, maybe later if someone
> > wants to get some addtional things done before?
> 
> I have only one thing I'd like to test, with Tim Allison, before a 
> release: there's a line in PDTextStripper
I'm not in a hurry ... 

> 
> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
> 
> however wordSpacing == Float.NaN is always false. So I'd like to find 
> out if there is any difference in using what the developer probably 
> intended, which is
> 
> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
> 
> (BCC to Tim)
> 
> Tilman


BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org