You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Tilman Hausherr <TH...@t-online.de> on 2015/07/01 21:22:26 UTC
Re: PDFBox 1.8.10 release
Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
> Hi,
>
> there are again a number of solved issues and I'm thinking about a new
> bugfix release. How about a new one next week, maybe later if someone
> wants to get some addtional things done before?
I have only one thing I'd like to test, with Tim Allison, before a
release: there's a line in PDTextStripper
if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
however wordSpacing == Float.NaN is always false. So I'd like to find
out if there is any difference in using what the developer probably
intended, which is
if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
(BCC to Tim)
Tilman
Re: PDFBox 1.8.10 release
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 09.07.2015 um 18:40 schrieb Tilman Hausherr:
>>
>>
>> This table will likely be wrecked, but let me know if you’d like me
>> to post it somewhere:
> Thanks, I think I get it. I can identify the files from what you posted.
I checked these files, I can't find any difference in ExtractText, so
I'll do the change. Thanks for your help!
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: PDFBox 1.8.10 release
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 09.07.2015 um 18:25 schrieb Allison, Timothy B.:
> 9 files out of ~240k pdfs in govdocs1 had very, very minor differences. None of the differences were actual words.
>
>
>
> This table will likely be wrecked, but let me know if you’d like me to post it somewhere:
Thanks, I think I get it. I can identify the files from what you posted.
Tilman
>
> FILE_PATH
>
> TOKEN_COUNT_A
>
> TOKEN_COUNT_B
>
> UNIQUE_TOKEN_COUNT_A
>
> UNIQUE_TOKEN_COUNT_B
>
> TOP_N_WORDS_A
>
> TOP_B_WORDS_B
>
> TOP_10_UNIQUE_TOKEN_DIFFS_A
>
> TOP_10_UNIQUE_TOKEN_DIFFS_B
>
> TOP_10_MORE_IN_A
>
> TOP_10_MORE_IN_B
>
> DICE_COEFFICIENT
>
> OVERLAP
>
> 095/095028.pdf
>
> 99708
>
> 99880
>
> 8216
>
> 8244
>
> the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669
>
> the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669
>
> bc: 6 | cb: 5 | bm: 3 | ied: 2 | ec: 2 | gi: 1 | fg: 1 | fd: 1 | edbb: 1 | bd: 1
>
> c: 18 | d: 18 | b: 17 | f: 13 | de: 11 | h: 8 | bc: 6 | e: 6 | cb: 5 | m: 5
>
> 0.998299
>
> 0.999138
>
> 167/167852.pdf
>
> 38313
>
> 39154
>
> 6035
>
> 6101
>
> wkh: 2000 | ri: 1201 | dqg: 1091 | wr: 907 | d: 776 | lq: 582 | lv: 531 | iru: 494 | h: 411 | 6: 378
>
> wkh: 2035 | ri: 1221 | dqg: 1115 | wr: 922 | d: 792 | lq: 589 | lv: 539 | iru: 509 | h: 417 | 6: 385
>
> dpswrq: 2 | 2uelwlqj: 2 | prghudwh: 1 | odfwlf: 1 | lqiudvwuxfwxuhv: 1 | lplw: 1 | hqdeohv: 1 | gurvskhuh: 1 | 526: 1 | 3krwrphwu: 1
>
> wkh: 35 | dqg: 24 | ri: 20 | d: 16 | iru: 15 | wr: 15 | eh: 12 | plvvlrqv: 12 | 0lfur0dsv: 11 | odxqfk: 11
>
> 0.994562
>
> 0.989144
>
> 552/552762.pdf
>
> 157799
>
> 157798
>
> 8156
>
> 8156
>
> the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759
>
> the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759
>
> s: 1
>
> 1
>
> 0.999997
>
> 575/575190.pdf
>
> 1127
>
> 1128
>
> 260
>
> 261
>
> y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19
>
> y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19
>
> æ: 1
>
> æ: 1
>
> 0.998081
>
> 0.999557
>
> 660/660406.pdf
>
> 2434
>
> 2437
>
> 1084
>
> 1085
>
> the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18
>
> the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18
>
> zat: 1
>
> at: 1
>
> z: 3 | zat: 1
>
> 0.999539
>
> 0.998973
>
> 660/660684.pdf
>
> 21803
>
> 21776
>
> 2268
>
> 2268
>
> the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216
>
> the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216
>
> 9:27
>
> 1
>
> 0.99938
>
> 729/729805.pdf
>
> 11261
>
> 11266
>
> 1866
>
> 1866
>
> the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120
>
> the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120
>
> z: 5
>
> 1
>
> 0.999778
>
> 792/792201.pdf
>
> 1268
>
> 1265
>
> 255
>
> 254
>
> 05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24
>
> 05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24
>
> r: 3
>
> r: 3
>
> 0.998035
>
> 0.998816
>
> 999/999419.pdf
>
> 18917
>
> 18917
>
> 1291
>
> 1290
>
> 0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408
>
> 0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408
>
> í9,150: 1 | í8,600: 1 | í13,200: 1
>
> 9,150: 1 | 8,600: 1
>
> í13,200: 1 | í8,600: 1 | í9,150: 1
>
> 13,200: 1 | 8,600: 1 | 9,150: 1
>
> 0.998063
>
> 0.999841
>
>
>
>
>
>
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Wednesday, July 08, 2015 7:58 AM
> To: dev@pdfbox.apache.org
> Subject: RE: PDFBox 1.8.10 release
>
>
>
> Done and launched.
>
>
>
> -----Original Message-----
>
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>
> Sent: Wednesday, July 08, 2015 3:00 AM
>
> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
> Subject: Re: PDFBox 1.8.10 release
>
>
>
> Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
>
>> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in. I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>> Should I make the change locally or do you plan to commit?
>
>
> Locally would be best, as it is really just 1 line, and I haven't
>
> created an issue yet.
>
>
>
> Tilman
>
>
>
>> Thank you!
>> -----Original Message-----
>> From: Tilman Hausherr [mailto:THausherr@t-online.de]
>> Sent: Tuesday, July 07, 2015 3:59 PM
>> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> Subject: Re: PDFBox 1.8.10 release
>> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
>> This sounds like work. Isn't all that is needed to run a version before
>> the change, one after the change, and display the differences as a table
>> like you already do?
>> Tilman
>>> Thank you, Tilman, for pinging me. :)
>>> -----Original Message-----
>>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>>> Sent: Thursday, July 02, 2015 2:24 AM
>>> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> Subject: Re: PDFBox 1.8.10 release
>>> Hi,
>>>> Tilman Hausherr <TH...@t-online.de>> hat am 1. Juli 2015 um 21:22
>>>> geschrieben:
>>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>>> Hi,
>>>>> there are again a number of solved issues and I'm thinking about a new
>>>>> bugfix release. How about a new one next week, maybe later if someone
>>>>> wants to get some addtional things done before?
>>>> I have only one thing I'd like to test, with Tim Allison, before a
>>>> release: there's a line in PDTextStripper
>>> I'm not in a hurry ...
>>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>>> out if there is any difference in using what the developer probably
>>>> intended, which is
>>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>> (BCC to Tim)
>>>> Tilman
>>> BR
>>> Andreas
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
RE: PDFBox 1.8.10 release
Posted by "Allison, Timothy B." <ta...@mitre.org>.
9 files out of ~240k pdfs in govdocs1 had very, very minor differences. None of the differences were actual words.
This table will likely be wrecked, but let me know if you’d like me to post it somewhere:
FILE_PATH
TOKEN_COUNT_A
TOKEN_COUNT_B
UNIQUE_TOKEN_COUNT_A
UNIQUE_TOKEN_COUNT_B
TOP_N_WORDS_A
TOP_B_WORDS_B
TOP_10_UNIQUE_TOKEN_DIFFS_A
TOP_10_UNIQUE_TOKEN_DIFFS_B
TOP_10_MORE_IN_A
TOP_10_MORE_IN_B
DICE_COEFFICIENT
OVERLAP
095/095028.pdf
99708
99880
8216
8244
the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669
the: 6621 | and: 4111 | of: 3361 | in: 2470 | to: 1792 | a: 1414 | are: 981 | is: 863 | for: 849 | area: 669
bc: 6 | cb: 5 | bm: 3 | ied: 2 | ec: 2 | gi: 1 | fg: 1 | fd: 1 | edbb: 1 | bd: 1
c: 18 | d: 18 | b: 17 | f: 13 | de: 11 | h: 8 | bc: 6 | e: 6 | cb: 5 | m: 5
0.998299
0.999138
167/167852.pdf
38313
39154
6035
6101
wkh: 2000 | ri: 1201 | dqg: 1091 | wr: 907 | d: 776 | lq: 582 | lv: 531 | iru: 494 | h: 411 | 6: 378
wkh: 2035 | ri: 1221 | dqg: 1115 | wr: 922 | d: 792 | lq: 589 | lv: 539 | iru: 509 | h: 417 | 6: 385
dpswrq: 2 | 2uelwlqj: 2 | prghudwh: 1 | odfwlf: 1 | lqiudvwuxfwxuhv: 1 | lplw: 1 | hqdeohv: 1 | gurvskhuh: 1 | 526: 1 | 3krwrphwu: 1
wkh: 35 | dqg: 24 | ri: 20 | d: 16 | iru: 15 | wr: 15 | eh: 12 | plvvlrqv: 12 | 0lfur0dsv: 11 | odxqfk: 11
0.994562
0.989144
552/552762.pdf
157799
157798
8156
8156
the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759
the: 10333 | and: 4951 | to: 4614 | of: 4531 | comment: 3204 | in: 2935 | a: 2392 | that: 1990 | for: 1769 | no: 1759
s: 1
1
0.999997
575/575190.pdf
1127
1128
260
261
y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19
y: 63 | r: 57 | o: 57 | a: 39 | p: 38 | e: 38 | acs: 24 | l: 19 | i: 19 | n: 19
æ: 1
æ: 1
0.998081
0.999557
660/660406.pdf
2434
2437
1084
1085
the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18
the: 117 | a: 86 | to: 65 | of: 59 | and: 54 | in: 53 | for: 38 | with: 28 | says: 18 | year: 18
zat: 1
at: 1
z: 3 | zat: 1
0.999539
0.998973
660/660684.pdf
21803
21776
2268
2268
the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216
the: 1056 | of: 764 | benefits: 651 | and: 531 | to: 492 | for: 452 | a: 357 | in: 350 | disabled: 246 | would: 216
9:27
1
0.99938
729/729805.pdf
11261
11266
1866
1866
the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120
the: 500 | and: 456 | to: 327 | ipv6: 320 | of: 318 | in: 177 | for: 177 | a: 170 | internet: 127 | address: 120
z: 5
1
0.999778
792/792201.pdf
1268
1265
255
254
05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24
05: 123 | 06: 78 | 04: 60 | 10: 41 | 8: 39 | 5: 36 | 7: 27 | 12: 27 | 1: 26 | 6: 24
r: 3
r: 3
0.998035
0.998816
999/999419.pdf
18917
18917
1291
1290
0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408
0: 5920 | 1: 1161 | 2: 957 | 5: 657 | e: 650 | 4: 547 | 9: 436 | 3: 425 | 6: 411 | 8: 408
í9,150: 1 | í8,600: 1 | í13,200: 1
9,150: 1 | 8,600: 1
í13,200: 1 | í8,600: 1 | í9,150: 1
13,200: 1 | 8,600: 1 | 9,150: 1
0.998063
0.999841
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Wednesday, July 08, 2015 7:58 AM
To: dev@pdfbox.apache.org
Subject: RE: PDFBox 1.8.10 release
Done and launched.
-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de]
Sent: Wednesday, July 08, 2015 3:00 AM
To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
Subject: Re: PDFBox 1.8.10 release
Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in. I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>
> Should I make the change locally or do you plan to commit?
Locally would be best, as it is really just 1 line, and I haven't
created an issue yet.
Tilman
>
> Thank you!
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, July 07, 2015 3:59 PM
> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
> Subject: Re: PDFBox 1.8.10 release
>
> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
> This sounds like work. Isn't all that is needed to run a version before
> the change, one after the change, and display the differences as a table
> like you already do?
>
> Tilman
>
>> Thank you, Tilman, for pinging me. :)
>>
>> -----Original Message-----
>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>> Sent: Thursday, July 02, 2015 2:24 AM
>> To: dev@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> Subject: Re: PDFBox 1.8.10 release
>>
>> Hi,
>>
>>> Tilman Hausherr <TH...@t-online.de>> hat am 1. Juli 2015 um 21:22
>>> geschrieben:
>>>
>>>
>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>> Hi,
>>>>
>>>> there are again a number of solved issues and I'm thinking about a new
>>>> bugfix release. How about a new one next week, maybe later if someone
>>>> wants to get some addtional things done before?
>>> I have only one thing I'd like to test, with Tim Allison, before a
>>> release: there's a line in PDTextStripper
>> I'm not in a hurry ...
>>
>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>
>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>> out if there is any difference in using what the developer probably
>>> intended, which is
>>>
>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>
>>> (BCC to Tim)
>>>
>>> Tilman
>> BR
>> Andreas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
> For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org<ma...@pdfbox.apache.org>
For additional commands, e-mail: dev-help@pdfbox.apache.org<ma...@pdfbox.apache.org>
RE: PDFBox 1.8.10 release
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Done and launched.
-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de]
Sent: Wednesday, July 08, 2015 3:00 AM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.10 release
Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in. I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>
> Should I make the change locally or do you plan to commit?
Locally would be best, as it is really just 1 line, and I haven't
created an issue yet.
Tilman
>
> Thank you!
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, July 07, 2015 3:59 PM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
> This sounds like work. Isn't all that is needed to run a version before
> the change, one after the change, and display the differences as a table
> like you already do?
>
> Tilman
>
>> Thank you, Tilman, for pinging me. :)
>>
>> -----Original Message-----
>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>> Sent: Thursday, July 02, 2015 2:24 AM
>> To: dev@pdfbox.apache.org
>> Subject: Re: PDFBox 1.8.10 release
>>
>> Hi,
>>
>>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>>> geschrieben:
>>>
>>>
>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>> Hi,
>>>>
>>>> there are again a number of solved issues and I'm thinking about a new
>>>> bugfix release. How about a new one next week, maybe later if someone
>>>> wants to get some addtional things done before?
>>> I have only one thing I'd like to test, with Tim Allison, before a
>>> release: there's a line in PDTextStripper
>> I'm not in a hurry ...
>>
>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>
>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>> out if there is any difference in using what the developer probably
>>> intended, which is
>>>
>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>
>>> (BCC to Tim)
>>>
>>> Tilman
>> BR
>> Andreas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: PDFBox 1.8.10 release
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 08.07.2015 um 04:20 schrieb Allison, Timothy B.:
> Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in. I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
>
> Should I make the change locally or do you plan to commit?
Locally would be best, as it is really just 1 line, and I haven't
created an issue yet.
Tilman
>
> Thank you!
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Tuesday, July 07, 2015 3:59 PM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
>> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
> This sounds like work. Isn't all that is needed to run a version before
> the change, one after the change, and display the differences as a table
> like you already do?
>
> Tilman
>
>> Thank you, Tilman, for pinging me. :)
>>
>> -----Original Message-----
>> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
>> Sent: Thursday, July 02, 2015 2:24 AM
>> To: dev@pdfbox.apache.org
>> Subject: Re: PDFBox 1.8.10 release
>>
>> Hi,
>>
>>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>>> geschrieben:
>>>
>>>
>>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>>> Hi,
>>>>
>>>> there are again a number of solved issues and I'm thinking about a new
>>>> bugfix release. How about a new one next week, maybe later if someone
>>>> wants to get some addtional things done before?
>>> I have only one thing I'd like to test, with Tim Allison, before a
>>> release: there's a line in PDTextStripper
>> I'm not in a hurry ...
>>
>>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>>
>>> however wordSpacing == Float.NaN is always false. So I'd like to find
>>> out if there is any difference in using what the developer probably
>>> intended, which is
>>>
>>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>>
>>> (BCC to Tim)
>>>
>>> Tilman
>> BR
>> Andreas
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
RE: PDFBox 1.8.10 release
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Had to dig into code to make sure that our extension of PDFTextStripper winds up calling the code that you are interested in. I think it does, so, yes, all we'd have to do is two builds, one with and one without the change.
Should I make the change locally or do you plan to commit?
Thank you!
-----Original Message-----
From: Tilman Hausherr [mailto:THausherr@t-online.de]
Sent: Tuesday, July 07, 2015 3:59 PM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.10 release
Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
This sounds like work. Isn't all that is needed to run a version before
the change, one after the change, and display the differences as a table
like you already do?
Tilman
>
> Thank you, Tilman, for pinging me. :)
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Thursday, July 02, 2015 2:24 AM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Hi,
>
>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>> geschrieben:
>>
>>
>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>> Hi,
>>>
>>> there are again a number of solved issues and I'm thinking about a new
>>> bugfix release. How about a new one next week, maybe later if someone
>>> wants to get some addtional things done before?
>> I have only one thing I'd like to test, with Tim Allison, before a
>> release: there's a line in PDTextStripper
> I'm not in a hurry ...
>
>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>
>> however wordSpacing == Float.NaN is always false. So I'd like to find
>> out if there is any difference in using what the developer probably
>> intended, which is
>>
>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>
>> (BCC to Tim)
>>
>> Tilman
>
> BR
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: PDFBox 1.8.10 release
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 07.07.2015 um 19:16 schrieb Allison, Timothy B.:
> Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
This sounds like work. Isn't all that is needed to run a version before
the change, one after the change, and display the differences as a table
like you already do?
Tilman
>
> Thank you, Tilman, for pinging me. :)
>
> -----Original Message-----
> From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
> Sent: Thursday, July 02, 2015 2:24 AM
> To: dev@pdfbox.apache.org
> Subject: Re: PDFBox 1.8.10 release
>
> Hi,
>
>> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
>> geschrieben:
>>
>>
>> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
>>> Hi,
>>>
>>> there are again a number of solved issues and I'm thinking about a new
>>> bugfix release. How about a new one next week, maybe later if someone
>>> wants to get some addtional things done before?
>> I have only one thing I'd like to test, with Tim Allison, before a
>> release: there's a line in PDTextStripper
> I'm not in a hurry ...
>
>> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>>
>> however wordSpacing == Float.NaN is always false. So I'd like to find
>> out if there is any difference in using what the developer probably
>> intended, which is
>>
>> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>>
>> (BCC to Tim)
>>
>> Tilman
>
> BR
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
RE: PDFBox 1.8.10 release
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Will create separate wrapper that relies solely on PDFTextStripper instead of what we currently do now. Results in a few days...
Thank you, Tilman, for pinging me. :)
-----Original Message-----
From: Andreas Lehmkühler [mailto:andreas@lehmi.de]
Sent: Thursday, July 02, 2015 2:24 AM
To: dev@pdfbox.apache.org
Subject: Re: PDFBox 1.8.10 release
Hi,
> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
> geschrieben:
>
>
> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
> > Hi,
> >
> > there are again a number of solved issues and I'm thinking about a new
> > bugfix release. How about a new one next week, maybe later if someone
> > wants to get some addtional things done before?
>
> I have only one thing I'd like to test, with Tim Allison, before a
> release: there's a line in PDTextStripper
I'm not in a hurry ...
>
> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>
> however wordSpacing == Float.NaN is always false. So I'd like to find
> out if there is any difference in using what the developer probably
> intended, which is
>
> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>
> (BCC to Tim)
>
> Tilman
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org
Re: PDFBox 1.8.10 release
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
> Tilman Hausherr <TH...@t-online.de> hat am 1. Juli 2015 um 21:22
> geschrieben:
>
>
> Am 30.06.2015 um 12:20 schrieb Andreas Lehmkühler:
> > Hi,
> >
> > there are again a number of solved issues and I'm thinking about a new
> > bugfix release. How about a new one next week, maybe later if someone
> > wants to get some addtional things done before?
>
> I have only one thing I'd like to test, with Tim Allison, before a
> release: there's a line in PDTextStripper
I'm not in a hurry ...
>
> if ((wordSpacing == 0) || (wordSpacing == Float.NaN))
>
> however wordSpacing == Float.NaN is always false. So I'd like to find
> out if there is any difference in using what the developer probably
> intended, which is
>
> if ((wordSpacing == 0) || (|Float.isNaN(|wordSpacing)))
>
> (BCC to Tim)
>
> Tilman
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org