You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Manuel Aristarán <ja...@jazzido.com> on 2020/07/09 02:10:51 UTC

Bounty offer: Upgrade PDFBox in Tabula

Hi!

I'm one of the maintainers of Tabula [0].

Due to some changes in PDFBox, we've been running on 2.0.15 for some time
now, and we would love to keep Tabula updated with the newest version of
our favorite library :)

Last year, Tilman Hausherr graciously submitted a PR [1] that updated
PDFBox to 2.0.19, but unfortunately broke a few tests, as it seems that
there were changes in the font measurement heuristics. Text measurement is
a critical need of Tabula, so we had to choose to stick with the latest
compatible version.

We want to offer a $200 USD bounty to fix the issue. We run entirely on
donations, and have funds available for this [2]. The goal is to update
Tabula to use PDFBox 2.0.20, and the requirement is that the test suite
passes in its entirety.

If you're interested, please get in touch with me at manuel@jazzido.com

Thanks!


[0] https://tabula.technology
[1] https://github.com/tabulapdf/tabula-java/pull/325
[2] https://opencollective.com/tabulapdf

--
Manuel Aristarán
http://jazzido.com

Re: Bounty offer: Upgrade PDFBox in Tabula

Posted by Manuel Aristarán <ma...@jazzido.com>.
This was faster than I expected: Tilman contributed changes to PDFBox [0]
and Tabula [1], thus making us compatible with the newest version of PDFBox.

As soon as 2.0.21 is released, we'll release a new version of Tabula.

Thanks!

[0]
https://svn.apache.org/viewvc/pdfbox/branches/2.0/pdfbox/src/main/java/org/apache/pdfbox/text/LegacyPDFStreamEngine.java?r1=1879751&r2=1879750&pathrev=1879751
[1] https://github.com/tabulapdf/tabula-java/pull/325#issuecomment-615896790

On Thu, Jul 9, 2020 at 12:24 AM Tilman Hausherr <TH...@t-online.de>
wrote:

> Yeah I remember that one, I even tried to find the problem and then did
> something else. Or maybe the IDE crashed so the window was no longer
> open and I forgot.
>
> I did not even go far enough to find out whether the old text extraction
> was the "good" one or the new one.
>
> Coincicentally, there is an issue
> https://issues.apache.org/jira/browse/PDFBOX-4909
> that may make it easier to get back to the old height calculation.
>
> Tilman (works for free here)
>
> Am 09.07.2020 um 04:10 schrieb Manuel Aristarán:
> > Hi!
> >
> > I'm one of the maintainers of Tabula [0].
> >
> > Due to some changes in PDFBox, we've been running on 2.0.15 for some time
> > now, and we would love to keep Tabula updated with the newest version of
> > our favorite library :)
> >
> > Last year, Tilman Hausherr graciously submitted a PR [1] that updated
> > PDFBox to 2.0.19, but unfortunately broke a few tests, as it seems that
> > there were changes in the font measurement heuristics. Text measurement
> is
> > a critical need of Tabula, so we had to choose to stick with the latest
> > compatible version.
> >
> > We want to offer a $200 USD bounty to fix the issue. We run entirely on
> > donations, and have funds available for this [2]. The goal is to update
> > Tabula to use PDFBox 2.0.20, and the requirement is that the test suite
> > passes in its entirety.
> >
> > If you're interested, please get in touch with me at manuel@jazzido.com
> >
> > Thanks!
> >
> >
> > [0] https://tabula.technology
> > [1] https://github.com/tabulapdf/tabula-java/pull/325
> > [2] https://opencollective.com/tabulapdf
> >
> > --
> > Manuel Aristarán
> > http://jazzido.com
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Bounty offer: Upgrade PDFBox in Tabula

Posted by Tilman Hausherr <TH...@t-online.de>.
Yeah I remember that one, I even tried to find the problem and then did 
something else. Or maybe the IDE crashed so the window was no longer 
open and I forgot.

I did not even go far enough to find out whether the old text extraction 
was the "good" one or the new one.

Coincicentally, there is an issue
https://issues.apache.org/jira/browse/PDFBOX-4909
that may make it easier to get back to the old height calculation.

Tilman (works for free here)

Am 09.07.2020 um 04:10 schrieb Manuel Aristarán:
> Hi!
>
> I'm one of the maintainers of Tabula [0].
>
> Due to some changes in PDFBox, we've been running on 2.0.15 for some time
> now, and we would love to keep Tabula updated with the newest version of
> our favorite library :)
>
> Last year, Tilman Hausherr graciously submitted a PR [1] that updated
> PDFBox to 2.0.19, but unfortunately broke a few tests, as it seems that
> there were changes in the font measurement heuristics. Text measurement is
> a critical need of Tabula, so we had to choose to stick with the latest
> compatible version.
>
> We want to offer a $200 USD bounty to fix the issue. We run entirely on
> donations, and have funds available for this [2]. The goal is to update
> Tabula to use PDFBox 2.0.20, and the requirement is that the test suite
> passes in its entirety.
>
> If you're interested, please get in touch with me at manuel@jazzido.com
>
> Thanks!
>
>
> [0] https://tabula.technology
> [1] https://github.com/tabulapdf/tabula-java/pull/325
> [2] https://opencollective.com/tabulapdf
>
> --
> Manuel Aristarán
> http://jazzido.com
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org