You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2012/05/03 21:04:06 UTC

1.7 release?

Any guestimates for a 1.7.0 release?

It's been a long time (9 months) since 1.6.0... and I count ~203
commits since 1.6.0.

Mike McCandless

http://blog.mikemccandless.com

Re: 1.7 release?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Fri, May 4, 2012 at 9:46 AM, Timo Boehme <ti...@ontochem.com> wrote:
> Am 03.05.2012 21:04, schrieb Michael McCandless:
>
>> Any guestimates for a 1.7.0 release?
>>
>> It's been a long time (9 months) since 1.6.0... and I count ~203
>> commits since 1.6.0.
>
> There was already some discussion about it (see "Re: Next release(s)?"
> dating from 2012-04-10) and it is clear that a new version (probably 1.7.0)
> should be released soon. However I think we will wait until the project lead
> is back online.

Ahh, super, I missed that discussion (but went and read it now). Thanks!

Mike McCandless

http://blog.mikemccandless.com

Re: 1.7 release?

Posted by Timo Boehme <ti...@ontochem.com>.
Am 14.05.2012 10:11, schrieb Maruan Sahyoun:
> ...
> WRT 1.7 I agree with Timo that the enhancements made so far do
> validate a new release esp the new NonSequentialParser Timo created
> has already proven to solve a number of issues raised. Maybe this
> could be the default for the time being?

I wouldn't make it default since it will change which documents can be 
processed and which throw an exception. While for most documents it 
should be a big step forward there might be some strange/broken 
documents for which the standard parser succeeded using workaround and 
the new one will fail.
One possibility would be to write a wrapper (as was proposed in 
PDFBOX-1199) which first uses the new parser and falls back to the old 
one in case of an error.

Another issue is that the new parser needs a file as input for random 
access while the old parser also accepts a stream. This could be tackled 
by creating a temporary file from stream and use this as input.
I could add this in the next days.

Two further issues:
- need to add method/constructor parameter for specifying password for
   encryption
- signed documents are not tested; I would suppose that the signature
   string will also be decrypted which is wrong as far as I understand
   the spec; there is an implementation for standard parsing
   decryption to prevent this but it relies on all objects already loaded
   and I need another way to detect which strings not to decrypt

Thus in order to release a stable 1.7 in a short time frame I would 
propose keeping the old parser the default but proposing to use the new 
parser if possible. If all issues are resolved we may release a 1.8 with 
the new parser the default.


Best regards,

Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________


Re: 1.7 release?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
the new parser is - unfortunately - still in it's early state and not in any way helpful. I wanted to complete the SimpleParser, which takes the tokens from the PDF Lexer and creates the COS level objects this week. All this is still in preparation for the ConformingParser.

WRT 1.7 I agree with Timo that the enhancements made so far do validate a new release esp the new NonSequentialParser Timo created has already proven to solve a number of issues raised. Maybe this could be the default for the time being?

regards
 
Maruan

Am 14.05.2012 um 09:54 schrieb Timo Boehme:

> Hi,
> 
> Am 13.05.2012 10:24, schrieb Andreas Lehmkuehler:
>> Am 07.05.2012 10:50, schrieb Timo Boehme:
> ...
>>> In my opinion there are already a number of improvements in current trunk
>>> compared to 1.6 and there is no reason to not release another 1.8 before
>>> PDFBOX-1000 is really ready. As I see it we should bump the version to
>>> 2.0 if PDFBOX-1000 finally lands.
>> I just thought about a kind of beta version of the new parser, so that
>> one can test ist without building its own version.
> 
> As I see it we are currently not there. However this is a point Maruan is the only one who knows about current state.
> 
> ...
>>> Nevertheless I'd like to have your opinion on a release and expertise
>>> doing it :-)
>> The release process uses the maven release plugin and therefore it is
>> quite easy to perform. If you are interested in acting as release
>> manager you have to provide a key which will be used to sign the
>> release. This key should be signed by at least one member of "The Apache
>> Web of Trust", see [1] and [2].
> 
> Thanks for the pointers. Since I'm currently a bit short of time I really appreciate that you volunteer as RM.
> 
>> I'll volunteer as RM for the next release. What do you think about
>> cutting the release in one week from now on 22th? As I won't be
>> available in the first 2 weeks of june the next reasonable target date
>> could be june 26th, if we need some more time to include more stuff.
> 
> 22nd is perfect for me.
> 
> 
> Best regards,
> 
> Timo
> 
> -- 
> 
> Timo Boehme
> OntoChem GmbH
> H.-Damerow-Str. 4
> 06120 Halle/Saale
> T: +49 345 4780474
> F: +49 345 4780471
> timo.boehme@ontochem.com
> 
> _____________________________________________________________________
> 
> OntoChem GmbH
> Geschäftsführer: Dr. Lutz Weber
> Sitz: Halle / Saale
> Registergericht: Stendal
> Registernummer: HRB 215461
> _____________________________________________________________________
> 


Re: 1.7 release?

Posted by Timo Boehme <ti...@ontochem.com>.
Hi,

Am 13.05.2012 10:24, schrieb Andreas Lehmkuehler:
> Am 07.05.2012 10:50, schrieb Timo Boehme:
...
>> In my opinion there are already a number of improvements in current trunk
>> compared to 1.6 and there is no reason to not release another 1.8 before
>> PDFBOX-1000 is really ready. As I see it we should bump the version to
>> 2.0 if PDFBOX-1000 finally lands.
> I just thought about a kind of beta version of the new parser, so that
> one can test ist without building its own version.

As I see it we are currently not there. However this is a point Maruan 
is the only one who knows about current state.

...
>> Nevertheless I'd like to have your opinion on a release and expertise
>> doing it :-)
> The release process uses the maven release plugin and therefore it is
> quite easy to perform. If you are interested in acting as release
> manager you have to provide a key which will be used to sign the
> release. This key should be signed by at least one member of "The Apache
> Web of Trust", see [1] and [2].

Thanks for the pointers. Since I'm currently a bit short of time I 
really appreciate that you volunteer as RM.

> I'll volunteer as RM for the next release. What do you think about
> cutting the release in one week from now on 22th? As I won't be
> available in the first 2 weeks of june the next reasonable target date
> could be june 26th, if we need some more time to include more stuff.

22nd is perfect for me.


Best regards,

Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________


Re: 1.7 release?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 20.05.2012 18:46, schrieb Andreas Lehmkuehler:
> Am 13.05.2012 10:24, schrieb Andreas Lehmkuehler:
>
>> .....
>> I'll volunteer as RM for the next release. What do you think about cutting the
>> release in one week from now on 22th? As I won't be available in the first 2
>> weeks of june the next reasonable target date could be june 26th, if we need
>> some more time to include more stuff.
> As there weren't any objections, I'll cut the release in 2 days on tuesday the
> 22th unless something (unexpected) comes up in the meantime.
Due to a recently started discussion about a new implementation of the preflight 
module I postpone my plan to cut a new release for another 2 days.

BR
Andreas Lehmkühler



Re: 1.7 release?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, May 22, 2012 at 8:40 AM, Jukka Zitting <ju...@gmail.com> wrote:
> I just realized that there are some related changes in Tika that I
> should port to the parser class we now have in PDFBox. I'll take care
> of that within the next few hours.

I'm done with this, so +1 to proceeding with the release.

It turned out that the PDFParser class in Tika had evolved more than
I'd expected, so the easiest solution was to just revert the
PDFBOX-1132 changes and move any improvements we'd made within PDFBox
back to Tika.

Let's see if I or someone else will later have better time to
resurrect PDFBOX-1132, but until then it's probably best leave the
PDFParser class in Tika.

BR,

Jukka Zitting

Re: 1.7 release?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi Andreas,

On Sun, May 20, 2012 at 6:46 PM, Andreas Lehmkuehler <an...@lehmi.de> wrote:
> As there weren't any objections, I'll cut the release in 2 days on tuesday
> the 22th unless something (unexpected) comes up in the meantime.

I just realized that there are some related changes in Tika that I
should port to the parser class we now have in PDFBox. I'll take care
of that within the next few hours.

BR,

Jukka Zitting

Re: 1.7 release?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Am 13.05.2012 10:24, schrieb Andreas Lehmkuehler:

> .....
> I'll volunteer as RM for the next release. What do you think about cutting the
> release in one week from now on 22th? As I won't be available in the first 2
> weeks of june the next reasonable target date could be june 26th, if we need
> some more time to include more stuff.
As there weren't any objections, I'll cut the release in 2 days on tuesday the
22th unless something (unexpected) comes up in the meantime.

BR
Andreas Lehmkühler

Re: 1.7 release?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 07.05.2012 10:50, schrieb Timo Boehme:
> Hi,
>
> Am 06.05.2012 16:46, schrieb Andreas Lehmkuehler:
>> Am 04.05.2012 15:46, schrieb Timo Boehme:
>>> Am 03.05.2012 21:04, schrieb Michael McCandless:
>>>> Any guestimates for a 1.7.0 release?
>>>>
>>>> It's been a long time (9 months) since 1.6.0... and I count ~203
>>>> commits since 1.6.0.
>>>
>>> There was already some discussion about it (see "Re: Next
>>> release(s)?" dating from 2012-04-10) and it is clear that a new
>>> version (probably 1.7.0) should be released soon.
>> IMHO there are some things which should be done before, integrate
>> Maruans latest patch (PDFBOX-1000), improve the TTF-Parser (PDFBOX-490)
>
> In my opinion there are already a number of improvements in current trunk
> compared to 1.6 and there is no reason to not release another 1.8 before
> PDFBOX-1000 is really ready. As I see it we should bump the version to 2.0 if
> PDFBOX-1000 finally lands.
I just thought about a kind of beta version of the new parser, so that one can 
test ist without building its own version.

> Thus I would vote for only adding stuff already in pipeline and bug fixes in
> order to do a release in the next few weeks.
I fully agree.

>>> However I think we will wait until the project lead is back online.
>> I guess you are adressing me as PMC Chair. I'm afraid there is a
>> misunderstanding I'd like to clarify.
>>
>> There is no concept of leadership within the ASF. An apache project is
>> led by the PMC [1]. The PMC Chair [2] is just the speaker of the project
>> and acts as interface to the board of the foundation. All PMC members
>> [3] including the chair are equal and each of them has one vote.
>
> Point taken.
I just wanted to avoid the misimpression that anyone else than the PMC rules the 
project. :-)

> Nevertheless I'd like to have your opinion on a release and expertise doing it :-)
The release process uses the maven release plugin and therefore it is quite easy 
to perform. If you are interested in acting as release manager you have to 
provide a key which will be used to sign the release. This key should be signed 
by at least one member of "The Apache Web of Trust", see [1] and [2].

I'll volunteer as RM for the next release. What do you think about cutting the 
release in one week from now on 22th? As I won't be available in the first 2 
weeks of june the next reasonable target date could be june 26th, if we need 
some more time to include more stuff.

> Best regards
> Timo

BR
Andreas Lehmkühler

[1] http://www.apache.org/dev/release-signing.html
[2] http://www.apache.org/dev/release-signing.html#apache-wot

Re: 1.7 release?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, May 7, 2012 at 4:50 AM, Timo Boehme <ti...@ontochem.com> wrote:

> In my opinion there are already a number of improvements in current trunk
> compared to 1.6

+1

> and there is no reason to not release another 1.8 before
> PDFBOX-1000 is really ready. As I see it we should bump the version to 2.0
> if PDFBOX-1000 finally lands.
> Thus I would vote for only adding stuff already in pipeline and bug fixes in
> order to do a release in the next few weeks.

+1

In general releasing should not have to wait for patches to be
committed, and release time isn't the time to suddenly commit a bunch
of last minute patches.  It should rather be the reverse: right after
a release is when you should commit the big changes; this way they
have the most time to "bake" (uncovering issues) in trunk.

It's best if what's committed is always kept in a releasable state;
this way on any given morning someone could wake up and cut a release
candidate.

If there are truly blocker bugs then they should be marked that way in Jira...

Mike McCandless

http://blog.mikemccandless.com

Re: 1.7 release?

Posted by Timo Boehme <ti...@ontochem.com>.
Hi,

Am 06.05.2012 16:46, schrieb Andreas Lehmkuehler:
> Am 04.05.2012 15:46, schrieb Timo Boehme:
>> Am 03.05.2012 21:04, schrieb Michael McCandless:
>>> Any guestimates for a 1.7.0 release?
>>>
>>> It's been a long time (9 months) since 1.6.0... and I count ~203
>>> commits since 1.6.0.
>>
>> There was already some discussion about it (see "Re: Next
>> release(s)?" dating from 2012-04-10) and it is clear that a new
>> version (probably 1.7.0) should be released soon.
> IMHO there are some things which should be done before, integrate
> Maruans latest patch (PDFBOX-1000), improve the TTF-Parser (PDFBOX-490)

In my opinion there are already a number of improvements in current 
trunk compared to 1.6 and there is no reason to not release another 1.8 
before PDFBOX-1000 is really ready. As I see it we should bump the 
version to 2.0 if PDFBOX-1000 finally lands.
Thus I would vote for only adding stuff already in pipeline and bug 
fixes in order to do a release in the next few weeks.

>> However I think we will wait until the project lead is back online.
> I guess you are adressing me as PMC Chair. I'm afraid there is a
> misunderstanding I'd like to clarify.
>
> There is no concept of leadership within the ASF. An apache project is
> led by the PMC [1]. The PMC Chair [2] is just the speaker of the project
> and acts as interface to the board of the foundation. All PMC members
> [3] including the chair are equal and each of them has one vote.

Point taken. Nevertheless I'd like to have your opinion on a release and 
expertise doing it :-)


Best regards
Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________


Re: 1.7 release?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Before integrating the current work at PDFBOX-1000 I would prefer to 

- make sure the lexer is using the new IO classes
- move some parts to the (new) SimpleParser as e.g. some keywords are already handled in the lexer which is more than the lexer should do imo

regards

Maruan

Am 06.05.2012 um 16:46 schrieb Andreas Lehmkuehler <an...@lehmi.de>:

> Hi,
> 
> Am 04.05.2012 15:46, schrieb Timo Boehme:
>> Am 03.05.2012 21:04, schrieb Michael McCandless:
>>> Any guestimates for a 1.7.0 release?
>>> 
>>> It's been a long time (9 months) since 1.6.0... and I count ~203
>>> commits since 1.6.0.
>> 
>> There was already some discussion about it (see "Re: Next release(s)?" dating
>> from 2012-04-10) and it is clear that a new version (probably 1.7.0) should be
>> released soon.
> IMHO there are some things which should be done before, integrate Maruans latest patch (PDFBOX-1000), improve the TTF-Parser (PDFBOX-490) ....
> 
>> However I think we will wait until the project lead is back online.
> I guess you are adressing me as PMC Chair. I'm afraid there is a
> misunderstanding I'd like to clarify.
> 
> There is no concept of leadership within the ASF. An apache project is led by the PMC [1]. The PMC Chair [2] is just the speaker of the project and acts as interface to the board of the foundation. All PMC members [3] including the chair are equal and each of them has one vote.
> 
>> Kind regards,
>> Timo
> 
> BR
> Andreas Lehmkühler
> 
> [1] http://www.apache.org/foundation/how-it-works.html#pmc
> [2] http://www.apache.org/foundation/how-it-works.html#pmc-chair
> [3] http://www.apache.org/foundation/how-it-works.html#pmc-members

Re: 1.7 release?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 04.05.2012 15:46, schrieb Timo Boehme:
> Am 03.05.2012 21:04, schrieb Michael McCandless:
>> Any guestimates for a 1.7.0 release?
>>
>> It's been a long time (9 months) since 1.6.0... and I count ~203
>> commits since 1.6.0.
>
> There was already some discussion about it (see "Re: Next release(s)?" dating
> from 2012-04-10) and it is clear that a new version (probably 1.7.0) should be
> released soon.
IMHO there are some things which should be done before, integrate Maruans latest 
patch (PDFBOX-1000), improve the TTF-Parser (PDFBOX-490) ....

> However I think we will wait until the project lead is back online.
I guess you are adressing me as PMC Chair. I'm afraid there is a
misunderstanding I'd like to clarify.

There is no concept of leadership within the ASF. An apache project is led by 
the PMC [1]. The PMC Chair [2] is just the speaker of the project and acts as 
interface to the board of the foundation. All PMC members [3] including the 
chair are equal and each of them has one vote.

> Kind regards,
> Timo

BR
Andreas Lehmkühler

[1] http://www.apache.org/foundation/how-it-works.html#pmc
[2] http://www.apache.org/foundation/how-it-works.html#pmc-chair
[3] http://www.apache.org/foundation/how-it-works.html#pmc-members

Re: 1.7 release?

Posted by Timo Boehme <ti...@ontochem.com>.
Am 03.05.2012 21:04, schrieb Michael McCandless:
> Any guestimates for a 1.7.0 release?
>
> It's been a long time (9 months) since 1.6.0... and I count ~203
> commits since 1.6.0.

There was already some discussion about it (see "Re: Next release(s)?" 
dating from 2012-04-10) and it is clear that a new version (probably 
1.7.0) should be released soon. However I think we will wait until the 
project lead is back online.


Kind regards,
Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________