You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Andreas Lehmkuehler <le...@apache.org> on 2010/12/20 18:28:40 UTC

[ANNOUNCE] Apache PDFBox 1.4.0 released

The Apache PDFBox community is pleased to announce the release of
Apache PDFBox version 1.4.0. The release is available for download at:

    http://pdfbox.apache.org/download.html

See the full release notes below for details about this release.


Release Notes -- Apache PDFBox -- Version 1.4.0

Introduction
------------

PDFBox is an open source Java library for working with PDF documents.

This is an incremental feature release based on the earlier 1.x releases.
This release contains many improvements and fixes especially related to
text extraction, AES decryption and malformed PDFs.
For more details on these changes and all the other fixes and improvements
included in this release, please refer to the following issues on the
PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.

New Features

   [PDFBOX-865] - Optional Content Groups (OCGs aka layers): initial support
   [PDFBOX-913] - Add program which decompresses object streams

Improvements

   [PDFBOX-521] - Improved PDF Text Extraction that notes paragraph boundaries
   [PDFBOX-885] - Add constructors from super class to PDFTextStripperByArea to 
support encoding
   [PDFBOX-893] - Performance improvement in PDFStreamEngine and Matrix (patch 
included)
   [PDFBOX-909] - Add support for a 6 element matrix
   [PDFBOX-914] - Using TextToPDF to create a PDF from the empty string produces 
unreadble PDF file (patch included)

Bug Fixes

   [PDFBOX-28] - Spliiting a PDF creates unnecessarily large chunks
   [PDFBOX-671] - Cannot use PDFToImage to convert Chinese PDF pages into images.
   [PDFBOX-751] - Text Extraction truncates last character when image page has 
sideways text
   [PDFBOX-759] - Special characters not extracted
   [PDFBOX-779] - All English characters and some Chinese words are separated by 
a space
   [PDFBOX-806] - Failure to extract dc:description when the value is the node text
   [PDFBOX-854] - PDPageContentStream.drawString() doesn't work with all PDFs
   [PDFBOX-872] - ERROR org.apache.pdfbox.filter.FlateFilter - Stop reading 
corrupt stream
   [PDFBOX-881] - Incorrect output when word spacing is achieved by matrix 
translation
   [PDFBOX-883] - Special characters are not correctly handled anymore when 
printing or exporting to image
   [PDFBOX-887] - CCITTFaxDecodeFilter doesn't use the abbreviated names for 
image parameters
   [PDFBOX-888] - Decrypt doesn't allow more then 3 args
   [PDFBOX-889] - Empty page causes NPE in importPage
   [PDFBOX-896] - PDFViewer doesn't render landscape mode correctly
   [PDFBOX-897] - NullPointerException PDFFont#getEncodingFromFont with a PDF 
book because Type1Encoding is null
   [PDFBOX-898] - COSStreamArray NullPointerException. firstStream is null if 
COSArray contains no items
   [PDFBOX-900] - ArrayIndexOutOfBoundsException with extracting labels from 
malformed document
   [PDFBOX-902] - ClassCastException caused by unhandled Markup Annotations.
   [PDFBOX-907] - Encrypted Key not correctly calculated when the meta data is 
not encrypted
   [PDFBOX-910] - certain sequences (such as endstrea[^m] are eaten by 
BaseParser#readUntilEndStream
   [PDFBOX-918] - Can't parse PDF
   [PDFBOX-921] - NumberFormatException when parsing a type1 font

Release Contents
----------------

This release consists of a single source archive packaged as a zip file.
The archive can be unpacked with the jar tool from your JDK installation.
See the README.txt file for instructions on how to build this release.

The source archive is accompanied by SHA1 and MD5 checksums and a PGP
signature that you can use to verify the authenticity of your download.
The public key used for the PGP signature can be found at
https://svn.apache.org/repos/asf/pdfbox/KEYS.

About Apache PDFBox
-------------------

Apache PDFBox is an open source Java library for working with PDF documents.
This project allows creation of new PDF documents, manipulation of existing
documents and the ability to extract content from documents. Apache PDFBox
also includes several command line utilities. Apache PDFBox is published
under the Apache License, Version 2.0.

For more information, visit http://pdfbox.apache.org/

About The Apache Software Foundation
------------------------------------

Established in 1999, The Apache Software Foundation provides organizational,
legal, and financial support for more than 100 freely-available,
collaboratively-developed Open Source projects. The pragmatic Apache License
enables individual and commercial users to easily deploy Apache software;
the Foundation's intellectual property framework limits the legal exposure
of its 2,500+ contributors.

For more information, visit http://www.apache.org/

RE: [ANNOUNCE] Apache PDFBox 1.4.0 released

Posted by Jukka Zitting <jz...@adobe.com>.
Hi,

From: Orion Poplawski [mailto:orion@cora.nwra.com]
> Well, I'm trying to build a package for Fedora.  Full build logs
> with -e are at http://www.cora.nwra.com/~orion/fedora/rpmbuild.log

You're using Fedora's mvn-jpp instead of the official Maven version, so I suspect you're hitting some compatibility issue between them. If the build works with normal mvn, then you should probably contact the Fedora people for ideas on how to fix this.

BR,

Jukka Zitting


Re: [ANNOUNCE] Apache PDFBox 1.4.0 released

Posted by Orion Poplawski <or...@cora.nwra.com>.
On 12/21/2010 11:39 PM, Andreas Lehmkuehler wrote:
> Hi,
>
> Am 22.12.2010 00:17, schrieb Orion Poplawski:
>> On 12/20/2010 10:28 AM, Andreas Lehmkuehler wrote:
>>> The Apache PDFBox community is pleased to announce the release of
>>> Apache PDFBox version 1.4.0. The release is available for download at:
>>>
>>> http://pdfbox.apache.org/download.html
>>
>> I get the following error trying to build from source:
>>
>> [INFO] [bundle:bundle {execution: default-bundle}]
>> [ERROR] Error building bundle org.apache.pdfbox:pdfbox:bundle:1.4.0 : Input
>> file
>> does not exist: target/maven-shared-archive-resources/META-INF
>> [ERROR] Error(s) found in bundle configuration
>>
>>
>> I'm not sure where this file is supposed to come from.
>
> Everything works fine here. What exactly did you try to do? Have you try to
> add the "-e" switch to get more information about the build process?
>
> BR
> Andreas Lehmkühler

Well, I'm trying to build a package for Fedora.  Full build logs with -e are 
at http://www.cora.nwra.com/~orion/fedora/rpmbuild.log

Not much useful even with -e.

The file is referenced here:

       <plugin>
         <groupId>org.apache.felix</groupId>
         <artifactId>maven-bundle-plugin</artifactId>
         <version>2.0.1</version>
         <extensions>true</extensions>
         <configuration>
           <instructions>
             <Bundle-DocURL>http://pdfbox.apache.org/</Bundle-DocURL>
             <Include-Resource>
               {maven-resources},
               META-INF=target/maven-shared-archive-resources/META-INF,
 
org/apache/pdfbox/resources=target/classes/org/apache/pdfbox/resources
             </Include-Resource>
           </instructions>
         </configuration>
       </plugin>

But I can't imagine where that META-INF file is supposed to come from.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

Re: [ANNOUNCE] Apache PDFBox 1.4.0 released

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 22.12.2010 00:17, schrieb Orion Poplawski:
> On 12/20/2010 10:28 AM, Andreas Lehmkuehler wrote:
>> The Apache PDFBox community is pleased to announce the release of
>> Apache PDFBox version 1.4.0. The release is available for download at:
>>
>> http://pdfbox.apache.org/download.html
>
> I get the following error trying to build from source:
>
> [INFO] [bundle:bundle {execution: default-bundle}]
> [ERROR] Error building bundle org.apache.pdfbox:pdfbox:bundle:1.4.0 : Input file
> does not exist: target/maven-shared-archive-resources/META-INF
> [ERROR] Error(s) found in bundle configuration
>
>
> I'm not sure where this file is supposed to come from.

Everything works fine here. What exactly did you try to do? Have you try to add 
the "-e" switch to get more information about the build process?

BR
Andreas Lehmkühler

Re: [ANNOUNCE] Apache PDFBox 1.4.0 released

Posted by Orion Poplawski <or...@cora.nwra.com>.
On 12/20/2010 10:28 AM, Andreas Lehmkuehler wrote:
> The Apache PDFBox community is pleased to announce the release of
> Apache PDFBox version 1.4.0. The release is available for download at:
>
> http://pdfbox.apache.org/download.html

I get the following error trying to build from source:

[INFO] [bundle:bundle {execution: default-bundle}]
[ERROR] Error building bundle org.apache.pdfbox:pdfbox:bundle:1.4.0 : Input 
file does not exist: target/maven-shared-archive-resources/META-INF
[ERROR] Error(s) found in bundle configuration


I'm not sure where this file is supposed to come from.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

Re: [ANNOUNCE] Apache PDFBox 1.4.0 released

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 20.12.2010 18:33, schrieb Kevin Brown:
> Awesome, thanks!
>
> I don't see anything about the DocumentCatelog having the setVersion method.
> Did that make it into this release?
Yes. It is part of PDFBOX-865

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-865

Re: [ANNOUNCE] Apache PDFBox 1.4.0 released

Posted by Kevin Brown <kb...@gmail.com>.
Awesome, thanks!

I don't see anything about the DocumentCatelog having the setVersion method.
Did that make it into this release?


On Mon, Dec 20, 2010 at 12:28 PM, Andreas Lehmkuehler <le...@apache.org>wrote:

> The Apache PDFBox community is pleased to announce the release of
> Apache PDFBox version 1.4.0. The release is available for download at:
>
>   http://pdfbox.apache.org/download.html
>
> See the full release notes below for details about this release.
>
>
> Release Notes -- Apache PDFBox -- Version 1.4.0
>
> Introduction
> ------------
>
> PDFBox is an open source Java library for working with PDF documents.
>
> This is an incremental feature release based on the earlier 1.x releases.
> This release contains many improvements and fixes especially related to
> text extraction, AES decryption and malformed PDFs.
> For more details on these changes and all the other fixes and improvements
> included in this release, please refer to the following issues on the
> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.
>
> New Features
>
>  [PDFBOX-865] - Optional Content Groups (OCGs aka layers): initial support
>  [PDFBOX-913] - Add program which decompresses object streams
>
> Improvements
>
>  [PDFBOX-521] - Improved PDF Text Extraction that notes paragraph
> boundaries
>  [PDFBOX-885] - Add constructors from super class to PDFTextStripperByArea
> to support encoding
>  [PDFBOX-893] - Performance improvement in PDFStreamEngine and Matrix
> (patch included)
>  [PDFBOX-909] - Add support for a 6 element matrix
>  [PDFBOX-914] - Using TextToPDF to create a PDF from the empty string
> produces unreadble PDF file (patch included)
>
> Bug Fixes
>
>  [PDFBOX-28] - Spliiting a PDF creates unnecessarily large chunks
>  [PDFBOX-671] - Cannot use PDFToImage to convert Chinese PDF pages into
> images.
>  [PDFBOX-751] - Text Extraction truncates last character when image page
> has sideways text
>  [PDFBOX-759] - Special characters not extracted
>  [PDFBOX-779] - All English characters and some Chinese words are separated
> by a space
>  [PDFBOX-806] - Failure to extract dc:description when the value is the
> node text
>  [PDFBOX-854] - PDPageContentStream.drawString() doesn't work with all PDFs
>  [PDFBOX-872] - ERROR org.apache.pdfbox.filter.FlateFilter - Stop reading
> corrupt stream
>  [PDFBOX-881] - Incorrect output when word spacing is achieved by matrix
> translation
>  [PDFBOX-883] - Special characters are not correctly handled anymore when
> printing or exporting to image
>  [PDFBOX-887] - CCITTFaxDecodeFilter doesn't use the abbreviated names for
> image parameters
>  [PDFBOX-888] - Decrypt doesn't allow more then 3 args
>  [PDFBOX-889] - Empty page causes NPE in importPage
>  [PDFBOX-896] - PDFViewer doesn't render landscape mode correctly
>  [PDFBOX-897] - NullPointerException PDFFont#getEncodingFromFont with a PDF
> book because Type1Encoding is null
>  [PDFBOX-898] - COSStreamArray NullPointerException. firstStream is null if
> COSArray contains no items
>  [PDFBOX-900] - ArrayIndexOutOfBoundsException with extracting labels from
> malformed document
>  [PDFBOX-902] - ClassCastException caused by unhandled Markup Annotations.
>  [PDFBOX-907] - Encrypted Key not correctly calculated when the meta data
> is not encrypted
>  [PDFBOX-910] - certain sequences (such as endstrea[^m] are eaten by
> BaseParser#readUntilEndStream
>  [PDFBOX-918] - Can't parse PDF
>  [PDFBOX-921] - NumberFormatException when parsing a type1 font
>
> Release Contents
> ----------------
>
> This release consists of a single source archive packaged as a zip file.
> The archive can be unpacked with the jar tool from your JDK installation.
> See the README.txt file for instructions on how to build this release.
>
> The source archive is accompanied by SHA1 and MD5 checksums and a PGP
> signature that you can use to verify the authenticity of your download.
> The public key used for the PGP signature can be found at
> https://svn.apache.org/repos/asf/pdfbox/KEYS.
>
> About Apache PDFBox
> -------------------
>
> Apache PDFBox is an open source Java library for working with PDF
> documents.
> This project allows creation of new PDF documents, manipulation of existing
> documents and the ability to extract content from documents. Apache PDFBox
> also includes several command line utilities. Apache PDFBox is published
> under the Apache License, Version 2.0.
>
> For more information, visit http://pdfbox.apache.org/
>
> About The Apache Software Foundation
> ------------------------------------
>
> Established in 1999, The Apache Software Foundation provides
> organizational,
> legal, and financial support for more than 100 freely-available,
> collaboratively-developed Open Source projects. The pragmatic Apache
> License
> enables individual and commercial users to easily deploy Apache software;
> the Foundation's intellectual property framework limits the legal exposure
> of its 2,500+ contributors.
>
> For more information, visit http://www.apache.org/
>