You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Amit Maheshwari (Jira)" <ji...@apache.org> on 2019/09/04 08:51:00 UTC
[jira] [Created] (PDFBOX-4642) I'd like to know about the dependencies of PDF Box (2.0.12.0)

Amit Maheshwari created PDFBOX-4642:
---------------------------------------

             Summary: I'd like to know about the dependencies of PDF Box (2.0.12.0) 
                 Key: PDFBOX-4642
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4642
             Project: PDFBox
          Issue Type: Wish
          Components: Text extraction
    Affects Versions: 2.0.12
            Reporter: Amit Maheshwari
         Attachments: PDFBox.NET-1.8.9.zip

We have built a .Net version of PdfBox 2.0.12.0 using IKVM and we are using it to extract Text and Form Fields.

Currently we have taken following dependencies

BCProv.JDK15on
Commons.Logging
Commons.Logging.Javadoc
DiffUtils
Fontbox
HamcREST.Core
IKVM.OpenJDK.Core
IKVM.OpenJDK.Security
IKVM.OpenJDK.SwingAWT
IKVM.OpenJDK.Text
IKVM.OpenJDK.Util
IKVM.Reflection
IKVM.Runtime
jcl-over-slf4j-1.7.6

 

While recently we have faced an issue while extracting the text out of a pdf (see below stack trace)

System.IO.FileNotFoundException: Could not load file or assembly 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58' or one of its dependencies. The system cannot find the file specified.

File name: 'IKVM.OpenJDK.Media, Version=7.2.4630.5, Culture=neutral, PublicKeyToken=13235d27fcbfff58'

at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(InputStream , OutputStream , Int32 )

at org.apache.pdfbox.filter.LZWFilter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index)

at org.apache.pdfbox.filter.Filter.decode(InputStream encoded, OutputStream decoded, COSDictionary parameters, Int32 index, DecodeOptions options)

at org.apache.pdfbox.cos.COSInputStream.create(List , COSDictionary , InputStream , ScratchFile , DecodeOptions )

at org.apache.pdfbox.cos.COSStream.createInputStream(DecodeOptions options)

at org.apache.pdfbox.cos.COSStream.createInputStream()

at org.apache.pdfbox.pdmodel.PDPage.getContents()

at org.apache.pdfbox.pdfparser.PDFStreamParser..ctor(PDContentStream contentStream)

at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDContentStream )

at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDContentStream )

at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDPage page)

at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(PDPage )

at org.apache.pdfbox.text.PDFTextStripper.processPage(PDPage page)

at org.apache.pdfbox.text.PDFTextStripper.processPages(PDPageTree pages)

at org.apache.pdfbox.text.PDFTextStripper.writeText(PDDocument doc, Writer outputStream)

at org.apache.pdfbox.text.PDFTextStripper.getText(PDDocument doc)

 

We could mange to get the text extraction after adding these two .dlls in folder where PdfBox dll was residing.

IKVM.OpenJDK.Media.dll 
IKVM.AWT.WinForms.dll

 

Later we searched about the dependancies and we reached to this site. [http://www.squarepdf.net/pdfbox-in-net]

also attaching a zip of it.

 

We found lot of other dlls which we are not considering currently.

Thus I was wondering do we need all of these dlls or some specific. 

And also if possible, can we have a brief information about how different dlls are being used (what kind of problems can be there if not used them)

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org