You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Subham Tripathi <su...@gmail.com> on 2015/07/14 12:58:17 UTC

Understanding PDFBox

Hi All,
I wish to contribute to Apache PDFBox but before that i was trying to
understand the codebase. I am finding it very tough to understand the code
base as i am not finding any flow to follow.
Is there any documentation from which i can draw some high level insight of
the PDFBox ?

-- 
Best Regards,
Subham Tripathi

Re: Understanding PDFBox

Posted by John Hewson <jo...@jahewson.com>.
The book “Developing with PDF” provides a short and gentle introduction to the
PDF format.

We have a brief architectural summary of PDFBox at:

http://pdfbox.apache.org/1.8/architecture.html <http://pdfbox.apache.org/1.8/architecture.html>

But in general, to make sense of PDFBox, you’ll need to understand the PDF spec.

— John

> On 14 Jul 2015, at 03:58, Subham Tripathi <su...@gmail.com> wrote:
> 
> Hi All,
> I wish to contribute to Apache PDFBox but before that i was trying to
> understand the codebase. I am finding it very tough to understand the code
> base as i am not finding any flow to follow.
> Is there any documentation from which i can draw some high level insight of
> the PDFBox ?
> 
> -- 
> Best Regards,
> Subham Tripathi


Re: Understanding PDFBox

Posted by khyrul Bashar <kh...@gmail.com>.
Hi Subham
I'm a GSoc student here in PDFBox this year and I'm improving PDFDebugger
of PDFBox issue <https://issues.apache.org/jira/browse/PDFBOX-2530>. Before
applying for the project, I had to be familiar with the code base. I was in
a bit of puzzle for the first time, but now I've a basic understanding of
the code base though I'm not coding for the main module of the PDFBox. I'm
suggesting what I've done so far to get comfortable with PDFBox to start.

Read the PDF specification, at least get a head start.
https://www.adobe.com/devnet/pdf/pdf_reference.html
Read the documentation.
https://pdfbox.apache.org/docs/2.0.0-SNAPSHOT/javadocs/
Play with example codes.
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/

Anyway, there are other things before you can contribute which I think
the committers guys can say more specifically.

Regards
Khyrul Bashar

On Tue, Jul 14, 2015 at 4:58 PM, Subham Tripathi <su...@gmail.com>
wrote:

> Hi All,
> I wish to contribute to Apache PDFBox but before that i was trying to
> understand the codebase. I am finding it very tough to understand the code
> base as i am not finding any flow to follow.
> Is there any documentation from which i can draw some high level insight of
> the PDFBox ?
>
> --
> Best Regards,
> Subham Tripathi
>

Re: Understanding PDFBox

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 14.07.2015 um 12:58 schrieb Subham Tripathi:
> Hi All,
> I wish to contribute to Apache PDFBox but before that i was trying to
> understand the codebase. I am finding it very tough to understand the code
> base as i am not finding any flow to follow.
> Is there any documentation from which i can draw some high level insight of
> the PDFBox ?
>


Look at the examples... and start from there. Then look at an unsolved 
issue :-)

If this is about getting coding practice, google for BATIK-1109 
<https://issues.apache.org/jira/browse/BATIK-1109> and BATIK-1110 
<https://issues.apache.org/jira/browse/BATIK-1110>. One of the bugs is 
probably fixed by a few lines (although some debugging is needed to see 
how signed / unsigned values are handled there), the other one involves 
using code in PDFBox but in the way BATIK uses. Both bugs have been 
fixed in PDFBox, but not in BATIK (of which PDFBox used some code).

Tilman