You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by LynX <_L...@bk.ru> on 2011/05/06 21:01:55 UTC

HTML formatting

Hello,

There is several similar issues which concerns formatting after PDF to 
HTML conversion:

https://issues.apache.org/jira/browse/PDFBOX-6
https://issues.apache.org/jira/browse/PDFBOX-271

I would like to work on them (I see that some work has been done by 
rrufai already, but PDFBOX code have changed since then, so I may need 
to do some additional changes), but I see that severity of all these 
issues is minor and there are not any comments on them for a long time.
Thats why I am not sure if it make sense to work on them or not? If not 
then may be they can be closed?

Thank you,
LX

Re: HTML formatting

Posted by Raimi Rufai <rr...@gmail.com>.
Hi LX,

PDF to HTML conversion is a fascinating  set of problems. One of the
hard knots to crack is preserving tables and columns for multi-column
documents. I've not had a look at the code for a long time.

I'll be nice to jump back in.

Regards,

Raimi

On Fri, May 6, 2011 at 3:01 PM, LynX <_L...@bk.ru> wrote:
> Hello,
>
> There is several similar issues which concerns formatting after PDF to HTML
> conversion:
>
> https://issues.apache.org/jira/browse/PDFBOX-6
> https://issues.apache.org/jira/browse/PDFBOX-271
>
> I would like to work on them (I see that some work has been done by rrufai
> already, but PDFBOX code have changed since then, so I may need to do some
> additional changes), but I see that severity of all these issues is minor
> and there are not any comments on them for a long time.
> Thats why I am not sure if it make sense to work on them or not? If not then
> may be they can be closed?
>
> Thank you,
> LX
>



-- 
«To develop software is to build a machine simply by describing it.»
(Michael A. Jackson -- not the singer)
«Développer des logiciels est de construire une machine tout
simplement en le décrivant.» (Michael A. Jackson - pas le chanteur)