You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2011/03/02 12:55:14 UTC

DO NOT REPLY [Bug 50852] New: [PATCH] Improve generation of PDFs with accessibility information

https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

           Summary: [PATCH] Improve generation of PDFs with accessibility
                    information
           Product: Fop
           Version: 1.1dev
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: martin.koegler@brz.gv.at


The current accessibility implemenation does not scale and is slow. The
following implementation is memory efficient (only structure information of
pages in processing is kept in memory) and faster.

 [PATCH 01/20] container element for structure tree

Adds object for storing the structure tree

 [PATCH 02/20] Add methods for storing structure tree to area tree
 [PATCH 03/20] Helper functions for StructureElement construction
 [PATCH 04/20] Parse role attributes
 [PATCH 05/20] Implement mapping functions of FO objects to structure tree
 [PATCH 06/20] Store new structure tree in area tree
 [PATCH 07/20] Handle structure tree only in new format
 [PATCH 08/20] Remove unused code

These patches switch to the new internal format for the structure tree. Only
the structure tree of unfinished page sequences is kept in memory, so it the
overhead for documents with not too large page sequences is limited. The
content of the structure-tree tag in area-tree / intermediate XML is slightly
changed:

The page-sequences shows up as tag, so that it possible to give it roles [eg.
Part does not  make sense for a one page document].

 [PATCH 09/20] Workaround: Some test cases don't like ptr

I don't recommend to add this patch, as the additional ptr attributes do no
harm - but someone has to do some work on the testsuite otherwise.

 [PATCH 10/20] Avoid overhead of creating writers

Adding structure information to a 100MB PDF can mean adding 300-400 MB of
structure information =>  Multi Mio of additional dictionaries are written =>
So many BufferedWriter are created. Getting rid of both stream representation
improves performance large document and simplifies code as there is no Writer
to forget to flush anymore.

This patch is not required for the following patches.

 [PATCH 11/20] Add support for clearing objects at write time
 [PATCH 12/20] Add support for lazy object number assignment
 [PATCH 13/20] Improve PDFArray/Dictionary
 [PATCH 14/20] Free structure tree ID map at the end of the page sequence
 [PATCH 15/20] Don't write empty leaf structure to PDF

Structure information must be written in between - freeing resources after
writing reduces memory pressure. Assigning objects number latter allows to 
prune empty structure elements.

 [PATCH 16/20] Simplify references to the text

Use a more compact reference representation, if possible

-----------------------------------------
Enhancements:

 [PATCH 17/20] Dupplicate static content structures for each pages

Static regions, which are put on multiple pages, generate strange results.
Duplicating them in the structure tree yields to more logical results.

 [PATCH 18/20] Generate shorter PDF documents

Structure information means much larger PDFs - some bytes can be safed.

 [PATCH 19/20] New roles for accessibility

Role "hidden":
Kept whole subtree out of structure tree.
Role "inherit":
Hide this node in the structure tree and place content directly in the parent.

Both functions are necessary to create "beautiful" structure trees. The role
names need to be discussed.

 [PATCH 20/20] Support role for Flow & PageSequence

Allow changing the appearance of Flow/PageSequence in the structure tree.
Especially on small documents, the Structure Doc->Part->Sect can not be
appropriate.

PageSequence allows only assigning a tag name, but (because of
implementation/performance issues) it can not be removed.
Flow/static-content can be changed in every way.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #17 from Martin K <ma...@brz.gv.at> 2011-03-02 07:08:25 EST ---
Created an attachment (id=26717)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26717)
Patch 17/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #22 from Martin K <ma...@brz.gv.at> 2011-06-16 21:36:22 UTC ---
I'll split my answers.

Add "Add support for lazy object number assignment":

The codes creates an PDF structure tree containing objects, which will be
trimmed away (eg. because they are empty leaf objects).

Without lazy object number assignment such objects will get an object number
too, as they are assigned while creating these objects. It will increase the
xref size and the xref entries for such numbers will be incorrect (as there is
no support for free object numbers).

Lazy assignment is the simplest solution for this problem without rewriting
larger parts of the code. For such objects, the assignment will happen, when
the first reference to such objects (or the object itself) is written into the
PDF.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #23 from Martin K <ma...@brz.gv.at> 2011-06-16 21:51:39 UTC ---
Add "empty table cell problem":

This should be easy to solve:
In PDFStructElement.prepare() 
change 
        if (kids.length() == 0) {

If the test is extended to only trigger, if the current structure element is
not a table cell [getStructureType() not in (TD, TH, ..) && kids.length() ==
0], an empty table cell will be put in the structure tree.

Maybe PDFStructElement should get a trim empty leaf flag and the decision moved
to the FOToPDFRoleMap.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #24 from Martin K <ma...@brz.gv.at> 2011-06-16 22:32:37 UTC ---
Add "Avoid overhead of creating writers":

By default, nearly each FO tag results in an PDF structure element [= object],
so there are lots of OutputStreamWriters created.

The trunk implementation with two objects (Writer+Stream) has the weak point,
that forgetting to flush the writer before using the stream can lead to
incorrect PDFs.

As OutputStream.write only supports bytes, any forgotten encode will result in
an syntax error => My solution will catch errors at compile time.

> I think the problem of encoding text into the output should be solved by 
> defining a specialized PDFOutputStream that would be able to stream both 
> String and bytes. That PDFOutputStream would be passed around to objects 
> that then wouldn’t have to handle their own wrapper or make calls to 
> encode. Does that make sense?

If the PDFOutputStream is persistent (eg, not recreated for each batch of
objects), it is an interesting idea.

The current code creates lots of CountingOutputStreams too, which would be
needed to be wrapped in a PDFOutputStream. To overcome this, the new
PDFOutputStream could count the total number of written bytes. Instead of using
a CountingOutputStream, the code can store the start number of written bytes
and use the difference to the end number of written bytes.

This is an optional enhancement, therefore I would currently like to
concentrate on the other patches.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #9 from Martin K <ma...@brz.gv.at> 2011-03-02 07:02:29 EST ---
Created an attachment (id=26709)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26709)
Patch 09/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #1 from Martin K <ma...@brz.gv.at> 2011-03-02 06:56:43 EST ---
Created an attachment (id=26701)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26701)
Patch 01/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #32 from Vincent Hennebert <vh...@gmail.com> 2012-04-24 14:53:44 UTC ---
(In reply to comment #31)
> (In reply to comment #30)
> > vincent/peter, what is the status of this patch with your recent accessibility
> > improvements? have these patches been applied? if not, then do expect to apply
> > them? (when?)
> 
> vincent/peter?

Some of those patches may still be worth applying as they may bring
improvements in areas that were not touched by the work on accessibility. I'll
need to have another look.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #4 from Martin K <ma...@brz.gv.at> 2011-03-02 06:59:03 EST ---
Created an attachment (id=26704)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26704)
Patch 04/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #2 from Martin K <ma...@brz.gv.at> 2011-03-02 06:57:28 EST ---
Created an attachment (id=26702)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26702)
Patch 02/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26717|0                           |1
        is obsolete|                            |

--- Comment #37 from Vincent Hennebert <vh...@gmail.com> ---
Comment on attachment 26717
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26717
Patch 17/20

I must mark this patch as invalid. The structure tree corresponds to the
logical structure of the document and is not meant to reflect the way it is
laid out on pages. So the static-content cannot be duplicated in the structure
tree as many times as there are pages in the final output.

As hinted by the PDF Reference, the proper way to deal with content that is
repeated over pages is to treat it as artifact. This will become possible once
patches #20 and artifact have been handled.

Vincent

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #26 from Vincent Hennebert <vh...@gmail.com> 2011-06-17 11:15:57 UTC ---
(In reply to comment #23)
> Add "empty table cell problem":
> 
> This should be easy to solve:
> In PDFStructElement.prepare() 
> change 
>         if (kids.length() == 0) {
> 
> If the test is extended to only trigger, if the current structure element is
> not a table cell [getStructureType() not in (TD, TH, ..) && kids.length() ==
> 0], an empty table cell will be put in the structure tree.

I tried commenting out all the code in that method that removes elements and it
does not work.

Anyway, as described in the mailing list discussion that I linked to in comment
#21, I think the structure tree should not be trimmed, as there is structural
information there that would be necessary for a round-trip process. But in the
current design there is no way to create the right order of text and other
structure elements.


> Maybe PDFStructElement should get a trim empty leaf flag and the decision moved
> to the FOToPDFRoleMap.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #29 from Glenn Adams <gl...@skynav.com> 2012-04-08 09:11:08 UTC ---
increase priority due to presence of a patch

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #28 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:43:46 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #12 from Martin K <ma...@brz.gv.at> 2011-03-02 07:04:43 EST ---
Created an attachment (id=26712)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26712)
Patch 12/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #27 from Martin K <ma...@brz.gv.at> 2011-06-18 07:51:42 UTC ---
Created attachment 27171
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27171
Handle empty nodes

The attached patch should be able to handle both corner cases for empty
blocks/table-cells (at least my testcase).

Empty blocks can have a meaning in the structure tree. Or they are created for
technical reasons without any meaning. I created an interface for this policy
decision in FOToPDFRoleMap. The patch includes a simple test policy - the final
policy will require further discussion.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #25 from Martin K <ma...@brz.gv.at> 2011-06-16 23:11:44 UTC ---
"Add support for clearing objects at write time":

The structure tree is one big document wide tree. The root is written after the
last page. 

The structure tree has circular links (parent=>child, child=>parent) and
external links to tree nodes (eg. from the per page element list). 

If we don't want to keep the whole thing into memory, we need to write parts of
the tree and then purge all references to them.

Without this patch, a write of structure elements would be:
* add the object to the PDF document
* callback to the caller do the actual write or get the OutputStream somehow to
do the write (The whole logic is in PDFLogicalStructureHandler, which currently
only has access to the PDF document, not the output stream).
* Traverse the written structure elements again and break the loops

With this patch, a write of structure elements is only adding the objects to
the document.
The next write operation will store the streams and call clean the objects
afterwards. For a structure element, the clean operation is mostly removing
it's links to child elements, which also results in breaking the loops.

The same logic could eg. be used for writing pages: The clean function of a
pages would be deleting the page content.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #20 from Martin K <ma...@brz.gv.at> 2011-03-02 07:10:00 EST ---
Created an attachment (id=26720)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26720)
Patch 20/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #5 from Martin K <ma...@brz.gv.at> 2011-03-02 06:59:46 EST ---
Created an attachment (id=26705)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26705)
Patch 05/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #35 from Vincent Hennebert <vh...@gmail.com> ---
Created attachment 28826
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28826&action=edit
Treat some content as artifact – Documentation

This patch is against the staging CMS website, that sooner or later will become
the official website.

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #13 from Martin K <ma...@brz.gv.at> 2011-03-02 07:05:34 EST ---
Created an attachment (id=26713)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26713)
Patch 13/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #7 from Martin K <ma...@brz.gv.at> 2011-03-02 07:01:14 EST ---
Created an attachment (id=26707)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26707)
Patch 07/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #6 from Martin K <ma...@brz.gv.at> 2011-03-02 07:00:38 EST ---
Created an attachment (id=26706)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26706)
Patch 06/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #16 from Martin K <ma...@brz.gv.at> 2011-03-02 07:07:29 EST ---
Created an attachment (id=26716)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26716)
Patch 16/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #19 from Martin K <ma...@brz.gv.at> 2011-03-02 07:09:37 EST ---
Created an attachment (id=26719)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26719)
Patch 19/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26701|0                           |1
        is obsolete|                            |
  Attachment #26702|0                           |1
        is obsolete|                            |
  Attachment #26703|0                           |1
        is obsolete|                            |
  Attachment #26704|0                           |1
        is obsolete|                            |
  Attachment #26705|0                           |1
        is obsolete|                            |
  Attachment #26706|0                           |1
        is obsolete|                            |
  Attachment #26707|0                           |1
        is obsolete|                            |
  Attachment #26708|0                           |1
        is obsolete|                            |
  Attachment #26709|0                           |1
        is obsolete|                            |
  Attachment #26710|0                           |1
        is obsolete|                            |
  Attachment #26711|0                           |1
        is obsolete|                            |
  Attachment #26712|0                           |1
        is obsolete|                            |
  Attachment #26713|0                           |1
        is obsolete|                            |
  Attachment #26714|0                           |1
        is obsolete|                            |
  Attachment #26715|0                           |1
        is obsolete|                            |
  Attachment #27171|0                           |1
        is obsolete|                            |

--- Comment #33 from Vincent Hennebert <vh...@gmail.com> ---
Created attachment 28824
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28824&action=edit
Mark no longer relevant patches as obsolete

The accessibility code has been refactored to fix the fundamental limitation
that empty content was not properly handled. It also suppresses the need for an
intermediate representation of the structure tree and directly generates the
appropriate PDF structure elements from the FO tree.

Marking as obsolete the patches that are no longer relevant in the context of
the new implementation. (Adding an empty patch is the quickest way I found to
avoid editing each attachment individually.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #11 from Martin K <ma...@brz.gv.at> 2011-03-02 07:04:19 EST ---
Created an attachment (id=26711)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26711)
Patch 11/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #14 from Martin K <ma...@brz.gv.at> 2011-03-02 07:06:11 EST ---
Created an attachment (id=26714)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26714)
Patch 14/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #40 from Vincent Hennebert <vh...@gmail.com> ---
I applied the artifact patch in rev. 1343632:
http://svn.apache.org/viewvc?rev=1343632&view=rev

This fixes the last remaining issue for this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #8 from Martin K <ma...@brz.gv.at> 2011-03-02 07:01:59 EST ---
Created an attachment (id=26708)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26708)
Patch 08/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26720|0                           |1
        is obsolete|                            |

--- Comment #39 from Vincent Hennebert <vh...@gmail.com> ---
Comment on attachment 26720
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26720
Patch 20/20

I applied the changes suggested by this patch in rev. 1342680:
http://svn.apache.org/viewvc?rev=1342680&view=rev

I adapted the changes to the new accessibility code and I also implemented
support for the intermediate format.

Thanks,
Vincent

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #28824|0                           |1
        is obsolete|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #10 from Martin K <ma...@brz.gv.at> 2011-03-02 07:03:35 EST ---
Created an attachment (id=26710)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26710)
Patch 10/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #21 from Vincent Hennebert <vh...@gmail.com> 2011-06-16 17:58:06 UTC ---
Hi Martin,

I finally got round to having a look at your patch. First, I’d like to 
thank you for having taken the time to create 20 smaller self-contained 
patches. This made the review much easier. So thanks for that!

The new data structure is indeed much more efficient than the DOM that 
FOP manipulates at the moment. Unfortunately, this doesn’t solve (AFAIU) 
the fundamental problem that we recently discovered: empty content may 
be the cause for a wrong final structure tree. Take the following table:

  <fo:table width="100%" table-layout="fixed">
    <fo:table-body>
      <fo:table-row>
        <fo:table-cell>
          <fo:block>Cell 1.1</fo:block>
        </fo:table-cell>
        <fo:table-cell>
          <fo:block>Cell 1.2</fo:block>
        </fo:table-cell>
      </fo:table-row>
      <fo:table-row>
        <fo:table-cell>
          <fo:block/>
        </fo:table-cell>
        <fo:table-cell>
          <fo:block>Cell 2.2</fo:block>
        </fo:table-cell>
      </fo:table-row>
    </fo:table-body>
  </fo:table>

The content of the first cell in the second row is empty, which will 
result into a TR structure element having only one TD kid for the second 
cell; that TD element will be mistakenly interpreted by a screen reader 
as belonging to the first column.

See also discussion here:
http://markmail.org/message/mn7jdbxmjdq7ey52

To solve this we need to integrate the handling of the structure tree 
into the normal processing chain (FO tree -> Layout Engine -> Area tree) 
instead of bypassing it.


That said, I have a few comments and questions relating to some of your 
specific patches:
[PATCH 10/20] Avoid overhead of creating writers
I can imagine that if there are a lot of PDF objects to stream, creating 
an instance of BufferedWriter and OutputStreamWriter for each of them 
may have quite some performance impact. However replacing them with 
calls to PDFObject.encode everywhere it is necessary is not really an 
option. This makes the code difficult to read and maintain, and is 
error-prone as it’s very easy to miss one call somewhere.

I think the problem of encoding text into the output should be solved by 
defining a specialized PDFOutputStream that would be able to stream both 
String and bytes. That PDFOutputStream would be passed around to objects 
that then wouldn’t have to handle their own wrapper or make calls to 
encode. Does that make sense?

[PATCH 11/20] Add support for clearing objects at write time
I’m wondering why this is necessary? Isn’t it just possible to null out 
references to the objects and let the garbage collector do the work?

[PATCH 12/20] Add support for lazy object number assignment
Same here, what’s exactly the purpose of lazy object number assignment?


Thanks,
Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO
                 CC|                            |vhennebert@gmail.com

--- Comment #30 from Glenn Adams <gl...@skynav.com> 2012-04-08 19:15:53 UTC ---
vincent/peter, what is the status of this patch with your recent accessibility
improvements? have these patches been applied? if not, then do expect to apply
them? (when?)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #3 from Martin K <ma...@brz.gv.at> 2011-03-02 06:58:14 EST ---
Created an attachment (id=26703)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26703)
Patch 03/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26718|0                           |1
        is obsolete|                            |

--- Comment #36 from Vincent Hennebert <vh...@gmail.com> ---
Comment on attachment 26718
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26718
Patch 18/20

Applied patch #18 in rev. 1341992:
http://svn.apache.org/viewvc?rev=1341992&view=rev

I left the criteria for compact output in PDFDictionary to 2, as we have to
decide on a size and I don't think the savings are significant anyway (2 bytes
per entry).

Thanks for the suggestion,
Vincent

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26716|0                           |1
        is obsolete|                            |

--- Comment #38 from Vincent Hennebert <vh...@gmail.com> ---
Comment on attachment 26716
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=26716
Patch 16/20

This patch does no longer apply after the modifications brought to the
accessibility code. Feel free to submit an updated patch, although the matter
has been made slightly more complicated by the presence of placeholder
structure elements. Also, it would be good to improve encapsulation,
immutability and robustness by avoiding some of the tests (instanceof
PDFDictionary, type equal to "/MCR"). Maybe a method like addMarkedContentKid
on PDFStructElem would do. That would also avoid creating an instance of
PDFDictionary, only to discard it later on because the reference to the kid can
be simplified.

This simplification has been made less critical by the implementation of object
streams, that allow to achieve a much more significant reduction of the file
size. It may be of interest in cases where for some reason the user has to
stick to PDF 1.4. But I'm not sure how common such a use case is.

Thanks,
Vincent

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #18 from Martin K <ma...@brz.gv.at> 2011-03-02 07:09:11 EST ---
Created an attachment (id=26718)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26718)
Patch 18/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #31 from Glenn Adams <ga...@apache.org> 2012-04-24 05:53:04 UTC ---
(In reply to comment #30)
> vincent/peter, what is the status of this patch with your recent accessibility
> improvements? have these patches been applied? if not, then do expect to apply
> them? (when?)

vincent/peter?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26719|0                           |1
        is obsolete|                            |

--- Comment #34 from Vincent Hennebert <vh...@gmail.com> ---
Created attachment 28825
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28825&action=edit
Treat some content as artifact

This is an adaptation of patch #19 to the new accessibility code. I chose the
value 'artifact' instead of 'hidden' to follow the naming convention of the PDF
Reference.

I also chose to implement this special property value on fo:wrapper only, for
simplicity. It should be easy to wrap any content that must be treated as
artifact into an appropriate fo:wrapper element.

I didn't implement the 'inherit' value, as I believe the PDF Reference already
provides a standard structure type ('NonStruct') that allows to achieve a
similar result. The node is not removed from the structure tree but the end
result should be the same.

Any feedback is welcome. If I don't hear anything within 3 working days, I'll
assume lazy consensus and apply the patch.

Thanks,
Vincent

-- 
You are receiving this mail because:
You are the assignee for the bug.

DO NOT REPLY [Bug 50852] [PATCH] Improve generation of PDFs with accessibility information

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=50852

--- Comment #15 from Martin K <ma...@brz.gv.at> 2011-03-02 07:06:52 EST ---
Created an attachment (id=26715)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26715)
Patch 15/20

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.