You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2009/02/12 11:36:32 UTC

DO NOT REPLY [Bug 46705] New: Enhancement: PDF Accessibility

https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

           Summary: Enhancement: PDF Accessibility
           Product: Fop
           Version: 1.0dev
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: pdf
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: jost.klopfstein@gmail.com


Created an attachment (id=23250)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23250)
PDF accessibility

See attached html file and patch


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #26 from benoit.wiart@gmail.com 2009-10-01 09:03:41 PDT ---
the NPE is at the line
structElemType.put(ptr, structElem.get("S").toString());

structElem.get("S") returns null

it seems to be caused by FOToPDFRoleMap.mapFormattingObject(s, parent) : 
the STANDARD_MAPPINGS map does not know table-and-caption and table-caption

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #32 from Vincent Hennebert <vh...@gmail.com> 2010-01-11 08:21:23 UTC ---
Separate bug (48518) has been opened about the ID discussion. Closing this one.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #15 from Jost Klopfstein <jo...@gmail.com>  2009-02-19 02:23:01 PST ---
Created an attachment (id=23278)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23278)
added compiled stylesheets for improved performance in server environments
(suggested by Jeremias)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #20 from Vincent Hennebert <vh...@gmail.com> 2009-09-17 09:07:20 PDT ---
Note: I've been asked to look at integrating this patch into Trunk, but also to
implement tagged PDF in the legacy PDFRenderer. More later.

Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #22 from Vincent Hennebert <vh...@gmail.com> 2009-10-01 04:02:45 PDT ---
(In reply to comment #21)
> I've just noticed that the structure tree stored in the IF XML is not
> re-parsed. Running FOP with the IF as input (-afin) and the -a switch even
> leads to a NPE. So tagged PDF is currently not supported when processing the FO
> document in two steps, first to the intermediate format then to PDF.

This has now been addressed:
http://svn.apache.org/viewvc?rev=819585&view=rev

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #12 from Andreas L. Delmelle <ad...@apache.org>  2009-02-18 11:28:58 PST ---
(In reply to comment #11)
> 
> On Andreas' comments: I didn't realize that the spec says to initialize the
> "id" property to an initial value. Of course, it would make sense to use that
> instead of the foi:ptr. I am, however, concerned about the performance
> implications for a collision detection which I don't think should be ignored.

Well, what I'm personally a bit concerned about is the case where you have:
1) auto-generated ids by the 'main' stylesheet
2) additional auto-generated ids by the 'pre-processing' stylesheet

Since we have two passes, are we still 100% certain that the XSLT processor
will always return unique values that are used nowhere as an id elsewhere in
the document? 
For explicit ids, this problem would also exist, but AFAIK, no one specifies
explicit ids like "N897654", which is a format returned by some generate-id()
implementations.

If the answer to my above question is yes, then there is no real overhead (on
the FOP-side) involved for collision-detection, since the XSLT processor will
take care of that (or even the XML parser: Is 'id' not a standard XML
attribute? Or does it have nothing to do with the standard 'id()' XPath
function? I'd have to check in the respective specifications to be sure...)

> Also, the area tree and AT XML will increase in size of all elements have IDs.

True, but then again, doesn't the foi:ptr also have to be carried through to
the area tree or IF?
Maybe to address this concern, we could limit the case where the ids are
actually auto-generated to this one: when accessibility is enabled. IOW, if and
when it is needed (not default behavior).

> BTW, Andreas' proposal for adding the IDs is not entirely correct. Not all FO
> element take an "id" property. 

Again, very true, but we do implement getId() in FObj. That currently means
that, although the property may not apply to a given element, like
fo:declarations, as long as the Java object extends FObj, the id property will
be bound and available if it is specified. 
Checking closer, Declarations.java overrides bind() rather than relying on the
superclass implementation. Removing that empty implementation would be enough
to make it work.

Not 100% compliant, but it seems like this actually may have added value here.

> Maybe this should be done inside the FO tree instead of the addPtr.xsl.

What I was thinking indeed. Something like a dedicated IDPropertyMaker that
extends StringProperty.Maker, and would offer an implementation for make(FObj,
PropertyList) that returns a 'proper' initial value. I once implemented it as
such (and committed it, but it was a bit too quick and caused other issues, so
it was reverted almost immediately)

Downside is that, if we implement it like that in the FO tree, it would also be
necessary to implement the b) part in my earlier response, as the ids will
never be available in the raw FO source, so reduceFOTree.xsl would not yet
'see' them (unless they would be added by a pre-processing stylesheet, like
Jost proposes)


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705


Jeremias Maerki <je...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #23261|0                           |1
           is patch|                            |
  Attachment #23261|0                           |1
        is obsolete|                            |




-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #14 from Andreas L. Delmelle <ad...@apache.org>  2009-02-18 15:22:09 PST ---
(In reply to comment #13)
> (In reply to comment #12)
> 
> I would assume: no.

FWIW, just checked the XML and XPath specifications, and that one question
regarding 'id' is answered. If one specifies an 'xml:id', that would be a
different matter entirely. Since, in XSL-FO, the attribute is in the default
namespace (think 'fo:id'), they have nothing to do with each other.

I agree with your assumption. There seems to be absolutely no guarantee that
generate-id() returns unique values across the two passes. Even worse, if the
processor would implement generate-id() as a deterministic function (for
example: if the id value is solely based on an index which is incremented with
every element), then in both passes, generate-id() would always return
something like "N0" for the root node, and the risk of collisions could become
significant...

Placing the stress on the XSLT processor could offer a way out:

[in addPtr.xsl]
<!-- key for the explicit ids -->
<xsl:key name="idKey" match="fo:*" use="@id" />
...
<xsl:template name="addId">
  <xsl:param name="checkId" select="generate-id()" />
  <xsl:choose>
    <xsl:when test="key('idKey',$checkId)">
       <xsl:call-template name="addId">
         <xsl:with-param name="checkId"
select="generate-id(key('idKey',$checkId))" />
       </xsl:template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$checkId" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
...
<xsl:attribute name="id">
  <xsl:call-template name="addId" />
</xsl:attribute>
...

Untested for performance. The mere addition of the key could be problematic, as
the first call to key('idKey',...) cannot be evaluated unless the entire source
document is scanned.
I keep looking at that recursive call, and wonder whether that could lead to
problems. Strictly speaking, if the id starting the recursion will always be
unique (by definition so), then the recursion should always stop at some point,
and the path should always run over different nodes.
The depth will, in the most extreme case, be equal to the maximum number of
collisions, which is the same as the number of nodes that already have an
explicit id, or the size of the xsl:key map.

Assuming that the explicit ids added in the first pass adhere to the uniqueness
constraint:
If, for any given node, the recursion would go through all possible collisions,
then that would mean that the collision-values all refer to each other. That
is, every value returned by generate-id() for a node in the key-map, is already
specified as an id on another node in that same map. That can happen at most N
times for N nodes with explicit ids, and it already happened once, for a node
not in that map, to trigger the recursion. So, there will then be one of those
N nodes for which the generated id is not yet occupied. Moreover, this most
extreme case can present itself at most for one node in the document. The more
collisions with nodes not in the map, the less deep the recursions will go.

> <snip/>
> 
> Given all these observations, I think it would make sense to take the XSLT
> approach first (add an XSLT-generated ID if non exists) and then gather
> experience with this approach. This is easy to implement and can be extended
> later if we see the need.

Agreed.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #25 from benoit.wiart@gmail.com 2009-10-01 08:19:20 PDT ---
table-and-caption and table-caption are not supported by FOP but they do not
crash the engine
On the Accessibility branch, a file with a table-and-caption / table-caption
will generate a stacktrace (NPE) because foi:ptr is missing on those elements.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #9 from Jost Klopfstein <jo...@gmail.com>  2009-02-17 22:10:25 PST ---
Hi Andreas,

The approach with the @id seems a bit risky. The @ptr replacement has to have
unique values. What happens if a user has a few empty @id in his FO or multiple
times the same value for @id?

Cheers, Jost


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #4 from Jeremias Maerki <je...@apache.org>  2009-02-16 05:54:06 PST ---
Generally, the patch looks pretty good to me, besides the NPE which I know Jost
is already working on a fix for. That particular bug occurs because the
stylesheets get loaded via java.io.File instead of Class.getResource(AsStream).

I've got a few nits which I'll gladly address myself after we've applied to
patch to a temporary branch:

- FOUserAgent: set/getResult are not very speaking names for what they do. It
might also be better to just save a DOM instead of a byte array.
- area/inline/Image: ptr is normally set as a trait in the area tree but not
for the Image area. That can be homogenized.
- I'd rename fox:alt to a more speaking fox:alt-text.
- Some classes have just non-semantic changes (like commented code that is
never used or unnecessary "final" modifiers which look like they have come from
"Save Actions" in Eclipse). I'll remove those while processing the patch to
reduce the noise.
- There are some backwards-incompatible changes in the Renderer implementations
which are avoidable.
- I'd like to avoid adding the "ptr" attribute to the IFPainter methods since
most implementations don't support that anyway. I'd rather do it via the
IFContext like I've done for the foreign attributes.
- Static code analysis indicates a bug in PDFImageHandlerSVG concerning the
save/restore pairs.
- I'd move the FO -> PDF Struct Type mapping (in PDFStructElem) out of the PDF
library into the render/pdf package (separation of concerns and opening an
option for a later implementation of a custom role map).

Please speak up if anyone objects to any of these proposed changes.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705


Jeremias Maerki <je...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           Keywords|                            |PatchAvailable
         OS/Version|Windows XP                  |All
           Platform|PC                          |All
            Summary|Enhancement: PDF            |[PATCH] Enhancement: PDF
                   |Accessibility               |Accessibility




--- Comment #2 from Jeremias Maerki <je...@apache.org>  2009-02-12 02:54:04 PST ---
Note: the patch is against the new IF branch:
http://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_AreaTreeNewDesign.

Related Wiki pages:
http://wiki.apache.org/xmlgraphics-fop/PDF_Accessibility
http://wiki.apache.org/xmlgraphics-fop/LogicalStructure


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #30 from Vincent Hennebert <vh...@gmail.com> 2009-10-09 02:29:20 PDT ---
Jost or Jeremias,

In the PDFPainter.prepareImageMCID method, there is a "fix for Acro Checker"
comment. Do you have a sample FO file triggering that bug in Acrobat? I haven't
been able to reproduce it.

Thanks,
Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #31 from Vincent Hennebert <vh...@gmail.com> 2009-10-13 10:15:35 UTC ---
Hi BenoƮt,

(In reply to comment #25)
> table-and-caption and table-caption are not supported by FOP but they do not
> crash the engine
> On the Accessibility branch, a file with a table-and-caption / table-caption
> will generate a stacktrace (NPE) because foi:ptr is missing on those elements.

This has been fixed in revision 824845:
http://svn.apache.org/viewvc?rev=824845&view=rev

Thanks for reporting the bug,
Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

benoit.wiart@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |benoit.wiart@gmail.com

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #27 from Jeremias Maerki <je...@apache.org> 2009-10-01 09:13:20 PDT ---
(In reply to comment #24)
> Are images supposed to work? I've tested a simple document that contains an
> fo:external-graphic referring to a PNG image. The alt-text isn't read aloud by
> Acrobat, and the image doesn't appear in the list of tags, nor the Order tab.

Yes, they are supposed to work. One of my accessibility test cases I've just
run through cause the alt-text for the images to be read out loud. I'm running
off a current and clean checkout. If you want to send me some PDF for analysis,
I'm happy to take a look.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705


Andreas L. Delmelle <ad...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vinayak.patil1@gmail.com




--- Comment #18 from Andreas L. Delmelle <ad...@apache.org>  2009-06-05 12:02:37 PST ---
*** Bug 47130 has been marked as a duplicate of this bug. ***

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #10 from Andreas L. Delmelle <ad...@apache.org>  2009-02-17 22:21:47 PST ---
(In reply to comment #9)

> The approach with the @id seems a bit risky. The @ptr replacement has to have
> unique values. What happens if a user has a few empty @id in his FO or multiple
> times the same value for @id?

Could be, and I'm definitely not blocking the proposal on this count. Just an
idea.
Having multiple times the same value for an id will cause an error, since @id
has to be unique for every element. If it is not specified, then it could be
added using a stylesheet, initializing it with generate-id(), just like you
propose for @ptr.

The only thing FOP currently does not do is generate and assign ids
automatically, but the uniqueness constraint is already checked for.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #3 from Chris Bowditch <bo...@hotmail.com>  2009-02-16 02:52:44 PST ---
Hi Jost,

I've just been trying out your patch. I applied the patch to a fresh check out
of the new IF branch, compiled it okay. But when I tried to generate a PDF from
the command line I got the following NPE:

INFO: Accessibility is enabled
16-Feb-2009 10:25:44 org.apache.fop.cli.Main startFOP
SEVERE: Exception
java.lang.NullPointerException
        at org.apache.fop.apps.Fop.getDefaultHandler(Fop.java:138)
        at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:126)
        at org.apache.fop.cli.Main.startFOP(Main.java:174)
        at org.apache.fop.cli.Main.main(Main.java:205)

Thanks,

Chris


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705


Jeremias Maerki <je...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #23250|0                           |1
        is obsolete|                            |




-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #17 from Jeremias Maerki <je...@apache.org>  2009-04-17 07:42:17 PST ---
Most of the feedback has now been incorporated into the development branch. One
notable exception is the discussion around replacing foi:ptr with the "id"
property but I'm running out of time to work on this. Another point that I'm
not entirely happy with is the fact that the enriched FO is fully buffered in a
byte array before FOP starts processing. An idea here would be to build a
reduced structure tree per page-sequence and attach it as such to the area tree
instead of passing it to PDFDocumentHandler through the user agent. Anyway,
when accessibility is disabled (the default), FOP does not suffer any
performance drawbacks. So, IMO, the development branch could be merged with the
Trunk. At any rate, the end-user documentation is now also in place. I've added
a hopefully comprehensive list of limitations in addition to what Jost has
written up.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #28 from Vincent Hennebert <vh...@gmail.com> 2009-10-01 09:34:41 PDT ---
(In reply to comment #27)
> (In reply to comment #24)
> > Are images supposed to work? I've tested a simple document that contains an
> > fo:external-graphic referring to a PNG image. The alt-text isn't read aloud by
> > Acrobat, and the image doesn't appear in the list of tags, nor the Order tab.
> 
> Yes, they are supposed to work. One of my accessibility test cases I've just
> run through cause the alt-text for the images to be read out loud. I'm running
> off a current and clean checkout. If you want to send me some PDF for analysis,
> I'm happy to take a look.

Must be my local changes then. My local copy is getting a bit messy. I'll have
another look with a fresh checkout.

But why are images handled like text, using marked-content sequences? IIUC they
should be handled as PDF Objects (see "PDF Object as Content Items" in section
9.6.3 of the PDF Reference, Third Edition).

Thanks,
Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #11 from Jeremias Maerki <je...@apache.org>  2009-02-18 01:20:59 PST ---
Jost asked me off-list about my opinion on Vincent's feedback. I'll put it
here. Generally good input. The merging of TransformerNode and
TransformerNodeEndProcessing is probably a matter of taste. I personally prefer
keeping the general functionality seperate from the concrete functionality of
the subclass. As mentioned before, I'd like to put this new code into a branch
before putting it in Trunk. So we can also address these points in a second
step. I assume we all agree that this is great new functionality and if it's
applied now (in a separate branch) we can all help improve the whole thing
rather than off-loading so many little things on Jost. He can of course
continue to send follow-up patches.

On Andreas' comments: I didn't realize that the spec says to initialize the
"id" property to an initial value. Of course, it would make sense to use that
instead of the foi:ptr. I am, however, concerned about the performance
implications for a collision detection which I don't think should be ignored.
Also, the area tree and AT XML will increase in size of all elements have IDs.
BTW, Andreas' proposal for adding the IDs is not entirely correct. Not all FO
element take an "id" property. Maybe this should be done inside the FO tree
instead of the addPtr.xsl.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #21 from Vincent Hennebert <vh...@gmail.com> 2009-09-22 04:59:24 PDT ---
I've just noticed that the structure tree stored in the IF XML is not
re-parsed. Running FOP with the IF as input (-afin) and the -a switch even
leads to a NPE. So tagged PDF is currently not supported when processing the FO
document in two steps, first to the intermediate format then to PDF.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #13 from Jeremias Maerki <je...@apache.org>  2009-02-18 11:50:41 PST ---
(In reply to comment #12)
<snip/>
> Since we have two passes, are we still 100% certain that the XSLT processor
> will always return unique values that are used nowhere as an id elsewhere in
> the document? 

I would assume: no.

<snip/>
> True, but then again, doesn't the foi:ptr also have to be carried through to
> the area tree or IF?
> Maybe to address this concern, we could limit the case where the ids are
> actually auto-generated to this one: when accessibility is enabled. IOW, if and
> when it is needed (not default behavior).

Agreed.

<snip/>

Given all these observations, I think it would make sense to take the XSLT
approach first (add an XSLT-generated ID if non exists) and then gather
experience with this approach. This is easy to implement and can be extended
later if we see the need.

As I've heard no objections to my proposal from Monday, yet, I'll process
Jost's patch (as is) tomorrow morning (my local time) into a branch off the IF
branch. From there, we can take it further and incorporate all the feedback.
That also makes it easier to track what is changed.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #6 from Jost Klopfstein <jo...@gmail.com>  2009-02-16 13:06:21 PST ---
Created an attachment (id=23262)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23262)
Removed unused code in area/inline/Image & related

[Jeremias:]area/inline/Image: ptr is normally set as a trait in the area tree
but not


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705


Jeremias Maerki <je...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #23262|0                           |1
           is patch|                            |
  Attachment #23262|0                           |1
        is obsolete|                            |




-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #16 from Jeremias Maerki <je...@apache.org>  2009-02-19 11:58:34 PST ---
I've applied the patch (2009-02-19 02:23 PST) to a new branch as discussed
(with modifications). The code is now under:
https://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_Accessibility
http://svn.apache.org/viewvc?rev=745949&view=rev

I'll continue working on the code when my batteries are filled again.

Jost, thanks a lot for your work here. This is great new functionality. It will
still take a bit to polish a few areas. I hope to see you become a regular
contributor now that the hardest part is over. Please don't hesitate to publish
follow-up patches against the branch (or trunk if you have other ideas). I'll
keep the issue here open until we've adressed all the feedback, after which
we'll see to merging this into trunk.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #19 from Vincent Hennebert <vh...@gmail.com>  2009-06-26 03:48:13 PST ---
I've been (slowly) reviewing the patch and I noticed that classes in the
layoutmgr package are affected by the changes. There's no reason why it should
be the case, layout has little to do with accessibility.
In fact the layout managers just pass the ptr trait over to the area tree. This
calls for a more generic way of doing this. Indeed there are other properties
that don't affect layout but only areas (color, z-index among others).

I'll try to see how this can be handled. Just thought I'd made a note for
later.

Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #7 from Vincent Hennebert <vh...@gmail.com>  2009-02-17 03:00:27 PST ---
That's great new functionality. I've only had time to look at a part of the
patch so far, so I'm posting my few comments below. More later if I have time.

In the o.a.f.accessibility package:
- why are there two classes? Only the sub-class seems to be externally used, so
it may as well be merged with the super-class. If some more general
functionality turns out to be necessary, the extraction of a common super-class
can always be done later.
- Likewise, only one constructor seems to be used ATM. If other constructors
are needed, they can also be added later on.
- AFAIU mTranHandler is set in all of the constructors. Why systematically test
it for null in the methods then?

There is an encapsulation problem in o.a.f.apps.Fop.getDefaultHandler(): the
lines of code dealing with accessibility should be moved to the accessibility
package.

In FOUserAgent:
- the "accessibility" string should be defined only once in a public final
static field (ideally somewhere inside the accessibility package)
- there's no need to explicitly put false for the accessibility option in the
constructor, as the value is tested for null in the accessibilityEnabled method

In FopFactoryConfigurator.configure():
- any reason why the accessibility option is not handled like the other
options? (using getValueAsBoolean)

And a nit about coding style: you seem to be using Hungarian notation to name
fields ('mTheClassField'). I don't have anything against this notation, but
it's not used inside the rest of FOP's codebase. For consistency we may want to
agree upon a notation. FWIW I tend to think that, thanks to modern development
environments that now highlight class members, this notation is no longer so
useful.

Thanks,
Vincent


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #29 from Jeremias Maerki <je...@apache.org> 2009-10-02 00:49:27 PDT ---
(In reply to comment #28)
> But why are images handled like text, using marked-content sequences? IIUC they
> should be handled as PDF Objects (see "PDF Object as Content Items" in section
> 9.6.3 of the PDF Reference, Third Edition).

Ideally maybe, but not necessarily so IMO. See the note on page 599 of PDF 1.4:
"If it is important to distinguish between multiple renditions of the same
XObject on the same page, they should be accessed via marked content sequences
enclosing particular invocations of the Do operator, rather than via object
references." So the pattern is not illegal. FOP can reuse the same XObject
multiple times on the same or on different pages.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

--- Comment #33 from Glenn Adams <gl...@skynav.com> 2012-04-01 06:26:21 UTC ---
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #23 from Vincent Hennebert <vh...@gmail.com> 2009-10-01 04:05:43 PDT ---
The number tree corresponding to the ParentTree entry in the StructTreeRoot
object is invalid, as the values are directly stored in the array instead of
being reference. See discussion on fop-dev:
http://markmail.org/thread/oijggvxzfgjyewwk

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Re: DO NOT REPLY [Bug 46705] New: Enhancement: PDF Accessibility

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
So here's what I would propose to do with this patch:

Statistics:
- 7 new classes
- 2 new other files (XSLT stylesheets)
- 88 modified files

While this is totally new functionality it doesn't represent a full new
subsystem and the 9 new files, while essential, are making up the
smaller part of the patch. So I don't think the patch needs to go
through IP clearance. Probably a border case, though.

However, I would like to place the changes in a temporary branch (a
branch off the IF branch) to work on a few things I'd like changed. I'll
attach the details to the Bugzilla entry. That way, we can vote on the
IF branch merge into trunk now. And while that happens I can freely work
on the changes I have in mind and we can do another vote when the
accessibility functionality is ready for the merge into Trunk.

Any objections to that?

BTW, I'm still waiting for Jost's ICLA to be recorded. Before that
happens we cannot do anything. But in the meantime, maybe Jost will post
an updated patch with a fix for the problem Chris encountered.

Jeremias Maerki


Re: DO NOT REPLY [Bug 46705] New: Enhancement: PDF Accessibility

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

A few comments to this patch: A few months ago, I was asked by a client
to implement Tagged PDF support for FOP but I didn't have time to do it
myself. I was able to pass the lead over to Jost. Since then I've been
coaching Jost (in the background) to get this done. He had his reasons
to only appear on the mailing lists now. I'm hoping he will participate
in any follow-up discussions concerning accessibility and this patch.

I've already reviewed an early version of the patch and I was already
quite happy although I had some issues which Jost has now addressed (I
assume). I have yet to take a first look at the new patch. Since there
was next to no discussion about this topic here, I will of course not
just process the patch in a hurry. It's important that everyone has a
chance to review what's being proposed here. I also have to figure out
how best to approach the merge of the IF branch into Trunk, i.e. with or
without this patch to begin with. We can also put the patch into its own
branch at the beginning. All open for discussion.

Since it's a relatively big patch I'd like to have a vote about its
integration into our codebase once everyone had a chance to review this.
Jost has already submitted an ICLA to the ASF's secretary and I'm
monitoring its recording. We also have to see if the patch is big enough
to require a software grant to the ASF (like we did for the AFP renderer
or the PostScript DSC parser in XGC). I'll look into this in the next
few days.

If there are any questions, Jost is on this list, and I'll help out to
fill the gaps where I can.

That said, I think this is exciting new functionality for FOP. Once this
is integrated, I will implement PDF/A-1a functionality which another
client of mine would like.

There are also other possibilities for improvement later: Full support
for controlling the document language, optional addition of layout
attributes (like indents, text align, text decoration), or support for
the "role" property in FO (RoleMap in PDF) and probably more.


Jeremias Maerki


DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705

--- Comment #24 from Vincent Hennebert <vh...@gmail.com> 2009-10-01 04:10:58 PDT ---
Are images supposed to work? I've tested a simple document that contains an
fo:external-graphic referring to a PNG image. The alt-text isn't read aloud by
Acrobat, and the image doesn't appear in the list of tags, nor the Order tab.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #5 from Jost Klopfstein <jo...@gmail.com>  2009-02-16 12:44:31 PST ---
Created an attachment (id=23261)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23261)
Fixed NPE when used from command line


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] [PATCH] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #8 from Andreas L. Delmelle <ad...@apache.org>  2009-02-17 12:51:03 PST ---

Interesting stuff! I've also read some of the Wiki, and one thought that popped
in my head:
Do we really need the extension 'ptr'? Could this be simplified by using the
standard 'id' property perhaps? 

IIUC, I definitely think so, but that would mean introducing:
a) auto-generated and assigned initial value for the 'id' property (mandated by
the Rec, BTW: "the initial value ... is a random and unique identifier..."
--XSL-FO 7.30.8 "id")
b) a way for the FONodes to echo themselves back into SAX events, so they can
be sent through reduceFOTree.xsl (maybe that reduction could even be hardcoded
into FOP)

a) would be relatively easy to implement. I've already been playing with that
in the past. It may have a noticeable impact on performance for large
documents, if we're not careful, since the set of ids to check becomes entirely
dependent on the number of FO nodes, whereas currently, that set is limited to
the ids that have been specified.

b) would require more effort. Something we have never needed so far, but may be
a worthwhile addition, if you think about it. The emitted FO will not
correspond exactly (1-1) with the source, since shorthands will be expanded,
relative specs may have been resolved and such, but still...

That said, maybe to emulate that effect with the current proposal of a
pre-processing stylesheet (minimal effort), we could already do something like:

[in addPtr.xsl -> addId.xsl]

<xsl:template name="addId">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
    <xsl:apply-templates select="@*"/>
    <xsl:if test="not(@id)">
      <xsl:attribute name="id">
        <xsl:value-of select="generate-id()" />
      </xsl:attribute>
    <xsl:apply-templates />
  </xsl:element>
</xsl:template>

May have unwanted side-effects only in case the author happens to have an
explicit id that is identical to the value returned by generate-id() (but
that's a bit of a stretch ;-))

[in reduceFOTree.xsl]

<xsl:template match="*">
  <xsl:element name="{name()}">
    <xsl:copy-of select="@id" />
  </xsl:element>
</xsl:template>

If that could make it even simpler (one less extension attribute necessary),
I'm all for it... 
I'm not sure I'm comfortable with introducing a native extension, scattered in
so many classes, while we already have the standard 'id' property conveniently
available in FObj, which seems to serve the purpose nicely.

Another small detail: why exactly do we need the explicit value of the xml:lang
shorthand in Block.java? IIC, Block should already have the implied language
and country available (via getCommonHyphenation()). Just asking, since if you
look closely, you will notice that none of the shorthands are stored on the
FObjs (only the expanded, native FO properties), so making this exception
should have a good reason.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 46705] Enhancement: PDF Accessibility

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=46705





--- Comment #1 from Jost Klopfstein <jo...@gmail.com>  2009-02-12 02:45:47 PST ---
Created an attachment (id=23251)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=23251)
Documentation - will be merged into trunk by Jeremias


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.