You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2011/04/12 11:09:18 UTC

DO NOT REPLY [Bug 51052] New: NullPointerException with retrieve marker

https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

           Summary: NullPointerException with retrieve marker
           Product: Fop
           Version: all
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: page-master/layout
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: med1985@gmail.com


Created an attachment (id=26875)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=26875)
NullPointerException 

The issue is that a NPE is thrown when a retrieve-marker retrieves a marker
with only text content, when the retrieve-marker is a child of a table-cell.
The real problem here is that there is no validation done on the retrieve
marker to ensure that the markers child is a valid child of the retrieve
markers parent (excuse the tongue twister), see XSL section 6.13.5:

"An fo:marker may contain any formatting objects that are permitted as a
replacement of any fo:retrieve-marker or fo:retrieve-table-marker that
retrieves the fo:marker's children."

Though the spec doesn't specify how to handle this error, I am proposing that
rather than throw an NPE, FOP logs a warning and ignores the marker. The spec
doesn't specifically suggest that this kind of error is recoverable, but others
might agree that throwing an NPE isn't the "proper" way to deal with this
error.

I've attached an example test FO and I'll attach a patch shortly.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #7 from Andreas L. Delmelle <ad...@apache.org> 2011-04-16 14:54:27 EDT ---
(In reply to comment #6)
> In the same sense, one could reason that the marker's child blocks, tables,
> lists etc. all should not be created, and we should store the FO source rather
> than parsing them into FONodes.
> There might be good points for, but also against.

Note: I am not necessarily against this myself. It would be pretty cool,
actually, if we were to store only the raw FO source of the marker-subtree, in
a CharBuffer, to be parsed later. At first glance, it could turn out to be
slightly more efficient in terms of memory footprint. I'd need to see proof to
be certain, but it might...

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #3 from Mehdi Houshmand <me...@gmail.com> 2011-04-14 04:08:22 EDT ---
After further looking into this, it appears as if the validateChildNode() was
never intended to validate #PCDATA (or in our its object representation
FOText). I say this because (as Andreas eluded to) FObjMixed is the progenitor
of all the classes that create FOText. And these are the only objects that can
accept #PCDATA as a child, so no validation really needs to be done on FOText,
with respect to validateChildNode().

However, o.a.f.fo.flow.Marker inherits from FObjMixed, and so when
Marker.characters() is called, it blindly creates an FOText object without
checking the validity of the #PCDATA. 

One solution would be to store the character array (or as a String) and
postpone the creation of the FOText node, until the Marker is retrieved by the
retrieve marker, and thus validation can be done. 

The point at which both the marker and the retrieve marker are both accessible
seems to be in AbstractPageSequenceLayoutManager.resolveRetrieveMarker().
Arguably, this object shouldn't be responsible for validation, so if we inspect
resolveRetrieveMarker(), we find that AbstractRetrieveMarker.bind() is
responsible for binding a Marker to a RetrieveMarker. It is possible that, just
prior to this, we validate the children of the marker and either invoke the
creation of an FOText object or trigger an event to display the error.

I thought this would be a quick fix, but now I'm not so sure. I don't know what
the ramifications of postponing the creation of the FOText node will have. Nor
if it's possible to postpone the creation since FObjMixed.flushText() seems to
be responsible for binding the FOText object to a Block, and it's class
private.

I'm afraid I don't have the time at the moment to investigate this further, but
if and when I come back to it, or if anyone else want's to take a stab at this,
it may save some time.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #6 from Andreas L. Delmelle <ad...@apache.org> 2011-04-16 14:35:44 EDT ---
(In reply to comment #5)
> Well, I guess that would depend on how this was implemented. If we were being
> puritanical, one could argue that if FOText was an object representation of
> #PCDATA (which I'm pretty sure it is), then by creating a #PCDATA child in the
> FOTree, we are creating an invalid node. 

No, we're not. #PCDATA is always a valid child node for a marker (i.e. the
content model for a marker is "(#PCDATA|%inline;|%block;)*"). 
It will only, potentially, /become/ invalid in the retrieval context.

In the same sense, one could reason that the marker's child blocks, tables,
lists etc. all should not be created, and we should store the FO source rather
than parsing them into FONodes.
There might be good points for, but also against.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #1 from Andreas L. Delmelle <ad...@apache.org> 2011-04-13 13:25:05 EDT ---
(In reply to comment #0)
> The issue is that a NPE is thrown when a retrieve-marker retrieves a marker
> with only text content, when the retrieve-marker is a child of a table-cell.
> The real problem here is that there is no validation done on the retrieve
> marker to ensure that the markers child is a valid child of the retrieve
> markers parent (excuse the tongue twister), see XSL section 6.13.5:

Indeed. Basic validation of child elements (i.e. FONode.validateChildNode()) is
skipped when the marker is cloned. That seems like an oversight.

> Though the spec doesn't specify how to handle this error, I am proposing that
> rather than throw an NPE, FOP logs a warning and ignores the marker. 

Exactly. That would remain consistent with the behavior during normal FO tree
building.
If some bare text is entered in a fo:table-cell (or basically any FO that
expects block-content), then it will be ignored, rather than reported as an
error. To remain consistent, we would have to match this behavior. So, if the
child node is a FOText and the retrieve-marker's parent is not a FObjMixed, we
should at most issue a warning, and assume 'no content'.

> The spec  doesn't specifically suggest that this kind of error is recoverable, but others
> might agree that throwing an NPE isn't the "proper" way to deal with this
> error.

Definitely. Rule of thumb: a NPE is *always* wrong. :-)

> 
> I've attached an example test FO and I'll attach a patch shortly.

Thanks!

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #10 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:41:40 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #4 from Andreas L. Delmelle <ad...@apache.org> 2011-04-14 14:45:35 EDT ---
(In reply to comment #3)

> However, o.a.f.fo.flow.Marker inherits from FObjMixed, and so when
> Marker.characters() is called, it blindly creates an FOText object without
> checking the validity of the #PCDATA. 

Yep, and at that point you cannot judge whether this will be appropriate, or
not, in the retrieval context. It could be correct for some pages, but wrong on
others, depending on where the retrieve-marker appears.

> One solution would be to store the character array (or as a String) and
> postpone the creation of the FOText node, until the Marker is retrieved by the
> retrieve marker, and thus validation can be done. 

That might be an option. Store the char array, and later on, trigger
characters() on the retrieve-marker, if appropriate.
However, I believe it is not strictly necessary. Also, the current approach of
constructing the FOText early and using clones later, has the convenient
side-effect that the text nodes are stored in sequence with the rest of the
marker's child nodes.

Ultimately, FOText _is_ already basically just a CharBuffer with some extra
information.
The only real gain would be that the FOText property references can be avoided,
so it might still be worthwhile to investigate, but more as a performance
enhancement.

As I see it, what we definitely lack:
- a good/decent way to detect if a FO can have text children; I do not
particularly like 'instanceof FObjMixed', since that does not cover possible
extension classes that subclass FObj directly
- (more general) proper validation of the marker's immediate children against
the parent of the retrieve-marker

The first would solve this particular bug, as it was first reported. The NPE
can be avoided simply by ignoring the FOText (see further below).
The second would be more comprehensive, to avoid similar issues in the future.

> ... It is possible that, just prior to this, we validate the children of the
> marker and either invoke the creation of an FOText object or trigger
> an event to display the error.

Yes, and strictly speaking, this only needs to happen for the first level, as
the other levels are already taken care of during normal FO tree building. 
That is, if you have:

<marker>
  <inline><block>Some text</block></inline>
</marker>

The only validation that is not done, is the inline against the parent of the
retrieve-marker. The block will have been validated for the parent inline,
though, so there is no need to repeat that step for every retrieval.

> I thought this would be a quick fix, but now I'm not so sure.

If someone really needs a quick fix:

* In AbstractRetrieveMarker cloneSingleNode(), before all else, first check

        if (newParent == this && child instanceof FOText
                && !(parent instanceof FObjMixed)) {
            return;
        }

* and add a similar condition --if (parent instanceof FObjMixed) -- 
   around the call to handleWhiteSpaceFor() in cloneFromMarker().

Tested and confirmed that the attachment just renders as a blank page, without
any complaints whatsoever. No more NPE, but this does not address the key
issues mentioned above... :-/

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

Glenn Adams <ga...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #12 from Mehdi Houshmand <me...@gmail.com> 2012-04-08 09:13:30 UTC ---
(In reply to comment #11)
> (In reply to comment #0)
> > I've attached an example test FO and I'll attach a patch shortly.
> 
> mehdi, any chance you'll be posting a patch?

No patch. I was just posting this as a known bug, quite nuanced, but as always,
when I get time, I'll look further into this issue.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #5 from Mehdi Houshmand <me...@gmail.com> 2011-04-15 04:40:37 EDT ---
> That might be an option. Store the char array, and later on, trigger
> characters() on the retrieve-marker, if appropriate.
> However, I believe it is not strictly necessary. Also, the current approach of
> constructing the FOText early and using clones later, has the convenient
> side-effect that the text nodes are stored in sequence with the rest of the
> marker's child nodes.

Well, I guess that would depend on how this was implemented. If we were being
puritanical, one could argue that if FOText was an object representation of
#PCDATA (which I'm pretty sure it is), then by creating a #PCDATA child in the
FOTree, we are creating an invalid node. Regardless of whether we address this
invalidation later or not, validation should be done before binding nodes to
the FOTree; invalid nodes shouldn't be bound to the FOTree in the first place.
One way to solve this, is by postponing the creation of the FOText while still
keeping the char array. This means that the data is kept, while we're
maintaining the validity of the FOTree. Though I appreciate, if we did this,
we'd have to maintain the rest of the current behaviour so as not to introduce
a regression.

> 
> Ultimately, FOText _is_ already basically just a CharBuffer with some extra
> information.
> The only real gain would be that the FOText property references can be avoided,
> so it might still be worthwhile to investigate, but more as a performance
> enhancement.

You may be right, it might be prohibitively complex doing it the way I'm
suggesting. It'd have to be investigated, this this is less about a performance
enhancement since as you said, the creation of an FOText object is cheap.

> 
> As I see it, what we definitely lack:
> - a good/decent way to detect if a FO can have text children; I do not
> particularly like 'instanceof FObjMixed', since that does not cover possible
> extension classes that subclass FObj directly
> - (more general) proper validation of the marker's immediate children against
> the parent of the retrieve-marker

Yeah I agree, I was disinclined to use "instanceof FObjMixed".

</snip>

Thanks for the assistance, but I think we're going to avoid the "quick fix".
We've been bitten by them before.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #2 from Andreas L. Delmelle <ad...@apache.org> 2011-04-13 13:35:07 EDT ---
(In reply to comment #1)
> ... If some bare text is entered in a fo:table-cell (or basically any FO that
> expects block-content), then it will be ignored, rather than reported as an
> error...

Note: obviously only if the table-cell also contains at least an empty block.
Otherwise, an error will be thrown, but it will be a complaint about an *empty*
table-cell...

That might complicate matters slightly, as this case will also have to be
reported as such if it happens during marker-retrieval.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #8 from Andreas L. Delmelle <ad...@apache.org> 2011-04-16 16:06:21 EDT ---
(In reply to comment #6)
> > Well, I guess that would depend on how this was implemented. If we were being
> > puritanical, one could argue that if FOText was an object representation of
> > #PCDATA (which I'm pretty sure it is), then by creating a #PCDATA child in the
> > FOTree, we are creating an invalid node. 

> No, we're not. #PCDATA is always a valid child node for a marker (i.e. the
> content model for a marker is "(#PCDATA|%inline;|%block;)*"). 
> It will only, potentially, /become/ invalid in the retrieval context.

I suddenly realize this needs more explanation, as there is obviously the
remark about the retrieve-marker's parent...

Looking at it from FOP's perspective, at parse-time (i.e. when the FO tree is
built), there is no way to know when --or even if-- a marker will actually be
retrieved. Granted, we _could_ decide to throw an error if there is even the
smallest probability of a mismatch, but we would never know for certain whether
it would actually cause an error. 
I am far from convinced that this justifies the added computational complexity
of walking up the tree, and checking all static-contents for a retrieve-marker
that _might_ retrieve a particular marker.

What I mean is: it is not incorrect/invalid to create the #PCDATA node as a
child of the marker. However, to concede to your point, it is definitely
incorrect to blindly copy it, and re-bind it to the wrong parent.

(In reply tom comment #7)
> Note: I am not necessarily against this myself. It would be pretty cool,
> actually, if we were to store only the raw FO source of the marker-subtree, in
> a CharBuffer, to be parsed later. At first glance, it could turn out to be
> slightly more efficient in terms of memory footprint. I'd need to see proof to
> be certain, but it might...

... and after some tests, I can see that this is definitely not always so cool.
;-)
A lot depends on the actual structure of the subtree. FO is quite verbose, so
even a small table already costs quite some chars, which does not always weigh
up to simply instantiating the FONodes to store the data.

If we're really serious about further optimization, then in terms of footprint,
the most optimal situation may just be to create a generic MarkerDescendant
node type, and convert those into the proper FONode subclass later, if and when
they are actually retrieved.
That is: as opposed to the current approach of immediately instantiating the
proper type at parse time, and cloning those instances later, when the area
tree is built.

Strictly speaking, in the current process, some space is still taken up by the
unused references for members in flow.Block, flow.Table, FOText... That space
is actually wasted, since the specified properties/attributes are stored in a
Map that is associated with the Marker. 
If we strip the MarkerDescendants to be lean, basic FObj instances, that might
save some in larger documents with a lot of markers, especially if only a
relatively small amount are actually retrieved.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #11 from Glenn Adams <gl...@skynav.com> 2012-04-08 08:40:37 UTC ---
(In reply to comment #0)
> I've attached an example test FO and I'll attach a patch shortly.

mehdi, any chance you'll be posting a patch?

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 51052] NullPointerException with retrieve marker

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51052

--- Comment #9 from Mehdi Houshmand <me...@gmail.com> 2011-04-18 03:42:47 EDT ---
> I am far from convinced that this justifies the added computational complexity
> of walking up the tree, and checking all static-contents for a retrieve-marker
> that _might_ retrieve a particular marker.

I am of the same frame of mind, I think that's neither feasible nor really
necessary.

> A lot depends on the actual structure of the subtree. FO is quite verbose, so
> even a small table already costs quite some chars, which does not always weigh
> up to simply instantiating the FONodes to store the data.
> 
> If we're really serious about further optimization, then in terms of footprint,
> the most optimal situation may just be to create a generic MarkerDescendant
> node type, and convert those into the proper FONode subclass later, if and when
> they are actually retrieved.
> That is: as opposed to the current approach of immediately instantiating the
> proper type at parse time, and cloning those instances later, when the area
> tree is built.

I think implementing this would/could get fairly involved. I'm not sure an
intermediary object would be the proper approach. If we want the FOTree to be
an object manifestation of the FO, adding objects types that aren't a part of
the FO spec would break it's adherence with the spec.

I hadn't considered that the markers can have a hierarchy of children, or at
least I hadn't considered that this could be several levels deep rather than
just #PCDATA or a single-level block or two. That would make postponing
instantiating any children unnecessarily difficult. It would also mean that
we're parsing FO in the layout manager, which I don't think there's a precedent
for, nor is that the duty of the layout manager. I'm beginning to think the
initial approach, validating in AbstractRetrieveMarker as it binds objects to
itself may be the way to go. We may be forced to do an "instanceof FObjMixed",
to validate #PCDATA separately.

> 
> Strictly speaking, in the current process, some space is still taken up by the
> unused references for members in flow.Block, flow.Table, FOText... That space
> is actually wasted, since the specified properties/attributes are stored in a
> Map that is associated with the Marker. 
> If we strip the MarkerDescendants to be lean, basic FObj instances, that might
> save some in larger documents with a lot of markers, especially if only a
> relatively small amount are actually retrieved.

That would mean adding an object that wasn't a part of the FO spec into the
FOTree. I'm not sure that's a good idea. It would be fairly confusing for
anyone looking at this at a later date, they'd see MarkerDescendants, refer to
the spec only to find it not there. No amount of commenting code would prevent
this.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.