You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2009/10/21 10:49:39 UTC

DO NOT REPLY [Bug 48032] New: The intermediate file format has xml:space for every text element

https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

           Summary: The intermediate file format has xml:space for every
                    text element
           Product: Fop
           Version: 0.95
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: page-master/layout
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: gobrien@obriensrus.co.uk


The intermediate file format that is passed between the formatter and the
renderer includes an xml:space=preserve attribute on every element.  Which is
wasting memory, and cpu time as this could be set at the page element level or
higher, as the spec details (see below), all elements within the element that
defines xml:space=preserver inherit the option, unless overridden.  As for the
other non text elements they don't care so we can save the processing and
memory of the extra attrubute entries.

==== XML Spec ======

http://www.w3.org/TR/REC-xml/

>From Section 2.10

The value "default" signals that applications' default white-space processing
modes are acceptable for this element; the value "preserve" indicates the
intent that applications preserve all the white space. This declared intent is
considered to apply to all elements within the content of the element where it
is specified, unless overridden with another instance of the xml:space
attribute.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48032] The intermediate file format has xml:space for every text element

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #4 from Vincent Hennebert <vh...@gmail.com> 2009-11-09 03:24:49 UTC ---
Done in rev. 834020:
http://svn.apache.org/viewvc?rev=834020&view=rev

I chose to put the attribute on the <page-sequence> element instead. Most
documents don't have more than a handful of page sequences anyway.

Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48032] The intermediate file format has xml:space for every text element

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

Gary O'Brien <go...@obriensrus.co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P4

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48032] The intermediate file format has xml:space for every text element

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

--- Comment #1 from Vincent Hennebert <vh...@gmail.com> 2009-11-02 04:32:54 UTC ---
I've played a bit with the IF serializer to try and find out whether setting
xml:text higher up in the hierarchy would make any difference. It turns out
that there is no gain in processing time. There are other factors that come
into play and that have more impact on the process.

However, there is a non-negligible gain in file size, of around 13%.
Surprisingly enough that difference doesn't increase with the number of pages
in the document. It's likely to be the biggest gain we can have, as I used a A4
document full of text (almost no empty space, no image) for the test. So I
guess it's worth doing the change, if only to save on disk space (although that
will be significant only if you store a lot of intermediate files).

Question remains: is there any risk that other elements containing text may be
affected by that xml:space? Also, is there any testing framework for the
intermediate format, to test that change?

Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48032] The intermediate file format has xml:space for every text element

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

--- Comment #5 from Glenn Adams <gl...@skynav.com> 2012-04-01 07:02:57 UTC ---
batch transition pre-FOP1.0 resolved+fixed bugs to closed+fixed

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48032] The intermediate file format has xml:space for every text element

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

--- Comment #3 from Vincent Hennebert <vh...@gmail.com> 2009-11-03 09:33:18 UTC ---
(In reply to comment #2)
> > Question remains: is there any risk that other elements containing text may be
> > affected by that xml:space?
> 
> To answer that I want to mention why I added the xml:space=preserve in the
> first place: Editing IF in an XML editor frequently messed up the formatting
> because of pretty-printing.

A comment in the source code explaining that would have been welcome.


> FOP itself doesn't really profit from it (and can
> do without). The default behaviour is application-specific which is fine in
> FOP, but an XML editor doesn't know about that except if there's an XML Schema
> active in the XML editor which could supply the intended default behaviour. But
> I haven't tested whether that would really work.

So IIUC the xml:space attribute won't even be honoured by the XML editor if the
schema is not associated with the document when it's loaded? Unless some code
has been written in the editor to recognize the standard attributes in the xml
namespace, even if not backed by a schema.


> > Also, is there any testing framework for the intermediate format, to test that > change?
> 
> The only thing we do now is test using XPath statements and I don't think I've
> written any test that looks for the xml:space attribute. Essentially, you can
> simply remove the xml:space and the tests will run through as before. As said
> above, the only case where xml:space is useful is when it comes to manually
> editing the IF in an XML editor. It avoids breaking the content.

If nobody objects I'm going to move the definition of the xml:space attribute
to the <document> element. If its purpose only is to avoid overzealous
pretty-printing by XML editors, FOP wouldn't be affected and those editors
should still behave properly.

Vincent

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48032] The intermediate file format has xml:space for every text element

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48032

--- Comment #2 from Jeremias Maerki <je...@apache.org> 2009-11-02 08:13:48 UTC ---
> Question remains: is there any risk that other elements containing text may be
> affected by that xml:space?

To answer that I want to mention why I added the xml:space=preserve in the
first place: Editing IF in an XML editor frequently messed up the formatting
because of pretty-printing. FOP itself doesn't really profit from it (and can
do without). The default behaviour is application-specific which is fine in
FOP, but an XML editor doesn't know about that except if there's an XML Schema
active in the XML editor which could supply the intended default behaviour. But
I haven't tested whether that would really work.

> Also, is there any testing framework for the intermediate format, to test that > change?

The only thing we do now is test using XPath statements and I don't think I've
written any test that looks for the xml:space attribute. Essentially, you can
simply remove the xml:space and the tests will run through as before. As said
above, the only case where xml:space is useful is when it comes to manually
editing the IF in an XML editor. It avoids breaking the content.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.