You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by ewitness - Ben Fowler <bf...@ewitness.co.uk> on 2002/02/25 14:41:11 UTC

RE: [PROPOSAL] linebreak

> >
>>I guess the reason nobody thought <fo:br/> or <fo:newline/> would be
> >required is because a U+000A will do the trick.
>
> [ snip ]
>
>In any case, a linefeed (LF) must be honoured, and result in a linebreak.
>_If_ the conditions are right. What that means is, the initial value for
>"linefeed-treatment" is "treat-as-space", which _does_ do a conversion of
>U+000A to U+0020 (space). So you would want to specify
>"linefeed-treatment='preserve'" on an ancestor flow object (possibly
>fo:root) and allow it to propagate to the FOs of interest, as it is
>inheritable. The "whitespace-*" properties do not affect the linefeed, and
>suppress-at-line-break can also be left as it is.
>
>But essentially the LF is there to accomplish what you want to do. The
>initial setting of "linefeed-treatment" acts to give us LaTeX-like
>behaviour, but unlike LaTeX we can switch to something different in this
>regard, rather than use new markup.

The answer that you gave is also to be found a few lines down
from the first URL I gave you

	4.	 Forced line-breaks are respected. Specifically, if A
	is the glyph-area generated by a fo:character whose Unicode
	character is U+000A, then A must be the last area in its
	containing subset Si.

I don't mind admitting that as an outsider to the XML standard, this
looks like a bad, even a really bad, idea.

My reading of your commentary is "Whitespace is sometimes respected,
and only a langauge lawyer can tell you when".

How should this be interpreted?

Do you think that HTML would be improved if the <BR> element was
replaced with a feature that said "You can get the effect of a
forced linebreak by setting 'linefeed-treatment' to 'preserve'
in the <body> of the page (or other container as required), which
causes all unix line feeds to be rendered" instead the <br /> element
which is what was done?

>From my POV this has an inhibiting effect on all editors and pretty
printing utilities, which must also respect exisiting white space
(as XSL processors do) and never introduce line feeds, in case this
setting was ever turned on. From my POV, a formatter should always
ignore the formatting of the source, unless notified that it is
preformatted as in the case of <PRE> and CDATA, exempli gratia
<URL: http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Autools/sperberg-mcqueen/sperberg.html >,
(1994) about half way down.

Do you happen to know whether this was ever discussed (id est objections
sought and answered) or whether this was one person's idea that
was incorporated as is.

I have a related 'issue' which is that the normalize-string( ) function
in XSL does two things. It trims leading and trailling newlines
and other whitespace, and it normalises internal white space.
I have a need for an operation that does the former, but not the latter.

(In fact I have an implementation which appears to be buggy
and replaces 'Miss A Burgrave' with Miss ABurgrave', but handles
'Miss A  Burgrave' correctly.

In short, XML processors including ones that produce XML-FO files
should pass through all whitespace, and processors such as fop
which are also XML processors, but adjusted so that they do not
produce XML, should (at least in general) normalise whitespace.
Where the output file format respects whitespace then it should
be supplied as <fo:text> or as some break (as my original suggestion)
The present situation is that the latter type of processor may not
normalise whitespace, because some newlines are significant.

Incidently, you have not made (or reported) a case against my suggestion:
unless it is harmful (or confusing) there is no real reason why both
styles of indicating significant breaks could not be used, is there?

Using FOP derived from version 0.14, I get this report when I tried
the following .fo

	WARNING: property 'linefeed-treatment' ignored
	WARNING: property 'linefeed-treatment' ignored
	setting up fonts
	formatting FOs into areas
	[1]
	rendering areas to PDF

(source)

	<?xml version="1.0" encoding="UTF-8"?>
	<fo:root
			xmlns:fo="http://www.w3.org/1999/XSL/Format"
			text-align="justified" font-size="12pt" font-family="serif"
			linefeed-treatment='preserve' >
		<fo:layout-master-set>
			<fo:simple-page-master
					margin-right="50pt" margin-left="100pt"
					margin-bottom="25pt" margin-top="75pt" master-name="all">
				<fo:region-body margin-bottom="50pt" />
				<fo:region-after extent="25pt" />
			</fo:simple-page-master>
		</fo:layout-master-set>
		<fo:page-sequence id="" hyphenate="true" master-name="all" language="en">
			<fo:flow flow-name="xsl-region-body">
				<fo:block linefeed-treatment='preserve'>
					Bilbo Baggins,
					Bag End,
					Underhill,
					Hobbiton,
					Westfarthing of the Shire.
				</fo:block>
			</fo:flow>
		</fo:page-sequence>
	</fo:root>

line-feed treatment was reported as not working in June last
year, <URL: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1998 >,
I don't know whether tit is now in.

I now have a linux installation (but not yet CVS), and so I am in a position to
start some development work on FOP. Where should I start? Is there a list
of outstanding tasks?

I wrote that a few days ago, but delayed sending it until I could
see what bugzilla could tell me. In the meantime, bugzilla has sent
me an e-mail giving no fewer than 195 issues. My search on bugzilla
reveals 18 high priority bugs.
<URL: http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&priority=High&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=& >

Nonetheless, my query remains, is there a list of issues which
people can start working on now, that won't need to be re-done
once the redesign is on place.


Ben.







---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: [PROPOSAL] linebreak

Posted by ewitness - Ben Fowler <bf...@ewitness.co.uk>.
>Comments below.
>
>[ snip ]
>
>3. Final discussion comment: XSL formatters _do_ ignore the presence of
>linefeeds (in one of several different interpretations of "ignore") by
>default. By choosing 'preserve' for linefeed-treatment you _are_ basically
>doing a <PRE> operation, with respect to linefeeds. So I don't see much of a
>difference or any grounds for objection.
>
>But I do see an argument for a semantic linebreak in the source. Relying on
>linefeeds or the lack thereof in source XML is a bit problematic.

Thank you. I don't exactly have a problem with the mechanism itself,
more that it is too complicated for most people to understand without
a tutor (as I found). This can be countered by an argument (which I
accept) that .fo files are usually machine produced, and are not
pretty printed or edited. Against that, (1) the fragments that make
up an .fo file most certainly are, and (2) there is no bar to creating
a .fo file directly of by some mechanism other than XSLT.

>4. normalize-space(): The XPath function takes tabs, spaces, carriage
>returns and linefeeds and does what you say. I think that the existing
>string functions in XPath/XSLT are not sufficiently powerful to easily do
>what you wish; OTOH the activities of the XSL people to come out with a new
>XSLT and XPath include regular expressions (see
>http://www.w3.org/TR/xquery-operators/) so this is one way in which you
>could do what you want.

Thank you. I am simply not that familiar with XPath. The issue arose,
as you might have guessed by an editor assuming that it could add any
amount of white space at the beginning or end of an element (quite
reasonable in the XML world), and I had assumed that there would be
a matching function that would remove it. Maybe I am making a mountain
out of a molehill. I feel that this is a result of trying to
standardise too early, id est without sufficient, or sufficient duration
of experience.

Ben.

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: [PROPOSAL] linebreak

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.
Comments below.

-----Original Message-----
From: ewitness - Ben Fowler [mailto:bfowler@ewitness.co.uk]
Sent: February 25, 2002 9:41 AM
To: fop-dev@xml.apache.org
Subject: RE: [PROPOSAL] linebreak


> >
>>I guess the reason nobody thought <fo:br/> or <fo:newline/> would be
> >required is because a U+000A will do the trick.
>
> [ snip ]
>
>In any case, a linefeed (LF) must be honoured, and result in a linebreak.
>_If_ the conditions are right. What that means is, the initial value for
>"linefeed-treatment" is "treat-as-space", which _does_ do a conversion of
>U+000A to U+0020 (space). So you would want to specify
>"linefeed-treatment='preserve'" on an ancestor flow object (possibly
>fo:root) and allow it to propagate to the FOs of interest, as it is
>inheritable. The "whitespace-*" properties do not affect the linefeed, and
>suppress-at-line-break can also be left as it is.
>
>But essentially the LF is there to accomplish what you want to do. The
>initial setting of "linefeed-treatment" acts to give us LaTeX-like
>behaviour, but unlike LaTeX we can switch to something different in this
>regard, rather than use new markup.

The answer that you gave is also to be found a few lines down
from the first URL I gave you

	4.	 Forced line-breaks are respected. Specifically, if A
	is the glyph-area generated by a fo:character whose Unicode
	character is U+000A, then A must be the last area in its
	containing subset Si.

I don't mind admitting that as an outsider to the XML standard, this
looks like a bad, even a really bad, idea.

My reading of your commentary is "Whitespace is sometimes respected,
and only a langauge lawyer can tell you when".

How should this be interpreted?

Do you think that HTML would be improved if the <BR> element was
replaced with a feature that said "You can get the effect of a
forced linebreak by setting 'linefeed-treatment' to 'preserve'
in the <body> of the page (or other container as required), which
causes all unix line feeds to be rendered" instead the <br /> element
which is what was done?

>>From my POV this has an inhibiting effect on all editors and pretty
printing utilities, which must also respect exisiting white space
(as XSL processors do) and never introduce line feeds, in case this
setting was ever turned on. From my POV, a formatter should always
ignore the formatting of the source, unless notified that it is
preformatted as in the case of <PRE> and CDATA, exempli gratia
<URL:
http://archive.ncsa.uiuc.edu/SDG/IT94/Proceedings/Autools/sperberg-mcqueen/s
perberg.html >,
(1994) about half way down.

Do you happen to know whether this was ever discussed (id est objections
sought and answered) or whether this was one person's idea that
was incorporated as is.

I have a related 'issue' which is that the normalize-string( ) function
in XSL does two things. It trims leading and trailling newlines
and other whitespace, and it normalises internal white space.
I have a need for an operation that does the former, but not the latter.

(In fact I have an implementation which appears to be buggy
and replaces 'Miss A Burgrave' with Miss ABurgrave', but handles
'Miss A  Burgrave' correctly.

In short, XML processors including ones that produce XML-FO files
should pass through all whitespace, and processors such as fop
which are also XML processors, but adjusted so that they do not
produce XML, should (at least in general) normalise whitespace.
Where the output file format respects whitespace then it should
be supplied as <fo:text> or as some break (as my original suggestion)
The present situation is that the latter type of processor may not
normalise whitespace, because some newlines are significant.

Incidently, you have not made (or reported) a case against my suggestion:
unless it is harmful (or confusing) there is no real reason why both
styles of indicating significant breaks could not be used, is there?

[ SNIP example ]

line-feed treatment was reported as not working in June last
year, <URL: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1998 >,
I don't know whether tit is now in.

I now have a linux installation (but not yet CVS), and so I am in a position
to
start some development work on FOP. Where should I start? Is there a list
of outstanding tasks?

I wrote that a few days ago, but delayed sending it until I could
see what bugzilla could tell me. In the meantime, bugzilla has sent
me an e-mail giving no fewer than 195 issues. My search on bugzilla
reveals 18 high priority bugs.
<URL:
http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSI
GNED&bug_status=REOPENED&priority=High&email1=&emailtype1=substring&emailass
igned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&
bug_id=&changedin=&votes=& >

Nonetheless, my query remains, is there a list of issues which
people can start working on now, that won't need to be re-done
once the redesign is on place.

Ben

********************************************************

My Comments:

1. Bear in mind that 'linefeed-treatment' need not be a global. You can
leave it to the initial value ('treat-as-space'), or change it to 'ignore'
if you like (whatever suits) and so the document as a whole will have LaTeX
or HTML-like behaviour when it comes to linefeeds in text.

But for specific blocks you can explicitly set the value to 'preserve', and
then you know that linefeeds in that block will be acted upon.

2. However, I gather you don't like that much. Even if FOP worked in this
regard you still want an explicit linebreak. OK, let's operate on that
premise. And let's assume that we use the spec as it is. In this case one
option for your stylesheet is to implement the above (in comment 1):

<xsl:if test="br">
	<xsl:attribute name="linefeed-treatment">preserve</xsl:attribute>
</xsl:if>

If you have this inside each template of interest:

<xsl:template match="para">
<fo:block>
    <xsl:if test="br">
        <xsl:attribute name="linefeed-treatment">preserve</xsl:attribute>
    </xsl:if>
	<xsl:apply-templates/>
</fo:block>
</xsl:template>

then the presence of a <br/> will throw the switch for that block.

You'd want to finetune this test, so as to reset the switch for descendant
blocks that do _not_ contain <br/>, but you get the idea.

I have no idea how expensive this approach is in terms of XSLT processing
but my gut feeling is it's probably not too bad.

3. Final discussion comment: XSL formatters _do_ ignore the presence of
linefeeds (in one of several different interpretations of "ignore") by
default. By choosing 'preserve' for linefeed-treatment you _are_ basically
doing a <PRE> operation, with respect to linefeeds. So I don't see much of a
difference or any grounds for objection.

But I do see an argument for a semantic linebreak in the source. Relying on
linefeeds or the lack thereof in source XML is a bit problematic.

4. normalize-space(): The XPath function takes tabs, spaces, carriage
returns and linefeeds and does what you say. I think that the existing
string functions in XPath/XSLT are not sufficiently powerful to easily do
what you wish; OTOH the activities of the XSL people to come out with a new
XSLT and XPath include regular expressions (see
http://www.w3.org/TR/xquery-operators/) so thsi is one way in which you
could do what you want.

Regards,
Arved Sandstrom


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org