You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2009/08/19 20:23:09 UTC

DO NOT REPLY [Bug 47709] New: Issue with parsing the 'font' shorthand

https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

           Summary: Issue with parsing the 'font' shorthand
           Product: Fop
           Version: 1.0dev
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: fo tree
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: adelmelle@apache.org


--- Comment #0 from Andreas L. Delmelle <ad...@apache.org> 2009-08-19 11:23:08 PDT ---
Created an attachment (id=24148)
amendment to font-shorthand-test which makes the build fail

Parsing of a very simple value for the font shorthand property currently fails
in FOP Trunk.

In attach, a patch/amendment to the related FO Tree testcase which, if applied,
shows that it does not pass...

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #5 from Jonathan Levinson <le...@intersystems.com> 2009-09-21 10:19:15 PDT ---
Hi Vincent,

It was my pleasure to help out.  Thank you for your review of the problems you
think still remain in the font shorthand code.  

When you closed 47859, did you noticed I had attached a patch that fixes one of
the issues you mention below:

"Also, it doesn't
seem to handle the case where there are more than one comma (i.e., at least 3
families specified)."

This patch, which (I believe) fixes the problem you mention, was attached to
47859, which you closed as a duplicate. 

I infer from reading and studying the code that the purpose of the complicated
""while (!fontFamilyParsed)" loop is to identify the beginning of the font
family token.  The FontFamilyProperty parser assumes it has been given a string
which represents a font family and the font shorthand routine is trying to
identify the font family token (a string) which is sent to FontFamilyProperty. 
The font shorthand code is not trying to do any parsing.  I say this with all
due respect for your greater experience with Apache FOP.  

You've made other comments I'll have to investigate before replying further.  I
need to add more test cases to verify that my fix (the one in 47859) corrects
all issues with font shorthand property.

I can see your point about the desirability of a font shorthand parser. 
Currently the font short hand code tries to identify the strings which should
be sent to each of the font property parsers such as FontFamilyProperty.  So
the font shorthand code is a tokenizer rather than a parser.  Its goal is to
identify the tokens and then send the tokens to property parsers for further
processing.  It could be a tokenizer is all this really needed, but I don't
know at this point.

Best Regards,
Jonathan S. Levinson
Senior Software Developer
Object Group
InterSystems
617-621-0600

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #9 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:45:04 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #8 from Jonathan Levinson <le...@intersystems.com> 2009-10-05 11:18:35 PDT ---
Created an attachment (id=24344)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24344)
Add more tests to font-shorthand-test

I add more tests to font-shorthand-test, testing various combinations,
including testing whether font-size/line-height can have spaces around the
slash and whether line-height -> space will be parsed.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

taffy-tyler6464@hotmail.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |taffy-tyler6464@hotmail.co.
                   |                            |uk

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #6 from Jonathan Levinson <le...@intersystems.com> 2009-09-21 13:29:20 PDT ---
Created an attachment (id=24298)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24298)
Get font short hand completely working

I tested a variety of cases such as:

          <fo:block font="italic small-caps normal 12pt/150% Arial, Helvetica,
sans-serif">
        <test:assert property="font-family" expected="[Arial, Helvetica,
sans-serif]" />
        <test:assert property="font-size" expected="12000mpt" />
        <test:assert property="font-weight" expected="400" />
        <test:assert property="font-style" expected="ITALIC" />
        <test:assert property="line-height.optimum" expected="18000mpt" />
        <test:assert property="font-variant" expected="SMALL_CAPS" />
        Test font shorthand
      </fo:block> 

     <fo:block font="xx-large/1.4 Arial, 'Times New Roman', sans-serif">
        <test:assert property="font-family" expected="[Arial, Times New Roman,
sans-serif]"/>
        <test:assert property="font-size" expected="20736mpt"/>
        <test:assert property="font-weight" expected="400" />
        <test:assert property="font-style" expected="NORMAL" />
        <test:assert property="line-height.optimum" expected="29030mpt" />
        <test:assert property="font-variant" expected="NORMAL" />
        Test font shorthand
      </fo:block>

I think FontShortHandProperty.java now works and does not need to be an LL(k)
or LR(k) parser.

The code looks complex but in each case of font property (such as font variant
or font family) it is defining a token to be parsed so it is is tokenizing
code.  It could probably be refactored to be easier to understand and read. 
The code is not parsing but tokenizing.

There is an additional remark I made about this solution (posted before).  The
solution seems to have been inadvertently ignored when the tests and patch I
made were marked as a duplicate.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #7 from Jonathan Levinson <le...@intersystems.com> 2009-10-05 11:16:00 PDT ---
Created an attachment (id=24343)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24343)
Fix parsing of font-size and line height

I use Java RegEx to parse font-size and line height.

I also (in next patch) add a patch to font shorthand regression test that tests
various combinations, including line-height -> space.

I ran unit tests on it and no new failures.

I ran check-style on it and corrected all warnings and errors.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #1 from Jonathan Levinson <le...@intersystems.com> 2009-09-16 10:52:07 PDT ---
Created an attachment (id=24277)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24277)
Allow case when no font-family is quoted and no comma-separated list of font
families

Allow case when no font-family is quoted and no comma-separated list of font
families.

The code in FontShorthandProperty is looking for the beginning of the
font-family.  If the font-family contains commas then the code works as before.
 If the font-family begins with a quote the code works as before.  There was
one case missing - when the code is looking for the beginning of the font
family but there are no commas and no quotes.

The font-family can be quite complex containing commas, single quoted strings,
double quote strings, and strings without quotes but the parsing of this
complex expression is taken care of by FontFamilyProperty.  All
FontShorthandProperty does is look for where the font-family begins.  It almost
did this correctly (before this patch) when the font-family contained commas or
consisted of a quoted string.  Now there is the case when the font-family
contains no quotes and no commas.

However, there is a fly in the ointment.  In testing this change (and in
testing the previous unchanged code), I added the following test case, which
fails in both cases.

     <fo:block font="xx-large/1.4 Arial, 'Times New Roman', sans-serif">
        <test:assert property="font-family" expected="[Arial, Times New Roman,
sans-serif]"/>
        <test:assert property="font-size" expected="20736mpt"/>
        <test:assert property="font-weight" expected="400" />
        <test:assert property="font-style" expected="NORMAL" />
        <test:assert property="line-height.optimum" expected="29030mpt" />
        <test:assert property="font-variant" expected="NORMAL" />
        Test font shorthand
      </fo:block> 

There seems to be a problem when the quote follows a comma.  I will add a patch
to font-shorthand-test.fo which demonstrates this problem, and will continue to
diagnose the code to see what further problems in the code are causing this
problem.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

Vincent Hennebert <vh...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |levinson@intersystems.com

--- Comment #2 from Vincent Hennebert <vh...@gmail.com> 2009-09-21 04:13:27 PDT ---
*** Bug 47859 has been marked as a duplicate of this bug. ***

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #3 from Vincent Hennebert <vh...@gmail.com> 2009-09-21 04:15:56 PDT ---
(In reply to comment #2)
> *** Bug 47859 has been marked as a duplicate of this bug. ***

See augmented patch to font-shorthand-test in bug report above.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 47709] Issue with parsing the 'font' shorthand

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=47709

--- Comment #4 from Vincent Hennebert <vh...@gmail.com> 2009-09-21 04:38:54 PDT ---
Hi Jonathan,

Thank you for your contribution to fixing this issue.

I think the whole parsing of this property needs to be refactored, because it's
unnecessarily complicated and doesn't work. I don't really get the purpose of
the "while (!fontFamilyParsed)" loop but it seems to me that it's doing a job
that should be left over to the FontFamilyProperty parser. Also, it doesn't
seem to handle the case where there are more than one comma (i.e., at least 3
families specified). If more than one font-style/-variant/-weight is specified,
it will try to parse the second value (say, font-variant) as a font family.
Also, it tries to apply a special treatment when system fonts (caption, icon,
menu...) are specified, whereas it is explicitly stated in the Recommendation
that they don't apply to XSL-FO.

I think the best way to handle that is to write a parser that follows the
grammar given by the Recommendation. Hopefully a simple LL parser will do [1],
but given that all of the sub-properties accept the 'inherit' keyword it will
probably not be LL(1). If an LL parser can't be used, then we will have to use
an LR one [2], which is slightly more problematic since it cannot be easily
coded by hand, and we will have to rely on an external parsing library.

I suggest you to look if an LL parser can be written to properly parse this
property, and report back here if it's not the case, and if we must consider
other solutions.

[1] http://en.wikipedia.org/wiki/LL_parser
[2] http://en.wikipedia.org/wiki/LR_parser

Thanks,
Vincent


(In reply to comment #1)
> Created an attachment (id=24277)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24277) [details]
> Allow case when no font-family is quoted and no comma-separated list of font
> families
> 
> Allow case when no font-family is quoted and no comma-separated list of font
> families.
> 
> The code in FontShorthandProperty is looking for the beginning of the
> font-family.  If the font-family contains commas then the code works as before.
>  If the font-family begins with a quote the code works as before.  There was
> one case missing - when the code is looking for the beginning of the font
> family but there are no commas and no quotes.
> 
> The font-family can be quite complex containing commas, single quoted strings,
> double quote strings, and strings without quotes but the parsing of this
> complex expression is taken care of by FontFamilyProperty.  All
> FontShorthandProperty does is look for where the font-family begins.  It almost
> did this correctly (before this patch) when the font-family contained commas or
> consisted of a quoted string.  Now there is the case when the font-family
> contains no quotes and no commas.
> 
> However, there is a fly in the ointment.  In testing this change (and in
> testing the previous unchanged code), I added the following test case, which
> fails in both cases.
> 
>      <fo:block font="xx-large/1.4 Arial, 'Times New Roman', sans-serif">
>         <test:assert property="font-family" expected="[Arial, Times New Roman,
> sans-serif]"/>
>         <test:assert property="font-size" expected="20736mpt"/>
>         <test:assert property="font-weight" expected="400" />
>         <test:assert property="font-style" expected="NORMAL" />
>         <test:assert property="line-height.optimum" expected="29030mpt" />
>         <test:assert property="font-variant" expected="NORMAL" />
>         Test font shorthand
>       </fo:block> 
> 
> There seems to be a problem when the quote follows a comma.  I will add a patch
> to font-shorthand-test.fo which demonstrates this problem, and will continue to
> diagnose the code to see what further problems in the code are causing this
> problem.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.