You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Manuel Mall <mm...@arcus.com.au> on 2005/10/05 09:46:18 UTC
script property
While I am at it (this whole alignment stuff I mean) we may as well do
it properly. This would include support for the "script" property. The
allowed values for script are defined for example here:
http://www.unicode.org/iso15924/iso15924-codes.html.
I assume we don't bother to validate if a correct code has been
provided as we don't do that for the "country" and "language"
properties either (should we? If we do we need more external config
files or expand fop.xconf to hold those values as they tend to change
over time).
But what we do need is a mapping from scripts to default baselines for
these scripts. I haven't found a mapping list on the net. Any one come
across something like that? Otherwise we may have to make that up. That
means entries somewhere similar to: <script code="Guru"
baseline="hanging" />. Is the fop config file the right place for this
stuff? Any not defined scripts encountered in an fo file would map to
baseline="alphabetic" (may be with a warning to the user?).
What we also need for proper script support is a mapping from Unicode
code point to script. The mappings are for example defined here:
http://www.unicode.org/Public/UNIDATA/Scripts.txt.
How would one best process this (has this been done in FOP before?)?
Is there other Unicode stuff FOP needs which should be considered at the
same time?
Are we better off working with the "raw" Unicode data
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)?
Manuel
Re: script property
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Manuel Mall wrote:
> It doesn't quite solve all issues though I think:
Correct.
> May be a wrapper around this class to provide that functionality?
Given that we must get data from Unicode files anyway, we
could as well have our own implementation for everything.
J.Pietschmann
Re: script property
Posted by Manuel Mall <mm...@arcus.com.au>.
On Fri, 7 Oct 2005 03:30 am, J.Pietschmann wrote:
> Manuel Mall wrote:
> > What we also need for proper script support is a mapping from
> > Unicode code point to script.
>
> On a second thought: isn't this what Class Character.UnicodeBlock
> does?
>
Joerg,
Thank you - I didn't even know that this class existed.
It doesn't quite solve all issues though I think:
a) We need a mapping from the ISO 4 letter codes to the
Character.UnicodeBlock classes.
b) We need a mapping from the Character.UnicodeBlock to script
properties (actually at this point in time the only property I am aware
off is the default baseline for the script).
May be a wrapper around this class to provide that functionality?
> J.Pietschmann
Manuel
Re: script property
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Manuel Mall wrote:
> What we also need for proper script support is a mapping from Unicode
> code point to script.
On a second thought: isn't this what Class Character.UnicodeBlock
does?
J.Pietschmann
Re: script property
Posted by "Peter B. West" <li...@pbw.id.au>.
Manuel Mall wrote:
> On Wed, 5 Oct 2005 04:17 pm, Jeremias Maerki wrote:
>
>>On 05.10.2005 09:46:18 Manuel Mall wrote:
>>
>>>While I am at it (this whole alignment stuff I mean) we may as well
>>>do it properly. This would include support for the "script"
>>>property. The allowed values for script are defined for example
>>>here:
>>>http://www.unicode.org/iso15924/iso15924-codes.html.
>>>
>>>I assume we don't bother to validate if a correct code has been
>>>provided as we don't do that for the "country" and "language"
>>>properties either (should we? If we do we need more external config
>>>files or expand fop.xconf to hold those values as they tend to
>>>change over time).
>>
>>We don't have to but we could. Since this is not something that
>>changes often I wouldn't put it into the config file, but in resource
>>files instead.
>>
>
> OK - makes sense.
>
Validation issues considered in alt-design circa 2002. See
CountryLanguageScript.java in the alt-design code for an attempt at
this. Generated from xml-lang.xml and xml-lang.xsl. No baselines.
>
Peter
--
Peter B. West <http://cv.pbw.id.au/>
Folio <http://defoe.sourceforge.net/folio/>
Re: script property
Posted by Manuel Mall <mm...@arcus.com.au>.
On Wed, 5 Oct 2005 04:17 pm, Jeremias Maerki wrote:
> On 05.10.2005 09:46:18 Manuel Mall wrote:
> > While I am at it (this whole alignment stuff I mean) we may as well
> > do it properly. This would include support for the "script"
> > property. The allowed values for script are defined for example
> > here:
> > http://www.unicode.org/iso15924/iso15924-codes.html.
> >
> > I assume we don't bother to validate if a correct code has been
> > provided as we don't do that for the "country" and "language"
> > properties either (should we? If we do we need more external config
> > files or expand fop.xconf to hold those values as they tend to
> > change over time).
>
> We don't have to but we could. Since this is not something that
> changes often I wouldn't put it into the config file, but in resource
> files instead.
>
OK - makes sense.
> > But what we do need is a mapping from scripts to default baselines
> > for these scripts. I haven't found a mapping list on the net. Any
> > one come across something like that?
>
> Nope.
>
> > Otherwise we may have to make that up. That
> > means entries somewhere similar to: <script code="Guru"
> > baseline="hanging" />. Is the fop config file the right place for
> > this stuff?
>
> Again, I'd put it in separate resource files as this is not going to
> change often and a rebuild of FOP is not the end of the world in this
> case.
My suggestion was based around the assumption that if we have to make up
the mappings from script to baseline ourselves we may get it wrong.
Therefore leave it up to the user to add the mappings for his/her
language/script environment to the config file. Most users will deal
only with a very few scripts so its not a big deal.
>
> > Any not defined scripts encountered in an fo file would map to
> > baseline="alphabetic" (may be with a warning to the user?).
>
> Sure.
>
> > What we also need for proper script support is a mapping from
> > Unicode code point to script. The mappings are for example defined
> > here: http://www.unicode.org/Public/UNIDATA/Scripts.txt.
> > How would one best process this?
>
> <shrug/>
>
> > (has this been done in FOP before?)
>
> I don't think so.
>
See Joerg's response.
> > Is there other Unicode stuff FOP needs which should be considered
> > at the same time?
> > Are we better off working with the "raw" Unicode data
> > (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)?
>
> <shrug/>
Seems like line breaking (and hyphenation, e.g. script specific
hyphenation character) may also need Unicode stuff (not necessarily
from the raw data file though).
>
> We should simply make sure that this doesn't influence performance
> too much for the big majority of users happy to use latin scripts.
> After all, this looks like many lookups are necessary and all these
> maps have to be loaded at one point.
>
Yes, that is a valid consideration. May be it needs to be designed in a
way that these lookups can be disabled and replaced by defaults from
the config file.
>
> Jeremias Maerki
Manuel
Re: script property
Posted by Manuel Mall <mm...@arcus.com.au>.
On Thu, 6 Oct 2005 04:23 am, J.Pietschmann wrote:
> Jeremias Maerki wrote:
> >> What we also need for proper script support is a mapping from
> >> Unicode code point to script.
>
> ...
>
> >> (has this been done in FOP before?)
> >
> > I don't think so.
>
> Have a look at
> http://people.apache.org/~pietsch/linebreak.tar.gz
>
> Occasionally I've thought about some sort of Jakarta commons
> Unicode file component, but the guys there weren't all that
> enthusiastic about this, and I've not enough time to get
> the ball rolling all of my own.
>
Joerg,
thanks for that.
Do I understand this correctly that you use a Java code generation
approach here. That is you generate Java source code from the Unicode
text files which is then compiled as part of the line breaking code?
Not so sure I like that but then again if it works. For me this type of
stuff feels more like pure data but of course we don't want to parse
these text files each time FOP loads. What about the hyphenation
pattern approach? Store it as a serialized object and treat it more
like a resource? Accessing that should be comparable in time to class
loading (I think as I haven't ever empirically tested that).
I haven't studied your code in detail but could we / should we integrate
this into the FOP trunk to support 'Unicode compliant' line breaking?
My main goal still is to make FOP happen therefore I wouldn't like to
dilute my effort / time in trying to argue / establishing another
commons subproject at the moment. What about we create a
org.apache.fop.unicode package for the time being where we keep unicode
specific support stuff? That can then at a later stage be refactored
into a commons subproject if the time/will/energy is there.
> J.Pietschmann
Manuel
Re: script property
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Jeremias Maerki wrote:
>> What we also need for proper script support is a mapping from Unicode
>> code point to script.
...
>> (has this been done in FOP before?)
>
> I don't think so.
Have a look at
http://people.apache.org/~pietsch/linebreak.tar.gz
Occasionally I've thought about some sort of Jakarta commons
Unicode file component, but the guys there weren't all that
enthusiastic about this, and I've not enough time to get
the ball rolling all of my own.
J.Pietschmann
Re: script property
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 05.10.2005 09:46:18 Manuel Mall wrote:
> While I am at it (this whole alignment stuff I mean) we may as well do
> it properly. This would include support for the "script" property. The
> allowed values for script are defined for example here:
> http://www.unicode.org/iso15924/iso15924-codes.html.
>
> I assume we don't bother to validate if a correct code has been
> provided as we don't do that for the "country" and "language"
> properties either (should we? If we do we need more external config
> files or expand fop.xconf to hold those values as they tend to change
> over time).
We don't have to but we could. Since this is not something that changes
often I wouldn't put it into the config file, but in resource files
instead.
> But what we do need is a mapping from scripts to default baselines for
> these scripts. I haven't found a mapping list on the net. Any one come
> across something like that?
Nope.
> Otherwise we may have to make that up. That
> means entries somewhere similar to: <script code="Guru"
> baseline="hanging" />. Is the fop config file the right place for this
> stuff?
Again, I'd put it in separate resource files as this is not going to
change often and a rebuild of FOP is not the end of the world in this
case.
> Any not defined scripts encountered in an fo file would map to
> baseline="alphabetic" (may be with a warning to the user?).
Sure.
> What we also need for proper script support is a mapping from Unicode
> code point to script. The mappings are for example defined here:
> http://www.unicode.org/Public/UNIDATA/Scripts.txt.
> How would one best process this?
<shrug/>
> (has this been done in FOP before?)
I don't think so.
> Is there other Unicode stuff FOP needs which should be considered at the
> same time?
> Are we better off working with the "raw" Unicode data
> (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)?
<shrug/>
We should simply make sure that this doesn't influence performance too
much for the big majority of users happy to use latin scripts. After all,
this looks like many lookups are necessary and all these maps have to be
loaded at one point.
Jeremias Maerki