You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flex.apache.org by Justin Mclean <ju...@classsoftware.com> on 2013/04/22 03:05:03 UTC

What's the easiest way to tell if a character is a letter or not?

Hi,

Anyone know how to work out if a character, including unicode characters, is a letter or not?

This is not the right way of doing it:

            if ("a" <= letter && letter <= "z" ||
                "A" <= letter && letter <= "Z")

From the DateFormatter class for the curious.

Thanks,
Justin




Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

>> Would actually currently pass - although it throws away the GMT+7 and you;re left with what ever timezone your computer is in. (Another bug/feature).
> 
> does it actually "cast" the datetime into user's timezone (tz)? that would be pretty cool.

Nope it this case it just takes the date and time as written and changes the time zone to be the current one. It may depend on exactly how you are converting the string to a date, I've not tried Date.parse to see what it does.

Also (sadly) the Flash Player's Date class timezoneOffset property is read only.

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
On 4/23/2013 2:54 PM, Justin Mclean wrote:
> Hi,
>
>> no, its a different calendar underlying the system.
> Sure I'm not trying to fix Flex to cater for any calendar system (yet) just want to get some of the obvious and annoying date issues fixed.

well i wouldn't bother much w/locales that don't use the gregorian calendar.

>> 13 นาฬิกา 48 นาที 56 วินาที เวลาอินโดจีน
>> 13 นาฬิกา 48 นาที 56 วินาที GMT+7
> Would actually currently pass - although it throws away the GMT+7 and you;re left with what ever timezone your computer is in. (Another bug/feature).

does it actually "cast" the datetime into user's timezone (tz)? that would be 
pretty cool.



Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> no, its a different calendar underlying the system.
Sure I'm not trying to fix Flex to cater for any calendar system (yet) just want to get some of the obvious and annoying date issues fixed.

> 13 นาฬิกา 48 นาที 56 วินาที เวลาอินโดจีน
> 13 นาฬิกา 48 นาที 56 วินาที GMT+7
Would actually currently pass - although it throws away the GMT+7 and you;re left with what ever timezone your computer is in. (Another bug/feature).

> 13:48:56
> 13:48
And obviously so would these.

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
On 4/23/2013 2:50 PM, Justin Mclean wrote:
> Believe it or that this does exist in Flash and is use by DateTimeFormatter.

i guess i should read more ;-)

> The issue with this class however is there no way to parse a string into a
> date (the issue here), just format a date object in a nice way, we dealing
> with user input here and they are not all Java programmers :-)

think i'm getting confused between java/cf & AS3. let me see how smart 
Date.parse() is.

in java/cf, if you follow the java style masks, yes you can usually parse 
datetime strings back into Date. if you allow any mask the developer feels like, 
the probability shrinks.

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> maybe add another method or change the way the format methods worked. taking coldfusion as a perfect example ;-) it has lsDateFormat method that has the following signature
> 
> lsDateFormat(date,mask,locale)
> 
> where mask can be user supplied but i argue for following the java style of FULL-->SHORT to avoid crazy/un-parseable date formats.

Believe it or that this does exist in Flash and is use by DateTimeFormatter. 

The issue with this class however is there no way to parse a string into a date (the issue here), just format a date object in a nice way, we dealing with user input here and they are not all Java programmers :-)

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
On 4/23/2013 6:04 AM, Justin Mclean wrote:
> It should be able to cope with that. Bit of a moot point as there no Thai
> locale for the SDK but I guess you could use copylocale and give it a go.

no, its a different calendar underlying the system. currently its the year
2556BE (buddhist  era) which you can obtain by adding 543 to the current CE 
year. you can make a gregorian calendar date look like a thai buddhist calendar 
date by supplying the thai langauge date parts & adding 543 to the CE year but 
it would be wrong.

23/4/56 is a tuesday in the thai buddhist calendar (year 2556BE)
23/4/56 is a friday in a faked thai date using gregorian calendar (year 2556CE).

and in some locales for the buddhist calendar there's a year 0 (want to say in 
mayanmar but don't remember exactly).

while there's a lot of correspondence between the buddhist & gregorian calendars 
(ie enough that you can fake it w/a wrapper to handle the year part as the 
months match up exactly) arabic/persian locales are never going to work that 
way. those calendars use completely different calculations.

> It probably wouldn't deal with the 6 hour (vs 12 hour) time system but I've
> no idea how Thai dates are normally written down do you have some examples?

no need, see above. and in any case, thai time format is 24-hour clock system, 
no AM/PM. though informally people use 1-6 plus period of day (late night, 
morning, afternoon & evening)--"bye 2" 2nd hour of the afternoon (2:00pm). but 
in case you were wondering (full-->short):

13 นาฬิกา 48 นาที 56 วินาที เวลาอินโดจีน
13 นาฬิกา 48 นาที 56 วินาที GMT+7
13:48:56
13:48

where:
นาฬิกา is roughly "o'clock" or time
นาที is minute
วินาที is second
วลาอินโดจีน is timezone name, indochina/ICT


>> though i think it would be better to supply the CLDR vetted date parts
> Would be the ideal solution - no idea how much work that would be.

maybe add another method or change the way the format methods worked. taking 
coldfusion as a perfect example ;-) it has lsDateFormat method that has the 
following signature

lsDateFormat(date,mask,locale)

where mask can be user supplied but i argue for following the java style of 
FULL-->SHORT to avoid crazy/un-parseable date formats.

but to get this to work globally, flex would need more calendars. pretty sure 
that would take a lot of work, i wouldn't even no where to begin to swap out the 
gregorian calendar for a buddhist one (guessing easiest to port). unless it was 
already done & waiting to be donated?

> What no Klingon dates? :-)

klingon comes up for inclusion in unicode every so often but the delegates 
usually get drunk & start a riot so nothing ever comes of it ;-) but you'd need 
a klingon calendar anyway.


Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> won't matter for countries that don't use the gregorian calendar (like thailand) as the dates will be wrong anyway.
It should be able to cope with that. Bit of a moot point as there no Thai locale for the SDK but I guess you could use copylocale and give it a go. 

It probably wouldn't deal with the 6 hour (vs 12 hour) time system but I've no idea how Thai dates are normally written down do you have some examples?

>  though i think it would be better to supply the CLDR vetted date parts
Would be the ideal solution - no idea how much work that would be.

> & not allow users to shoot their feet off w/wacky, non-standard date part names 
What no Klingon dates? :-)

Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> Should the checks be based on locale then?  They could be smaller checks based on one local per... It will keep from trying to have a 1 set of condition statements to rule them all.

Currently it gets the month name for the compiled locale and looks for those (well the first 3 letter of - wonder if there any language where the first 3 letters of different months are the same?.  Sort story is it should pick up any character there so no issues as far as I know.

The issue with japanese dates is that it doesn't recognise the separators between the year month and day and the format I posted doesn't use space or "/" or "." to separate the date parts.

Thanks,
Justin


RE: What's the easiest way to tell if a character is a letter or not?

Posted by Kessler CTR Mark J <ma...@usmc.mil>.
Should the checks be based on locale then?  They could be smaller checks based on one local per... It will keep from trying to have a 1 set of condition statements to rule them all.

-----Original Message-----
From: Paul Hastings [mailto:paul.hastings@gmail.com] 
Sent: Monday, April 22, 2013 12:37 PM
To: dev@flex.apache.org
Subject: Re: What's the easiest way to tell if a character is a letter or not?

On 4/22/2013 9:57 PM, Alex Harui wrote:
> My first thought was: 'what about Asia?'

won't matter for countries that don't use the gregorian calendar (like thailand) 
as the dates will be wrong anyway.

for countries that "sort" of use the gregorian calendar, like china, yes you'd 
have to expand the unicode range. though i think it would be better to supply 
the CLDR vetted date parts & not allow users to shoot their feet off w/wacky, 
non-standard date part names which also might make parsing them back to Dates 
hard or impossible.


Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
On 4/22/2013 9:57 PM, Alex Harui wrote:
> My first thought was: 'what about Asia?'

won't matter for countries that don't use the gregorian calendar (like thailand) 
as the dates will be wrong anyway.

for countries that "sort" of use the gregorian calendar, like china, yes you'd 
have to expand the unicode range. though i think it would be better to supply 
the CLDR vetted date parts & not allow users to shoot their feet off w/wacky, 
non-standard date part names which also might make parsing them back to Dates 
hard or impossible.


RE: What's the easiest way to tell if a character is a letter or not?

Posted by Gordon Smith <go...@adobe.com>.
An good utility would be a getUnicodeCategory() API.

- Gordon

-----Original Message-----
From: Gordon Smith 
Sent: Monday, April 22, 2013 10:24 AM
To: dev@flex.apache.org; 'paul.hastings@gmail.com'
Subject: RE: What's the easiest way to tell if a character is a letter or not?

There is no easy way to tell. You need a large lookup table containing all the characters in the LC, Ll, Lm, Lo, Lt, and Lu categories

http://www.fileformat.info/info/unicode/category/index.htm

- Gordon

-----Original Message-----
From: Paul Hastings [mailto:paul.hastings@gmail.com] 
Sent: Monday, April 22, 2013 10:02 AM
To: dev@flex.apache.org
Subject: Re: What's the easiest way to tell if a character is a letter or not?

On 4/22/2013 10:23 PM, Justin Mclean wrote:
> I'm not that familiar with Chinese date formats.. Can someone provide me with some examples?

gregorian calendar, using latest icu4j lib (running full,long,medium,short formats):

zh_CN/zh_HK locales
date==================
2013年4月22日星期一
2013年4月22日
2013年4月22日
13-4-22
time==================
下午11:58:57 [印度支那時間]
下午11:58:57 [GMT+7]
下午11:58:57
下午11:58

zh_TW locale
date==================
2013年4月22日星期一
2013年4月22日
2013/4/22
2013/4/22

time==================
印度支那時間上午12時00分24秒
GMT+7上午12時00分24秒
上午12:00:24
上午12:00


i'm in bangkok, so the tz is UTC+7.

RE: What's the easiest way to tell if a character is a letter or not?

Posted by Gordon Smith <go...@adobe.com>.
There is no easy way to tell. You need a large lookup table containing all the characters in the LC, Ll, Lm, Lo, Lt, and Lu categories

http://www.fileformat.info/info/unicode/category/index.htm

- Gordon

-----Original Message-----
From: Paul Hastings [mailto:paul.hastings@gmail.com] 
Sent: Monday, April 22, 2013 10:02 AM
To: dev@flex.apache.org
Subject: Re: What's the easiest way to tell if a character is a letter or not?

On 4/22/2013 10:23 PM, Justin Mclean wrote:
> I'm not that familiar with Chinese date formats.. Can someone provide me with some examples?

gregorian calendar, using latest icu4j lib (running full,long,medium,short formats):

zh_CN/zh_HK locales
date==================
2013年4月22日星期一
2013年4月22日
2013年4月22日
13-4-22
time==================
下午11:58:57 [印度支那時間]
下午11:58:57 [GMT+7]
下午11:58:57
下午11:58

zh_TW locale
date==================
2013年4月22日星期一
2013年4月22日
2013/4/22
2013/4/22

time==================
印度支那時間上午12時00分24秒
GMT+7上午12時00分24秒
上午12:00:24
上午12:00


i'm in bangkok, so the tz is UTC+7.

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
On 4/22/2013 10:23 PM, Justin Mclean wrote:
> I'm not that familiar with Chinese date formats.. Can someone provide me with some examples?

gregorian calendar, using latest icu4j lib (running full,long,medium,short formats):

zh_CN/zh_HK locales
date==================
2013年4月22日星期一
2013年4月22日
2013年4月22日
13-4-22
time==================
下午11:58:57 [印度支那時間]
下午11:58:57 [GMT+7]
下午11:58:57
下午11:58

zh_TW locale
date==================
2013年4月22日星期一
2013年4月22日
2013/4/22
2013/4/22

time==================
印度支那時間上午12時00分24秒
GMT+7上午12時00分24秒
上午12:00:24
上午12:00


i'm in bangkok, so the tz is UTC+7.

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> My first thought was: 'what about Asia?'

Both before and after my change this fails:
var date:Date = DateFormatter.parseDateString("2008年12月31日");
var df:DateFormatter = new DateFormatter("DD MMM YYYY");

Japanese for those who don't know and it fails as it doesn't recognise 年, 月 or 日 as seperators.

I'm not that familiar with Chinese date formats.. Can someone provide me with some examples?

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Alex Harui <ah...@adobe.com>.


On 4/22/13 3:47 AM, "Harbs" <ha...@gmail.com> wrote:

> A quick test on this seems to work with all characters in the latin range with
> the exception of ß (probably because there's no official capital and
> lowercaseŠ)
My first thought was: 'what about Asia?'

> 
> On Apr 22, 2013, at 1:41 PM, Harbs wrote:
> 
>> What about this:
>> 
>> if(letter.toLowerCase() != letter.toUpperCase()){
>> // do your stuff...
>> }
>> 
>> I have no idea how performance compares to figuring it out yourself thoughŠ
>> 
>> On Apr 22, 2013, at 4:05 AM, Justin Mclean wrote:
>> 
>>> Hi,
>>> 
>>> Anyone know how to work out if a character, including unicode characters, is
>>> a letter or not?
>>> 
>>> This is not the right way of doing it:
>>> 
>>>           if ("a" <= letter && letter <= "z" ||
>>>               "A" <= letter && letter <= "Z")
>>> 
>>> From the DateFormatter class for the curious.
>>> 
>>> Thanks,
>>> Justin
>>> 
>>> 
>>> 
>> 
> 

-- 
Alex Harui
Flex SDK Team
Adobe Systems, Inc.
http://blogs.adobe.com/aharui


Re: What's the easiest way to tell if a character is a letter or not?

Posted by Harbs <ha...@gmail.com>.
A quick test on this seems to work with all characters in the latin range with the exception of ß (probably because there's no official capital and lowercase…)

On Apr 22, 2013, at 1:41 PM, Harbs wrote:

> What about this:
> 
> if(letter.toLowerCase() != letter.toUpperCase()){
> 	// do your stuff...
> }
> 
> I have no idea how performance compares to figuring it out yourself though…
> 
> On Apr 22, 2013, at 4:05 AM, Justin Mclean wrote:
> 
>> Hi,
>> 
>> Anyone know how to work out if a character, including unicode characters, is a letter or not?
>> 
>> This is not the right way of doing it:
>> 
>>           if ("a" <= letter && letter <= "z" ||
>>               "A" <= letter && letter <= "Z")
>> 
>> From the DateFormatter class for the curious.
>> 
>> Thanks,
>> Justin
>> 
>> 
>> 
> 


Re: What's the easiest way to tell if a character is a letter or not?

Posted by Harbs <ha...@gmail.com>.
What about this:

if(letter.toLowerCase() != letter.toUpperCase()){
	// do your stuff...
}

I have no idea how performance compares to figuring it out yourself though…

On Apr 22, 2013, at 4:05 AM, Justin Mclean wrote:

> Hi,
> 
> Anyone know how to work out if a character, including unicode characters, is a letter or not?
> 
> This is not the right way of doing it:
> 
>            if ("a" <= letter && letter <= "z" ||
>                "A" <= letter && letter <= "Z")
> 
> From the DateFormatter class for the curious.
> 
> Thanks,
> Justin
> 
> 
> 


Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
On 4/22/2013 9:23 AM, Justin Mclean wrote:

> Would that catch characters with accents and the like this? eg "Février", "Décembre", "Ağustos" or "Eylül"?

no, you'd need to extend the range a bit, 0x0041-0x00FF.

though it depends on the languages/locales you wanted to support, which is 
somewhat limited by the locales that default to gregorian calendar. for 
instance, going to support greek, cyrillic, etc.? then you'd need a wider range 
of codepoints.

a good resource for date part names, if you don't mind a little java, is ICU4J's 
com.ibm.icu.text.DateFormatSymbols class. give it a locale & it will dump out 
full date part names, short ones, etc. based on CLDR, which is about as official 
as you're going to get. there's also this (though no source is provided):

http://as3localedata.riaforge.org/


btw i thought someone had ported ICU4J's calendars to AS3. wondering what 
happened to that?




Re: What's the easiest way to tell if a character is a letter or not?

Posted by Nicholas Kwiatkowski <ni...@spoon.as>.
It looks like you will need to add the following :  192 - 255, exclusive of
247 & 215 (multiplication and division symbol codes).  That should cover
all the latin characters as well, but it won't do anything for non-latin
characters.

http://ascii-table.com/ansi-codes.php

You may have to add 181, 167, 159, 158, 156, 154, 142 and 138, but I'm not
sure how common those are in non-english languages, and how many of those
are truly considered "symbols".

-Nick


On Sun, Apr 21, 2013 at 10:23 PM, Justin Mclean <ju...@classsoftware.com>wrote:

> Hi,
>
> > You would probably want to use the charCodeAt() function on the string.
>
> Would that catch characters with accents and the like this? eg "Février",
> "Décembre", "Ağustos" or "Eylül"?
>
> That's the issue.
>
> Thanks,
> Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
HI,

> If I understand you, I think that is actually the right solution.  I think
> you want to identify if a sequence of chars is a valid format pattern".
This bit of the code is just skipping month names and is broken for locales with anything outside a-z,A-Z in them. 

This is in parseDateString in DateFormatter.

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Alex Harui <ah...@adobe.com>.


On 4/21/13 8:43 PM, "Justin Mclean" <ju...@classsoftware.com> wrote:

> Hi,
> 
> Thank Paul for all of that - very interesting and possible one way to go as
> the code is under MIT licence.
> 
> I'm now thinking the quick fix solution is to skip anything that's not a space
> or format character as that's the smaller set. At this point in the code it
> doesn't really care if the month name is actually a valid one.
> 
If I understand you, I think that is actually the right solution.  I think
you want to identify if a sequence of chars is a valid format pattern".
After all, what is the definition of a 'letter' and why aren't symbols
allowed.

-- 
Alex Harui
Flex SDK Team
Adobe Systems, Inc.
http://blogs.adobe.com/aharui


Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

Thank Paul for all of that - very interesting and possible one way to go as the code is under MIT licence.

I'm now thinking the quick fix solution is to skip anything that's not a space or format character as that's the smaller set. At this point in the code it doesn't really care if the month name is actually a valid one.

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Paul Hastings <pa...@gmail.com>.
take that back, http://as3localedata.riaforge.org/ does have source.

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Justin Mclean <ju...@classsoftware.com>.
Hi,

> You would probably want to use the charCodeAt() function on the string.

Would that catch characters with accents and the like this? eg "Février", "Décembre", "Ağustos" or "Eylül"?

That's the issue.

Thanks,
Justin

Re: What's the easiest way to tell if a character is a letter or not?

Posted by Nicholas Kwiatkowski <ni...@spoon.as>.
You would probably want to use the charCodeAt() function on the string.
 Assuming you want to search for printable, latin characters, the codes
would be 65 - 90 (upper case A-Z) and 97 - 122 (lower case a-z).

if ((letter.charCodeAt(0) >= 65) && (letter.charCodeAt(0) <= 90)) ||
((letter.charCodeAt(0) >= 97) && (letter.charCodeAt(0) <= 122))
  {
     stuff
  }

If you don't care if you catch the printable characters of [\]^_`  you
could just use the range of 65 -> 122.  A full ASCII code table is at
http://ascii-table.com/ascii.php    (or if you have an MS-DOS book
available)

-Nick


On Sun, Apr 21, 2013 at 9:05 PM, Justin Mclean <ju...@classsoftware.com>wrote:

> Hi,
>
> Anyone know how to work out if a character, including unicode characters,
> is a letter or not?
>
> This is not the right way of doing it:
>
>             if ("a" <= letter && letter <= "z" ||
>                 "A" <= letter && letter <= "Z")
>
> From the DateFormatter class for the curious.
>
> Thanks,
> Justin
>
>
>
>