You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Steve Cohen <sc...@javactivity.org> on 2004/09/25 17:19:55 UTC

[NET] Designing a Date Format-aware FTP Entry Parser

Designing a Date Format-aware FTP Entry Parser

After having percolated on the back burner for several years as an unresolved 
issue, there is finally some momentum toward solving the problem of parsing 
FTP entries from servers which format the file timestamps in the directory 
listings in a format other than the NetComponents “standard”. 

In order to understand what must be done, it would be helpful to understand 
what we now do.  In brief, we are using a regular expression to achieve 
basically the same results as attempting to parse the date portion of the 
listing with one of two alternate java.text.SimpleDateFormats in the en_US 
locale:
1.MMM dd HH:mm for dates within one year of the current time
2.MMM dd yyyy for dates older than one year.

Additionally, these formats presume some timezone, which is either the local 
timezone of the server or GMT, I presume.

The alternative mechanism that I am proposing would remove the parsing of the 
timestamp from the responsibilities of the regular expression and unload this 
onto some other object. 

But what object?  The obvious candidate would be java.text.DateFormat.  This 
abstract class allows a formatter object to be created on the basis of some 
formatting codes defined in DateFormat (“LONG, MEDIUM, SHORT”) and a Locale.  
But this is problematic because what is meant by MEDIUM in en_US is a string 
like “Sep 25, 2004” while in “de_DE”, you get a string like “25.09.2004”.  
This just won't do.  So we have to fall back on java.text.SimpleDateFormat, 
passing in both a specific formatting string and a Locale, which provides the 
month names, etc.  (By the way, has anyone ever noticed that SimpleDateFormat 
is actually less simple than DateFormat?) :-)

The regular expression would merely extract from the listing the entire 
timestamp portion and delegate the task of parsing it to a pair of  
SimpleDateFormat objects (one for less than 1 year old and the other for one 
year old or older), each constructed on the basis of a format string and a 
locale.  Since the Locale should be the same for both formats, we would 
require the user to provide the two format Strings, and the Locale (or 
possibly the constituent elements of the locale, the country code and 
language code).  We want an object that encapsulates all of that, say,
org.apache.commons.net.ftp.parser.FTPDateFormat.

So each parser would have a settable member of this class   FTPDateFormat 
would be constructed from two format strings and a Locale.  Possibly a 
timezone as well.  We probably would have to provide some default 
FTPDateFormat objects for some of the common locales.

One consequence of this is that we would start making heavier use of the 
FTPFileEntryParserFactory objects.  We might want to start thinking about 
deemphasizing but not deprecating the use of FTPClient.listFiles() which is 
simple but makes too many assumptions.  There are already four or five 
different overrides of this method name and adding several more parameters 
into the mix will make this completely unworkable.  Instead, going through 
the factory would become the more common, more documented and recommended 
approach.  This would be the preferred method of accessing commons-net ftp 
for clients such as Ant and VFS.  Users who are happily using listFiles() in 
its current form in their custom apps built directly from commons-net could 
continue to do so.

Well, these are some preliminary thoughts.  Let's hear from the other 
developers of this project.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>If you think that I meant for the user to pass in FTPDateFormat objects, you 
>misunderstood me. The paradigm I want to use here is passing in strings.  
>  
>
No, i got it. But maybe we could do it like the FTPFileEntryParser do? 
Try to determine if the use passed a FQCN else treat it as date/locale 
string.

>For the Ant client 
>task it is much easier to assemble the strings and construct the needed 
>objects ourselves, than it is to make the user do it.
>  
>
This is why i created in VFS a 
o.a..c.v.u.DelegatingFileSystemOptionsBuilder (again a long name ;-).
I dont wanted to maintain two methods for each possible configuration 
setting - one with a string as parameter and another with the real class 
type.

This class is responsible to get a configuration-key-name and its value 
as string.
It tries to lookup the targeted configuration-methods (by the key name) 
parsers its method parameters and tries to find a way to convert the 
string to the desired type.
This is done by lookup a
*) constructor with only one String as parameter
*) static valueOf(String) method
on the targeted object.

I think this is a nice glue between configuration-by-string (as in ant) 
and configuration-by-code where i would like to have compile time checks.
 
But for sure, this might go too far just for the date/locale setting.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Tuesday 28 September 2004 7:34 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
> >All this setting would go on as setters on a factory class that the user
> > would not have to use.  If they didn't setLocale, en_US would be the
> > default. If they setLocale but not either date recent or older date
> > format, then the standard US ordering would be used but the Locale month
> > names.  If they specified Locale and older date format, we could infer
> > the newer date format as well.  And if they specified everything, we
> > could handle that case too.
>
> At least could you please implement this by passing in a e.g.
> FTPDateObject as you stated in one of your previous mails.
>
> This sould have a method like
> Date FTPDateObject.parse(String datepart)
> or something else.
>
> That way one is able to pass in a completely different sort of date
> parses - like the one i have in mind - which is able to automatically
> determine the right month without have to set any locale (as long as the
> date parts are in correct order)
>
> ---
> Mario

If you think that I meant for the user to pass in FTPDateFormat objects, you 
misunderstood me.  The paradigm I want to use here is passing in strings.  
The FTPDateFormat object was just a way of organizing thoughts.  It is an 
object that the user would rarely if ever see.  I have learned from earlier 
ideas of "passing in a parser" that the more complex the object you are 
trying to pass in, the more difficult it is for the user.  For the Ant client 
task it is much easier to assemble the strings and construct the needed 
objects ourselves, than it is to make the user do it.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>All this setting would go on as setters on a factory class that the user would 
>not have to use.  If they didn't setLocale, en_US would be the default. If 
>they setLocale but not either date recent or older date format, then the 
>standard US ordering would be used but the Locale month names.  If they 
>specified Locale and older date format, we could infer the newer date format 
>as well.  And if they specified everything, we could handle that case too.
>  
>
At least could you please implement this by passing in a e.g. 
FTPDateObject as you stated in one of your previous mails.

This sould have a method like
Date FTPDateObject.parse(String datepart)
or something else.

That way one is able to pass in a completely different sort of date 
parses - like the one i have in mind - which is able to automatically 
determine the right month without have to set any locale (as long as the 
date parts are in correct order)

---
Mario

Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen schrieb:

>I guess I don't have a problem with making a composite parser, which you could 
>make the default for VFS if it works, but I don't think it can be the default 
>for NetComponents itself.
>
re composite parser: You can stick this work on me as soon as the 
framework has materialized.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Thursday 30 September 2004 7:09 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
> >This business of constantly churning does bother me.
>
> I hope I dont leave to negative impressions on you.

I didn't mean to leave such an impression.

>
> If we find an agreement it could be possible with the way you build the
> framework - it is good enough for me.

I guess I don't have a problem with making a composite parser, which you could 
make the default for VFS if it works, but I don't think it can be the default 
for NetComponents itself.  It's too radical a step, and even if it works 
exactly as planned, it will impose some performance penalty on non-VFS users.

I think you ought to wait until the basic framework is completed, and I think 
it would be good to accomodate your use case with the proper hooks.

But again, the basic functionality should not change.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Rory Winston <rw...@eircom.net>.
I'm also in favour of the general staying staying exactly the same as it is now, except for the (few) edge cases that cause us problems. Using something like setShortMonthNames() as Steve mentioned earlier sounds reasonable enough to me to catch Locale-related language issues, and the data formatting we can allow the user to specify exactly, but *only* when they know they have a problem and need to use that functionality.

-----Original Message-----
From: Mario Ivankovits [mailto:mario@ops.co.at]
Sent: 30 September 2004 13:09
To: Jakarta Commons Developers List
Subject: Re: [NET] Designing a Date Format-aware FTP Entry Parser


Steve Cohen wrote:

>This business of constantly churning does bother me. 
>
I hope I dont leave to negative impressions on you.


If we find an agreement it could be possible with the way you build the 
framework - it is good enough for me.


---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>This business of constantly churning does bother me. 
>
I hope I dont leave to negative impressions on you.


If we find an agreement it could be possible with the way you build the 
framework - it is good enough for me.


---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Thursday 30 September 2004 5:39 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
>  >First parser to successfully parse every
>  >item in the listing I suppose.  Or do you go for best score?
>
> line-by-line - the first parser which is able to parse should be cached
> (ad performance) - that way it might be only slower on the first match
> However the parser should be prepared to redetect the language as soon
> as it fails at a later time - maybe there
> are minor differences between languages and the first detection wasnt
> correct -
> e.g. Mar  (March) might not be uniq if you talk to a german ftp server
> wich do not use umlauts (März => Mar)

This business of constantly churning does bother me. 


>
>  >What if none of
>  >the parsers in your composite works?  Then what?
>
> Like now - a "null" entry in the list of returned entries.
> Or we change the paradigm NET uses today and throw an exception - but
> this is worth a thread on its own ;-)
>
>  >2) Will we be opening ourselves to arguments as to which languages are
>
> "in"
>
>  >the composite?  Or in which order?  If you're using Italian and it has
>
> to try
>
>  >US English, British English, French and German first, your performance is
>  >going to be lousy.  Which brings me to
>
> Is there a difference between US and British?

The original complaint which got this started, about AIX comes from Britain.  
http://issues.apache.org/bugzilla/show_bug.cgi?id=27437
I believe the month names will be the same, but not necessarily the order.  In 
Linux, I found that the month-day order was preserved regardless of locale. 
(although my test was with French, not en_UK). In that defect there is an 
example about AIX where there was a difference between en_US and en_UK.

>
> Performance: As i said - we could cache the last matching language -
> then only the first search might be slow.
>
> Such a composite might only fail if one have to use croatic and polnish
> language at once. There the names "lis" and "lip" means different
> months. (at least of the point of "java short names" view)

So you are saying that between these two languages, that the same 
abbreviations in one language refer to different months in another?  That is 
a real problem for autodetection.  I guess you could say it's not affecting 
the most common languages.   But it doesn't make me happy.

> This is why i am not against your solution at all, the composite parser
> should only be one additional possiblity - and IMHO the default parser.

I agree that the composite parser could be a possibility.  I disagree 
vehemently that it should be the default.

>
> I think this composite could be configureable by a static map (system
> wide). There I would like to configure it
> to detect "US", "DE", "FR" (in this order) and i am fine with 100% of
> all ftp server i have to contact today.
> In the case of ant it could be configured by e.g lang="US,DE,FR"
> Or by a system property, .... or .... we could discuss this if we found
> a consens at all.
>
> And we should also discuss that you dont want to take SYST into account
> - or at least the possiblity to do so, but this depends also for which
> file entry parsers you would like to implement the date stuff. Currently
> I am only aware the fact the unices to this language stuff.
>
>  >3) This is too much run-time trial and error for my tastes.  The
>
> average user
>
>  >of our library is not writing the ultimate FTP client.  He is writing
>
> a java
>
>  >app or Ant script to connect repeatedly to an FTP server somewhere.
>
> Once he
>
>  >gets the right parser, he never has need of trying others for that
>  > server.
>
> ... or using VFS. And VFS would like the be the super ftp, ssh, ....
> client. Like a filesystem works - the user dont want to be bothered with
> things like date styles.

OK, I understand you a little better, you are approaching this from the angle 
of VFS.  So, you could make your composite parser the default parser USED BY 
VFS.  In other usages, where our user is simply setting up a little system to 
talk to some specific ftp server about which he knows all the details, the 
composite parser is a needless performance drain.

>
> For sure, I am not fully against the solution you have in mind, i just
> would like to ensure it is posssible to pass
> in a parser which uses a completley different strategy.
> And again: The user do not have to choose a file-entry-parser now - is
> is done automatically by SYST (i know you know ;-)) -
> but now we force him to select the correct date format - today if he
> changes the url (and a appropriate parser
> is available) the file parsing works without any additional attention.

No, we force him to do nothing.  My goal, expressed a few posts back, was that 
the system work by default exactly as it does now.  The additional 
functionalities would only exist to help him out of the odd cases.  Changing 
the default parser that autodetection provides could provide some real 
surprises.

>
> <vision>
> Maybe we would provide a parser with a TreeMap where all month names and
> their numbers are stored - the community could
> help to fill this map - or a properties file which could easily be changed.
> </vision>
>
>  >4) On the other hand, your idea could be the basis of a pretty cool
>
> tool based
>
>  >on NetComponents: point it at an FTP server somewhere, let it try all the
>  >tricks it knows, and somehow it returns its best guess as to what
>
> parser and
>
>  >parser date format to use for that server.
>
> Thats the point - like to comfort we provide with the automatich
> detection of the needet file-entry-parser.
> Computers should work for humans and not humans for computers ;-)
>
> As i tried to say earlier: Today the parsing works pretty well - we do
> have problems only with the month
> name (and unknown servers). As long as the date parts are not in
> different order (based on the language)
> why implement such a drastic change in the comfort NET provides today -
> A black box where the user passes
> in an url and gets a file listing is what the user really wants.

I think you are proposing a swiss-army-knife.  While this could be an 
indispensable tool in a few situations, it's an inefficient answer for the 
great majority of them.  Yes, there should be a swiss army knife, but it 
should not be the default.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

 >First parser to successfully parse every
 >item in the listing I suppose.  Or do you go for best score?

line-by-line - the first parser which is able to parse should be cached 
(ad performance) - that way it might be only slower on the first match
However the parser should be prepared to redetect the language as soon 
as it fails at a later time - maybe there
are minor differences between languages and the first detection wasnt 
correct -
e.g. Mar  (March) might not be uniq if you talk to a german ftp server 
wich do not use umlauts (März => Mar)

 >What if none of
 >the parsers in your composite works?  Then what?

Like now - a "null" entry in the list of returned entries.
Or we change the paradigm NET uses today and throw an exception - but 
this is worth a thread on its own ;-)

 >
 >2) Will we be opening ourselves to arguments as to which languages are 
"in"
 >the composite?  Or in which order?  If you're using Italian and it has 
to try
 >US English, British English, French and German first, your performance is
 >going to be lousy.  Which brings me to

Is there a difference between US and British?

Performance: As i said - we could cache the last matching language - 
then only the first search might be slow.

Such a composite might only fail if one have to use croatic and polnish 
language at once. There the names "lis" and "lip" means different
months. (at least of the point of "java short names" view)
This is why i am not against your solution at all, the composite parser 
should only be one additional possiblity - and IMHO the default parser.

I think this composite could be configureable by a static map (system 
wide). There I would like to configure it
to detect "US", "DE", "FR" (in this order) and i am fine with 100% of 
all ftp server i have to contact today.
In the case of ant it could be configured by e.g lang="US,DE,FR"
Or by a system property, .... or .... we could discuss this if we found 
a consens at all.

And we should also discuss that you dont want to take SYST into account 
- or at least the possiblity to do so, but this depends also for which
file entry parsers you would like to implement the date stuff. Currently 
I am only aware the fact the unices to this language stuff.

 >3) This is too much run-time trial and error for my tastes.  The 
average user
 >of our library is not writing the ultimate FTP client.  He is writing 
a java
 >app or Ant script to connect repeatedly to an FTP server somewhere.  
Once he
 >gets the right parser, he never has need of trying others for that server.

... or using VFS. And VFS would like the be the super ftp, ssh, .... client.
Like a filesystem works - the user dont want to be bothered with things 
like date styles.

For sure, I am not fully against the solution you have in mind, i just 
would like to ensure it is posssible to pass
in a parser which uses a completley different strategy.
And again: The user do not have to choose a file-entry-parser now - is 
is done automatically by SYST (i know you know ;-)) -
but now we force him to select the correct date format - today if he 
changes the url (and a appropriate parser
is available) the file parsing works without any additional attention.

<vision>
Maybe we would provide a parser with a TreeMap where all month names and 
their numbers are stored - the community could
help to fill this map - or a properties file which could easily be changed.
</vision>

 >4) On the other hand, your idea could be the basis of a pretty cool 
tool based
 >on NetComponents: point it at an FTP server somewhere, let it try all the
 >tricks it knows, and somehow it returns its best guess as to what 
parser and
 >parser date format to use for that server.

Thats the point - like to comfort we provide with the automatich 
detection of the needet file-entry-parser.
Computers should work for humans and not humans for computers ;-)

As i tried to say earlier: Today the parsing works pretty well - we do 
have problems only with the month
name (and unknown servers). As long as the date parts are not in 
different order (based on the language)
why implement such a drastic change in the comfort NET provides today - 
A black box where the user passes
in an url and gets a file listing is what the user really wants.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Wednesday 29 September 2004 2:27 pm, Mario Ivankovits wrote:
> Steve Cohen wrote:

>
> Maybe you are right, but at least I think it should be possible to
> implement a "CompositeDateFormat".
> This could be a composite of n languages and it tries every (configured
> - by default eg. US, FR, DE) language (and maybe by using
> SimpleDateFormat) until a match is found.
> Same thing we did for the NTFTPFileEntryParser to automatically
> distinguish between NT and UNIX format.

Whew! I'm trying to get my mind around that one!  I see problems you would 
need to address.

1) What is the meaning of "tries"?  First parser to successfully parse every 
item in the listing I suppose.  Or do you go for best score?  What if none of 
the parsers in your composite works?  Then what?

2) Will we be opening ourselves to arguments as to which languages are "in" 
the composite?  Or in which order?  If you're using Italian and it has to try 
US English, British English, French and German first, your performance is 
going to be lousy.  Which brings me to

3) This is too much run-time trial and error for my tastes.  The average user 
of our library is not writing the ultimate FTP client.  He is writing a java 
app or Ant script to connect repeatedly to an FTP server somewhere.  Once he 
gets the right parser, he never has need of trying others for that server.

4) On the other hand, your idea could be the basis of a pretty cool tool based 
on NetComponents: point it at an FTP server somewhere, let it try all the 
tricks it knows, and somehow it returns its best guess as to what parser and 
parser date format to use for that server.
>
> Maybe you might not want an implementation of CompositeDateFormat in the
> main version of net, but it would be nice if this could be possible.
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>The advantage of 2 is that you still get a Date object after all your pains, 
>more easily that you do from rolling your own off a regex.  And it's easier 
>to use SimpleDateFormat format strings than regular expressions.  Finally, 
>there is more calendar logic in SimpleDateFormat than in our reqular 
>expressions.  Please note that using our regexes Feb 30 is a legitimate date 
>in our regex system.
>  
>
Maybe you are right, but at least I think it should be possible to 
implement a "CompositeDateFormat".
This could be a composite of n languages and it tries every (configured 
- by default eg. US, FR, DE) language (and maybe by using 
SimpleDateFormat) until a match is found.
Same thing we did for the NTFTPFileEntryParser to automatically 
distinguish between NT and UNIX format.

Maybe you might not want an implementation of CompositeDateFormat in the 
main version of net, but it would be nice if this could be possible.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Tuesday 28 September 2004 8:02 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
> >>>>form jui 7
> >>>>rather than
> >>>>7 jui
>
> Steven, we have a problem?

Yes indeed.  Thanks!  Ouch!
>
> I have tried to parse the date you shown in your ftp-locales test.
>
>         SimpleDateFormat sdf = new SimpleDateFormat("MMM dd", new
> Locale("fr", "FR"));
>         Date dt = sdf.parse("jui 7");
>
> "jui 7" is not parseable!!!!!!
> java.text.ParseException: Unparseable date: "jui 7"
>     at java.text.DateFormat.parse(DateFormat.java:335)
>
> while "juil."  (javas short french form) is.
>
>         SimpleDateFormat sdf = new SimpleDateFormat("MMM dd", new
> Locale("fr", "FR"));
>         Date dt = sdf.parse("juil. 7");

This is what I get for making assumptions in this field (and a strong 
cautionary note to those who might still want to try for an automated 
auto-detect system.

Noting the strong similarities between "ls" listings and listings from ftp 
servers, I assumed that what I see in a unix directory listing created with a 
specific unix "LANG" would be the same as what we would see in java, created 
with a specific "Locale".  Because American unix directory and ftp listings 
use the same month abbreviations as those returned by 
SimpleDateFormat.getDateFormatSymbols().getShortMonths(), I erroneously 
assumed that this must be the case for all LANGs and their equivalent 
Locales.  

As you so cogently point out, that's not the case!  Doh!

Which means, back to the drawing board!  Java and its SimpleDateFormats are 
not going to help us as much.  There is no simple path from a Locale to a 
directory listing date format.

So I see two possiblities.
1) Parse the date with a regular expression but make the month names a 
settable parameter.
2) Parse the date with a special SimpleDateFormat constructed on the fly:

private SimpleDateFormat createDateFormatter(
	String formatString,  /* e.g "MMM dd"*/
	String monthNames) /* e.g "jan|fév|mar|avr|mai|jun|jui|aoû|sep|oct|nov|déc"*/
{
	sdf = new SimpleDateFormat(formatString); 
	sdf.getDateFormatSymbols().setShortMonthNames(monthNames.split("|");  
/*
yes,I know that String.split() is java 1.4 specific, this is just for 
simplicity here.  Any actual implementation could not use the split() method.
*/
}

The advantage of 2 is that you still get a Date object after all your pains, 
more easily that you do from rolling your own off a regex.  And it's easier 
to use SimpleDateFormat format strings than regular expressions.  Finally, 
there is more calendar logic in SimpleDateFormat than in our reqular 
expressions.  Please note that using our regexes Feb 30 is a legitimate date 
in our regex system.

In either case, for the sake of user convenience we might still want to tie 
some preset constant month  to locales (and possibly system types), even 
though java's implementation does not produce the same symbols natively as 
unix directory listings do.


>
> ---
> Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>>>>form jui 7
>>>>rather than
>>>>7 jui
>>>>        
>>>>
Steven, we have a problem?

I have tried to parse the date you shown in your ftp-locales test.

        SimpleDateFormat sdf = new SimpleDateFormat("MMM dd", new 
Locale("fr", "FR"));
        Date dt = sdf.parse("jui 7");

"jui 7" is not parseable!!!!!!
java.text.ParseException: Unparseable date: "jui 7"
    at java.text.DateFormat.parse(DateFormat.java:335)

while "juil."  (javas short french form) is.

        SimpleDateFormat sdf = new SimpleDateFormat("MMM dd", new 
Locale("fr", "FR"));
        Date dt = sdf.parse("juil. 7");

---
Mario

Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Tuesday 28 September 2004 7:11 am, Rory Winston wrote:
> Steve,
>
> This sounds like it could be the way forward. This way, the user doesn't
> have to specify anything extra unless they really need to. The only
> question is, do we generate regexes on the fly, or pull out the enire date
> string? I would be inclined to go for the latter option.

Me too.
>
> -----Original Message-----
> From: Steve Cohen [mailto:scohen@javactivity.org]
> Sent: 28 September 2004 12:24
> To: Jakarta Commons Developers List
> Subject: Re: [NET] Designing a Date Format-aware FTP Entry Parser
>
> On Monday 27 September 2004 7:50 am, Mario Ivankovits wrote:
> > Steve Cohen wrote:
> > >I created a hypothetical French user
> > >named Jacques on my system, gave him "LANG" of "fr_FR", logged in as
> > > him, and got French directory listings, although the dates were of the
> > > form jui 7
> > >rather than
> > >7 jui
> >
> > So it is as i thought - at least for the unix like ftp server. The date
> > format isnt really true locale-specific, only the month name is
> > converted. I am not sure if it is worth to implement the whole date stuff
> > just to handle the month name - we could achieve the same by simply
> > provide a static month-name list and a
> > static addMonth(String name, int number) which one can use to add some
> > month-names we do not maintain in our default list.
>
> Locale + SimpleDateFormat provides an easier way to do this.  A
> SimpleDateFormat is constructed with the Locale as a parameter and then
> SimpleDateFormat.getShortMonthNames() provides a list of month
> abbreviations for that locale.
>
> Another option, though, is NOT to use regular expressions for the date
> parsing at all.  Let the regex pull out the entire date portion and then
> parse that with the SimpleDateFormat.
>
> > This is fairly easy to implement.
> > But i dont know what Rory found for NT and therefore i dont know if this
> > might work there too.
> >
> > >It seems to me that we might need no other identifier than Locale.  I
> > > would caution once again that we not get this mixed up with SYST.  I
> > > would proceed for now as though there is no way to automate this. 
> > > Later if we find such a way we can build for it.
> >
> > But you found that the french date wasnt relly printed in its typical
> > manner, maybe another server will do.
> > So it is possible you might end up in two French locale definitions and
> > then the user has to encode this fact into the locale name e.g "fr_FR"
> > and "fr_FR_xyzserver".
> > For sure, in this case my proposed solution might not work too.
> >
> > The question is: Are there server where the date parts are really
> > printed in different order depending on the locale?
>
> That is not what I meant when I said we might need no other IDENTIFIER than
> a locale.  That is, if the user supplied "fr_FR" we would construct a
> SimpleDateFormat("MMM dd", new Locale("fr", "FR"))
> and if he supplied "en_US" we would construct thus:
> SimpleDateFormat("MMM dd", new Locale("en", "US"))
>
> That is to say, we would NOT infer date-month-year ordering from the
> locale, at least for the unix-like parsers.
>
> But there would be a way for the user to supply the date format string as
> well as locale so as to get
> SimpleDateFormat("dd MMM", new Locale("fr", "FR"))
> if it is required.
>
> All this setting would go on as setters on a factory class that the user
> would not have to use.  If they didn't setLocale, en_US would be the
> default. If they setLocale but not either date recent or older date format,
> then the standard US ordering would be used but the Locale month names.  If
> they specified Locale and older date format, we could infer the newer date
> format as well.  And if they specified everything, we could handle that
> case too.
>
> > What date problems do have users reported till today?
> > Acutally i only read the aix language problem (and seen your french
> > test).
>
> We seem to see one or two of these a year.
>
> > ---
> > Mario
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Rory Winston <rw...@eircom.net>.
Steve,

This sounds like it could be the way forward. This way, the user doesn't have to specify anything extra unless they really need to. The only question is, do we generate regexes on the fly, or pull out the enire date string? I would be inclined to go for the latter option.

-----Original Message-----
From: Steve Cohen [mailto:scohen@javactivity.org]
Sent: 28 September 2004 12:24
To: Jakarta Commons Developers List
Subject: Re: [NET] Designing a Date Format-aware FTP Entry Parser


On Monday 27 September 2004 7:50 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
> >I created a hypothetical French user
> >named Jacques on my system, gave him "LANG" of "fr_FR", logged in as him,
> > and got French directory listings, although the dates were of the form
> > jui 7
> >rather than
> >7 jui
>
> So it is as i thought - at least for the unix like ftp server. The date
> format isnt really true locale-specific, only the month name is converted.
> I am not sure if it is worth to implement the whole date stuff just to
> handle the month name - we could achieve the same by simply provide a
> static month-name list and a
> static addMonth(String name, int number) which one can use to add some
> month-names we do not maintain in our default list.

Locale + SimpleDateFormat provides an easier way to do this.  A 
SimpleDateFormat is constructed with the Locale as a parameter and then 
SimpleDateFormat.getShortMonthNames() provides a list of month abbreviations 
for that locale.

Another option, though, is NOT to use regular expressions for the date parsing 
at all.  Let the regex pull out the entire date portion and then parse that 
with the SimpleDateFormat.

> This is fairly easy to implement.
> But i dont know what Rory found for NT and therefore i dont know if this
> might work there too.
>
> >It seems to me that we might need no other identifier than Locale.  I
> > would caution once again that we not get this mixed up with SYST.  I
> > would proceed for now as though there is no way to automate this.  Later
> > if we find such a way we can build for it.
>
> But you found that the french date wasnt relly printed in its typical
> manner, maybe another server will do.
> So it is possible you might end up in two French locale definitions and
> then the user has to encode this fact into the locale name e.g "fr_FR"
> and "fr_FR_xyzserver".
> For sure, in this case my proposed solution might not work too.
>
> The question is: Are there server where the date parts are really
> printed in different order depending on the locale?

That is not what I meant when I said we might need no other IDENTIFIER than a 
locale.  That is, if the user supplied "fr_FR" we would construct a 
SimpleDateFormat("MMM dd", new Locale("fr", "FR")) 
and if he supplied "en_US" we would construct thus:
SimpleDateFormat("MMM dd", new Locale("en", "US")) 

That is to say, we would NOT infer date-month-year ordering from the locale, 
at least for the unix-like parsers.

But there would be a way for the user to supply the date format string as well 
as locale so as to get
SimpleDateFormat("dd MMM", new Locale("fr", "FR")) 
if it is required.

All this setting would go on as setters on a factory class that the user would 
not have to use.  If they didn't setLocale, en_US would be the default. If 
they setLocale but not either date recent or older date format, then the 
standard US ordering would be used but the Locale month names.  If they 
specified Locale and older date format, we could infer the newer date format 
as well.  And if they specified everything, we could handle that case too.

>
> What date problems do have users reported till today?
> Acutally i only read the aix language problem (and seen your french test).

We seem to see one or two of these a year.


>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Monday 27 September 2004 7:50 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
> >I created a hypothetical French user
> >named Jacques on my system, gave him "LANG" of "fr_FR", logged in as him,
> > and got French directory listings, although the dates were of the form
> > jui 7
> >rather than
> >7 jui
>
> So it is as i thought - at least for the unix like ftp server. The date
> format isnt really true locale-specific, only the month name is converted.
> I am not sure if it is worth to implement the whole date stuff just to
> handle the month name - we could achieve the same by simply provide a
> static month-name list and a
> static addMonth(String name, int number) which one can use to add some
> month-names we do not maintain in our default list.

Locale + SimpleDateFormat provides an easier way to do this.  A 
SimpleDateFormat is constructed with the Locale as a parameter and then 
SimpleDateFormat.getShortMonthNames() provides a list of month abbreviations 
for that locale.

Another option, though, is NOT to use regular expressions for the date parsing 
at all.  Let the regex pull out the entire date portion and then parse that 
with the SimpleDateFormat.

> This is fairly easy to implement.
> But i dont know what Rory found for NT and therefore i dont know if this
> might work there too.
>
> >It seems to me that we might need no other identifier than Locale.  I
> > would caution once again that we not get this mixed up with SYST.  I
> > would proceed for now as though there is no way to automate this.  Later
> > if we find such a way we can build for it.
>
> But you found that the french date wasnt relly printed in its typical
> manner, maybe another server will do.
> So it is possible you might end up in two French locale definitions and
> then the user has to encode this fact into the locale name e.g "fr_FR"
> and "fr_FR_xyzserver".
> For sure, in this case my proposed solution might not work too.
>
> The question is: Are there server where the date parts are really
> printed in different order depending on the locale?

That is not what I meant when I said we might need no other IDENTIFIER than a 
locale.  That is, if the user supplied "fr_FR" we would construct a 
SimpleDateFormat("MMM dd", new Locale("fr", "FR")) 
and if he supplied "en_US" we would construct thus:
SimpleDateFormat("MMM dd", new Locale("en", "US")) 

That is to say, we would NOT infer date-month-year ordering from the locale, 
at least for the unix-like parsers.

But there would be a way for the user to supply the date format string as well 
as locale so as to get
SimpleDateFormat("dd MMM", new Locale("fr", "FR")) 
if it is required.

All this setting would go on as setters on a factory class that the user would 
not have to use.  If they didn't setLocale, en_US would be the default. If 
they setLocale but not either date recent or older date format, then the 
standard US ordering would be used but the Locale month names.  If they 
specified Locale and older date format, we could infer the newer date format 
as well.  And if they specified everything, we could handle that case too.

>
> What date problems do have users reported till today?
> Acutally i only read the aix language problem (and seen your french test).

We seem to see one or two of these a year.


>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>I created a hypothetical French user 
>named Jacques on my system, gave him "LANG" of "fr_FR", logged in as him, and 
>got French directory listings, although the dates were of the form
>jui 7 
>rather than 
>7 jui
>  
>
So it is as i thought - at least for the unix like ftp server. The date 
format isnt really true locale-specific, only the month name is converted.
I am not sure if it is worth to implement the whole date stuff just to 
handle the month name - we could achieve the same by simply provide a 
static month-name list and a
static addMonth(String name, int number) which one can use to add some 
month-names we do not maintain in our default list.
This is fairly easy to implement.
But i dont know what Rory found for NT and therefore i dont know if this 
might work there too.

>It seems to me that we might need no other identifier than Locale.  I would 
>caution once again that we not get this mixed up with SYST.  I would proceed 
>for now as though there is no way to automate this.  Later if we find such a 
>way we can build for it.
>  
>
But you found that the french date wasnt relly printed in its typical 
manner, maybe another server will do.
So it is possible you might end up in two French locale definitions and 
then the user has to encode this fact into the locale name e.g "fr_FR" 
and "fr_FR_xyzserver".
For sure, in this case my proposed solution might not work too.

The question is: Are there server where the date parts are really 
printed in different order depending on the locale?

What date problems do have users reported till today?
Acutally i only read the aix language problem (and seen your french test).

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Rory Winston <rw...@eircom.net>.
>But this brings up the possibility that non-anonymous FTP might produce 
>different results than anonymous FTP to the same server!

>All of which argues for the user being able to specify all the relevant 
>parameters, even though we go to some length to assure that he doesn't often 
>have to.

Right - I agree. I also concur that I don't think there will be a universal way to automate this. 

Re: the Locale issue, would this mean that we would need to provide a different, say, month-parsing regex per Locale ?

-----Original Message-----
From: Steve Cohen [mailto:scohen@javactivity.org]
Sent: 27 September 2004 12:39
To: Jakarta Commons Developers List
Subject: Re: [NET] Designing a Date Format-aware FTP Entry Parser


On Monday 27 September 2004 1:51 am, Mario Ivankovits wrote:
> Steve Cohen schrieb:
> >Where I was sort of heading was a combination of these, since I'm still
> > not sure that server locale implies a particular date format.
>
> You might be right, though i think the combination of server-type/locale
> will be sufficient for a reproducible result.
> At least as long as the ftp-server might not allow the user to define a
> custom dateformat.
> But you are right if you mean the automatic detection could lead to a
> complicated task - i think it is possible that some ftp-server changed
> the date-format during their versions.
> Maybe - if it comes to the automatic detection - we might see we have to
> use the version-part of the SYST command too.
>
> >I don't think so, and I'm not sure that anything
> >forces an ftp server to format its listings in the locale-specific way.
>
> I think a ftp-server might use a "posix" format or the servers locale
> format - other strategies might make not much sense.

Well, I was just looking at some of the man files for ftp on my linux box.  I 
didn't find what I was looking for (some method of configuring the ftpd 
daemon in the ways we are talking about, such as the examples that Rory found 
the other day for NT.  But I found other things that gave me pause - for 
non-anonymous ftp, the daemon looks at the user's shell.  Might it perhaps in 
some cases look at the user's locale?  I created a hypothetical French user 
named Jacques on my system, gave him "LANG" of "fr_FR", logged in as him, and 
got French directory listings, although the dates were of the form
jui 7 
rather than 
7 jui

But this brings up the possibility that non-anonymous FTP might produce 
different results than anonymous FTP to the same server!

All of which argues for the user being able to specify all the relevant 
parameters, even though we go to some length to assure that he doesn't often 
have to.

>
> But again - it might make not much sense to spend much effort in this
> now, what if we simply prepare the locale/date structure to be ready for
> such a thing - so we do not have to refactor the api any time later.
> For now a simply "ident" might be enough - later we could use it for the
> server-type (and/or the server-version from SYST)

It seems to me that we might need no other identifier than Locale.  I would 
caution once again that we not get this mixed up with SYST.  I would proceed 
for now as though there is no way to automate this.  Later if we find such a 
way we can build for it.
<snip>
>
> What if we first try to collect some directory listings.
> Maybe we might see only the name of the month changes - then we do not
> need to build all this, but simply extend the parsing of the month name
> to a multi-language like style.

Possibly.  The trouble with collecting directory listings, though, is that 
it's precisely the private ones which you or I don't have access to that will 
be the most problematic, but also, it's precisely these that are probably the 
main targets of users who want to code applications using our API.  But in 
the listings you'll collect you''ll necessarily be limited to public sites 
and those few private ones to which you have access.

> The positive effekt could be we do not need to bother the user with this.
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Monday 27 September 2004 1:51 am, Mario Ivankovits wrote:
> Steve Cohen schrieb:
> >Where I was sort of heading was a combination of these, since I'm still
> > not sure that server locale implies a particular date format.
>
> You might be right, though i think the combination of server-type/locale
> will be sufficient for a reproducible result.
> At least as long as the ftp-server might not allow the user to define a
> custom dateformat.
> But you are right if you mean the automatic detection could lead to a
> complicated task - i think it is possible that some ftp-server changed
> the date-format during their versions.
> Maybe - if it comes to the automatic detection - we might see we have to
> use the version-part of the SYST command too.
>
> >I don't think so, and I'm not sure that anything
> >forces an ftp server to format its listings in the locale-specific way.
>
> I think a ftp-server might use a "posix" format or the servers locale
> format - other strategies might make not much sense.

Well, I was just looking at some of the man files for ftp on my linux box.  I 
didn't find what I was looking for (some method of configuring the ftpd 
daemon in the ways we are talking about, such as the examples that Rory found 
the other day for NT.  But I found other things that gave me pause - for 
non-anonymous ftp, the daemon looks at the user's shell.  Might it perhaps in 
some cases look at the user's locale?  I created a hypothetical French user 
named Jacques on my system, gave him "LANG" of "fr_FR", logged in as him, and 
got French directory listings, although the dates were of the form
jui 7 
rather than 
7 jui

But this brings up the possibility that non-anonymous FTP might produce 
different results than anonymous FTP to the same server!

All of which argues for the user being able to specify all the relevant 
parameters, even though we go to some length to assure that he doesn't often 
have to.

>
> But again - it might make not much sense to spend much effort in this
> now, what if we simply prepare the locale/date structure to be ready for
> such a thing - so we do not have to refactor the api any time later.
> For now a simply "ident" might be enough - later we could use it for the
> server-type (and/or the server-version from SYST)

It seems to me that we might need no other identifier than Locale.  I would 
caution once again that we not get this mixed up with SYST.  I would proceed 
for now as though there is no way to automate this.  Later if we find such a 
way we can build for it.
<snip>
>
> What if we first try to collect some directory listings.
> Maybe we might see only the name of the month changes - then we do not
> need to build all this, but simply extend the parsing of the month name
> to a multi-language like style.

Possibly.  The trouble with collecting directory listings, though, is that 
it's precisely the private ones which you or I don't have access to that will 
be the most problematic, but also, it's precisely these that are probably the 
main targets of users who want to code applications using our API.  But in 
the listings you'll collect you''ll necessarily be limited to public sites 
and those few private ones to which you have access.

> The positive effekt could be we do not need to bother the user with this.
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Rory Winston <rw...@eircom.net>.
I think that the SimpleDateFormat approach is the way to go. I guess really what we want to achieve is:

 - No disruption to the current API semantics (i.e. a user may only have to choose an FTPDateFormat object if they really *have* to, otherwise, things work as normal);
 - A mechanism that is pluggable across multiple parser implementations.

I don't think that there is any way around the fact that we may require a user to explicitly enter a date format if they are using a "problematic" system - as Steve has metioned before, the FTP spec is kind of vague when it comes to these sort of specifics, and we can't rely on implementation consistency. FTP is one protocol that is pretty "autodetect-unfriendly" :)

I guess what I would like to see is a connection-specific DateFormat, somthing like:

FTPClient client = new FTPClient();

client.connect(server, FTPDateFormat.getFormat("dd-mm-YYYY"));

or

client.setDateFormat("dd-mm-yyy");

- something like that. Here's a question: if required, would we only need to ask the user for the "less-than-1-year" date format? i.e. given the less-than-1-year Date format, can we reliably identify the older-than-1-year format from that information?

Inside the **Parser class, we could have:

  private FTPDateFormat format;

  private static final String REGEX =
        "((?:0[1-9])|(?:1[0-2]))-"
        + "((?:0[1-9])|(?:[1-2]\\d)|(?:3[0-1]))-"
        + getFTPPDateFormat()		
        + "(\\S.*)";

	
So the FTPDateFormat class maps regexes to date format strings. This would also mean that we could need to parameterize the following code:

String mo = group(1);
            String da = group(2);
            String yr = group(3);
            String hr = group(4);
            String min = group(5);
            String ampm = group(6);

Perhaps we could hand this off to an FTPDateFormatParser class as well?

I guess ideally, what I would like to be able to do, is for the worst-case scenario, I would have to pass an extra parameter to the Ant task:

 <ftp ..... dateFormat="dd/mm/yyyy">

And it would process the listings accordingly.

-----Original Message-----
From: Steve Cohen [mailto:scohen@javactivity.org]
Sent: 27 September 2004 11:53
To: Jakarta Commons Developers List
Subject: Re: [NET] Designing a Date Format-aware FTP Entry Parser


On Monday 27 September 2004 1:51 am, Mario Ivankovits wrote:
> Steve Cohen schrieb:
> >Where I was sort of heading was a combination of these, since I'm still
> > not sure that server locale implies a particular date format.
>
> You might be right, though i think the combination of server-type/locale
> will be sufficient for a reproducible result.
> At least as long as the ftp-server might not allow the user to define a
> custom dateformat.
> But you are right if you mean the automatic detection could lead to a
> complicated task - i think it is possible that some ftp-server changed
> the date-format during their versions.
> Maybe - if it comes to the automatic detection - we might see we have to
> use the version-part of the SYST command too.
>
> >I don't think so, and I'm not sure that anything
> >forces an ftp server to format its listings in the locale-specific way.
>
> I think a ftp-server might use a "posix" format or the servers locale
> format - other strategies might make not much sense.
>
> But again - it might make not much sense to spend much effort in this
> now, what if we simply prepare the locale/date structure to be ready for
> such a thing - so we do not have to refactor the api any time later.
> For now a simply "ident" might be enough - later we could use it for the
> server-type (and/or the server-version from SYST)
>
> >public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
> >	Locale locale,
> >	SimpleDateFormat newerThanOneYear,
> >	SimpleDateFormat olderThanOneYear)
>
> public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
>
> 	*String ident,*
>
> 	Locale locale,
> 	SimpleDateFormat newerThanOneYear,
> 	SimpleDateFormat olderThanOneYear)
>
> >public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
> >	Locale locale)
>
> public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
>
> 	*String ident,*
>
> 	Locale locale)
>
>
> What if we first try to collect some directory listings.
> Maybe we might see only the name of the month changes - then we do not
> need to build all this, but simply extend the parsing of the month name
> to a multi-language like style.
> The positive effekt could be we do not need to bother the user with this.
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
On Monday 27 September 2004 1:51 am, Mario Ivankovits wrote:
> Steve Cohen schrieb:
> >Where I was sort of heading was a combination of these, since I'm still
> > not sure that server locale implies a particular date format.
>
> You might be right, though i think the combination of server-type/locale
> will be sufficient for a reproducible result.
> At least as long as the ftp-server might not allow the user to define a
> custom dateformat.
> But you are right if you mean the automatic detection could lead to a
> complicated task - i think it is possible that some ftp-server changed
> the date-format during their versions.
> Maybe - if it comes to the automatic detection - we might see we have to
> use the version-part of the SYST command too.
>
> >I don't think so, and I'm not sure that anything
> >forces an ftp server to format its listings in the locale-specific way.
>
> I think a ftp-server might use a "posix" format or the servers locale
> format - other strategies might make not much sense.
>
> But again - it might make not much sense to spend much effort in this
> now, what if we simply prepare the locale/date structure to be ready for
> such a thing - so we do not have to refactor the api any time later.
> For now a simply "ident" might be enough - later we could use it for the
> server-type (and/or the server-version from SYST)
>
> >public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
> >	Locale locale,
> >	SimpleDateFormat newerThanOneYear,
> >	SimpleDateFormat olderThanOneYear)
>
> public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
>
> 	*String ident,*
>
> 	Locale locale,
> 	SimpleDateFormat newerThanOneYear,
> 	SimpleDateFormat olderThanOneYear)
>
> >public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
> >	Locale locale)
>
> public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
>
> 	*String ident,*
>
> 	Locale locale)
>
>
> What if we first try to collect some directory listings.
> Maybe we might see only the name of the month changes - then we do not
> need to build all this, but simply extend the parsing of the month name
> to a multi-language like style.
> The positive effekt could be we do not need to bother the user with this.
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen schrieb:

>Where I was sort of heading was a combination of these, since I'm still not 
>sure that server locale implies a particular date format.
>
You might be right, though i think the combination of server-type/locale 
will be sufficient for a reproducible result.
At least as long as the ftp-server might not allow the user to define a 
custom dateformat.
But you are right if you mean the automatic detection could lead to a 
complicated task - i think it is possible that some ftp-server changed 
the date-format during their versions.
Maybe - if it comes to the automatic detection - we might see we have to 
use the version-part of the SYST command too.

>I don't think so, and I'm not sure that anything 
>forces an ftp server to format its listings in the locale-specific way.
>  
>
I think a ftp-server might use a "posix" format or the servers locale 
format - other strategies might make not much sense.

But again - it might make not much sense to spend much effort in this 
now, what if we simply prepare the locale/date structure to be ready for 
such a thing - so we do not have to refactor the api any time later.
For now a simply "ident" might be enough - later we could use it for the 
server-type (and/or the server-version from SYST)

>public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
>	Locale locale, 
>	SimpleDateFormat newerThanOneYear, 
>	SimpleDateFormat olderThanOneYear)
>  
>
public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(

	*String ident,*

	Locale locale, 
	SimpleDateFormat newerThanOneYear, 
	SimpleDateFormat olderThanOneYear)

>public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
>	Locale locale)
>  
>
public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(

	*String ident,*

	Locale locale)


What if we first try to collect some directory listings.
Maybe we might see only the name of the month changes - then we do not 
need to build all this, but simply extend the parsing of the month name 
to a multi-language like style.
The positive effekt could be we do not need to bother the user with this.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Steve Cohen <sc...@javactivity.org>.
Where I was sort of heading was a combination of these, since I'm still not 
sure that server locale implies a particular date format.  It maybe defines 
month abbreviations and the ordering of day, month, and year within a date, 
but does it define whether a numeric-only date format, as opposed to an 
abbreviated month is used?  I don't think so, and I'm not sure that anything 
forces an ftp server to format its listings in the locale-specific way.

This brings back to the FTPDateFormat object which I see as a place to resolve 
all this uncertainty and ambiguity.  And a factory to aid in the creation

public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
	Locale locale, 
	SimpleDateFormat newerThanOneYear, 
	SimpleDateFormat olderThanOneYear)

I see default static final FTPDateFormat objects for each of the locales so 
that another, simpler factory method could create those

public static FTPDateFormat FTPDateFormatFactory.createFTPDateFormat(
	Locale locale)

this would return the static object we created as the default for that locale.
This would probably be the preferred way to access it.

I would be very conservative in trying to automate this - I would, in fact, 
not do so in the first release.  My goal in the first release would be to 
implement all this stuff but have all current implementations work as before.

We simply have no idea of the relative prevalence of arrangements in the real 
world at this time.  But we should devote some effort to defining the way of 
accessing this functionality that is as painless for the user as possible.

Then when complaints came in, we would have something to recommend which is 
better than having nothing to recommend.


On Sunday 26 September 2004 1:51 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
> >and delegate the task of parsing it to a pair of
> >SimpleDateFormat objects (one for less than 1 year old and the other for
> > one year old or older), each constructed on the basis of a format string
> > and a locale.
>
> Sounds good at all, just one additional question: How should the user
> pass in these date parsers?
>
> 1) explicitly set the date parser per connections
> But this might work against the idea behind the default file entry parser.
> The default file entry parser uses some "magic" to decide the real
> parser and hide the pain about the different styles from the user.
> Depending on the result the possible date formats could be known too
> (except for the locale for sure).
> If the user needs to set a real date parser implementation he always has
> to take the result of the DefaultFileEntryParser into account.
>
> This brings me to
> 2) only set a java.util.Locale per connection
> and pick the needet date parser - in combination with the result of SYST
> - out of a date parser pool.
>
> For sure - it should be possible to do 1) but this should not be the
> preferred way.
>
> ---
> Mario
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [NET] Designing a Date Format-aware FTP Entry Parser

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

>and delegate the task of parsing it to a pair of  
>SimpleDateFormat objects (one for less than 1 year old and the other for one 
>year old or older), each constructed on the basis of a format string and a 
>locale.
>
Sounds good at all, just one additional question: How should the user 
pass in these date parsers?

1) explicitly set the date parser per connections
But this might work against the idea behind the default file entry parser.
The default file entry parser uses some "magic" to decide the real 
parser and hide the pain about the different styles from the user.
Depending on the result the possible date formats could be known too 
(except for the locale for sure).
If the user needs to set a real date parser implementation he always has 
to take the result of the DefaultFileEntryParser into account.

This brings me to
2) only set a java.util.Locale per connection
and pick the needet date parser - in combination with the result of SYST 
- out of a date parser pool.

For sure - it should be possible to do 1) but this should not be the 
preferred way.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org