You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by mark benedetto king <mb...@lowlatency.com> on 2003/12/14 19:37:57 UTC

RFC: date parser rewrite

The exiting bison-based date-parser suffers from several deficiencies,
among which are the following:

  1.) it is bison-based, which complicates the build and release processes,
      especially for Windows.
  2.) it parses an extremely large set of possible date formats, sometimes
      in surprising and counterintuitive ways.
  3.) it is complex enough to discourage maintainance and enhancement
  4.) it does not support ISO 8601 date formats.
  5.) it it not perfectly MP-safe
  6.) it is a bit of an eyesore.

If we are to replace it, we must do so before 1.0 so that its replacement
will not be forced to support a superset of its behaviour.

I propose the following implementation strategy (this is loosely based
on work from C. Scott Ananian <ca...@lesser-magoo.lcs.mit.edu>, available
at http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=6093):

The current local time and current UTC time must be passed to the date
parser.

Take the first match amongst:

1.) the following ISO 8601 extended date-time formats:

  YYYY-MM-DDThh:mm:ss.sTZI
  YYYY-MM-DDThh:mm:ss,sTZI
  YYYY-MM-DDThh:mm:ssTZI
  YYYY-MM-DDThh:mmTZI
  YYYY-MM-DDThhTZI

Where "TZI" stands for "Time Zone Indicator", and can be one of:

  Z
  +HH
  -HH
  +HH:MM
  -HH:MM

2.) the following ISO 8601 basic date-time formats:

  YYYYMMDDThhmmss.sTZI
  YYYYMMDDThhmmss,sTZI
  YYYYMMDDThhmmssTZI
  YYYYMMDDThhmmTZI
  YYYYMMDDThhTZI

Where "TZI" stands for "Time Zone Indicator", and can be one of:

  Z
  +HH
  -HH
  +HH:MM
  -HH:MM

3.) the following formats, which will be adjusted to UTC from local time.

  YYYY-MM-DD HH:MI:SS
  YYYY-MM-DD HH:MI

4.) the following formats, which are presumed to reference midnight
    local time of the specified day.

  YYYY-MM-DD
  YYYYMMDD

5.) the following formats, which are presumed to reference local time
  of the current day.

  HH:MI:SS
  HH:MI

I believe that these formats are necessary and sufficient for the 1.0
release.  The merits of more complex specifications, such as "two
fortnights ago", "four score and seven minutes hence", and "the Ides of
March in the most recent Chinese year of the dragon" can be discussed
afterwards.

Note: the existing book references several date formats that are
not supported in this proposal.  Those references would need to be
updated.


--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Jostein, is 1.0 STATUS file in 0.35.0 branch okay?

Posted by kf...@collab.net.
"Jostein Chr. Andersen" <jo...@josander.net> writes:
> > > I'll ask Jostein if he minds us starting it in the 0.35 branch,
> > > since that leads directly to the 1.0 line.
> >
> > Jostein, do you mind if I start a STATUS file on the 0.35 branch,
> > since, unlike trunk, that leads to the 1.0-stabilization line?
> 
> I don't mind :-)

Okay, thanks.  I'll save this permission for later, just in case :-).
For now, we're trying the issue tracker route.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Jostein, is 1.0 STATUS file in 0.35.0 branch okay?

Posted by "Jostein Chr. Andersen" <jo...@josander.net>.
On Monday 15 December 2003 22.12, kfogel@collab.net wrote:
> > I'll ask Jostein if he minds us starting it in the 0.35 branch,
> > since that leads directly to the 1.0 line.
>
> Jostein, do you mind if I start a STATUS file on the 0.35 branch,
> since, unlike trunk, that leads to the 1.0-stabilization line?

I don't mind :-)

-- 
http://www.josander.net/kontakt/ ||
http://www.josander.net/en/contact/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Jostein, is 1.0 STATUS file in 0.35.0 branch okay?

Posted by kf...@collab.net.
kfogel@collab.net writes:
> I think we really need to get that STATUS file under version control
> right now, so we can start to compile a complete list of the proposed
> changes.  I'm finding it hard to evaluate this one without the others
> around it.
> 
> I'll ask Jostein if he minds us starting it in the 0.35 branch, since
> that leads directly to the 1.0 line.

Jostein, do you mind if I start a STATUS file on the 0.35 branch,
since, unlike trunk, that leads to the 1.0-stabilization line?

The STATUS file is described in HACKING, see the section "Stabilizing
a release branch".  It's basically a place to record votes about
whether something will or will not get merged into 1.0.  (I've been
assuming that once developers agree on a particular change being
merged, you're okay with any full committer actually doing the merge.
"Release manager" doesn't have to mean "merge slave" :-) ).

Anyway, it would be nice to start that STATUS file now so we can start
getting an overview of all the issues out there.  There are a lot of
them; I think when we see the sheer number, that will automatically
make us a bit more conservative about choosing them.

Whatever state the STATUS file is in at the 0.35 release is fine, it's
really about 1.0, not 0.35.  I'd port r8005 over to 0.35 too, that's
the new HACKING section about the STATUS file.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by kf...@collab.net.
Greg Hudson <gh...@MIT.EDU> writes:
> Yes, thus "one place (the -r '{DATE}' argument)".  I did not mention the
> "log" command anywhere in the text you quoted.

Wow.  I don't know what I was smoking there.

> No, that's not the bug.  The bug is that getdate.y supports a massive
> and poorly-defined set of date formats (I posted several egregious
> examples earlier), and we don't want to be trapped into supporting a
> superset of those formats for all time.

According to MBK, lack of ISO 8601 support is part of the problem.
But what you describe above is also a problem, I agree.

I think we really need to get that STATUS file under version control
right now, so we can start to compile a complete list of the proposed
changes.  I'm finding it hard to evaluate this one without the others
around it.

I'll ask Jostein if he minds us starting it in the 0.35 branch, since
that leads directly to the 1.0 line.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2003-12-15 at 15:21, kfogel@collab.net wrote:
> Greg Hudson <gh...@MIT.EDU> writes:
> > +1.  Since I think we only use this kind of date-parsing in one place
> > (the -r '{DATE}' argument), I don't think this change is very
> > destabilizing, even though it churns a lot of code.
> 
> Don't we use it anywhere the -r argument is accepted?  Maybe it's most
> commonly used with 'svn log', but it *could* be used by any command
> that accepts -r (with or without a range).

Yes, thus "one place (the -r '{DATE}' argument)".  I did not mention the
"log" command anywhere in the text you quoted.

> The current situation is certainly not ideal, but it is compatible
> with CVS dates.

I don't think that's terribly important.

> Do we really need a whole new implementation for 1.0?  If the bug is
> that we don't support ISO 8601 dates

No, that's not the bug.  The bug is that getdate.y supports a massive
and poorly-defined set of date formats (I posted several egregious
examples earlier), and we don't want to be trapped into supporting a
superset of those formats for all time.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by kf...@collab.net.
Greg Hudson <gh...@MIT.EDU> writes:
> +1.  Since I think we only use this kind of date-parsing in one place
> (the -r '{DATE}' argument), I don't think this change is very
> destabilizing, even though it churns a lot of code.

Don't we use it anywhere the -r argument is accepted?  Maybe it's most
commonly used with 'svn log', but it *could* be used by any command
that accepts -r (with or without a range).

Anyway:

The current situation is certainly not ideal, but it is compatible
with CVS dates.  Would it be a smaller change for us to just make
getdate.y handle whatever date formats we want it to handle that it
currently doesn't?

That wouldn't get rid of the ugliness of having getdate.y in our tree,
nor the burden of having getdate.cw... But I don't think that's been a
particularly noticeable burden, either, since getdate.y is quite
static except for occasional point fixes like this.

Do we really need a whole new implementation for 1.0?  If the bug is
that we don't support ISO 8601 dates, how hard would it be to fix just
that?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2003-12-14 at 14:37, mark benedetto king wrote:
> If we are to replace it, we must do so before 1.0 so that its replacement
> will not be forced to support a superset of its behaviour.

+1.  Since I think we only use this kind of date-parsing in one place
(the -r '{DATE}' argument), I don't think this change is very
destabilizing, even though it churns a lot of code.

Your chosen date formats look fine to me.

> I believe that these formats are necessary and sufficient for the 1.0
> release.  The merits of more complex specifications, such as "two
> fortnights ago", "four score and seven minutes hence", and "the Ides of
> March in the most recent Chinese year of the dragon" can be discussed
> afterwards.

These aren't date formats actually supported by getdate.y, but there are
some really weird ones.  For example:

  2 days ago yesterday 4
  2 fortnights ago 1630
  third thursday


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by John Szakmeister <jo...@szakmeister.net>.
On Sunday 14 December 2003 14:37, mark benedetto king wrote:
> [snip]
> I believe that these formats are necessary and sufficient for the 1.0
> release.  The merits of more complex specifications, such as "two
> fortnights ago", "four score and seven minutes hence", and "the Ides of
> March in the most recent Chinese year of the dragon" can be discussed
> afterwards.

+1 from me.  I don't feel we need all of the current ones, and I very much 
like the idea of anything that makes the build process easier on Windows. :-)

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by "C. Michael Pilato" <cm...@collab.net>.
mark benedetto king <mb...@lowlatency.com> writes:

> If we are to replace it, we must do so before 1.0 so that its replacement
> will not be forced to support a superset of its behaviour.

+1.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Erik Huelsmann <e....@gmx.net>.
> I believe that these formats are necessary and sufficient for the 1.0
> release.  The merits of more complex specifications, such as "two
> fortnights ago", "four score and seven minutes hence", and "the Ides of
> March in the most recent Chinese year of the dragon" can be discussed
> afterwards.

+1.

Extending a clearly defined set with clearly defined extras is IMO clearly
prefferable to having to support a not very clearly defined set as a minimum.

bye,

Erik.

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by mark benedetto king <mb...@lowlatency.com>.
On Mon, Dec 15, 2003 at 01:57:31PM -0600, Jens B. Jorgensen wrote:
> mark benedetto king wrote:
> 
> >4.) the following formats, which are presumed to reference midnight
> >   local time of the specified day.
> >
> > YYYY-MM-DD
> > YYYYMMDD
> >
> > 
> >
> Just to clarify for myself. It sounds like you mean: YYYY-MM-DD ==> 
> YYYY-MM-DD 24:00 (you get the idea).
> 

No, I meant 00:00.   There is no such thing as 24:00, even during
leap seconds.   I should have been explicit, though.

> Does anyone think it makes more sense for YYYY-MM-DD ==> YYYY-MM-DD 00:00?
> 

Yes, that makes much more sense. :-)

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by "Jens B. Jorgensen" <je...@tallan.com>.
mark benedetto king wrote:

>4.) the following formats, which are presumed to reference midnight
>    local time of the specified day.
>
>  YYYY-MM-DD
>  YYYYMMDD
>
>  
>
Just to clarify for myself. It sounds like you mean: YYYY-MM-DD ==> 
YYYY-MM-DD 24:00 (you get the idea).

Does anyone think it makes more sense for YYYY-MM-DD ==> YYYY-MM-DD 00:00?

That one stuck out at me right away since the later would be my 
assumption. I tried to look for some precedents:

  o The "at" command (at least the linux one, I presume it doesn't 
differ in this respect from other unices) interprets a bare date to mean 
you want to run the command on that date at the _current local time_. 
Weird huh? I guess it makes sense.

  o The Oracle rdbms interprets dates without time to be 00:00:00.

  o strptime (which is defined in the POSIX standard) converts just a 
date spec to a struct tm with the tm_sec, tm_min, tm_hour all initialized.

Anyhow perhaps this is what you meant in the first place but "midnight 
of date d" just sounds to me like the time at the end of that day rather 
than the start.

-- 
Jens B. Jorgensen
jens.jorgensen@tallan.com

"With a focused commitment to our clients and our people, we deliver value through customized technology solutions."


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Friday, December 19, 2003 7:57 PM -0500 Greg Hudson <gh...@MIT.EDU> 
wrote:

> anywhere in the code base right now.  Or that APR provides no way to
> statically initialize mutexes.  (Nor do we have a library-wide
> initialization function or global context in which to store a
> runtime-initialized mutex.)

Well, that's because many mutexes don't allow for static initialization.  If 
it were possible, it'd be there.  ;-)

I've suggested before that Subversion needs an svn_initialize() function for 
this purpose to allow us to later setup 'global' context.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2003-12-19 at 19:34, Julian Foad wrote:
> The only thing as far as I am concerned that would be intolerable is
> guaranteeing to support all of the currently accepted date formats in
> future.  As I mentioned, we can solve that by documenting that the
> nasty formats are not supported even though the code happens to accept
> them currently.

I think it would be naive of us to assume that people generally read the
documentation before using date formats.  We would be taking only a
futile swing at the problem, not solving it.

> >    1.) protect the logic with a big mutex

> That sounds like a moderately important thing to do, and a small
> enough change that porting it into 1.0 is a reasonable option.

It's not a small change when you consider that we don't use mutexes
anywhere in the code base right now.  Or that APR provides no way to
statically initialize mutexes.  (Nor do we have a library-wide
initialization function or global context in which to store a
runtime-initialized mutex.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Julian Foad <ju...@btopenworld.com>.
mark benedetto king wrote:
> On Fri, Dec 19, 2003 at 02:44:21PM -0600, kfogel@collab.net wrote:
> 
>>This thread's been dormant for a bit, so I'm afraid to disturb it :-).

Heh - yes.

For what it's worth, I think Karl is talking sense with regard to how this fits into the big picture of releasing the first "proper" version of Subversion.

The only thing as far as I am concerned that would be intolerable is guaranteeing to support all of the currently accepted date formats in future.  As I mentioned, we can solve that by documenting that the nasty formats are not supported even though the code happens to accept them currently.

> If we decide not to replace the date parser before 1.0, then I suggest
> we:
>    1.) protect the logic with a big mutex

That sounds like a moderately important thing to do, and a small enough change that porting it into 1.0 is a reasonable option.

>    2.) follow your suggestion to add support for the formats we want
>        to add by wrapping the existing logic

No need to do that for version 1.0.  Version 1.0 will still be useful without the new formats, as it has been for a year and more.  We can add new formats in version 1.1 or 1.2 or any time later.

>    3.) *Deprecate* all behavior outside the strictly limited input set
>        I proposed earlier.

Yes - that's the important point.  It can be as simple as changing this part of the book:

Subversion accepts an incredible number of date formats—just remember to use quotes around any date that contains spaces. Here are just a few of the formats that Subversion accepts:

$ svn checkout --revision {2002-02-17}
$ svn checkout --revision {2/17/02}
$ svn checkout --revision {"17 Feb"}
$ svn checkout --revision {"17 Feb 2002"}
$ svn checkout --revision {"17 Feb 2002 15:30"}
$ svn checkout --revision {"17 Feb 2002 15:30:12 GMT"}
$ svn checkout --revision {"10 days ago"} 
$ svn checkout --revision {"last week"} 
$ svn checkout --revision {"yesterday"} 
…
      
to something like:

You should specify the date as YYYY-MM-DD hh:mm:ss.  (Subversion currently accepts an incredible number of date formats but they won't be supported in future.)  Remember to use quotes around any date that contains spaces.
      

- Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by mark benedetto king <mb...@lowlatency.com>.
On Fri, Dec 19, 2003 at 03:55:04PM -0600, kfogel@collab.net wrote:
> mark benedetto king <mb...@lowlatency.com> writes:
> > I expect my implementation to be less than 100 lines of code.  I think
> > Greg Hudson would be willing to review it thoroughly, and probably
> > Brane would too.  Yes, they could be fixing other bugs or reviewing
> > other changes, instead...
> 
> Would you be reluctant to put in the effort until you knew it was "in
> principle" approved for 1.0?  It would be so nice to actually have the
> patch before going farther... But I understand your time is limited.

Sure.  If things are as simple as I claim, your concerns will probably be
allayed.  If not, we'll have real code to point at and make harder
decisions about.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by kf...@collab.net.
mark benedetto king <mb...@lowlatency.com> writes:
> I expect my implementation to be less than 100 lines of code.  I think
> Greg Hudson would be willing to review it thoroughly, and probably
> Brane would too.  Yes, they could be fixing other bugs or reviewing
> other changes, instead...

Would you be reluctant to put in the effort until you knew it was "in
principle" approved for 1.0?  It would be so nice to actually have the
patch before going farther... But I understand your time is limited.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by mark benedetto king <mb...@lowlatency.com>.
On Fri, Dec 19, 2003 at 02:44:21PM -0600, kfogel@collab.net wrote:
> This thread's been dormant for a bit, so I'm afraid to disturb it :-).
> 
> However, eventually someone's going to file it as a 1.0 candidate
> change (probably after they have a patch ready), or move issue #408 to
> be a 1.0 candidate.

That was my plan.

> So, this is an anticipatory post explaining why I hope we do *not*
> tackle this problem for 1.0.
> 
> Reason 1: As much as we dislike our current date parser, it hasn't
>           caused us many actual problems.  It's not like we had a
>           bunch of defects filed about date parsing, and then realized
>           that we couldn't release 1.0 without fixing them.  We have
>           issue #408, sure, but that's "date parser rewrite", so it
>           would be circular to view it as a justification in itself :-).

It has at least one major defect: it is not thread-safe.  It's not even
close.  This would be a problem for GUI clients and/or server-side
consumers of our APIs.  I suppose that it could be *made* thread-safe,
we could (horrors) protect it with a big mutex, for example.

It also, for example, parses "2003-12-17T17:00:00" differently from
"2003-12-17 17:00:00".  I don't know if that's a bug or not, but it's
a bit surprising.  (If you're interested, 2003-12-17T17:00:00 comes
out as "Wed Dec 17 10:00:00 UTC 2003".  I'm in EST, which doesn't
account for a 7 hour time difference, even if the 'T' were to somehow
magically stand for "this date-time is in UTC".  Hmm...perhaps it's
off by twelve hours the wrong way... hard to say).

> Reason 2: Unlike some of the simple API fixes and whatnot we've been
>           bandying about, this actually has the potential to cause
>           bugs.  It'd need a lot of review, and even then it's still
>           possible for there to be subtle misparsings that don't get
>           found until after 1.0.

I expect my implementation to be less than 100 lines of code.  I think
Greg Hudson would be willing to review it thoroughly, and probably
Brane would too.  Yes, they could be fixing other bugs or reviewing
other changes, instead...

>           maintain.  If we write a new parser, it certainly wouldn't
>           differently parse any string that is correctly parsed by the
>           old code.  So, we can just pass input to the old code first,
>           then if it fails, we try the new code.  That way we don't
>           have to "maintain" the old code, so much as preserve it in
>           stasis until the new code matches it in sophistication -- a
>           process that can take as long as we're comfortable with,
>           since there's no great penalty to keeping the old code
>           around.

The semantics of the existing parser are ill-defined at best.  It is
unlikely that any implementation that does not have, as a design goal,
to match a superset of the defacto date-language domain will wind up
compatible.  If it is axiomatic that "one day" we will remove that code
from the tree, it follows that on that day we will stop parsing some
non-empty set of strings that we used to parse.

When that happens, it is likely that some (possibly not many, granted)
scripts, programs, documentation, printed manuals, user's guides, insider's
guides, etc, will be broken.  Undocumented features are still features.

> 
> For these reasons, I think it is unnecessary (and even a bit risky) to
> futz with our date parser before 1.0.  Let's leave well enough alone.
> 

If we decide not to replace the date parser before 1.0, then I suggest
we:
   1.) protect the logic with a big mutex
   2.) follow your suggestion to add support for the formats we want
       to add by wrapping the existing logic
   3.) *Deprecate* all behavior outside the strictly limited input set
       I proposed earlier.

Consider, though: I expect the implementation of (2) to be roughly the
same, in terms of complexity, whether I try to only parse these
"new formats", or whether I try to parse the entire set of formats
covered in the proposal.  If step (2) introduces all the instability
that you are hesitant to incur, we need to omit it completely.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Erik Huelsmann <e....@gmx.net>.
> For the record, I disagree with all of your reasons.
> 
>    1. The current situation is intolerable. We nominally support date
>       formats that don't warrant that name. Whatever we do to change
>       that, we'll be yanking suppor for at least some of those date
>       formats, and doing that after 1.0 (after we've known about this
>       problem for ages) is a bad move.
>    2. We don't support date formats that we _should_ support.
>    3. Going with a limited set of supported date formats, like MBK
>       suggests, isn't going to cause that many problems. The parsing
>       code is a simple function that can be tested to death very easily.
> 
> +1 for changing the date parser before 1.0. We'll do ourselves and our
> users a favour.

I too am very much in favor of starting with a clean (possibly small)
well-defined set of dates. Extending that is easier than having to maintain the
current - documented or not - set.

bye,

eErik.

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by kf...@collab.net.
Branko Čibej <br...@xbc.nu> writes:
>    1. The current situation is intolerable. We nominally support date
>       formats that don't warrant that name. Whatever we do to change
>       that, we'll be yanking suppor for at least some of those date
>       formats, and doing that after 1.0 (after we've known about this
>       problem for ages) is a bad move.

I have an existence proof that the current situation is not
intolerable: we've tolerated it for years :-).

>    2. We don't support date formats that we _should_ support.

Okay.  (Elsewhere in this thread there's a canonical list of those, I
think.)

>    3. Going with a limited set of supported date formats, like MBK
>       suggests, isn't going to cause that many problems. The parsing
>       code is a simple function that can be tested to death very easily.

The "isn't going to cause that many problems" is an assertion about
the correctness of an implementation that we don't even *have* yet,
and which when we do have it, will be far less tested than the current
date parser.

> +1 for changing the date parser before 1.0. We'll do ourselves and our
> users a favour.

I'd say it's early to +1 until there's an actual patch to evaluate! :-)

I'm not vetoing, just discussing, but so far these reasons doesn't
seem very compelling to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: RFC: date parser rewrite

Posted by Branko Čibej <br...@xbc.nu>.
For the record, I disagree with all of your reasons.

   1. The current situation is intolerable. We nominally support date
      formats that don't warrant that name. Whatever we do to change
      that, we'll be yanking suppor for at least some of those date
      formats, and doing that after 1.0 (after we've known about this
      problem for ages) is a bad move.
   2. We don't support date formats that we _should_ support.
   3. Going with a limited set of supported date formats, like MBK
      suggests, isn't going to cause that many problems. The parsing
      code is a simple function that can be tested to death very easily.

+1 for changing the date parser before 1.0. We'll do ourselves and our
users a favour.


kfogel@collab.net wrote:

>This thread's been dormant for a bit, so I'm afraid to disturb it :-).
>
>However, eventually someone's going to file it as a 1.0 candidate
>change (probably after they have a patch ready), or move issue #408 to
>be a 1.0 candidate.
>
>So, this is an anticipatory post explaining why I hope we do *not*
>tackle this problem for 1.0.
>
>Reason 1: As much as we dislike our current date parser, it hasn't
>          caused us many actual problems.  It's not like we had a
>          bunch of defects filed about date parsing, and then realized
>          that we couldn't release 1.0 without fixing them.  We have
>          issue #408, sure, but that's "date parser rewrite", so it
>          would be circular to view it as a justification in itself :-).
>
>Reason 2: Unlike some of the simple API fixes and whatnot we've been
>          bandying about, this actually has the potential to cause
>          bugs.  It'd need a lot of review, and even then it's still
>          possible for there to be subtle misparsings that don't get
>          found until after 1.0.
>
>Reason 2: There are post-1.0 compatibility strategies that make this
>          change less compelling.  For one thing, any new parser we
>          write is going to intersect with some of what the current
>          parser supports.  We can then look at current side of the
>          non-intersecting area and see how important some of those
>          formats really are.  It's not outside the bounds of
>          propriety to break dates used by ~20 people in the world (if
>          we can determine that with reasonable certainty).
>
>          But even if we don't want to lose *any* compatibility,
>          there's still a strategy available.  The current date parser
>          is not known to misparse anything (right?) -- it simply
>          doesn't parse everything we want it to, and is hard to
>          maintain.  If we write a new parser, it certainly wouldn't
>          differently parse any string that is correctly parsed by the
>          old code.  So, we can just pass input to the old code first,
>          then if it fails, we try the new code.  That way we don't
>          have to "maintain" the old code, so much as preserve it in
>          stasis until the new code matches it in sophistication -- a
>          process that can take as long as we're comfortable with,
>          since there's no great penalty to keeping the old code
>          around.
>
>For these reasons, I think it is unnecessary (and even a bit risky) to
>futz with our date parser before 1.0.  Let's leave well enough alone.
>
>-Karl
>  
>


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by mark benedetto king <mb...@lowlatency.com>.
On Fri, Dec 19, 2003 at 06:06:18PM -0500, Greg Hudson wrote:
> 
> (I think the non-thread-safety of the current date parser is a pretty
> strong argument too.  I hadn't thought of that before.)
> 

I guess we should lobby to have "That's not thread-safe." added to
the list at http://skirsch.com/humor/techarg.htm

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2003-12-19 at 15:44, kfogel@collab.net wrote:
> Reason 2: There are post-1.0 compatibility strategies that make this
>           change less compelling.  For one thing, any new parser we
>           write is going to intersect with some of what the current
>           parser supports.  We can then look at current side of the
>           non-intersecting area and see how important some of those
>           formats really are.

I don't think that's a feasible strategy.  We have no way of collecting
data on who is using what date formats; the only thing we can do is
break them and see who complains bitterly about their scripts failing
when they upgraded.

(I think the non-thread-safety of the current date parser is a pretty
strong argument too.  I hadn't thought of that before.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by kf...@collab.net.
This thread's been dormant for a bit, so I'm afraid to disturb it :-).

However, eventually someone's going to file it as a 1.0 candidate
change (probably after they have a patch ready), or move issue #408 to
be a 1.0 candidate.

So, this is an anticipatory post explaining why I hope we do *not*
tackle this problem for 1.0.

Reason 1: As much as we dislike our current date parser, it hasn't
          caused us many actual problems.  It's not like we had a
          bunch of defects filed about date parsing, and then realized
          that we couldn't release 1.0 without fixing them.  We have
          issue #408, sure, but that's "date parser rewrite", so it
          would be circular to view it as a justification in itself :-).

Reason 2: Unlike some of the simple API fixes and whatnot we've been
          bandying about, this actually has the potential to cause
          bugs.  It'd need a lot of review, and even then it's still
          possible for there to be subtle misparsings that don't get
          found until after 1.0.

Reason 2: There are post-1.0 compatibility strategies that make this
          change less compelling.  For one thing, any new parser we
          write is going to intersect with some of what the current
          parser supports.  We can then look at current side of the
          non-intersecting area and see how important some of those
          formats really are.  It's not outside the bounds of
          propriety to break dates used by ~20 people in the world (if
          we can determine that with reasonable certainty).

          But even if we don't want to lose *any* compatibility,
          there's still a strategy available.  The current date parser
          is not known to misparse anything (right?) -- it simply
          doesn't parse everything we want it to, and is hard to
          maintain.  If we write a new parser, it certainly wouldn't
          differently parse any string that is correctly parsed by the
          old code.  So, we can just pass input to the old code first,
          then if it fails, we try the new code.  That way we don't
          have to "maintain" the old code, so much as preserve it in
          stasis until the new code matches it in sophistication -- a
          process that can take as long as we're comfortable with,
          since there's no great penalty to keeping the old code
          around.

For these reasons, I think it is unnecessary (and even a bit risky) to
futz with our date parser before 1.0.  Let's leave well enough alone.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Julian Foad <ju...@btopenworld.com>.
Greg Hudson wrote:
> On Tue, 2003-12-16 at 10:54, Julian Foad wrote:
> 
>>Simpler, but not much so, but I won't force the issue.  In that case I
>>think we should require that the specified time be between 00:00 and
>>"now", so that both semantics (other times meaning "later today" and
>>other times meaning "yesterday") are reserved for a future extension.
> 
> You seem to have a misconception here.  In the current architecture, we
> take a date string and convert it to an apr_time_t.  We do not convert
> it to a range of times.  We're not talking about changing that
> architecture.

I don't have that particular misconception, but maybe I failed to explain my meaning clearly.  Let me try again.

When parsing a string of the form "hh:mm[:ss]", MBK was suggesting that we interpret it as the specified time on the current date, but he agreed with me that it would be useful to interpret it instead as the specified time within the last 24 hours or so.

These two interpretations conflict for times that are greater than the time of day "now".  In the former proposal, such a time means "later today" and is only useful for overcoming clock skew, and times up to a few minutes greater than "now" might be encountered.  In my latter proposal, a time much greater than "now" is interpreted as the specified time yesterday; a time just a little bit greater than "now" is interpreted as "later" to allow for clock skew.

I was then trying to say that, in order to allow future choice of interpretation, we should for now only accept a time in this format if it would be interpreted the same way in both schemes.  I said that the set of times for which this is true is the times between 00:00 and "now", but in fact that is not restrictive enough.

I would now say that enforcing such a restriction is unnecessary, because people will not get into the habit of using times of "later today", because Subversion does not deal with times in the future.

- Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2003-12-16 at 10:54, Julian Foad wrote:
> Simpler, but not much so, but I won't force the issue.  In that case I
> think we should require that the specified time be between 00:00 and
> "now", so that both semantics (other times meaning "later today" and
> other times meaning "yesterday") are reserved for a future extension.

You seem to have a misconception here.  In the current architecture, we
take a date string and convert it to an apr_time_t.  We do not convert
it to a range of times.  We're not talking about changing that
architecture.

(The user can specify a range of dates using -r '{DATE1}:{DATE2}', but
that's exactly like saying -r '{last rev as of DATE1}:{last rev as of
DATE2}'.  There's some text in the book about how that can sometimes be
confusing.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Julian Foad <ju...@btopenworld.com>.
mark benedetto king wrote:
> On Mon, Dec 15, 2003 at 11:39:33PM +0000, Julian Foad wrote:
> 
>>Actually, if we can find a sufficient common subset of formats supported 
>>now and formats that we would like to support, then we could leave the 
>>implementation in place but _document_ that only specific formats are 
>>supported by Subversion.  For example, the set of formats "YYYY-MM-DD[ 
>>HH:MM[:SS]]" is supported by both the current and the proposed schemes, 
>>although I'm not sure if it means the same in terms of UTC or local time.  
>>Note that it would be OK to find a very restrictive subset for the time 
>>being; we can expand the range of formats in a future release.
> 
> Undocumented features have a habit of making their way into common
> practice and "insider's guides" (c.f. Google Hacks).

Indeed.  As I said (below), it's not a terribly great solution.  Nevertheless I thought it was worth considering as it would avoid a bunch of coding work just as we are trying to stabilise.

>>I admit that just documenting that only a few of the accepted formats are 
>>supported is not ideal, but nor is it terribly bad.  (If it is terribly 
>>bad, then we should review similar cases such as "svn log" with multiple WC 
>>targets.)

I'll mention the "svn log" case in a separate thread.

> Even documenting the existing implementation is difficult.  
> 
[long example snipped]

That's irrelevant.  I was suggesting that we just document that the only supported format is "YYYY-MM-DD hh:mm:ss" and it means local time (and/or a very small number of other simple formats).


>>>1.) the following ISO 8601 extended date-time formats:
>>>
>>> YYYY-MM-DDThh:mm:ss.sTZI
>>> YYYY-MM-DDThh:mm:ss,sTZI
>>> YYYY-MM-DDThh:mm:ssTZI
>>> YYYY-MM-DDThh:mmTZI
>>> YYYY-MM-DDThhTZI
>>
>>... where the final "s" represents an arbitrary number of decimal digits.
> 
> Right, as long as arbitrary means "less than 7": we only store microseconds
> in the repository, we shouldn't fake finer granularity than we can provide.

OK.

>>>Where "TZI" stands for "Time Zone Indicator", and can be one of:
>>>
>>> Z
>>> +HH
>>> -HH
>>> +HH:MM
>>> -HH:MM
>>
>>... else must be the empty string, meaning local time.
> 
> This makes the special cases below in (4) irrelevant, as those formats can
> be merged into (1) and (2).  Fine by me.

i.e. add "YYYY-MM-DDTZI" to the list in (1) and "YYYYMMDDTZI" to the list in (2).  That would add date-only formats with a non-empty TZI to the current proposal, but I think that makes sense and is a good thing for consistency.  e.g. a GUI might allow the user to enter a date and/or time, and the program could then apend the TZI itself without knowing which format the user entered.

>>>4.) the following formats, which are presumed to reference midnight
>>>   local time of the specified day.
>>>
>>> YYYY-MM-DD
>>> YYYYMMDD

>>>5.) the following formats, which are presumed to reference local time
>>> of the current day.
>>>
>>> HH:MI:SS
>>> HH:MI
>>
>>... where "current day" means (now +- 12 hours).
>>
>>If "current day" were to mean 00:00 to 24:00 on the current date, then this 
>>would be unfriendly to people who (like me) work after midnight.  Therefore 
> 
> What about the people who (like me) work >12 hour days? :-)

You smile, but the question is worth answering.

Well, we could make it (now - 23 hours) to (now + 1 hour).  The "+" side only needs a few minutes to overcome clock skew, so "+ 1 hour" is plenty.  However, we could also say that this format is only appropriate for recent times - much less than a day - and if the event was anywhere approaching a day ago, i.e. more than half a day ago, then you need to use a different format.

>>"current day" should mean the period of <= 24 hours that ends at least few 
>>minutes after "now" (to allow for clock skew); e.g. "(now - 12 hours) to 
>>(now + 6 hours)" or "(now +/- 12 hours)".  There is a precedent in Unix 
>>"ls" and Subversion "svn list": when printing a file date without the year 
>>we require that the date be within (now +/- 6 months), not that it be 
>>within the current calendar year.
> 
> While this would certainly be useful, I think it would be much simpler
> to implement and document the "current calendar day" interpretation.

Simpler, but not much so, but I won't force the issue.  In that case I think we should require that the specified time be between 00:00 and "now", so that both semantics (other times meaning "later today" and other times meaning "yesterday") are reserved for a future extension.

- Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by mark benedetto king <mb...@lowlatency.com>.
On Mon, Dec 15, 2003 at 11:39:33PM +0000, Julian Foad wrote:
> Actually, if we can find a sufficient common subset of formats supported 
> now and formats that we would like to support, then we could leave the 
> implementation in place but _document_ that only specific formats are 
> supported by Subversion.  For example, the set of formats "YYYY-MM-DD[ 
> HH:MM[:SS]]" is supported by both the current and the proposed schemes, 
> although I'm not sure if it means the same in terms of UTC or local time.  
> Note that it would be OK to find a very restrictive subset for the time 
> being; we can expand the range of formats in a future release.

Undocumented features have a habit of making their way into common
practice and "insider's guides" (c.f. Google Hacks).

> I admit that just documenting that only a few of the accepted formats are 
> supported is not ideal, but nor is it terribly bad.  (If it is terribly 
> bad, then we should review similar cases such as "svn log" with multiple WC 
> targets.)

Even documenting the existing implementation is difficult.  

For example: let's assume we have documented how the parser extracts
the year from the provided string, if present.  In order to determine
the actual year, the following logic is applied:

  Take the absolute value of the parsed year.  Just in case.
  If it's less than 69 add 2000 to it (assume that it's a two digit
  reference to a 21st century year), otherwise, if it's less than
  100, assume that it's a two digit reference to a 20th century year
  and add 1900.  If the new value is before 1970, this must be an
  invalid date.  Similarly, if it's after 2038, it must also be
  an invalid date.

And that's just to handle how to interpret the year, presuming that
we've carefully and accurately documented how the year might be
extracted!  Like how, for example, "(bork)10101(bork)" is parsed
as the date "2001-01-01" (in local time, of course).

> >1.) the following ISO 8601 extended date-time formats:
> >
> >  YYYY-MM-DDThh:mm:ss.sTZI
> >  YYYY-MM-DDThh:mm:ss,sTZI
> >  YYYY-MM-DDThh:mm:ssTZI
> >  YYYY-MM-DDThh:mmTZI
> >  YYYY-MM-DDThhTZI
> 
> ... where the final "s" represents an arbitrary number of decimal digits.
> 

Right, as long as arbitrary means "less than 7": we only store microseconds
in the repository, we shouldn't fake finer granularity than we can provide.

> >Where "TZI" stands for "Time Zone Indicator", and can be one of:
> >
> >  Z
> >  +HH
> >  -HH
> >  +HH:MM
> >  -HH:MM
> 
> ... else must be the empty string, meaning local time.

This makes the special cases below in (4) irrelevant, as those formats can
be merged into (1) and (2).  Fine by me.

> >
> >4.) the following formats, which are presumed to reference midnight
> >    local time of the specified day.
> >
> >  YYYY-MM-DD
> >  YYYYMMDD
> 
> ... where "midnight" means 00:00 hours.  (Missing hours and/or minutes 
> and/or seconds fields mean 0 in all of these formats.)

Right.  I should have been more specific.

> 
> Sometimes it would be nice to have it mean 00:00 when used as the beginning 
> of a revision range and 24:00 when used as the end of a range, but I don't 
> think now is the time to consider that.
> 

Yes, but for that people can just use the next day.

> >5.) the following formats, which are presumed to reference local time
> >  of the current day.
> >
> >  HH:MI:SS
> >  HH:MI
> 
> ... where "current day" means (now +- 12 hours).
> 
> If "current day" were to mean 00:00 to 24:00 on the current date, then this 
> would be unfriendly to people who (like me) work after midnight.  Therefore 

What about the people who (like me) work >12 hour days? :-)

> "current day" should mean the period of <= 24 hours that ends at least few 
> minutes after "now" (to allow for clock skew); e.g. "(now - 12 hours) to 
> (now + 6 hours)" or "(now +/- 12 hours)".  There is a precedent in Unix 
> "ls" and Subversion "svn list": when printing a file date without the year 
> we require that the date be within (now +/- 6 months), not that it be 
> within the current calendar year.

While this would certainly be useful, I think it would be much simpler
to implement and document the "current calendar day" interpretation.

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: date parser rewrite

Posted by Julian Foad <ju...@btopenworld.com>.
mark benedetto king wrote:
> The exiting bison-based date-parser suffers from several deficiencies,
> among which are the following:

Basically I agree with your proposal for change, but I offer some clarifications and suggest that it is not essential to get a code change into the 1.0 release.

>   1.) it is bison-based, which complicates the build and release processes,
>       especially for Windows.
>   2.) it parses an extremely large set of possible date formats, sometimes
>       in surprising and counterintuitive ways.
>   3.) it is complex enough to discourage maintainance and enhancement
>   4.) it does not support ISO 8601 date formats.

Well, it supports some of the ISO 8601 date formats, but not enough of them.

>   5.) it it not perfectly MP-safe
>   6.) it is a bit of an eyesore.
> 
> If we are to replace it, we must do so before 1.0 so that its replacement
> will not be forced to support a superset of its behaviour.

Actually, if we can find a sufficient common subset of formats supported now and formats that we would like to support, then we could leave the implementation in place but _document_ that only specific formats are supported by Subversion.  For example, the set of formats "YYYY-MM-DD[ HH:MM[:SS]]" is supported by both the current and the proposed schemes, although I'm not sure if it means the same in terms of UTC or local time.  Note that it would be OK to find a very restrictive subset for the time being; we can expand the range of formats in a future release.

I admit that just documenting that only a few of the accepted formats are supported is not ideal, but nor is it terribly bad.  (If it is terribly bad, then we should review similar cases such as "svn log" with multiple WC targets.)

> I propose the following implementation strategy (this is loosely based
> on work from C. Scott Ananian <ca...@lesser-magoo.lcs.mit.edu>, available
> at http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=6093):
> 
> The current local time and current UTC time must be passed to the date
> parser.
> 
> Take the first match amongst:
> 
> 1.) the following ISO 8601 extended date-time formats:
> 
>   YYYY-MM-DDThh:mm:ss.sTZI
>   YYYY-MM-DDThh:mm:ss,sTZI
>   YYYY-MM-DDThh:mm:ssTZI
>   YYYY-MM-DDThh:mmTZI
>   YYYY-MM-DDThhTZI

... where the final "s" represents an arbitrary number of decimal digits.

[...]
> 2.) the following ISO 8601 basic date-time formats:
> 
>   YYYYMMDDThhmmss.sTZI
>   YYYYMMDDThhmmss,sTZI
>   YYYYMMDDThhmmssTZI
>   YYYYMMDDThhmmTZI
>   YYYYMMDDThhTZI

(For categories 1 and 2:)

> Where "TZI" stands for "Time Zone Indicator", and can be one of:
> 
>   Z
>   +HH
>   -HH
>   +HH:MM
>   -HH:MM

... else must be the empty string, meaning local time.

> 3.) the following formats, which will be adjusted to UTC from local time.
> 
>   YYYY-MM-DD HH:MI:SS
>   YYYY-MM-DD HH:MI
> 
> 4.) the following formats, which are presumed to reference midnight
>     local time of the specified day.
> 
>   YYYY-MM-DD
>   YYYYMMDD

... where "midnight" means 00:00 hours.  (Missing hours and/or minutes and/or seconds fields mean 0 in all of these formats.)

Sometimes it would be nice to have it mean 00:00 when used as the beginning of a revision range and 24:00 when used as the end of a range, but I don't think now is the time to consider that.

> 5.) the following formats, which are presumed to reference local time
>   of the current day.
> 
>   HH:MI:SS
>   HH:MI

... where "current day" means (now +- 12 hours).

If "current day" were to mean 00:00 to 24:00 on the current date, then this would be unfriendly to people who (like me) work after midnight.  Therefore "current day" should mean the period of <= 24 hours that ends at least few minutes after "now" (to allow for clock skew); e.g. "(now - 12 hours) to (now + 6 hours)" or "(now +/- 12 hours)".  There is a precedent in Unix "ls" and Subversion "svn list": when printing a file date without the year we require that the date be within (now +/- 6 months), not that it be within the current calendar year.

> I believe that these formats are necessary and sufficient for the 1.0

Sufficient, yes.  Necessary?  No: not all of them.  If you mean "desirable", then I agree that they are all reasonable although I personally don't want the formats that have hour but not minute.


- Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org