You are viewing a plain text version of this content. The canonical link for it is here.

Posted to ivy-user@ant.apache.org by Mitch Gitman <mg...@gmail.com> on 2009/08/31 05:16:16 UTC

caching strategies--is there room for a new one?

Carlton's question came at a time when I was pondering the caching
strategies Ivy implicitly offers. I can identify four:

   1. Always trust the cache.
   2. For a given resolver, never trust the cache, i.e. lastmodified="true" by
   itself. This makes sense if you can split your repositories into integration
   and release repositories. Trust cache for release; distrust cache for
   integration.
   3. For a given resolver, only distrust the cache when the revision value
   meets a specified changing pattern. This makes sense if you don't have a
   separate integration repository but you don't mind going to the trouble of
   giving all your integration revisions the same naming pattern, like
a–SNAPSHOT
   suffix.
   4. Check the repository on a per-dependency basis. This only makes sense
   if you don't want to use a separate integration repository or a special
   revision naming convention like –SNAPSHOT, but you can deal with the
   overhead of (A) remembering to manually specify changing="true" and then
   (B) remembering to turn it off when you're ready to do a release publish.

Approach 4 is really high-precision, but it's just so high-maintenance and
error-prone. I wish there were a way with *ivy:deliver *or *ivy:publish *to
automatically turn off the changing mode, such as if you're publishing to
release status. I mean, it doesn't make sense for an Ivy module in release
status to be depending on other Ivy modules that could be changing out from
under it.

Anyway, I can understand why approach 2, in combination with an integration
repository, is recommended.

But I wonder if there's potentially a simpler approach where you don't even
need a separate integration repository. (I acknowledge having a separate
integration repository has value in its own right apart from caching.)
Consider these observations:

   - Under normal caching situations, there's a high correlation between (A)
   your willingness to trust the cache for a given module and (B) that module's
   publication status. Milestone or release=trust cache. Integration=distrust
   cache.
   - There's a certain information redundancy in having an integration
   repository. All the modules published there have status="integration",
   but just the existence of the repository is saying the same.
   - Under normal circumstances, a module graduates from integration to
   milestone to release (using the default statuses). A module never goes
   backwards in status. That makes about as much sense as flunking from sixth
   grade down to fifth grade.

So what if you could specify something like the following on a resolver?
checkmodified="true" maxcheckstatus="integration"

What would this combination mean? For any given cached module, Ivy will go
out to the Ivy repository and compare the last modified timestamp only if
the cached ivy.xml has a status of integration or lower. Once Ivy downloads
a copy of that module that has a higher status, it subsequently stops
checking. But that's what you want.

I can understand why folks wouldn't have use for such a feature if they're
already using an integration repository. But if Ivy's already
supporting changingPattern
and changing="true", you have to admit this is worlds simpler.

Or is it? So what am I missing here?

Re: caching strategies--is there room for a new one?

Posted by Stephen Nesbitt <st...@alumni.cmc.edu>.

On Tuesday 08 September 2009 04:43:30 pm Mitch Gitman wrote:
> Steve, thanks for getting back.
> 
> I really don't want to go down that rabbit hole of debating
> prolific/promiscuous versioning vs. snapshot versioning.
<snip>

Unfortunately I think the rabbit hole is a core part of the issue :-( 

I am going to posit that the loss of version information that is a fundamental 
attribute of SNAPSHOT versioning precludes any elegant solution to caching 
short of a dedicated repository. Any other solution is going to be complicated 
and probably not very robust.

I understand that SNAPSHOT versioning is in use and will be used so it needs 
to be addressed. My question is where should it be addressed. When I consider 
SNAPSHOT versioning I find two reasons why it is used: 1) to deal with the 
proliferation  of obsolete versions and  2) to  make it easy to find the 
"latest and greatest". If that is the case - and I'm not missing some other 
reason - then the core purpose of SNAPSHOT versioning is repository management 
and the proper tool for implementing it will be something that does repository 
management. And that's not really Ivy's forte.

I can imagine a scenario where Ivy always does prolific versioning with the 
repository responsible for providing SNAPSHOT capabilities.

-steve

Stephen Nesbitt
Absaroka Tech
Build & Configuration Management Consulting
steve.nesbitt@absaroka-tech.com

Re: caching strategies--is there room for a new one?

Posted by Mitch Gitman <mg...@gmail.com>.

Steve, thanks for getting back.

I really don't want to go down that rabbit hole of debating
prolific/promiscuous versioning vs. snapshot versioning. Let's assume I've
made the choice to use so-called snapshot versioning, where the CI server
keeps overwriting integration versions when it does an ivy:publish. I can
see three ways to implement that approach which affect caching:
* Have a special integration repository and put checkmodified="true" on the
resolver for that repository. (Forgive me for mistakenly writing
lastmodified in previous posts.)
* Use a naming convention like "-SNAPSHOT" for your integration versions and
on the resolver specify a combination of checkmodified="true" and a
changingPattern.
* Override caching on a per-module basis using changing="true".

While I would say that the support for changing="true" could be made easier,
even then I wouldn't want to go with it. Just too high-maintenance. Same for
a naming convention like "-SNAPSHOT". That leaves only the first approach:
integration repository + checkmodified="true".

While I think having an integration repository is a good idea in its own
right, I hate for it to be the only way to do snapshot versioning that
doesn't have hidden landmines. One of the things I like about Ivy is that it
offers you different ways to accomplish something. One thing I don't like is
that--fortunately, only on rare occasion--when you get right down to it,
there's really only one way that works and isn't a pain in the butt, and all
the other approaches, while available, are really deprecated.

Now you could try to make the hard options--changing="true" and
changingPattern--a little easier. But they'll still be hard options. What
I'd like is just to have another easy option in the mix. My proposal is put
another filter on checkmodified="true"--where it only checks for a given
status or lower. This way you don't have to set up a special integration
repository from the outset just to avoid running into insidious, mysterious
bugs that eventually trace their way back to over-caching.

On Tue, Sep 8, 2009 at 11:07 AM, Stephen Nesbitt <
stephen_nesbitt@alumni.cmc.edu> wrote:

> Mitch:
>
> I guess I'm not completely clear on what problem you are trying to solve
> here.
> Is it to make using the "changing" attribute easier?
>
> From a configuration management perspective, I consider any need for a
> "changing" flag to be poor (or at least not ideal) practice - it's
> essentially
> an admission that I don't know what I've got and that I have failed to
> uniquely identify my artifacts.
>
> I suspect this has something to do with the convention of tagging artifacts
> with -SNAPSHOT or something and avoidance of promiscuous versioning. If so,
> what I would really like to know is what is the issue with promiscuous
> versioning and what problem does tagging everything with -SNAPSHOT provide?
>
> I have a sneaking suspicion that the "changing" attribute is really a hack
> to
> cover a problem in repository management.
>
> -steve
>
> Stephen Nesbitt
> Absaroka Tech
> Build and Configuration Management Consulting
> steve.nesbitt@absaroka-tech.com
>
> On Sunday 30 August 2009 08:16:16 pm Mitch Gitman wrote:
> > Carlton's question came at a time when I was pondering the caching
> > strategies Ivy implicitly offers. I can identify four:
> >
> >    1. Always trust the cache.
> >    2. For a given resolver, never trust the cache, i.e.
> lastmodified="true"
> >  by itself. This makes sense if you can split your repositories into
> >  integration and release repositories. Trust cache for release; distrust
> >  cache for integration.
> >    3. For a given resolver, only distrust the cache when the revision
> value
> >    meets a specified changing pattern. This makes sense if you don't have
> a
> >    separate integration repository but you don't mind going to the
> trouble
> >  of giving all your integration revisions the same naming pattern, like
> >  a–SNAPSHOT
> >    suffix.
> >    4. Check the repository on a per-dependency basis. This only makes
> sense
> >    if you don't want to use a separate integration repository or a
> special
> >    revision naming convention like –SNAPSHOT, but you can deal with the
> >    overhead of (A) remembering to manually specify changing="true" and
> then
> >    (B) remembering to turn it off when you're ready to do a release
> >  publish.
> >
> > Approach 4 is really high-precision, but it's just so high-maintenance
> and
> > error-prone. I wish there were a way with *ivy:deliver *or *ivy:publish
> *to
> > automatically turn off the changing mode, such as if you're publishing to
> > release status. I mean, it doesn't make sense for an Ivy module in
> release
> > status to be depending on other Ivy modules that could be changing out
> from
> > under it.
> >
> > Anyway, I can understand why approach 2, in combination with an
> integration
> > repository, is recommended.
> >
> > But I wonder if there's potentially a simpler approach where you don't
> even
> > need a separate integration repository. (I acknowledge having a separate
> > integration repository has value in its own right apart from caching.)
> > Consider these observations:
> >
> >    - Under normal caching situations, there's a high correlation between
> >  (A) your willingness to trust the cache for a given module and (B) that
> >  module's publication status. Milestone or release=trust cache.
> >  Integration=distrust cache.
> >    - There's a certain information redundancy in having an integration
> >    repository. All the modules published there have status="integration",
> >    but just the existence of the repository is saying the same.
> >    - Under normal circumstances, a module graduates from integration to
> >    milestone to release (using the default statuses). A module never goes
> >    backwards in status. That makes about as much sense as flunking from
> >  sixth grade down to fifth grade.
> >
> > So what if you could specify something like the following on a resolver?
> > checkmodified="true" maxcheckstatus="integration"
> >
> > What would this combination mean? For any given cached module, Ivy will
> go
> > out to the Ivy repository and compare the last modified timestamp only if
> > the cached ivy.xml has a status of integration or lower. Once Ivy
> downloads
> > a copy of that module that has a higher status, it subsequently stops
> > checking. But that's what you want.
> >
> > I can understand why folks wouldn't have use for such a feature if
> they're
> > already using an integration repository. But if Ivy's already
> > supporting changingPattern
> > and changing="true", you have to admit this is worlds simpler.
> >
> > Or is it? So what am I missing here?
> >
>

Re: caching strategies--is there room for a new one?

Posted by Stephen Nesbitt <st...@alumni.cmc.edu>.

Mitch:

I guess I'm not completely clear on what problem you are trying to solve here. 
Is it to make using the "changing" attribute easier?

From a configuration management perspective, I consider any need for a 
"changing" flag to be poor (or at least not ideal) practice - it's essentially 
an admission that I don't know what I've got and that I have failed to 
uniquely identify my artifacts.

I suspect this has something to do with the convention of tagging artifacts 
with -SNAPSHOT or something and avoidance of promiscuous versioning. If so, 
what I would really like to know is what is the issue with promiscuous 
versioning and what problem does tagging everything with -SNAPSHOT provide?

I have a sneaking suspicion that the "changing" attribute is really a hack to 
cover a problem in repository management.

-steve

Stephen Nesbitt
Absaroka Tech
Build and Configuration Management Consulting
steve.nesbitt@absaroka-tech.com

On Sunday 30 August 2009 08:16:16 pm Mitch Gitman wrote:
> Carlton's question came at a time when I was pondering the caching
> strategies Ivy implicitly offers. I can identify four:
> 
>    1. Always trust the cache.
>    2. For a given resolver, never trust the cache, i.e. lastmodified="true"
>  by itself. This makes sense if you can split your repositories into
>  integration and release repositories. Trust cache for release; distrust
>  cache for integration.
>    3. For a given resolver, only distrust the cache when the revision value
>    meets a specified changing pattern. This makes sense if you don't have a
>    separate integration repository but you don't mind going to the trouble
>  of giving all your integration revisions the same naming pattern, like
>  a–SNAPSHOT
>    suffix.
>    4. Check the repository on a per-dependency basis. This only makes sense
>    if you don't want to use a separate integration repository or a special
>    revision naming convention like –SNAPSHOT, but you can deal with the
>    overhead of (A) remembering to manually specify changing="true" and then
>    (B) remembering to turn it off when you're ready to do a release
>  publish.
> 
> Approach 4 is really high-precision, but it's just so high-maintenance and
> error-prone. I wish there were a way with *ivy:deliver *or *ivy:publish *to
> automatically turn off the changing mode, such as if you're publishing to
> release status. I mean, it doesn't make sense for an Ivy module in release
> status to be depending on other Ivy modules that could be changing out from
> under it.
> 
> Anyway, I can understand why approach 2, in combination with an integration
> repository, is recommended.
> 
> But I wonder if there's potentially a simpler approach where you don't even
> need a separate integration repository. (I acknowledge having a separate
> integration repository has value in its own right apart from caching.)
> Consider these observations:
> 
>    - Under normal caching situations, there's a high correlation between
>  (A) your willingness to trust the cache for a given module and (B) that
>  module's publication status. Milestone or release=trust cache.
>  Integration=distrust cache.
>    - There's a certain information redundancy in having an integration
>    repository. All the modules published there have status="integration",
>    but just the existence of the repository is saying the same.
>    - Under normal circumstances, a module graduates from integration to
>    milestone to release (using the default statuses). A module never goes
>    backwards in status. That makes about as much sense as flunking from
>  sixth grade down to fifth grade.
> 
> So what if you could specify something like the following on a resolver?
> checkmodified="true" maxcheckstatus="integration"
> 
> What would this combination mean? For any given cached module, Ivy will go
> out to the Ivy repository and compare the last modified timestamp only if
> the cached ivy.xml has a status of integration or lower. Once Ivy downloads
> a copy of that module that has a higher status, it subsequently stops
> checking. But that's what you want.
> 
> I can understand why folks wouldn't have use for such a feature if they're
> already using an integration repository. But if Ivy's already
> supporting changingPattern
> and changing="true", you have to admit this is worlds simpler.
> 
> Or is it? So what am I missing here?
>