You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ant.apache.org by Hans Lund <hl...@multi-support.com> on 2008/07/04 13:22:35 UTC

Ivy: resolve latest.status (IVY-854)

I'm moving this from ivy-users:

...
running:
> >
> > <ivy:resolve conf="runtime" />
> > But: After a short while running on a CI (hudson) server, where the
ivy
> > repository also is placed, resolving stops working due to connection
> > problems in httpClient.
> > ---
> >
> > [ivy:resolve] 01-07-2008 13:16:24
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > [ivy:resolve] INFO: I/O exception (java.net.BindException) caught
when
........

> >
> > which in the end results in unresolved dependencies.

It looks like ivy:resolve is over aggressive towards the http reposiory.


Basically I think that Ivy url resolver should operate with the same
practice as a web crawler, that is limit the load on the http-server to
a configurable load (max concurrent connections).

Also, A reasonable error handler for the executeWithRetry, should be in
place, now tries 5 times in a row. The exception here (address in use)
on http always comes when the tcp stack on the server is exhausted, so
trying again right away, might not be the best error handler :-).
Improving error handling though, will have no effect if the ivy behavior
is not changed. 
(This I think, should be considered httpClient features???)  

Regarding latest.status resolving strategy:
As far as I can see the latest.status resolving fetches all ivy files
and checksum files, which for n revisions is 2(n-1) too many http calls
(and maybe also other calculations). Downloading checksum files for any
other revision than latest.status, should be an error handle if the
checksums don't match the ivy-file that resolves to latest.status.
This should also speed up the resolving a tiny bit :-)


Now someone with inside knowledge of ivy, pleace comment. Right now I
have no idea about the size of such changes, or even if this is the
right strategy (eq. an alternative for ivy repositories could be to have
an additional archive format, where all meta-data is stored in one file,
and let ivy:publish update the archive along with normal publishing
tasks)

Regards
Hans Lund    



   



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy: resolve latest.status (IVY-854)

Posted by Hans Lund <ha...@gmail.com>.

On Fri, 2008-07-04 at 17:00 +0200, Xavier Hanin wrote:
> On Fri, Jul 4, 2008 at 1:22 PM, Hans Lund <hl...@multi-support.com> wrote:
> 
> > I'm moving this from ivy-users:
> >
> > ...
> > running:
> > > >
> > > > <ivy:resolve conf="runtime" />
> > > > But: After a short while running on a CI (hudson) server, where the
> > ivy
> > > > repository also is placed, resolving stops working due to connection
> > > > problems in httpClient.
> > > > ---
> > > >
> > > > [ivy:resolve] 01-07-2008 13:16:24
> > > > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > > > [ivy:resolve] INFO: I/O exception (java.net.BindException) caught
> > when
> > ........
> >
> > > >
> > > > which in the end results in unresolved dependencies.
> >
> > It looks like ivy:resolve is over aggressive towards the http reposiory.
> >
> >
> > Basically I think that Ivy url resolver should operate with the same
> > practice as a web crawler, that is limit the load on the http-server to
> > a configurable load (max concurrent connections).
> 
> resolve in Ivy is not multi threaded, so if connections are cleanly closed
> we shouldn't have more than one connection at once. But maybe we have a
> problem of connection closing in some cases.

I've just checked out the trunk,  I hope a debugging session can give
some more useful information. The 'tests' I've performed right now has
been very crud - just netstat | grep --count '*server*ESTABLISHED*' to
see the effect on the http server - It might be that this is simply a
side effect on the tcp stack implementation.

> 
> 
> >
> >
> > Also, A reasonable error handler for the executeWithRetry, should be in
> > place, now tries 5 times in a row. The exception here (address in use)
> > on http always comes when the tcp stack on the server is exhausted, so
> > trying again right away, might not be the best error handler :-).
> > Improving error handling though, will have no effect if the ivy behavior
> > is not changed.
> > (This I think, should be considered httpClient features???)
> 
> Indeed, I think we use default behavior of httpclient, but it can certainly
> be adjusted. Moreover, did you try to run Ivy without httpclient to see if
> the behavior is different? It would help narrow down the problem.

No, and sorry for my ignorance but can I do that with a url resolver -
of do yum mean using another network resolver like scp?
Regarding overriding httpclient error handling, which I think is a
standard feature of httpclient, I still feel that I this case with
'Address Already in Use', which when connection to a http server always
is a sign on the http server being exhausted should be fixed i
httpclientm using a growing interval fro retries? 

> 
> >
> >
> > Regarding latest.status resolving strategy:
> > As far as I can see the latest.status resolving fetches all ivy files
> > and checksum files, which for n revisions is 2(n-1) too many http calls
> > (and maybe also other calculations). Downloading checksum files for any
> > other revision than latest.status, should be an error handle if the
> > checksums don't match the ivy-file that resolves to latest.status.
> > This should also speed up the resolving a tiny bit :-)
> 
> To be precise, Ivy doesn't download ivy files for all revisions: it
> downloads ivy files for the latest revision, check if it has the expected
> status, and if not, go to the revision before, and so on until it finds the
> latest revision which has the expected status. The problem is that Ivy can't
> know the revision of the dependency without downloading its module
> descriptor. That's why I think we warn users that latest.status version
> constraint is not very efficient.
> 
> What could be improved is to avoid downloading checksums in this particular
> case, since checking ivy files consistency is not the highest priority when
> looking for a latest.status revision.

O thanks, that made things much more clear. In our case, we decided to
change our revision pattern, and use the status along with revision
(making all the old publications lowest in the status chain - but in
fact due to the new revision pattern they all had newer revisions that
new builds). This explains why that for some modules the hundreds of old
revisions was fetched - I didn't realize this to be an effect of
changing patterns too.)

Regarding checksums, now the problem is so much smaller and properly not
worth optimizing, but what I did see was that checksums for revisions
not having a proper status in the descriptor was indeed loaded. Those
fetches can not have any significance as it can be determined from the
ivy file alone, that the revision should not be used.

Personally I think that checksums should be verified for candidates but
not for non-candidates. 

> 
> 
> 
> >
> >
> >
> > Now someone with inside knowledge of ivy, pleace comment. Right now I
> > have no idea about the size of such changes, or even if this is the
> > right strategy (eq. an alternative for ivy repositories could be to have
> > an additional archive format, where all meta-data is stored in one file,
> > and let ivy:publish update the archive along with normal publishing
> > tasks)
> 
> You can implement this if you want with a custom resolver. If you really
> need to use a lot of latest.status, it can be a good way to go. Or maybe
> store one metadata file per module (like maven-metadata.xml) storing the
> latest version for each status, updated only when you publish a new version.
> What I don't like with that is that it's more subject to concurrency issues,
> and since Ivy is a client side only tool, it's difficult to avoid
> concurrency issues when you update the repository. So maybe the best option
> would be to implement a small server knowing for each module what is the
> latest revision by status, handling publications, and answering a custom
> resolver. In other words a repository manager...
> 
> 

The custom resolver had crossed my mind - my first idea was to make
status a part of the url pattern -> publish to
[organization]/[module]/[status].

resolving latest.status would then be a simple limiting of possible urls
when searching, together with a verification.

This because I think that the concept of latest.status is so powerful,
making it very simple to branch out a feature for the modules needed,
but leave other modules in a more mature branch, not changing the module
descriptor but just depend on modules and pick them up from more stable
branches.

But then having a ivy repository server/manager indeed is an appealing
and simple solution, that could be as simple as indexing the module
descriptors when published.

Hans 

> Xavier
> 
> 
> >
> >
> > Regards
> > Hans Lund
> >
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> > For additional commands, e-mail: dev-help@ant.apache.org
> >
> >
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy: resolve latest.status (IVY-854)

Posted by Xavier Hanin <xa...@gmail.com>.

On Fri, Jul 4, 2008 at 1:22 PM, Hans Lund <hl...@multi-support.com> wrote:

> I'm moving this from ivy-users:
>
> ...
> running:
> > >
> > > <ivy:resolve conf="runtime" />
> > > But: After a short while running on a CI (hudson) server, where the
> ivy
> > > repository also is placed, resolving stops working due to connection
> > > problems in httpClient.
> > > ---
> > >
> > > [ivy:resolve] 01-07-2008 13:16:24
> > > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > > [ivy:resolve] INFO: I/O exception (java.net.BindException) caught
> when
> ........
>
> > >
> > > which in the end results in unresolved dependencies.
>
> It looks like ivy:resolve is over aggressive towards the http reposiory.
>
>
> Basically I think that Ivy url resolver should operate with the same
> practice as a web crawler, that is limit the load on the http-server to
> a configurable load (max concurrent connections).

resolve in Ivy is not multi threaded, so if connections are cleanly closed
we shouldn't have more than one connection at once. But maybe we have a
problem of connection closing in some cases.

>
>
> Also, A reasonable error handler for the executeWithRetry, should be in
> place, now tries 5 times in a row. The exception here (address in use)
> on http always comes when the tcp stack on the server is exhausted, so
> trying again right away, might not be the best error handler :-).
> Improving error handling though, will have no effect if the ivy behavior
> is not changed.
> (This I think, should be considered httpClient features???)

Indeed, I think we use default behavior of httpclient, but it can certainly
be adjusted. Moreover, did you try to run Ivy without httpclient to see if
the behavior is different? It would help narrow down the problem.

>
>
> Regarding latest.status resolving strategy:
> As far as I can see the latest.status resolving fetches all ivy files
> and checksum files, which for n revisions is 2(n-1) too many http calls
> (and maybe also other calculations). Downloading checksum files for any
> other revision than latest.status, should be an error handle if the
> checksums don't match the ivy-file that resolves to latest.status.
> This should also speed up the resolving a tiny bit :-)

To be precise, Ivy doesn't download ivy files for all revisions: it
downloads ivy files for the latest revision, check if it has the expected
status, and if not, go to the revision before, and so on until it finds the
latest revision which has the expected status. The problem is that Ivy can't
know the revision of the dependency without downloading its module
descriptor. That's why I think we warn users that latest.status version
constraint is not very efficient.

What could be improved is to avoid downloading checksums in this particular
case, since checking ivy files consistency is not the highest priority when
looking for a latest.status revision.

>
>
>
> Now someone with inside knowledge of ivy, pleace comment. Right now I
> have no idea about the size of such changes, or even if this is the
> right strategy (eq. an alternative for ivy repositories could be to have
> an additional archive format, where all meta-data is stored in one file,
> and let ivy:publish update the archive along with normal publishing
> tasks)

You can implement this if you want with a custom resolver. If you really
need to use a lot of latest.status, it can be a good way to go. Or maybe
store one metadata file per module (like maven-metadata.xml) storing the
latest version for each status, updated only when you publish a new version.
What I don't like with that is that it's more subject to concurrency issues,
and since Ivy is a client side only tool, it's difficult to avoid
concurrency issues when you update the repository. So maybe the best option
would be to implement a small server knowing for each module what is the
latest revision by status, handling publications, and answering a custom
resolver. In other words a repository manager...

Xavier

>
>
> Regards
> Hans Lund
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
>
>

-- 
Xavier Hanin - Independent Java Consultant
http://xhab.blogspot.com/
http://ant.apache.org/ivy/
http://www.xoocode.org/