You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ant.apache.org by Jon Schneider <jk...@gmail.com> on 2009/11/11 16:21:38 UTC

Ivy Indexer

I've been thinking about IVYDE-134 (Quick Search feature for dependencies in
repositories) and related IVY-866. If we add support for the Nexus Indexer
(which would be nice in its own right), we would still be lacking this
feature for Ivy repositories. Also, what about ivysettings whose default
resolver is a chain resolver of a Maven repository and an Ivy repository? In
this case, without some all-encompassing index, the quick search feature
would find Java types in only the Maven repository within the chain
resolver, which I think would be counterintuitive to a user.

My first thought was to build an extension to Nexus or Archiva for Ivy, but
somehow I just really dislike the idea of making an otherwise stateless
repository stateful (or should I say, having a manager, however thin,
continuously running to proxy modifications to the repository). Also, these
two products are so Maven-centric (due to their intended use) that any
extension would amount to an abuse of their intended use.

So my compromising proposal is centered around a Lucene index that should be
modified (1) whenever a deliver/publish/install task is ran. Also, since
nothing stops a repository administrator from manually
deleting/adding/updating files in the repository, we should provide (2) a
new <ivy:index> task.

(1) is accomplished through a new resolver type extending from ChainResolver
that proxies publishing to its delegate resolvers, indexing the published
artifacts in the process. As an example, adding this proxy would look like
this in ivysettings.xml:

<resolvers>
<indexed name="indexable" index="${ivy.settings.dir}/index">
<filesystem name="1">
<ivy
pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/>
<artifact
pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[revision].[ext]"/>
</filesystem>
<!-- other resolvers here... -->
</indexed>
</resolvers>

(2) allows a repository administrator to force clean the index via an Ant
task when it is known to be stale. It also provides an alternative to using
the proxy mechanism described in (1); the index task could be run
periodically (e.g. nightly) as a task on a continuous integration tool.

The index task itself explores the repository, opening jars and listing the
fully qualified types found in each jar in the index and associating these
types with a particular ModuleRevisionId. With the code I have written so
far, I have been able to index up to 10,000 jars in less than 10 seconds
when the index task is running against a repository on the same machine
(indexing a repository through a network path slows down considerably).

IvyDE can then search for types against the optimized Lucene index, making
it very fast.

Thoughts on this approach?
Jon

Re: Ivy Indexer

Posted by Gilles Scokart <gs...@gmail.com>.

2009/11/14 Jon Schneider <jk...@gmail.com>

> That is exactly right.  It is also the first step in two of my other future
> goals:
>
> 1.  Unused dependency analysis.  I realize this won't ever be perfect since
> there is no limit to the way dependencies can be used (sometimes
> exclusively) in specific metadata like Spring context files, but
> nevertheless, it would be a good watermark for unused dependencies.
>

I have already some code doing that.  If you are interrested, you can find
it at http://sourceforge.net/projects/deco-project/.

For the moment it can just check the compile time dependencies.  I would
like to try to play further with it in order to check also runtime dep.  The
idea that I have is to write a kind of byte code interpreter that either
simulate the execution of the simple java code, either fall back to real
execution of some piece of code when it is too dynamic (with the help of
some external configuration).

I'm just waiting someone to show interrest in order to remotivate me to
continue...


Gilles Scokart

Re: Ivy Indexer

Posted by Jon Schneider <js...@apache.org>.

IVY-1143 created for this thread and linked to IVYDE-134.  A rough patch is
attached.

Jon

On Tue, Nov 24, 2009 at 7:32 AM, Jon Schneider <js...@apache.org>wrote:

> > ---------------------------------------------
>
> > So Solr might be the easiest way of achieving an Ivy indexer.
>
>> Probably. As a side note, while thinking of installing a server side
>> component to provide search, I started to wonder why not use a
>> repository manager in that case. During devoxx I discussed with people
>> from artifactory, and their latest version is now supporting Ivy (may
>> still be limited, but they are working on improving that). They also
>> provide a REST api for their search feature, so maybe it would be
>> interesting and easy to use their software. But if we don't want to be
>> dependent on their API, maybe we can try to define some sort of
>> "standard" REST api to access a repository search feature. This is
>> something they are ok to discuss. Then any repository manager
>> implementing this api could be used.
>>
>> Note that compared to using artifactory, using solr still has the
>> advantage of being probably usable with any kind of Ivy repo, not just
>> artifactory, which has probably some limitations (because it has not
>> been designed as an Ivy repo manager, I suppose it has some proxying
>>  and layout limitations).
>>
> ---------------------------------------------
>
> I think the Solr implementation could serve as a sort of "Reference
> Implementation" that we provide.  If Artifactory then also provides an
> interface, all the better!  Undoubtedly, it would help the folks at
> Artifactory if they had a reference implementation to base their product on.
>
> ---------------------------------------------
>
>> >
>> > I have to admit I am not a big fan of having to deploy a webapp next to
>> a dumb simple repo. On the other hand managing an index on the client side
>> depends enormously of the kind of repository (at work we have an ivy repo in
>> svn accessible form both http and checkouted), it would consume more
>> bandwidth, some publication locking would probably be in place, etc...
>>
>> I agree that having to deploy a webapp is an additional burden in the
>> build ecosystem setup. But now people are used to install a CI server,
>> a SCM server, and so on. So I don't think it should stop us, because I
>> think dealing with that from the client side only will have some
>> serious limitations.
>>
> ---------------------------------------------
>
> I think it is important to lure people in with the prospect of no
> additional webapps by allowing them to read an index on a remote filesystem
> directly, then allowing them to gradually come to terms with the fact that
> it it is more performant to allow a server to serve up the index and add
> complexity to their build ecosystem on their own timeline.
>
> ---------------------------------------------
> > Alternatively we could define a java interface to access a search
> > service (there's already one, but it is very limited), and have
> > different implementations: based ona local index as initially
> > suggested, using solr, artifactory, or any other. Then we are open to
> > the future.
> ---------------------------------------------
>
> I must be missing something... what java interface accesses a search
> feature currently?
>
> ---------------------------------------------
> > I think the transaction would be supported at the Lucene index level.
> ---------------------------------------------
>
> More specifically, the transaction is already at the publish level for the
> general use case since an index writer is opened for each publish and closed
> at the end of the publish, effectively committing the transaction.
>
> On a somewhat related point, are we still considering Solr or other repo
> managers for the additional duty of handling the indexing?  Is there a use
> case where guaranteeing indexing concurrency apart from a simple lock
> strategy (block until the index lock is closed or a timeout is reached) is
> important?
>
> Jon
>

Re: Ivy Indexer

Posted by Jon Schneider <js...@apache.org>.

> ---------------------------------------------
> So Solr might be the easiest way of achieving an Ivy indexer.

> Probably. As a side note, while thinking of installing a server side
> component to provide search, I started to wonder why not use a
> repository manager in that case. During devoxx I discussed with people
> from artifactory, and their latest version is now supporting Ivy (may
> still be limited, but they are working on improving that). They also
> provide a REST api for their search feature, so maybe it would be
> interesting and easy to use their software. But if we don't want to be
> dependent on their API, maybe we can try to define some sort of
> "standard" REST api to access a repository search feature. This is
> something they are ok to discuss. Then any repository manager
> implementing this api could be used.
>
> Note that compared to using artifactory, using solr still has the
> advantage of being probably usable with any kind of Ivy repo, not just
> artifactory, which has probably some limitations (because it has not
> been designed as an Ivy repo manager, I suppose it has some proxying
>  and layout limitations).
>
---------------------------------------------

I think the Solr implementation could serve as a sort of "Reference
Implementation" that we provide.  If Artifactory then also provides an
interface, all the better!  Undoubtedly, it would help the folks at
Artifactory if they had a reference implementation to base their product on.

---------------------------------------------

> >
> > I have to admit I am not a big fan of having to deploy a webapp next to a
> dumb simple repo. On the other hand managing an index on the client side
> depends enormously of the kind of repository (at work we have an ivy repo in
> svn accessible form both http and checkouted), it would consume more
> bandwidth, some publication locking would probably be in place, etc...
>
> I agree that having to deploy a webapp is an additional burden in the
> build ecosystem setup. But now people are used to install a CI server,
> a SCM server, and so on. So I don't think it should stop us, because I
> think dealing with that from the client side only will have some
> serious limitations.
>
---------------------------------------------

I think it is important to lure people in with the prospect of no additional
webapps by allowing them to read an index on a remote filesystem directly,
then allowing them to gradually come to terms with the fact that it it is
more performant to allow a server to serve up the index and add complexity
to their build ecosystem on their own timeline.

---------------------------------------------
> Alternatively we could define a java interface to access a search
> service (there's already one, but it is very limited), and have
> different implementations: based ona local index as initially
> suggested, using solr, artifactory, or any other. Then we are open to
> the future.
---------------------------------------------

I must be missing something... what java interface accesses a search feature
currently?

---------------------------------------------
> I think the transaction would be supported at the Lucene index level.
---------------------------------------------

More specifically, the transaction is already at the publish level for the
general use case since an index writer is opened for each publish and closed
at the end of the publish, effectively committing the transaction.

On a somewhat related point, are we still considering Solr or other repo
managers for the additional duty of handling the indexing?  Is there a use
case where guaranteeing indexing concurrency apart from a simple lock
strategy (block until the index lock is closed or a timeout is reached) is
important?

Jon

Re: Ivy Indexer

Posted by Xavier Hanin <xa...@gmail.com>.

2009/11/21 Nicolas Lalevée <ni...@hibnet.org>:
>
> Le 19 nov. 2009 à 12:06, Xavier Hanin a écrit :
>
>> I really like the idea to use a solr instance colocated with the repository.
>> I've seen a presentation on solr yesterday at devoxx, and it sounds like so
>> close to what we need. The only problem I see with it is that it requires to
>> install a server side component, getting closer to what repository managers
>> do. I'm not sure about why if we install a slor instance we wouldn't use it
>> to update the index too. Solr takes care of problems like transactions,
>> concurrency, so I think it's a perfect fit...
>
> I think the transaction would be supported at the Lucene index level. I don't think there is any mechanism to make solr manage an extra "data storage". As far as I remember Solr is just able to read the external "data storage" to index it.
> But what would work is a Solr deployed just next to an Ivy repository, let Ivy publish artifacts like it already does, but also make Ivy request Solr to index the newly published artifact.
Yes, this is exactly what I was thinking about.

>
> And spotted by a friend, Solr 1.4 [1] support replication in Java [2] ala rsync !
I'm not sure this is even necessary to use, except for very large
implementations of Ivy with huge repositories. Most of the time only
one solr instance should be enough.

>
> So Solr might be the easiest way of achieving an Ivy indexer.
Probably. As a side note, while thinking of installing a server side
component to provide search, I started to wonder why not use a
repository manager in that case. During devoxx I discussed with people
from artifactory, and their latest version is now supporting Ivy (may
still be limited, but they are working on improving that). They also
provide a REST api for their search feature, so maybe it would be
interesting and easy to use their software. But if we don't want to be
dependent on their API, maybe we can try to define some sort of
"standard" REST api to access a repository search feature. This is
something they are ok to discuss. Then any repository manager
implementing this api could be used.

Alternatively we could define a java interface to access a search
service (there's already one, but it is very limited), and have
different implementations: based ona local index as initially
suggested, using solr, artifactory, or any other. Then we are open to
the future.

Note that compared to using artifactory, using solr still has the
advantage of being probably usable with any kind of Ivy repo, not just
artifactory, which has probably some limitations (because it has not
been designed as an Ivy repo manager, I suppose it has some proxying
and layout limitations).

>
> I have to admit I am not a big fan of having to deploy a webapp next to a dumb simple repo. On the other hand managing an index on the client side depends enormously of the kind of repository (at work we have an ivy repo in svn accessible form both http and checkouted), it would consume more bandwidth, some publication locking would probably be in place, etc...

I agree that having to deploy a webapp is an additional burden in the
build ecosystem setup. But now people are used to install a CI server,
a SCM server, and so on. So I don't think it should stop us, because I
think dealing with that from the client side only will have some
serious limitations.

Xavier

>
> Nicolas
>
> [1] http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/CHANGES.txt
> [2] https://issues.apache.org/jira/browse/SOLR-561
>
>>
>> My 2 c.
>>
>> Xavier
>>
>> 2009/11/18 Jon Schneider <jk...@gmail.com>
>>
>>> While I digest Nicolas' novel :) (thanks for the additional insight on
>>> Lucene by the way), I will suggest one other idea.
>>>
>>> We could allow for the option of a Solr instance collocated with the
>>> repository on one machine to serve up the index stored on the repository.
>>> IvyDE could be configured by the user to either read the index directly
>>> from the remote filesystem or send its requests via HTTP to a Solr server.
>>> The Solr server would not be responsible for maintaining the index in the
>>> same way that Archiva/Nexus/Artifactory do, but would simply be a querying
>>> tool.  In the case where Solr is serving the index, the index would still
>>> be
>>> maintained through some combination of the index ant task and the publish
>>> proxy.
>>>
>>> This way we don't get into the complexity of pushing out index updates to
>>> clients.
>>>
>>> The rsync strategy is a very intriguing idea though, especially in light of
>>> how Lucene segments its index in multiple files.  What happens when
>>> optimize
>>> is called on the index and the segments are combined into one file?  In
>>> this
>>> case, any search slaves would essentially have to download the whole index
>>> right?  How much segmentation is considered too much segmentation before we
>>> optimize the index to cater to search speed over index publishing speed?
>>>
>>> I'll be trying to wrap this up enough (at least with the remote filesystem
>>> index read strategy) to make a patch so others can see it in action.  We
>>> are
>>> a little busy at work, but I will be coming back to it in the coming days.
>>>
>>> Thanks for all the feedback so far,
>>> Jon
>>>
>>
>>
>>
>> --
>> Xavier Hanin - 4SH France - http://www.4sh.fr/
>> BordeauxJUG creator & leader - http://www.bordeauxjug.org/
>> Apache Ivy Creator - http://ant.apache.org/ivy/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
>
>



-- 
Xavier Hanin - 4SH France - http://www.4sh.fr/
BordeauxJUG creator & leader - http://www.bordeauxjug.org/
Apache Ivy Creator - http://ant.apache.org/ivy/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy Indexer

Posted by Nicolas Lalevée <ni...@hibnet.org>.

Le 19 nov. 2009 à 12:06, Xavier Hanin a écrit :

> I really like the idea to use a solr instance colocated with the repository.
> I've seen a presentation on solr yesterday at devoxx, and it sounds like so
> close to what we need. The only problem I see with it is that it requires to
> install a server side component, getting closer to what repository managers
> do. I'm not sure about why if we install a slor instance we wouldn't use it
> to update the index too. Solr takes care of problems like transactions,
> concurrency, so I think it's a perfect fit...

I think the transaction would be supported at the Lucene index level. I don't think there is any mechanism to make solr manage an extra "data storage". As far as I remember Solr is just able to read the external "data storage" to index it.
But what would work is a Solr deployed just next to an Ivy repository, let Ivy publish artifacts like it already does, but also make Ivy request Solr to index the newly published artifact.

And spotted by a friend, Solr 1.4 [1] support replication in Java [2] ala rsync !

So Solr might be the easiest way of achieving an Ivy indexer.

I have to admit I am not a big fan of having to deploy a webapp next to a dumb simple repo. On the other hand managing an index on the client side depends enormously of the kind of repository (at work we have an ivy repo in svn accessible form both http and checkouted), it would consume more bandwidth, some publication locking would probably be in place, etc...

Nicolas

[1] http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/CHANGES.txt
[2] https://issues.apache.org/jira/browse/SOLR-561

> 
> My 2 c.
> 
> Xavier
> 
> 2009/11/18 Jon Schneider <jk...@gmail.com>
> 
>> While I digest Nicolas' novel :) (thanks for the additional insight on
>> Lucene by the way), I will suggest one other idea.
>> 
>> We could allow for the option of a Solr instance collocated with the
>> repository on one machine to serve up the index stored on the repository.
>> IvyDE could be configured by the user to either read the index directly
>> from the remote filesystem or send its requests via HTTP to a Solr server.
>> The Solr server would not be responsible for maintaining the index in the
>> same way that Archiva/Nexus/Artifactory do, but would simply be a querying
>> tool.  In the case where Solr is serving the index, the index would still
>> be
>> maintained through some combination of the index ant task and the publish
>> proxy.
>> 
>> This way we don't get into the complexity of pushing out index updates to
>> clients.
>> 
>> The rsync strategy is a very intriguing idea though, especially in light of
>> how Lucene segments its index in multiple files.  What happens when
>> optimize
>> is called on the index and the segments are combined into one file?  In
>> this
>> case, any search slaves would essentially have to download the whole index
>> right?  How much segmentation is considered too much segmentation before we
>> optimize the index to cater to search speed over index publishing speed?
>> 
>> I'll be trying to wrap this up enough (at least with the remote filesystem
>> index read strategy) to make a patch so others can see it in action.  We
>> are
>> a little busy at work, but I will be coming back to it in the coming days.
>> 
>> Thanks for all the feedback so far,
>> Jon
>> 
> 
> 
> 
> -- 
> Xavier Hanin - 4SH France - http://www.4sh.fr/
> BordeauxJUG creator & leader - http://www.bordeauxjug.org/
> Apache Ivy Creator - http://ant.apache.org/ivy/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy Indexer

Posted by Xavier Hanin <xa...@gmail.com>.

I really like the idea to use a solr instance colocated with the repository.
I've seen a presentation on solr yesterday at devoxx, and it sounds like so
close to what we need. The only problem I see with it is that it requires to
install a server side component, getting closer to what repository managers
do. I'm not sure about why if we install a slor instance we wouldn't use it
to update the index too. Solr takes care of problems like transactions,
concurrency, so I think it's a perfect fit...

My 2 c.

Xavier

2009/11/18 Jon Schneider <jk...@gmail.com>

> While I digest Nicolas' novel :) (thanks for the additional insight on
> Lucene by the way), I will suggest one other idea.
>
> We could allow for the option of a Solr instance collocated with the
> repository on one machine to serve up the index stored on the repository.
>  IvyDE could be configured by the user to either read the index directly
> from the remote filesystem or send its requests via HTTP to a Solr server.
>  The Solr server would not be responsible for maintaining the index in the
> same way that Archiva/Nexus/Artifactory do, but would simply be a querying
> tool.  In the case where Solr is serving the index, the index would still
> be
> maintained through some combination of the index ant task and the publish
> proxy.
>
> This way we don't get into the complexity of pushing out index updates to
> clients.
>
> The rsync strategy is a very intriguing idea though, especially in light of
> how Lucene segments its index in multiple files.  What happens when
> optimize
> is called on the index and the segments are combined into one file?  In
> this
> case, any search slaves would essentially have to download the whole index
> right?  How much segmentation is considered too much segmentation before we
> optimize the index to cater to search speed over index publishing speed?
>
> I'll be trying to wrap this up enough (at least with the remote filesystem
> index read strategy) to make a patch so others can see it in action.  We
> are
> a little busy at work, but I will be coming back to it in the coming days.
>
> Thanks for all the feedback so far,
> Jon
>



-- 
Xavier Hanin - 4SH France - http://www.4sh.fr/
BordeauxJUG creator & leader - http://www.bordeauxjug.org/
Apache Ivy Creator - http://ant.apache.org/ivy/

Re: Ivy Indexer

Posted by Nicolas Lalevée <ni...@hibnet.org>.

Le 18 nov. 2009 à 20:17, Jon Schneider a écrit :

> While I digest Nicolas' novel :) (thanks for the additional insight on
> Lucene by the way), I will suggest one other idea.
> 
> We could allow for the option of a Solr instance collocated with the
> repository on one machine to serve up the index stored on the repository.
> IvyDE could be configured by the user to either read the index directly
> from the remote filesystem or send its requests via HTTP to a Solr server.
> The Solr server would not be responsible for maintaining the index in the
> same way that Archiva/Nexus/Artifactory do, but would simply be a querying
> tool.  In the case where Solr is serving the index, the index would still be
> maintained through some combination of the index ant task and the publish
> proxy.
> 
> This way we don't get into the complexity of pushing out index updates to
> clients.
> 
> The rsync strategy is a very intriguing idea though, especially in light of
> how Lucene segments its index in multiple files.  What happens when optimize
> is called on the index and the segments are combined into one file?

yep, it merges them all in one.

>  In this
> case, any search slaves would essentially have to download the whole index
> right?

exactly. The publisher of the indexes shouldn't do any optimization on the indexes. The receiver shouldn't do either to have the same files as the publisher. On the client side it may be interesting to optimize the index for performance: it will just maintain two indexes, one for synchronizing with the publisher, and a clone which is then optimized.

>  How much segmentation is considered too much segmentation before we
> optimize the index to cater to search speed over index publishing speed?

It is done automatically by Lucene itself with the merge policy:
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/index/MergePolicy.html

Nicolas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy Indexer

Posted by Jon Schneider <jk...@gmail.com>.

While I digest Nicolas' novel :) (thanks for the additional insight on
Lucene by the way), I will suggest one other idea.

We could allow for the option of a Solr instance collocated with the
repository on one machine to serve up the index stored on the repository.
 IvyDE could be configured by the user to either read the index directly
from the remote filesystem or send its requests via HTTP to a Solr server.
 The Solr server would not be responsible for maintaining the index in the
same way that Archiva/Nexus/Artifactory do, but would simply be a querying
tool.  In the case where Solr is serving the index, the index would still be
maintained through some combination of the index ant task and the publish
proxy.

This way we don't get into the complexity of pushing out index updates to
clients.

The rsync strategy is a very intriguing idea though, especially in light of
how Lucene segments its index in multiple files.  What happens when optimize
is called on the index and the segments are combined into one file?  In this
case, any search slaves would essentially have to download the whole index
right?  How much segmentation is considered too much segmentation before we
optimize the index to cater to search speed over index publishing speed?

I'll be trying to wrap this up enough (at least with the remote filesystem
index read strategy) to make a patch so others can see it in action.  We are
a little busy at work, but I will be coming back to it in the coming days.

Thanks for all the feedback so far,
Jon

Re: Ivy Indexer

Posted by Nicolas Lalevée <ni...@hibnet.org>.

On Tuesday 17 November 2009 16:55:21 Jon Schneider wrote:
> > When you say "anywhere you choose", is it limited to a location on the
> > filesystem? Or do you intend to make use of ivy repositories
>
> access/publish
>
> > mechanism to store the index remotely? With filesystem only the usage
> > sounds rather limited. With ivy repository mechanism you can store your
> > index on the same kind of store as where you put your modules, but you
>
> will
>
> > need a more advanced syntax to configure it, and more advanced
> >
>  > implementation.
>
> Right now it is limited to locations on the filesystem.  I agree, the
> repository mechanism would be more flexible, but I do need to evaluate the
> performance of storing/reading the index across different storage mediums.
>
> > So you will have to deal with index locking during updates, which may
> > become a contention point, and be difficult to implement if you want to
> >
>  > allow using any repo to store the index.
>
> Thanks for bringing this point up.  Lucene offers a write lock contention
> mechanism, but I do need to tread carefully here.
>
> > If the index grows, accessing the index from a remote box may become
> > long. If you think big, you will have to find a way to transfer index
> > updates to the clients which is optimizing the network, such as
> > transferring diffs or something similar. But this becomes difficult to
> > implement, unless you
>
> want
>
> > to rely on existing technology for that (such as a SCM).
>
> I am having trouble trying to manufacture a scalability problem here (with
> my unscientific approach).  I am up to 1,149 jars containing class types
> with over 28,700 types in my test repository and the index is at 39 mb.

Did you tried to compress it ? I would expect that the index would be 
transferred compressed over the network.

>  I've pushed the index out on a remote filesystem, and the quick search
> opens the index in 219 ms.  After the index reader is opened, subsequent
> searches return in the microsecond range until the reader becomes stale
> from a commit and is reopened.
>
> Interesting point about the growth of the index based on the topology of
> the repository:  modules with hundreds or thousands of revisions (e.g.
> nightly builds) do not add much bulk to the index because there is so much
> overlap in type names across the builds.  The duplicate type names get
> optimized down.
>
>
> The last time I worked with Lucene we implemented a such diff and publish
>
> > mecanism for Lucene indexes, and it was working quite well. Solr does
> > have a
> > mecanism for such things too, but the last time I checked it was just
> > relying
> >  on rsync. If somebody is interested I can take some time to explain it
> > here.
>
> Not totally convinced that a scalability problem is out of the question,
> I'm interested in what you have to offer on this point, Nicolas.

We used a feature of Lucene which allows to merge two indexes, adding every 
Lucene document from one index into another [1].
The issue here is that Lucene has no notion of replacing a document. So a 
Lucene index update was both a Lucene index with the new indexed data and a 
list of the ids of the documents to delete or update. Then applying an index 
update on a "full" index is deleting the specified list of documents and 
merging the index update in the full index.

Note that it only works if a Lucene document can be uniquely identified. For 
the Ivy use case I think this can fit as the unique id would be the  
org#module;revision

To track the version of the index, Lucene itself provides a version number 
[2]. I don't remember well if we can rely safely on it. I think we did, but  
it might work only if the exact same version of Lucene is used everywhere, as 
the segment merging algorithm would be the same. At least the Lucene API 
doesn't garanty that a merge of two indexes produce the same version, the API 
just garanties that it will be upper on each "commit".

In our use case we had the situation where there was an indexer and sevral 
search slaves. The indexer was responsible to publish a full index and a set 
of index updates. So that a search slave starting empty will just get the full 
index. And along the time, a slaves asks the indexer just updates 
corresponding to its version. So there may be situation were the slave is a 
little late, and will get sevral updates to apply. And sometimes so late that 
the slave will get a full index, as the indexer just maintains a finite set 
of index updates.

That scenario is corresponding quite well of one of the case described where 
an Ivy repository is managed on the server side.

Managing it from the client side maybe more complex, as Lucene only support 
only one writer at a time. But we can imagine that each time there is a 
publish, there would also be a "publication" of an "index update". Then the 
complexity is reported on a periodic purge of old updates and a build of a 
full index: which client would be "elected" to do it ? And how do we deal 
with simultaneous publication ?

Few words on Solr's [3] index replication mechanism. As previously wrote, the 
transport of the files is done by some rsync. This is actually quite smart 
knowing how the Lucene indexer works with files.
First it never modifies files or never append data to a file. Once wrote a 
file doesn't change (see the API it is relying on [4]). So a diff between two 
versions of an index is some deleted files and some new files.
Secondly, when we index new data on a already filled index, as Lucene doesn't 
modify any file, it will actually create an internal "segment" containing the 
new indexed data. Opening an new IndexReader on the new version of the index 
is then taking into consideration the added segment. As we index data, there 
are then more and more segments. To avoid having to many files, sometimes it 
decides to merge sevral little segments into a bigger one [5]. So we can say 
that a Lucene index is composed of big old files and little new ones.
So this quite perfect for rsync. The more you rsync, the less you have to 
transport on each run.

We didn't like relying on some platform dependant tool and we liked the idea 
that an index update is just a zip of files (we actaully had some other data 
to update, so one file for all). What we implemented is actually quite 
similar to how Lucene works with its internal "segments": the update contains 
just the new indexed data, older update being the full index itself.

I don't think Ivy should rely on rsync either, but probably we could use the 
same kind of mechanism rsync use. It would be quite easy to implement as 
there would be no binary diff. It doesn't solve the critical case where there 
is a simultaneous publication though.

I am starting to have ideas, but I think that this mail is already too long, 
let's take a breath :)

Nicolas

[1] 
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/index/IndexWriter.html#addIndexes%28org.apache.lucene.index.IndexReader[]%29
[2] 
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/index/IndexReader.html#getVersion%28%29
[3] http://lucene.apache.org/solr/
[4] 
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/store/Directory.html
[5] 
http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/index/MergePolicy.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy Indexer

Posted by Xavier Hanin <xa...@gmail.com>.

2009/11/17 Jon Schneider <jk...@gmail.com>

>
> > If the index grows, accessing the index from a remote box may become
> long.
> > If you think big, you will have to find a way to transfer index updates
> to
> > the clients which is optimizing the network, such as transferring diffs
> or
> > something similar. But this becomes difficult to implement, unless you
> want
> > to rely on existing technology for that (such as a SCM).
>
> I am having trouble trying to manufacture a scalability problem here (with
> my unscientific approach).  I am up to 1,149 jars containing class types
> with over 28,700 types in my test repository and the index is at 39 mb.
>  I've pushed the index out on a remote filesystem, and the quick search
> opens the index in 219 ms.  After the index reader is opened, subsequent
> searches return in the microsecond range until the reader becomes stale
> from
> a commit and is reopened.
>
Yes, I think the main point is to "get" the index locally. At 39mb depending
on your bandwidth to the remote server it can take time... and having to
doanload the full index each time it is modified sounds scary. But maybe
Nicolas has good things to share about that.

Xavier

-- 
Xavier Hanin - 4SH France - http://www.4sh.fr/
BordeauxJUG creator & leader - http://www.bordeauxjug.org/
Apache Ivy Creator - http://ant.apache.org/ivy/

Re: Ivy Indexer

Posted by Jon Schneider <jk...@gmail.com>.

> When you say "anywhere you choose", is it limited to a location on the
> filesystem? Or do you intend to make use of ivy repositories
access/publish
> mechanism to store the index remotely? With filesystem only the usage
> sounds rather limited. With ivy repository mechanism you can store your
> index on the same kind of store as where you put your modules, but you
will
> need a more advanced syntax to configure it, and more advanced
 > implementation.

Right now it is limited to locations on the filesystem.  I agree, the
repository mechanism would be more flexible, but I do need to evaluate the
performance of storing/reading the index across different storage mediums.

> So you will have to deal with index locking during updates, which may
> become a contention point, and be difficult to implement if you want to
 > allow using any repo to store the index.

Thanks for bringing this point up.  Lucene offers a write lock contention
mechanism, but I do need to tread carefully here.

> If the index grows, accessing the index from a remote box may become long.
> If you think big, you will have to find a way to transfer index updates to
> the clients which is optimizing the network, such as transferring diffs or
> something similar. But this becomes difficult to implement, unless you
want
> to rely on existing technology for that (such as a SCM).

I am having trouble trying to manufacture a scalability problem here (with
my unscientific approach).  I am up to 1,149 jars containing class types
with over 28,700 types in my test repository and the index is at 39 mb.
 I've pushed the index out on a remote filesystem, and the quick search
opens the index in 219 ms.  After the index reader is opened, subsequent
searches return in the microsecond range until the reader becomes stale from
a commit and is reopened.

Interesting point about the growth of the index based on the topology of the
repository:  modules with hundreds or thousands of revisions (e.g. nightly
builds) do not add much bulk to the index because there is so much overlap
in type names across the builds.  The duplicate type names get optimized
down.


The last time I worked with Lucene we implemented a such diff and publish
> mecanism for Lucene indexes, and it was working quite well. Solr does have
> a
> mecanism for such things too, but the last time I checked it was just
> relying
>  on rsync. If somebody is interested I can take some time to explain it
> here.
>

Not totally convinced that a scalability problem is out of the question, I'm
interested in what you have to offer on this point, Nicolas.

Re: Ivy Indexer

Posted by Nicolas Lalevée <ni...@hibnet.org>.

On Tuesday 17 November 2009 15:18:13 Xavier Hanin wrote:
> 2009/11/16 Jon Schneider <jk...@gmail.com>
>
> > On Sat, Nov 14, 2009 at 9:42 AM, Xavier Hanin <xavier.hanin@gmail.com
> >
> > >wrote:
> > >
> > > One thing I'm not sure to fully understand: it seems that you plan to
> >
> > store
> >
> > > the index on the client (say the developer's box), according to your
> > > example
> > > with dir="${ivy.settings.dir}/index". But it also seems like every
> > > client will have the responsibility to maintain the index, is that
> > > right?
> >
> > Sound's
> >
> > > strange, there's probably something I miss?
> >
> > As a best practice recommendation, I would suggest that the index be
> > stored on the same box as the repository (of course, I think the index
> > task and the
> > proxying resolver should allow for the index to be stored anywhere you
> > choose).
>
> When you say "anywhere you choose", is it limited to a location on the
> filesystem? Or do you intend to make use of ivy repositories access/publish
> mechanism to store the index remotely? With filesystem only the usage
> sounds rather limited. With ivy repository mechanism you can store your
> index on the same kind of store as where you put your modules, but you will
> need a more advanced syntax to configure it, and more advanced
> implementation.
>
> >  I do not see any reason for there to be more than one index per
> > repository.
> >
> > This single index should be as close to representative of the real-time
> > state of the repository as possible.  In the case where repository
> > artifacts
> > are added mainly through publish/deliver/install and these tasks are
> > routed through the indexing proxy, the index will always match the
> > real-time state of the repository.  In the case where repository
> > artifacts are added manually by a repository administrator, the index
> > will lack the types defined in these artifacts until the administrator
> > runs the index task.
> >
> > Thus, the responsibility for maintaining the index belongs with the some
> > combination of the proxying resolver and the index task, their precise
> > relationship being defined by the use case.
>
> So you will have to deal with index locking during updates, which may
> become a contention point, and be difficult to implement if you want to
> allow using any repo to store the index.
>
> > All clients would then read from the same index.  The Quick Search
> > feature only reads the index, it does not modify it.
>
> If the index grows, accessing the index from a remote box may become long.
> If you think big, you will have to find a way to transfer index updates to
> the clients which is optimizing the network, such as transferring diffs or
> something similar. But this becomes difficult to implement, unless you want
> to rely on existing technology for that (such as a SCM).

The last time I worked with Lucene we implemented a such diff and publish 
mecanism for Lucene indexes, and it was working quite well. Solr does have a 
mecanism for such things too, but the last time I checked it was just relying 
on rsync. If somebody is interested I can take some time to explain it here.

Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy Indexer

Posted by Xavier Hanin <xa...@gmail.com>.

2009/11/16 Jon Schneider <jk...@gmail.com>

> On Sat, Nov 14, 2009 at 9:42 AM, Xavier Hanin <xavier.hanin@gmail.com
> >wrote:
>
> > One thing I'm not sure to fully understand: it seems that you plan to
> store
> > the index on the client (say the developer's box), according to your
> > example
> > with dir="${ivy.settings.dir}/index". But it also seems like every client
> > will have the responsibility to maintain the index, is that right?
> Sound's
> > strange, there's probably something I miss?
> >
>
> As a best practice recommendation, I would suggest that the index be stored
> on the same box as the repository (of course, I think the index task and
> the
> proxying resolver should allow for the index to be stored anywhere you
> choose).

When you say "anywhere you choose", is it limited to a location on the
filesystem? Or do you intend to make use of ivy repositories access/publish
mechanism to store the index remotely? With filesystem only the usage sounds
rather limited. With ivy repository mechanism you can store your index on
the same kind of store as where you put your modules, but you will need a
more advanced syntax to configure it, and more advanced implementation.


>  I do not see any reason for there to be more than one index per
> repository.
>
> This single index should be as close to representative of the real-time
> state of the repository as possible.  In the case where repository
> artifacts
> are added mainly through publish/deliver/install and these tasks are routed
> through the indexing proxy, the index will always match the real-time state
> of the repository.  In the case where repository artifacts are added
> manually by a repository administrator, the index will lack the types
> defined in these artifacts until the administrator runs the index task.
>
> Thus, the responsibility for maintaining the index belongs with the some
> combination of the proxying resolver and the index task, their precise
> relationship being defined by the use case.
>
So you will have to deal with index locking during updates, which may become
a contention point, and be difficult to implement if you want to allow using
any repo to store the index.

>
> All clients would then read from the same index.  The Quick Search feature
> only reads the index, it does not modify it.
>
If the index grows, accessing the index from a remote box may become long.
If you think big, you will have to find a way to transfer index updates to
the clients which is optimizing the network, such as transferring diffs or
something similar. But this becomes difficult to implement, unless you want
to rely on existing technology for that (such as a SCM).

So IMO for this to be useful for most situations it needs to designed
carefully.

Xavier
-- 
Xavier Hanin - 4SH France - http://www.4sh.fr/
BordeauxJUG creator & leader - http://www.bordeauxjug.org/
Apache Ivy Creator - http://ant.apache.org/ivy/

Re: Ivy Indexer

Posted by Jon Schneider <jk...@gmail.com>.

On Sat, Nov 14, 2009 at 9:42 AM, Xavier Hanin <xa...@gmail.com>wrote:

> One thing I'm not sure to fully understand: it seems that you plan to store
> the index on the client (say the developer's box), according to your
> example
> with dir="${ivy.settings.dir}/index". But it also seems like every client
> will have the responsibility to maintain the index, is that right? Sound's
> strange, there's probably something I miss?
>

As a best practice recommendation, I would suggest that the index be stored
on the same box as the repository (of course, I think the index task and the
proxying resolver should allow for the index to be stored anywhere you
choose).  I do not see any reason for there to be more than one index per
repository.

This single index should be as close to representative of the real-time
state of the repository as possible.  In the case where repository artifacts
are added mainly through publish/deliver/install and these tasks are routed
through the indexing proxy, the index will always match the real-time state
of the repository.  In the case where repository artifacts are added
manually by a repository administrator, the index will lack the types
defined in these artifacts until the administrator runs the index task.

Thus, the responsibility for maintaining the index belongs with the some
combination of the proxying resolver and the index task, their precise
relationship being defined by the use case.

All clients would then read from the same index.  The Quick Search feature
only reads the index, it does not modify it.

On Sun, Nov 15, 2009 at 6:04 AM, Gilles Scokart <gs...@gmail.com> wrote:

> I have already some code doing that.  If you are interrested, you can find
> it at http://sourceforge.net/projects/deco-project/.
>
> For the moment it can just check the compile time dependencies.  I would
> like to try to play further with it in order to check also runtime dep.
>  The
> idea that I have is to write a kind of byte code interpreter that either
> simulate the execution of the simple java code, either fall back to real
> execution of some piece of code when it is too dynamic (with the help of
> some external configuration).

The compile time dependencies (Java types, not artifacts) are already
computed by Eclipse and made available via the "search engine" API.  Again,
I haven't found a public facing interface to this yet, but I haven't looked
all that intensively.  There may not be one.  Runtime dependencies are very
tricky.  I'll be interested in your thoughts on the byte code interpreter.
 The only idea that has occurred to me is using the reports generated by
Clover.

Jon

Re: Ivy Indexer

Posted by Xavier Hanin <xa...@gmail.com>.

One thing I'm not sure to fully understand: it seems that you plan to store
the index on the client (say the developer's box), according to your example
with dir="${ivy.settings.dir}/index". But it also seems like every client
will have the responsibility to maintain the index, is that right? Sound's
strange, there's probably something I miss?

Having already thought about that previously, I'm not sure having using an
index on the client is the best way to go. It requires either to maintain
the index on the client, which can take a very long time, or to download the
index on the client (maven indexes approach IIUC), which is not optimizing
the network transport, such indexes can become very large if you want a lot
of useful information. So IMHO this should be something part of a repository
manager, providing a remote API (I thought of a rest API) queried by the
clients (ivy, ...). The problem with that approach is that you require to
install a new server side tool on your repository, which is not as easy as
using a basic shared filesystem. But I think nowadays most people already
use dedicated resource to host their own module repository (http server,
svn, ...), so having to deploy a war somewhere to have quick search and
advanced features like the organize dependencies you suggest is a reasonable
tradeoff IMO.

What do you think?

Xavier

On Sat, Nov 14, 2009 at 16:08, Jon Schneider <jk...@gmail.com> wrote:

> That is exactly right.  It is also the first step in two of my other future
> goals:
>
> 1.  Unused dependency analysis.  I realize this won't ever be perfect since
> there is no limit to the way dependencies can be used (sometimes
> exclusively) in specific metadata like Spring context files, but
> nevertheless, it would be a good watermark for unused dependencies.
>
> 2.  "Organize Dependencies", hooking the Eclipse "search engine" API, if I
> can find a public facing API interface to it.  It would be awesome to proxy
> the Organize Imports functionality in Eclipse with this add dependency (and
> remove unnecessary dependency) functionality.
>
> Jon
>
> 2009/11/13 Nicolas Lalevée <ni...@hibnet.org>
>
> > On Friday 13 November 2009 10:24:56 Gilles Scokart wrote:
> > > Seems nice.  But I'm not sure I understand what it will be used for.
> > > What would be the user interface to read the index ?
> >
> > The use case is pretty simple: I work on a project with no dependency.
> Then
> > I
> > know that there some cool stuff in commons-io, I want to use FileUtils
> for
> > instance. More than trying to find the exact organisation+module name
> > (commons-io/commons-io or apache/commons-io or
> > org.apache.commons/commons-io
> > or org.apache.commons/io, etc....), I would open a search windows where I
> > put "FilesUtils" in a search field, and it would find the proper
> > organisation
> > and module names. Then there would be a "add dependency" button which
> will
> > add it to the ivy.xml of the project.
> >
> > Nicolas
> >
> >
> > >
> > >
> > > Gilles Scokart
> > >
> > >
> > > 2009/11/11 Jon Schneider <jk...@gmail.com>
> > >
> > > > I've been thinking about IVYDE-134 (Quick Search feature for
> > dependencies
> > > > in
> > > > repositories) and related IVY-866. If we add support for the Nexus
> > > > Indexer (which would be nice in its own right), we would still be
> > lacking
> > > > this feature for Ivy repositories. Also, what about ivysettings whose
> > > > default resolver is a chain resolver of a Maven repository and an Ivy
> > > > repository? In
> > > > this case, without some all-encompassing index, the quick search
> > feature
> > > > would find Java types in only the Maven repository within the chain
> > > > resolver, which I think would be counterintuitive to a user.
> > > >
> > > > My first thought was to build an extension to Nexus or Archiva for
> Ivy,
> > > > but somehow I just really dislike the idea of making an otherwise
> > > > stateless repository stateful (or should I say, having a manager,
> > however
> > > > thin, continuously running to proxy modifications to the repository).
> > > > Also, these two products are so Maven-centric (due to their intended
> > use)
> > > > that any extension would amount to an abuse of their intended use.
> > > >
> > > > So my compromising proposal is centered around a Lucene index that
> > should
> > > > be
> > > > modified (1) whenever a deliver/publish/install task is ran. Also,
> > since
> > > > nothing stops a repository administrator from manually
> > > > deleting/adding/updating files in the repository, we should provide
> (2)
> > a
> > > > new <ivy:index> task.
> > > >
> > > > (1) is accomplished through a new resolver type extending from
> > > > ChainResolver
> > > > that proxies publishing to its delegate resolvers, indexing the
> > published
> > > > artifacts in the process. As an example, adding this proxy would look
> > > > like this in ivysettings.xml:
> > > >
> > > > <resolvers>
> > > > <indexed name="indexable" index="${ivy.settings.dir}/index">
> > > > <filesystem name="1">
> > > > <ivy
> > > >
> > pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/
> > > >> <artifact
> > > >
> > > >
> > pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[r
> > > >evision].[ext]"/> </filesystem>
> > > > <!-- other resolvers here... -->
> > > > </indexed>
> > > > </resolvers>
> > > >
> > > > (2) allows a repository administrator to force clean the index via an
> > Ant
> > > > task when it is known to be stale. It also provides an alternative to
> > > > using the proxy mechanism described in (1); the index task could be
> run
> > > > periodically (e.g. nightly) as a task on a continuous integration
> tool.
> > > >
> > > > The index task itself explores the repository, opening jars and
> listing
> > > > the fully qualified types found in each jar in the index and
> > associating
> > > > these types with a particular ModuleRevisionId. With the code I have
> > > > written so far, I have been able to index up to 10,000 jars in less
> > than
> > > > 10 seconds when the index task is running against a repository on the
> > > > same machine (indexing a repository through a network path slows down
> > > > considerably).
> > > >
> > > > IvyDE can then search for types against the optimized Lucene index,
> > > > making it very fast.
> > > >
> > > > Thoughts on this approach?
> > > > Jon
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> > For additional commands, e-mail: dev-help@ant.apache.org
> >
> >
>



-- 
Xavier Hanin - 4SH France - http://www.4sh.fr/
BordeauxJUG creator & leader - http://www.bordeauxjug.org/
Apache Ivy Creator - http://ant.apache.org/ivy/

Re: Ivy Indexer

Posted by Jon Schneider <jk...@gmail.com>.

That is exactly right.  It is also the first step in two of my other future
goals:

1.  Unused dependency analysis.  I realize this won't ever be perfect since
there is no limit to the way dependencies can be used (sometimes
exclusively) in specific metadata like Spring context files, but
nevertheless, it would be a good watermark for unused dependencies.

2.  "Organize Dependencies", hooking the Eclipse "search engine" API, if I
can find a public facing API interface to it.  It would be awesome to proxy
the Organize Imports functionality in Eclipse with this add dependency (and
remove unnecessary dependency) functionality.

Jon

2009/11/13 Nicolas Lalevée <ni...@hibnet.org>

> On Friday 13 November 2009 10:24:56 Gilles Scokart wrote:
> > Seems nice.  But I'm not sure I understand what it will be used for.
> > What would be the user interface to read the index ?
>
> The use case is pretty simple: I work on a project with no dependency. Then
> I
> know that there some cool stuff in commons-io, I want to use FileUtils for
> instance. More than trying to find the exact organisation+module name
> (commons-io/commons-io or apache/commons-io or
> org.apache.commons/commons-io
> or org.apache.commons/io, etc....), I would open a search windows where I
> put "FilesUtils" in a search field, and it would find the proper
> organisation
> and module names. Then there would be a "add dependency" button which will
> add it to the ivy.xml of the project.
>
> Nicolas
>
>
> >
> >
> > Gilles Scokart
> >
> >
> > 2009/11/11 Jon Schneider <jk...@gmail.com>
> >
> > > I've been thinking about IVYDE-134 (Quick Search feature for
> dependencies
> > > in
> > > repositories) and related IVY-866. If we add support for the Nexus
> > > Indexer (which would be nice in its own right), we would still be
> lacking
> > > this feature for Ivy repositories. Also, what about ivysettings whose
> > > default resolver is a chain resolver of a Maven repository and an Ivy
> > > repository? In
> > > this case, without some all-encompassing index, the quick search
> feature
> > > would find Java types in only the Maven repository within the chain
> > > resolver, which I think would be counterintuitive to a user.
> > >
> > > My first thought was to build an extension to Nexus or Archiva for Ivy,
> > > but somehow I just really dislike the idea of making an otherwise
> > > stateless repository stateful (or should I say, having a manager,
> however
> > > thin, continuously running to proxy modifications to the repository).
> > > Also, these two products are so Maven-centric (due to their intended
> use)
> > > that any extension would amount to an abuse of their intended use.
> > >
> > > So my compromising proposal is centered around a Lucene index that
> should
> > > be
> > > modified (1) whenever a deliver/publish/install task is ran. Also,
> since
> > > nothing stops a repository administrator from manually
> > > deleting/adding/updating files in the repository, we should provide (2)
> a
> > > new <ivy:index> task.
> > >
> > > (1) is accomplished through a new resolver type extending from
> > > ChainResolver
> > > that proxies publishing to its delegate resolvers, indexing the
> published
> > > artifacts in the process. As an example, adding this proxy would look
> > > like this in ivysettings.xml:
> > >
> > > <resolvers>
> > > <indexed name="indexable" index="${ivy.settings.dir}/index">
> > > <filesystem name="1">
> > > <ivy
> > >
> pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/
> > >> <artifact
> > >
> > >
> pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[r
> > >evision].[ext]"/> </filesystem>
> > > <!-- other resolvers here... -->
> > > </indexed>
> > > </resolvers>
> > >
> > > (2) allows a repository administrator to force clean the index via an
> Ant
> > > task when it is known to be stale. It also provides an alternative to
> > > using the proxy mechanism described in (1); the index task could be run
> > > periodically (e.g. nightly) as a task on a continuous integration tool.
> > >
> > > The index task itself explores the repository, opening jars and listing
> > > the fully qualified types found in each jar in the index and
> associating
> > > these types with a particular ModuleRevisionId. With the code I have
> > > written so far, I have been able to index up to 10,000 jars in less
> than
> > > 10 seconds when the index task is running against a repository on the
> > > same machine (indexing a repository through a network path slows down
> > > considerably).
> > >
> > > IvyDE can then search for types against the optimized Lucene index,
> > > making it very fast.
> > >
> > > Thoughts on this approach?
> > > Jon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
> For additional commands, e-mail: dev-help@ant.apache.org
>
>

Re: Ivy Indexer

Posted by Nicolas Lalevée <ni...@hibnet.org>.

On Friday 13 November 2009 10:24:56 Gilles Scokart wrote:
> Seems nice.  But I'm not sure I understand what it will be used for.
> What would be the user interface to read the index ?

The use case is pretty simple: I work on a project with no dependency. Then I 
know that there some cool stuff in commons-io, I want to use FileUtils for 
instance. More than trying to find the exact organisation+module name 
(commons-io/commons-io or apache/commons-io or org.apache.commons/commons-io 
or org.apache.commons/io, etc....), I would open a search windows where I 
put "FilesUtils" in a search field, and it would find the proper organisation 
and module names. Then there would be a "add dependency" button which will 
add it to the ivy.xml of the project.

Nicolas


>
>
> Gilles Scokart
>
>
> 2009/11/11 Jon Schneider <jk...@gmail.com>
>
> > I've been thinking about IVYDE-134 (Quick Search feature for dependencies
> > in
> > repositories) and related IVY-866. If we add support for the Nexus
> > Indexer (which would be nice in its own right), we would still be lacking
> > this feature for Ivy repositories. Also, what about ivysettings whose
> > default resolver is a chain resolver of a Maven repository and an Ivy
> > repository? In
> > this case, without some all-encompassing index, the quick search feature
> > would find Java types in only the Maven repository within the chain
> > resolver, which I think would be counterintuitive to a user.
> >
> > My first thought was to build an extension to Nexus or Archiva for Ivy,
> > but somehow I just really dislike the idea of making an otherwise
> > stateless repository stateful (or should I say, having a manager, however
> > thin, continuously running to proxy modifications to the repository).
> > Also, these two products are so Maven-centric (due to their intended use)
> > that any extension would amount to an abuse of their intended use.
> >
> > So my compromising proposal is centered around a Lucene index that should
> > be
> > modified (1) whenever a deliver/publish/install task is ran. Also, since
> > nothing stops a repository administrator from manually
> > deleting/adding/updating files in the repository, we should provide (2) a
> > new <ivy:index> task.
> >
> > (1) is accomplished through a new resolver type extending from
> > ChainResolver
> > that proxies publishing to its delegate resolvers, indexing the published
> > artifacts in the process. As an example, adding this proxy would look
> > like this in ivysettings.xml:
> >
> > <resolvers>
> > <indexed name="indexable" index="${ivy.settings.dir}/index">
> > <filesystem name="1">
> > <ivy
> > pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/
> >> <artifact
> >
> > pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[r
> >evision].[ext]"/> </filesystem>
> > <!-- other resolvers here... -->
> > </indexed>
> > </resolvers>
> >
> > (2) allows a repository administrator to force clean the index via an Ant
> > task when it is known to be stale. It also provides an alternative to
> > using the proxy mechanism described in (1); the index task could be run
> > periodically (e.g. nightly) as a task on a continuous integration tool.
> >
> > The index task itself explores the repository, opening jars and listing
> > the fully qualified types found in each jar in the index and associating
> > these types with a particular ModuleRevisionId. With the code I have
> > written so far, I have been able to index up to 10,000 jars in less than
> > 10 seconds when the index task is running against a repository on the
> > same machine (indexing a repository through a network path slows down
> > considerably).
> >
> > IvyDE can then search for types against the optimized Lucene index,
> > making it very fast.
> >
> > Thoughts on this approach?
> > Jon

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org

Re: Ivy Indexer

Posted by Gilles Scokart <gs...@gmail.com>.

Seems nice.  But I'm not sure I understand what it will be used for.
What would be the user interface to read the index ?


Gilles Scokart


2009/11/11 Jon Schneider <jk...@gmail.com>

> I've been thinking about IVYDE-134 (Quick Search feature for dependencies
> in
> repositories) and related IVY-866. If we add support for the Nexus Indexer
> (which would be nice in its own right), we would still be lacking this
> feature for Ivy repositories. Also, what about ivysettings whose default
> resolver is a chain resolver of a Maven repository and an Ivy repository?
> In
> this case, without some all-encompassing index, the quick search feature
> would find Java types in only the Maven repository within the chain
> resolver, which I think would be counterintuitive to a user.
>
> My first thought was to build an extension to Nexus or Archiva for Ivy, but
> somehow I just really dislike the idea of making an otherwise stateless
> repository stateful (or should I say, having a manager, however thin,
> continuously running to proxy modifications to the repository). Also, these
> two products are so Maven-centric (due to their intended use) that any
> extension would amount to an abuse of their intended use.
>
> So my compromising proposal is centered around a Lucene index that should
> be
> modified (1) whenever a deliver/publish/install task is ran. Also, since
> nothing stops a repository administrator from manually
> deleting/adding/updating files in the repository, we should provide (2) a
> new <ivy:index> task.
>
> (1) is accomplished through a new resolver type extending from
> ChainResolver
> that proxies publishing to its delegate resolvers, indexing the published
> artifacts in the process. As an example, adding this proxy would look like
> this in ivysettings.xml:
>
> <resolvers>
> <indexed name="indexable" index="${ivy.settings.dir}/index">
> <filesystem name="1">
> <ivy
> pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/>
> <artifact
>
> pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[revision].[ext]"/>
> </filesystem>
> <!-- other resolvers here... -->
> </indexed>
> </resolvers>
>
> (2) allows a repository administrator to force clean the index via an Ant
> task when it is known to be stale. It also provides an alternative to using
> the proxy mechanism described in (1); the index task could be run
> periodically (e.g. nightly) as a task on a continuous integration tool.
>
> The index task itself explores the repository, opening jars and listing the
> fully qualified types found in each jar in the index and associating these
> types with a particular ModuleRevisionId. With the code I have written so
> far, I have been able to index up to 10,000 jars in less than 10 seconds
> when the index task is running against a repository on the same machine
> (indexing a repository through a network path slows down considerably).
>
> IvyDE can then search for types against the optimized Lucene index, making
> it very fast.
>
> Thoughts on this approach?
> Jon
>

Re: Ivy Indexer

Posted by Nicolas Lalevée <ni...@hibnet.org>.

Le 11 nov. 2009 à 16:21, Jon Schneider a écrit :

> I've been thinking about IVYDE-134 (Quick Search feature for dependencies in
> repositories) and related IVY-866. If we add support for the Nexus Indexer
> (which would be nice in its own right), we would still be lacking this
> feature for Ivy repositories. Also, what about ivysettings whose default
> resolver is a chain resolver of a Maven repository and an Ivy repository? In
> this case, without some all-encompassing index, the quick search feature
> would find Java types in only the Maven repository within the chain
> resolver, which I think would be counterintuitive to a user.
> 
> My first thought was to build an extension to Nexus or Archiva for Ivy, but
> somehow I just really dislike the idea of making an otherwise stateless
> repository stateful (or should I say, having a manager, however thin,
> continuously running to proxy modifications to the repository). Also, these
> two products are so Maven-centric (due to their intended use) that any
> extension would amount to an abuse of their intended use.
> 
> So my compromising proposal is centered around a Lucene index that should be
> modified (1) whenever a deliver/publish/install task is ran. Also, since
> nothing stops a repository administrator from manually
> deleting/adding/updating files in the repository, we should provide (2) a
> new <ivy:index> task.
> 
> (1) is accomplished through a new resolver type extending from ChainResolver
> that proxies publishing to its delegate resolvers, indexing the published
> artifacts in the process. As an example, adding this proxy would look like
> this in ivysettings.xml:
> 
> <resolvers>
> <indexed name="indexable" index="${ivy.settings.dir}/index">
> <filesystem name="1">
> <ivy
> pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/>
> <artifact
> pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[revision].[ext]"/>
> </filesystem>
> <!-- other resolvers here... -->
> </indexed>
> </resolvers>
> 
> (2) allows a repository administrator to force clean the index via an Ant
> task when it is known to be stale. It also provides an alternative to using
> the proxy mechanism described in (1); the index task could be run
> periodically (e.g. nightly) as a task on a continuous integration tool.
> 
> The index task itself explores the repository, opening jars and listing the
> fully qualified types found in each jar in the index and associating these
> types with a particular ModuleRevisionId. With the code I have written so
> far, I have been able to index up to 10,000 jars in less than 10 seconds
> when the index task is running against a repository on the same machine
> (indexing a repository through a network path slows down considerably).
> 
> IvyDE can then search for types against the optimized Lucene index, making
> it very fast.
> 
> Thoughts on this approach?

Nothing to say special apart that it sounds good :)

I am just not sure about how to declare it in the ivysettings. Maybe we could declare them just like the caches:
<resolvers>
<filesystem name="1" index="index1">
<ivy pattern="${ivy.settings.dir}/[organisation]/[module]/ivy-[revision].xml"/>
<artifact pattern="${ivy.settings.dir}/[organisation]/[module]/[type]/[artifact]-[revision].[ext]"/>
</filesystem>
</resolvers>
<indexes>
  <index name="index1" dir="${ivy.settings.dir}/index" />
</indexes>

Nicolas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ant.apache.org
For additional commands, e-mail: dev-help@ant.apache.org