You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@maven.apache.org by Laurent Pellegrino <la...@gmail.com> on 2012/04/17 09:11:54 UTC

Maven Indexer, filter by period of time and classNames field

Hi all,

I am trying to use Apache maven indexer to retrieve artifacts whose
for example their lastModified field indicates a date between January
and February of this year (1). Then, for each artifact retrieved, I
would like to get the classNames field value (2).

To achieve it I tried to use the API provided with Maven indexer and
the Lucene API but with both methods it seems impossible to fullfill
requirements (1) and (2) at the same time.

By using the Maven indexer API (c.f. [1]) I retrieve artifacts for the
desired period of time but when I access to the field classNames I get
null instead of the right value for artifacts with packaging of type
JAR. However, I have specified a JarFileContentsIndexCreator for
indexers. Is there a bug during reconstruction of artifacts info, is
it a correct behavior or do I miss something?

My second idea was to use directly Lucene to retrieve what I need but
according to the implementation MinimalArtifactInfoIndexCreator
declares the field lastModified (FLD_LAST_MODIFIED) as being not
indexed. Thus, it is impossible to perform a search by using the
efficient NumericRangeFilter predicate. Moreover, in terms of
execution time this method would be better than the first solution
that uses an ArtifactFilter which is iteratively applied among all the
documents. Is it not possible to index this field?

More generally, does someone has a method to achieve requirement (1) and (2)?

[1] http://pastebin.com/raw.php?i=qaNXjWT5

Thanks.

Kind Regards,

Laurent

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Indexer, filter by period of time and classNames field

Posted by Laurent Pellegrino <la...@gmail.com>.
Hi Tamás,

Thanks for your answer. However, can you give me an example to use it
with a nexus repository (I mean how to create the indexing context and
which kind of URL to pass)?

Is there a nexus repository available to do what I would like to do
with Maven Central?

Kind Regards,

Laurent

On Tue, Apr 17, 2012 at 11:46 AM, Tamás Cservenák <ta...@cservenak.net> wrote:
> Your example accesses Central Repository Index
> (http://repo1.maven.org/maven2), and
> due to bandwidth considerations, it does NOT index, hence index chunks
> you download
> does not have classNames in it.
>
> Otherwise, your code looks good. Try the same against a Nexus
> repository with some
> JARs deployed, it will work.
>
> In short, Central repository uses "min" and "maven-plugin" set of
> creators, while Nexus
> uses "full" set of index creators.
>
> you can use Luke http://www.getopt.org/luke/ to inspect what
> downloaded index actually contains
> "min" is the bare minimum, and is always present, while others are
> "optional" creators.
>
> Hope helps,
> ~t~
>
> On Tue, Apr 17, 2012 at 9:11 AM, Laurent Pellegrino
> <la...@gmail.com> wrote:
>> Hi all,
>>
>> I am trying to use Apache maven indexer to retrieve artifacts whose
>> for example their lastModified field indicates a date between January
>> and February of this year (1). Then, for each artifact retrieved, I
>> would like to get the classNames field value (2).
>>
>> To achieve it I tried to use the API provided with Maven indexer and
>> the Lucene API but with both methods it seems impossible to fullfill
>> requirements (1) and (2) at the same time.
>>
>> By using the Maven indexer API (c.f. [1]) I retrieve artifacts for the
>> desired period of time but when I access to the field classNames I get
>> null instead of the right value for artifacts with packaging of type
>> JAR. However, I have specified a JarFileContentsIndexCreator for
>> indexers. Is there a bug during reconstruction of artifacts info, is
>> it a correct behavior or do I miss something?
>>
>> My second idea was to use directly Lucene to retrieve what I need but
>> according to the implementation MinimalArtifactInfoIndexCreator
>> declares the field lastModified (FLD_LAST_MODIFIED) as being not
>> indexed. Thus, it is impossible to perform a search by using the
>> efficient NumericRangeFilter predicate. Moreover, in terms of
>> execution time this method would be better than the first solution
>> that uses an ArtifactFilter which is iteratively applied among all the
>> documents. Is it not possible to index this field?
>>
>> More generally, does someone has a method to achieve requirement (1) and (2)?
>>
>> [1] http://pastebin.com/raw.php?i=qaNXjWT5
>>
>> Thanks.
>>
>> Kind Regards,
>>
>> Laurent
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
>> For additional commands, e-mail: dev-help@maven.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org


Re: Maven Indexer, filter by period of time and classNames field

Posted by Tamás Cservenák <ta...@cservenak.net>.
Your example accesses Central Repository Index
(http://repo1.maven.org/maven2), and
due to bandwidth considerations, it does NOT index, hence index chunks
you download
does not have classNames in it.

Otherwise, your code looks good. Try the same against a Nexus
repository with some
JARs deployed, it will work.

In short, Central repository uses "min" and "maven-plugin" set of
creators, while Nexus
uses "full" set of index creators.

you can use Luke http://www.getopt.org/luke/ to inspect what
downloaded index actually contains
"min" is the bare minimum, and is always present, while others are
"optional" creators.

Hope helps,
~t~

On Tue, Apr 17, 2012 at 9:11 AM, Laurent Pellegrino
<la...@gmail.com> wrote:
> Hi all,
>
> I am trying to use Apache maven indexer to retrieve artifacts whose
> for example their lastModified field indicates a date between January
> and February of this year (1). Then, for each artifact retrieved, I
> would like to get the classNames field value (2).
>
> To achieve it I tried to use the API provided with Maven indexer and
> the Lucene API but with both methods it seems impossible to fullfill
> requirements (1) and (2) at the same time.
>
> By using the Maven indexer API (c.f. [1]) I retrieve artifacts for the
> desired period of time but when I access to the field classNames I get
> null instead of the right value for artifacts with packaging of type
> JAR. However, I have specified a JarFileContentsIndexCreator for
> indexers. Is there a bug during reconstruction of artifacts info, is
> it a correct behavior or do I miss something?
>
> My second idea was to use directly Lucene to retrieve what I need but
> according to the implementation MinimalArtifactInfoIndexCreator
> declares the field lastModified (FLD_LAST_MODIFIED) as being not
> indexed. Thus, it is impossible to perform a search by using the
> efficient NumericRangeFilter predicate. Moreover, in terms of
> execution time this method would be better than the first solution
> that uses an ArtifactFilter which is iteratively applied among all the
> documents. Is it not possible to index this field?
>
> More generally, does someone has a method to achieve requirement (1) and (2)?
>
> [1] http://pastebin.com/raw.php?i=qaNXjWT5
>
> Thanks.
>
> Kind Regards,
>
> Laurent
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
> For additional commands, e-mail: dev-help@maven.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@maven.apache.org
For additional commands, e-mail: dev-help@maven.apache.org