You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2003/09/04 13:18:23 UTC
Lucene app to index Java code
Hello,
Has anyone written an application that uses Lucene to index Java code,
either from the source .java files, or compiled .class files?
I need to create a searchable index for Java code, so that I can use
that index to check if classes or methods with certain functionality
have already been written. This is an effort to remove code
duplication and do more code re-use. If this application can also
index Javadocs, even better!
I think I heard of somebody doing this already. Kevin Burton?
This is something that would fit nicely in Erik's Ant IndexTask in
Lucene Sandbox), I think.
Thank you,
Otis
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
Re: Lucene app to index Java code
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Otis Gospodnetic wrote:
>Hello,
>
>Has anyone written an application that uses Lucene to index Java code,
>either from the source .java files, or compiled .class files?
>
>I need to create a searchable index for Java code, so that I can use
>that index to check if classes or methods with certain functionality
>have already been written. This is an effort to remove code
>duplication and do more code re-use. If this application can also
>index Javadocs, even better!
>
>I think I heard of somebody doing this already. Kevin Burton?
>
>
I was playing with it... blogged about it here...
http://www.peerfear.org/rss/permalink/2003/07/23/LuceneForSourceManagement/
>This is something that would fit nicely in Erik's Ant IndexTask in
>Lucene Sandbox), I think.
>
>
Yes... I was thinking about making an ant task for it or using someone
else's. One of the cool things would be direct integration within the IDE.
Also parsing the .java file into a token stream and then indexing the
tokens would make a blazingly fast doc completion facility....
Kevin
--
Help Support NewsMonster Development! Purchase NewsMonster PRO!
http://www.newsmonster.org/download-pro.html
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM - sfburtonator, Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4 DCAA 0303 3AC5 BD9D 7C4D
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
Re: Lucene app to index Java code
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Thursday, September 4, 2003, at 01:30 PM, Kevin A. Burton wrote:
>> - XDoclet could be used to sweep through Java code and build a
>> text/XML file as richly as you'd like from the information there
>> (complete with JavaDoc tags, which Zapata will miss :)), and then run
>> Lucene on the generated files. On a related note, the XDoclet2
>> architecture would streamline this even further by eliminating the
>> middle textual representation (QDox/XJavadoc reads Java as a "meta
>> data provider" and then a Lucene "plugin" indexes things). It could
>> be done without the intermediate text representation even in XDoclet
>> 1.2, but it would require coding a custom subtask and be slightly out
>> of the norm for XDoclet subtasks (but would work just fine).
>
> It would be faster to write a native doclet as this would remove the
> XML parse overhead... The whole point of this thing is that it needs
> to be fast!
Do you mean the Ant build file parsing? That would be the only XML
parsing in the equation I'm proposing, unless you did it the clunkiest
XDoclet 1.2 way of having an intermediate XML file.
As for speed.... QDox, I've heard, is the fastest option. javadoc is
the slowest parsing of the three I know of (javadoc, xjavadoc, qdox).
Erik
Re: Lucene app to index Java code
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Erik Hatcher wrote:
> A couple of thoughts on this:
>
> - Eclipse uses Lucene for its code indexing/searching (I learned this
> at the OSCON Keynote by Eclipse folks). Perhaps looking at how
> Eclipse does its thing would be useful even if not the solution.
>
> - XDoclet could be used to sweep through Java code and build a
> text/XML file as richly as you'd like from the information there
> (complete with JavaDoc tags, which Zapata will miss :)), and then run
> Lucene on the generated files. On a related note, the XDoclet2
> architecture would streamline this even further by eliminating the
> middle textual representation (QDox/XJavadoc reads Java as a "meta
> data provider" and then a Lucene "plugin" indexes things). It could
> be done without the intermediate text representation even in XDoclet
> 1.2, but it would require coding a custom subtask and be slightly out
> of the norm for XDoclet subtasks (but would work just fine).
It would be faster to write a native doclet as this would remove the XML
parse overhead... The whole point of this thing is that it needs to be
fast!
Kevin
--
Help Support NewsMonster Development! Purchase NewsMonster PRO!
http://www.newsmonster.org/download-pro.html
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM - sfburtonator, Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4 DCAA 0303 3AC5 BD9D 7C4D
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
Re: StandardTokenizer problem
Posted by petite_abeille <pe...@mac.com>.
On Thursday, Sep 4, 2003, at 16:07 Europe/Zurich, Nicolas Maisonneuve
wrote:
> "I.B.M" can be a host or acronym, so threre is a problem , no ?
Perhaps as far as this parser goes... but... in practice... '.M' is not
a valid TLD.
PA.
StandardTokenizer problem
Posted by Nicolas Maisonneuve <n....@HotPOP.com>.
hy ,
when i use standardTokenizer
for parse for example "I.B.M"
the type of the Token is HOST and not ACRONYM
WHY ???
in StandardTokenizer.jj
// acronyms: U.S.A., I.B.M., etc.
// use a post-filter to remove dots
| <ACRONYM: <ALPHA> "." (<ALPHA> ".")+ >
// hostname
| <HOST: <ALPHANUM> ("." <ALPHANUM>)+ >
"I.B.M" can be a host or acronym, so threre is a problem , no ?
----- Original Message -----
From: "petite_abeille" <pe...@mac.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, September 04, 2003 3:19 PM
Subject: Re: Lucene app to index Java code
> Hi Erik,
>
> On Thursday, Sep 4, 2003, at 15:03 Europe/Zurich, Erik Hatcher wrote:
>
> > - XDoclet could be used to sweep through Java code and build a
> > text/XML file as richly as you'd like from the information there
> > (complete with JavaDoc tags, which Zapata will miss :)),
>
> Correct. This happen to be on purpose :) Does XDoclet build an
> "intertwingled" object graph of your code along the way? Performing a
> plain search on a code base is pretty trivial... what seems to be more
> interesting would be to put that in context.
>
> Zapata does something along the line of what MagicHat does for
> Objective-C:
>
> http://homepage.mac.com/petite_abeille/MagicHat/
>
> But from the sound of what Otis is saying this is not what you guys are
> looking for... back to the pampa then...
>
> Cheers,
>
> PA.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
Re: Lucene app to index Java code
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Thursday, September 4, 2003, at 09:19 AM, petite_abeille wrote:
>> - XDoclet could be used to sweep through Java code and build a
>> text/XML file as richly as you'd like from the information there
>> (complete with JavaDoc tags, which Zapata will miss :)),
>
> Correct. This happen to be on purpose :) Does XDoclet build an
> "intertwingled" object graph of your code along the way? Performing a
> plain search on a code base is pretty trivial... what seems to be more
> interesting would be to put that in context.
Yes, XDoclet builds a complete object graph of all the source files you
hand it (as an Ant <fileset>). It actually even does binary class
interpretation for the information it needs to construct a full
object-graph if some dependencies are in the classpath of the taskdef
as well.
> Zapata does something along the line of what MagicHat does for
> Objective-C:
>
> http://homepage.mac.com/petite_abeille/MagicHat/
Very cool. You rock!
Erik
Re: Lucene app to index Java code
Posted by petite_abeille <pe...@mac.com>.
Hi Erik,
On Thursday, Sep 4, 2003, at 15:03 Europe/Zurich, Erik Hatcher wrote:
> - XDoclet could be used to sweep through Java code and build a
> text/XML file as richly as you'd like from the information there
> (complete with JavaDoc tags, which Zapata will miss :)),
Correct. This happen to be on purpose :) Does XDoclet build an
"intertwingled" object graph of your code along the way? Performing a
plain search on a code base is pretty trivial... what seems to be more
interesting would be to put that in context.
Zapata does something along the line of what MagicHat does for
Objective-C:
http://homepage.mac.com/petite_abeille/MagicHat/
But from the sound of what Otis is saying this is not what you guys are
looking for... back to the pampa then...
Cheers,
PA.
Re: Lucene app to index Java code
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
A couple of thoughts on this:
- Eclipse uses Lucene for its code indexing/searching (I learned this
at the OSCON Keynote by Eclipse folks). Perhaps looking at how Eclipse
does its thing would be useful even if not the solution.
- XDoclet could be used to sweep through Java code and build a text/XML
file as richly as you'd like from the information there (complete with
JavaDoc tags, which Zapata will miss :)), and then run Lucene on the
generated files. On a related note, the XDoclet2 architecture would
streamline this even further by eliminating the middle textual
representation (QDox/XJavadoc reads Java as a "meta data provider" and
then a Lucene "plugin" indexes things). It could be done without the
intermediate text representation even in XDoclet 1.2, but it would
require coding a custom subtask and be slightly out of the norm for
XDoclet subtasks (but would work just fine).
- My <index> task could be used, but it would be better to use
something that built a complete object-graph of all the source code you
want indexed, so that it can deal with base classes, inherited javadoc
tags, and other such interactions between classes you might want to
capture.
Erik
On Thursday, September 4, 2003, at 07:18 AM, Otis Gospodnetic wrote:
> Hello,
>
> Has anyone written an application that uses Lucene to index Java code,
> either from the source .java files, or compiled .class files?
>
> I need to create a searchable index for Java code, so that I can use
> that index to check if classes or methods with certain functionality
> have already been written. This is an effort to remove code
> duplication and do more code re-use. If this application can also
> index Javadocs, even better!
>
> I think I heard of somebody doing this already. Kevin Burton?
> This is something that would fit nicely in Erik's Ant IndexTask in
> Lucene Sandbox), I think.
>
> Thank you,
> Otis
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Lucene app to index Java code
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Otis Gospodnetic wrote:
>What you describe sounds interesting, but I was thinking more along the
>lines of this:
>
>http://www.peerfear.org/rss/permalink/2003/07/23/LuceneForSourceManagement/
>
>An application that I could use to find out whether I already have a
>'getStudents' or 'getStudents*' method somewhere in the source code,
>for instance, before I start writing it. As the code base grows
>larger, and as the team that works with it becomes bigger, this tools
>becomes more and more valuable.
>If this application could also index Javadocs, so that I can search for
>methods or classes that mention +student* +(database OR db) +update,
>that would be even better.
>
>Has anyone done this?
>Kevin Burton mentioned something similar to what I described above, at
>that URL, but it looks like he didn't make his application available.
>
>
It's just two source files + Lucene plus I didn't do all the work to
make it into an OSS package. 99% of OSS work isn't technical but
political, maintenance, etc..
If someone wants to start an OSS project for this and do all the grunt
work I will do the coding :) I don't know what parser I wnat to use to
tokenize the source but a Doclet would be perfect for this.... The only
problem is that this wouldn't allow full differential builds and would
slow down the generation....
Also it just dawned on me that the Emacs compile-internal function
parses stdout in the form of file:line# so this would make a great way
to integrate for us Emacs geeks.
Kevin
--
Help Support NewsMonster Development! Purchase NewsMonster PRO!
http://www.newsmonster.org/download-pro.html
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM - sfburtonator, Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4 DCAA 0303 3AC5 BD9D 7C4D
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
Re: Lucene app to index Java code
Posted by Otis Gospodnetic <ot...@yahoo.com>.
What you describe sounds interesting, but I was thinking more along the
lines of this:
http://www.peerfear.org/rss/permalink/2003/07/23/LuceneForSourceManagement/
An application that I could use to find out whether I already have a
'getStudents' or 'getStudents*' method somewhere in the source code,
for instance, before I start writing it. As the code base grows
larger, and as the team that works with it becomes bigger, this tools
becomes more and more valuable.
If this application could also index Javadocs, so that I can search for
methods or classes that mention +student* +(database OR db) +update,
that would be even better.
Has anyone done this?
Kevin Burton mentioned something similar to what I described above, at
that URL, but it looks like he didn't make his application available.
Thanks,
Otis
--- petite_abeille <pe...@mac.com> wrote:
> Hi Otis,
>
> On Thursday, Sep 4, 2003, Otis Gospodnetic wrote:
>
> > Has anyone written an application that uses Lucene to index Java
> code,
> > either from the source .java files, or compiled .class files?
>
> If you are talking about my ultra secret project "Zapata: Coding
> Mexican Style", then yes ;)
>
> But... it uses runtime information to reach its devious ends and is
> more like a documentation tool than anything else...
>
> Anyway, this is how it goes:
>
> Given a set of binary jar files it builds an object graph of the
> bytecode: packages, classes, methods and so on. Complete with
> interdependencies and other handy informations. The bytecode is also
> run through a decompiler and pretty printed to normalize the source.
> Code segments are attached and indexed alongside their owners (class
> or
> method). All this fully indexed, searchable and cross referenced.
>
> This is built upon the same engine used by ZOE, so the end result is
> very much along the lines of what ZOE does for email, but for code
> instead... fun, fun, fun ;)
>
> Cheers,
>
> PA.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
Re: Lucene app to index Java code
Posted by petite_abeille <pe...@mac.com>.
Hi Otis,
On Thursday, Sep 4, 2003, Otis Gospodnetic wrote:
> Has anyone written an application that uses Lucene to index Java code,
> either from the source .java files, or compiled .class files?
If you are talking about my ultra secret project "Zapata: Coding
Mexican Style", then yes ;)
But... it uses runtime information to reach its devious ends and is
more like a documentation tool than anything else...
Anyway, this is how it goes:
Given a set of binary jar files it builds an object graph of the
bytecode: packages, classes, methods and so on. Complete with
interdependencies and other handy informations. The bytecode is also
run through a decompiler and pretty printed to normalize the source.
Code segments are attached and indexed alongside their owners (class or
method). All this fully indexed, searchable and cross referenced.
This is built upon the same engine used by ZOE, so the end result is
very much along the lines of what ZOE does for email, but for code
instead... fun, fun, fun ;)
Cheers,
PA.