You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Spencer, Dave" <da...@lumos.com> on 2002/03/13 03:28:43 UTC

idea: lucene doclet for indexing javadoc better

One hassle/problem is that if a search engine (say...Lucene...)
is indexing javadoc (html generated from *.java),
it has to wade thru all kinds of junk to get at what's interesting.
And if you try to summarize the document by taking the
1st "n" words (after ignoring tags) you get something like
"Overview Package Class Use Deprecated Index PREV CLASS NEXT CLASS
FRAMES NO FRAMES SUMMARY: INNER | FIELD | CONSTR | METHOD DETAIL: FIELD
| CONSTR".

I've done a proof of concept of using the javadoc doclet api and having
an indexer keyed off of that to create a javadoc index, instead of 
spidering the output.
It's very prelim.
I was just wondering if this has been done before, or been discussed
before.

I guess the general principle is that it's always better to index the
orig
src of info and not the generated html. This is why lucene is much nicer
than
other engines (say, htdig), as the other engines seem to only be able to
spider.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>