You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Axel <ax...@gmail.com> on 2005/08/31 21:20:39 UTC

Indexing source files

Hi

I'm quite new to lucene, and I'm looking for information, how I can
start implementing a search engine for code completion, phpdoc
hovering etc.
Currently I'm using an approach similar to ctags: http://ctags.sourceforge.net/,
which isn't very fast in the startup time for large projects.
For the PHP scripting language I've already created a parser which
extracts something like
function-, class-, methodnames, phpdoc sections... for a given file.
I've also created a reduced binary AST of the php file, which I would
like to store with the lucene document.

What's the best way to store all these informations in an lucene search index?
Something like:
Field.Text("content", <string of all parsed identifiers>);

or is it better to write something like this for every identifier::
Field.Keyword("function","myfunction");
Field.Keyword("class","myclass");
...

Is it possible to store the complete AST (in an serializable format)
in the document or should the extra information be stored for every
(function-)identifier (like for example number of arguments, source
start, source end, javadoc start, javadoc end,...)

Did someone already implemented something similar?
Is Lucene the right tool for such a task?

-- 
Axel Kramer
http://www.phpeclipse.de - PHP Eclipse Plugin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org