You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@red-bean.com> on 2005/10/30 21:58:07 UTC

Repository Indexer demo (was Re: full text search source code, and change sets)

On Thu, 27 Oct 2005, Marcus Rueckert wrote:

> > sussman mentioned on irc and i think on the mailinglist to that they did
> > a integration of svn and pylucene via a post commit hook script.

OK, I cleaned it up, and the lucene-libsvn_fs demo works.  You can
check out the project here:

  http://svn.red-bean.com/repos/sussman/software/subversion/ReposIndexer

Here's the README file:

----------

This is a proof-of-concept: it demonstrates how one can hook up a
text-indexing engine with a subversion repository.

Specifically, it connects the 'lupy' module
(http://divmod.org/projects/lupy) -- which is a python port of the
famous Lucene indexer, included in this package -- with calls to the
libsvn_fs python bindings.

---------------------------------------------------------------------
  DISCLAIMER: 'lupy' is now retired software.  You should be using
  'PyLunene' instead, located at http://pylucene.osafoundation.org/.
  Rumor is that it's very easy to convert a lupy application into a
  PyLucene one via simple search and replace.
---------------------------------------------------------------------

To try this demo:

1.  Make sure you have the subversion swig/python bindings installed.

    To verify this, enter the python interpreter and check that you
    can successfully run the command 'import svn.fs'.


2.  Create an index of a single revision (say, revision 1) by running
    the 'svn_index.py' script against some repository:

       $ ./svn_index.py /path/to/repos 1 myindex
       Indexing changed file: (1, /libsvn_delta/xml_parse.c)
        ...done.
       Indexing changed file: (1, /libsvn_delta/delta.h)
        ...done.
       Indexing changed file: (1, /libsvn_delta/path_driver.c)
        ...done.
       [...]

    This creates a directory 'myindex' containing indexed data of all
    the *.c and *.h changed paths in revision 1.  Ideally, we would
    want to index files that match other patterns.  And also, we'd
    probably want to index more than a single revision!


3.  Search the index for a term:

       $ ./svn_search.py "txdelta" myindex
       Found in (1, /libsvn_delta/compose_delta.c).

    Notice that each hit comes back as a (revision, path) pair.
    That's because the indexing script has declared each "key" to be
    of that form.


Presumably, one could develop this demo into a full-fledged
post-commit hook which indexes the changed paths of each newly created
revision, augmenting an ever-growing server-side index.  One could
also then write a nice CGI script to search the index.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org