You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by "Alan Williamson (aw2.0 cloud experts)" <al...@aw20.co.uk> on 2011/09/07 15:03:38 UTC

Analyzer for code?

Good Afternoon.

We find ourselves indexing blocks of literal Java / CFML code.   We are 
using the standardanalyzer but it seems to be a little keen with respect 
to stop words.   Has anyone come across an Analyzer designed for code?

thanks

a
http://alan.blog-city.com/

Re: Analyzer for code?

Posted by Erik Hatcher <er...@gmail.com>.
You can customize the stop word list on the analyzer (see the various constructors).

However, with code one generally wants it to be "parsed" and split into separate fields, not just tokenized/analyzed into a single field.  For example, you might want the class names separately searchable from the inner code. 

I've written, in the past, a Java doclet hook to do this sort of thing, letting the javadoc engine do the heavy lifting.

	Erik


On Sep 7, 2011, at 09:03 , Alan Williamson (aw2.0 cloud experts) wrote:

> Good Afternoon.
> 
> We find ourselves indexing blocks of literal Java / CFML code.   We are using the standardanalyzer but it seems to be a little keen with respect to stop words.   Has anyone come across an Analyzer designed for code?
> 
> thanks
> 
> a
> http://alan.blog-city.com/


Re: Analyzer for code?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Alan, if you have Lucene in Action 2, there is a case study that describes how Krugle did this.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Alan Williamson (aw2.0 cloud experts) <al...@aw20.co.uk>
>To: "general@lucene.apache.org" <ge...@lucene.apache.org>
>Sent: Wednesday, September 7, 2011 9:03 AM
>Subject: Analyzer for code?
>
>Good Afternoon.
>
>We find ourselves indexing blocks of literal Java / CFML code.   We are using the standardanalyzer but it seems to be a little keen with respect to stop words.   Has anyone come across an Analyzer designed for code?
>
>thanks
>
>a
>http://alan.blog-city.com/
>
>
>