You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Smith <ss...@mainstreamdata.com> on 2012/11/13 23:41:18 UTC

Custom Solr indexer/searcher

Suppose I have a special data search type (something different than a string or numeric value) that I want to integrate into the Solr server.  For example, suppose I wanted to implement a KD-tree as a filter that would integrate with standard Solr filters and queries.  I might want to say "find all of the documents in the index with the word 'tree' in them that are within a certain distance of a particular document in the KD-tree".  Let me add that I'm not really looking for a KD-Tree implementation for Solr; I just assume that a fair number of people will know what a KD-tree is and so, have some idea that I'm talking about adding a new data type (different than string, long, etc.) that Solr will need to be able to index and search with.  It's important that the new data type should integrate with the existing standard Solr data types for searching purposes.

First, is there a way to build and specify a plugin that provides Solr both the indexer and search interfaces and therefore hides the internal details of what's going on in the search from Solr so it just thinks it's another search type?  Or, would I have to hack Solr in a lot of places to add my custom data type in?

Second, if the interface(s) exists to add in a new data type, is there documentation (tutorial, examples, etc.) anywhere on how to do this.  Or, is my only option to dig into the Solr code?

Mostly, I'm looking for some links or suggestions on where to start looking.  I doubt this subject is simple enough to fit into an email post (though I'd be happy to be surprised :) ).  You can assume Solr 4.0 if that makes things easier.  You can also assume that I have some familiarity with Lucene (though I haven't hacked that code either).

Hopefully, I've explained this well enough so that people know what I'm looking for.

Cheers

Scott


Re: Custom Solr indexer/searcher

Posted by "Smiley, David W." <ds...@mitre.org>.
FWIW I helped someone a few days ago about a similar problem and similarly advised modifying SpatialPrefixTree:
http://lucene.472066.n3.nabble.com/PointType-multivalued-query-tt4020445.html

IMO GeoHashField should be deprecated because it ads no value.

~ David

On Nov 16, 2012, at 1:49 PM, Scott Smith wrote:

> Thanks for the suggestions.  I'll take a look at these things.
> 
> -----Original Message-----
> From: Mikhail Khludnev [mailto:mkhludnev@griddynamics.com] 
> Sent: Thursday, November 15, 2012 11:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Custom Solr indexer/searcher
> 
> Scott,
> It sounds like you need to look into few samples of similar things in Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to the given in FST for query expansion. Generic query expansion is done via MultiTermQuery. Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it should match with your goal a lot). All these are single dimension samples, but AFAIK KD-tree is multidimensional, look into GeoHashField which puts two dimensional points into single terms with ability to build ranges on them see GeoHashField.createSpatialQuery().
> 
> Happy hacking!
> 
> 
> On Fri, Nov 16, 2012 at 10:34 AM, John Whelan <wh...@gmail.com> wrote:
> 
>> Scott,
>> 
>> I probably have no idea as to what I'm saying, but if you're looking 
>> for finding results in a N-dimensional space, you might look at 
>> creating a field of type 'point'. Point-type fields have a dimension 
>> attribute; I believe that it can be set to a large integer value.
>> 
>> Barring that, there is also a 'dist()' function that can be used to 
>> work with multiple numeric fields in order sort results based on 
>> closeness to a desired coordinate. The 'dist function takes a 
>> parameter to specify the means of calculating the distance. (For example, 2 -> 'Euclidean distance'.
>> I don't know the other options.)
>> 
>> In the worst case, my response is worthless, but pops your question 
>> back up in the e-mails...
>> 
>> Regards,
>> John
>> 
> 
> 
> 
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>


RE: Custom Solr indexer/searcher

Posted by Scott Smith <ss...@mainstreamdata.com>.
Thanks for the suggestions.  I'll take a look at these things.

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhludnev@griddynamics.com] 
Sent: Thursday, November 15, 2012 11:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Custom Solr indexer/searcher

Scott,
It sounds like you need to look into few samples of similar things in Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to the given in FST for query expansion. Generic query expansion is done via MultiTermQuery. Index time terms expansion is shown in TrieField and btw NumericRangeQuery (it should match with your goal a lot). All these are single dimension samples, but AFAIK KD-tree is multidimensional, look into GeoHashField which puts two dimensional points into single terms with ability to build ranges on them see GeoHashField.createSpatialQuery().

Happy hacking!


On Fri, Nov 16, 2012 at 10:34 AM, John Whelan <wh...@gmail.com> wrote:

> Scott,
>
> I probably have no idea as to what I'm saying, but if you're looking 
> for finding results in a N-dimensional space, you might look at 
> creating a field of type 'point'. Point-type fields have a dimension 
> attribute; I believe that it can be set to a large integer value.
>
> Barring that, there is also a 'dist()' function that can be used to 
> work with multiple numeric fields in order sort results based on 
> closeness to a desired coordinate. The 'dist function takes a 
> parameter to specify the means of calculating the distance. (For example, 2 -> 'Euclidean distance'.
> I don't know the other options.)
>
> In the worst case, my response is worthless, but pops your question 
> back up in the e-mails...
>
> Regards,
> John
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Custom Solr indexer/searcher

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Scott,
It sounds like you need to look into few samples of similar things in
Lucene. On top of my head FuzzyQuery from 4.0, which finds terms similar to
the given in FST for query expansion. Generic query expansion is done via
MultiTermQuery. Index time terms expansion is shown in TrieField and btw
NumericRangeQuery (it should match with your goal a lot). All these are
single dimension samples, but AFAIK KD-tree is multidimensional, look into
GeoHashField which puts two dimensional points into single terms with
ability to build ranges on them see GeoHashField.createSpatialQuery().

Happy hacking!


On Fri, Nov 16, 2012 at 10:34 AM, John Whelan <wh...@gmail.com> wrote:

> Scott,
>
> I probably have no idea as to what I'm saying, but if you're looking for
> finding results in a N-dimensional space, you might look at creating a
> field of type 'point'. Point-type fields have a dimension attribute; I
> believe that it can be set to a large integer value.
>
> Barring that, there is also a 'dist()' function that can be used to work
> with multiple numeric fields in order sort results based on closeness to a
> desired coordinate. The 'dist function takes a parameter to specify the
> means of calculating the distance. (For example, 2 -> 'Euclidean distance'.
> I don't know the other options.)
>
> In the worst case, my response is worthless, but pops your question back up
> in the e-mails...
>
> Regards,
> John
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Custom Solr indexer/searcher

Posted by John Whelan <wh...@gmail.com>.
Scott,

I probably have no idea as to what I'm saying, but if you're looking for
finding results in a N-dimensional space, you might look at creating a
field of type 'point'. Point-type fields have a dimension attribute; I
believe that it can be set to a large integer value.

Barring that, there is also a 'dist()' function that can be used to work
with multiple numeric fields in order sort results based on closeness to a
desired coordinate. The 'dist function takes a parameter to specify the
means of calculating the distance. (For example, 2 -> 'Euclidean distance'.
I don't know the other options.)

In the worst case, my response is worthless, but pops your question back up
in the e-mails...

Regards,
John