You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lee Goddard <le...@gmail.com> on 2011/01/19 09:22:51 UTC

Solr with Unknown Lucene Index?

I have to use some Lucene indexes, and Solr looks like the perfect 
solution.

However, all I know about the Lucene indexes are what Luke tells me, and 
simply setting the schema to represent all fields as text does not seem 
to be working -- though as this is my first Solr, I am not sure if that 
is due to some other issue.

Is there some way to ascertain how the Solr schema should describe the 
Lucene fields?

Many thanks in anticipation
Lee

Re: Solr with Unknown Lucene Index?

Posted by Chris Hostetter <ho...@fucit.org>.
: Having found some code that searches a Lucene index, the only analyzers
: referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.
: 
: How can I map this is Solr? The example schema doesn't seem to mention this,
: and specifying 'text' or 'string' for every field doesn't seem to help.

1) that analyzer seems to be a Lucene.Net analyzer, so the java equivilent 
would be org.apache.lucene.analsys.standard.StandardAnalyzer

2) the example schema.xml demonstrates how to use an existing Analyzer 
implementation...

    <!-- One can also specify an existing Analyzer class that has a
         default constructor via the class attribute on the analyzer element
    <fieldType name="text_greek" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
    </fieldType>
    -->

3) i'm getting the sense from your comments that you aren't very familiar 
with lucene/solr in general.  An important thing to understand is that 
just because the code that created the index only ever uses 
"StandardAnalyzer" doens't mean it will make sense to use that analyzer on 
every field when attempting to search that field from solr -- some fields 
may have been indexed w/o using any analysis, some may be numeric fields 
with special encoding, some may be compressed, etc...

trying to reverse engineer what the schema should look like to open any 
arbitrary index requires a lot of understanding about how that index was 
built -- it's easy to just "dump the terms" found in an index w/o knowing 
anything about where those terms came fom (that's what Luke does) but that 
doens't help your recognize things like "this list of X words were treated 
as stop words, and don't appera in the index, so my query analyzer needs 
to be configured with those same X words"

In short: you can eaisly make solr *read* the index (just like luke) but 
that won't neccessarily help you *use* the index in a meaninigful way.

-Hoss

Re: Solr with Unknown Lucene Index?

Posted by Lee Goddard <le...@gmail.com>.
Having found some code that searches a Lucene index, the only analyzers 
referenced are Lucene.Net.Analysis.Standard.StandardAnalyzer.

How can I map this is Solr? The example schema doesn't seem to mention 
this, and specifying 'text' or 'string' for every field doesn't seem to 
help.

Thanks
Lee

On 22/01/2011 21:50, Erick Erickson wrote:
> Sorry, I was out of town for a while. Luke just reads stuff, it 
> doesn't try to interpret any schema.
> Solr makes certain assumptions about what *should* be in the index 
> based on the schema.
> So getting Solr to just use a Lucene index really involves knowing 
> that Lucene used, say,
> a StandardAnalyzer followed by a LowerCaseFilter followed by for some 
> field.... And there's
> no way I know of to find that information out from a raw Lucene index.
>
> If you don't get things to match, your results will...er...vary. But 
> perhaps you can guess
> well enough to make it work, although upgrading will be a problem.
>
> I really think your effort would be best spent finding the original 
> indexing or querying
> code if at all possible and seeing the way that code defined the 
> analysis chain (in the
> code) for each fields and using that as a basis for creating a "close 
> enough" schema.
>
>
> Best
> Erick
>
> On Thu, Jan 20, 2011 at 3:59 AM, Lee Goddard <leegee@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thanks, Erick. I think my question comes down to, 'how does Luke
>     know how to read the indexes?' I will try the Luke mailing list.
>
>     Cheers
>     Lee
>
>
>     On 19/01/2011 17:49, Erick Erickson wrote:
>>     I don't really think this is possible/reasonable. There's nothing
>>     fixed about
>>     a Lucene index, you could index a field in different documents
>>     with any
>>     number of analysis chains. The tricky part here will, as you've
>>     discovered,
>>     find a way to match the Solr schema "closely enough" to get your
>>     desired
>>     results.
>>
>>     Are you sure there's no way to re-index the data? Or find the
>>     original code
>>     that indexed it?
>>
>>     Best
>>     Erick
>>
>>     On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard <leegee@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>         I have to use some Lucene indexes, and Solr looks like the
>>         perfect solution.
>>
>>         However, all I know about the Lucene indexes are what Luke
>>         tells me, and simply setting the schema to represent all
>>         fields as text does not seem to be working -- though as this
>>         is my first Solr, I am not sure if that is due to some other
>>         issue.
>>
>>         Is there some way to ascertain how the Solr schema should
>>         describe the Lucene fields?
>>
>>         Many thanks in anticipation
>>         Lee
>>
>>
>

Re: Solr with Unknown Lucene Index?

Posted by Erick Erickson <er...@gmail.com>.
I don't really think this is possible/reasonable. There's nothing fixed
about
a Lucene index, you could index a field in different documents with any
number of analysis chains. The tricky part here will, as you've discovered,
find a way to match the Solr schema "closely enough" to get your desired
results.

Are you sure there's no way to re-index the data? Or find the original code
that indexed it?

Best
Erick

On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard <le...@gmail.com> wrote:

> I have to use some Lucene indexes, and Solr looks like the perfect
> solution.
>
> However, all I know about the Lucene indexes are what Luke tells me, and
> simply setting the schema to represent all fields as text does not seem to
> be working -- though as this is my first Solr, I am not sure if that is due
> to some other issue.
>
> Is there some way to ascertain how the Solr schema should describe the
> Lucene fields?
>
> Many thanks in anticipation
> Lee
>