You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Yonik Seeley <ys...@gmail.com> on 2006/03/03 19:07:55 UTC

Re: XML Schema for schema.xml

Grant, I just today got a chance to page through your ApacheCon Lucene
presentation.
I did a double-take when I paged across your "sample configuration" slide.
WIld how similar some of it looks to Solr's schema!

So since it seems like your stuff has it's own schema too, do you see
any features needed for Solr's schema?

-Yonik

=======From Gran's Presentation========
Declare a Tokenizer:
	<tokenizer name="standardTokenizer"
                  class="StandardTokenizerWrapper"/>
Declare a Token Filter:
	<filter name="stop" class="StopFilterWrapper" 	ignoreCase="true"
stopFile="stopwords.dat"/>
Declare an Analyzer:
	<analyzer class="ConfigurableAnalyzer">
       	<name>test</name>           				 	   
<tokenizer>standardTokenizer</tokenizer>
       	<filter>stop</filter>
	</analyzer>
Can also use existing Lucene Analyzers
==================================

Re: XML Schema for schema.xml

Posted by Yonik Seeley <ys...@gmail.com>.

On 3/3/06, Grant Ingersoll <gr...@yahoo.com> wrote:
> We use Term Vectors quite a bit, in fact, I was thinking of having a go at a patch (so if you want to point me at where to begin)...

The schema should already parse and accept the following attributes on
either a fieldtype or field definition: "termVectors",
"termPositions", "termOffsets"  (these names are in FieldProperties).

SchemaField represents the <field> definitions in the schema.
FieldType represents the <fieldtype> definitions in the schema.

DocumentBuilder is used to build Lucene Documents, using
SchemaField.createField() to create the Field, which delegates to
FieldType.createField().

FieldType:  public Field createField(SchemaField field, String
externalVal, float boost) {
    String val = toInternal(externalVal);
    if (val==null) return null;
    Field f =  new Field(field.getName(), val, field.stored(),
field.indexed(), isTokenized());
    f.setOmitNorms(field.omitNorms());
    f.setBoost(boost);
    return f;
  }

SchemaField already has
public boolean storeTermVector() { return (properties & STORE_TERMVECTORS)!=0; }
public boolean storeTermPositions() { return (properties &
STORE_TERMPOSITIONS)!=0; }
public boolean storeTermOffsets() { return (properties &
STORE_TERMOFFSETS)!=0; }

So it's just a matter of setting the right properties on the Lucene
Field in FieldType.createField().

The harder part is figuring out what to do with TermVectors once they
are stored however... Right now, they won't be returned in the XML
response, you one would need to create a custom query handler to use
them.

> Other than that, I haven't delved into as deeply as I would like to at this point yet, but that is coming soon.

Super!

-Yonik

Re: XML Schema for schema.xml

Posted by Grant Ingersoll <gr...@yahoo.com>.

Great minds think alike :-)

Yeah, it is a bit eerie how similar they are, but I think they both go to solve a similar issue (mine started out with the desire to have only one Analyzer that I could configure with different filters, believe it or not, and grew from there).  The biggest difference that I see is that we are search engine agnostic (Lucene is but one implementation you could use), but there is no need for Solr to be that.

We use Term Vectors quite a bit, in fact, I was thinking of having a go at a patch (so if you want to point me at where to begin)...  Other than that, I haven't delved into as deeply as I would like to at this point yet, but that is coming soon.

Yonik Seeley <ys...@gmail.com> wrote: Grant, I just today got a chance to page through your ApacheCon Lucene
presentation.
I did a double-take when I paged across your "sample configuration" slide.
WIld how similar some of it looks to Solr's schema!

So since it seems like your stuff has it's own schema too, do you see
any features needed for Solr's schema?

-Yonik

=======From Gran's Presentation========
Declare a Tokenizer:
 
                  class="StandardTokenizerWrapper"/>
Declare a Token Filter:
 
stopFile="stopwords.dat"/>
Declare an Analyzer:
 
        test                    
standardTokenizer
        stop
 
Can also use existing Lucene Analyzers
==================================



----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze.