You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Grant Ingersoll <gr...@yahoo.com> on 2006/02/23 04:02:16 UTC

Kinda running

Seems like there are some familiar names on this project, so I thought I would give things a poke around.  

So I can kinda fake it to get the admin app running, but I don't see what to do with the example.

Here's what I have done so far:
1. Installed JDK 1.5 (made it the default JVM on my Mac, otherwise you get XML exceptions when starting up)
2. Installed Tomcat 5.5.15
3. in solr-nightly:
4. ant dist-war
5. Copied this war to the webapps directory of Tomcat
6. Restarted tomcat
7.  ClassNotFoundException thrown.  Can't find sol(a)rconfig.xml.  
8.  Copied solr-nightly/example/conf to <tomcat>/webapps/solr-1.0/WEB-INF/classes so that it is in my classpath
9. Restart tomcat.  Can now browse to the Admin app.  woo-hoo.  It won't let me enable anything.

So, I presume I need to put the Example somewhere, but it is not obvious where this should be.  Or is the Example meant to be a standalone thing?

What can I do to help?

Thanks,
Grant
 

----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
Relax. Yahoo! Mail virus scanning helps detect nasty viruses!

Re: Kinda running

Posted by Yonik Seeley <ys...@gmail.com>.

Another cool thing you can do without any data in the index is go to
the admin pages and click on the analysis link.  Type in "text" for
the field name, and then type some text in the box to see the results
after each TokenFilter.


-Yonik

Re: XML Schema for schema.xml

Posted by Yonik Seeley <ys...@gmail.com>.

On 3/3/06, Grant Ingersoll <gr...@yahoo.com> wrote:
> We use Term Vectors quite a bit, in fact, I was thinking of having a go at a patch (so if you want to point me at where to begin)...

The schema should already parse and accept the following attributes on
either a fieldtype or field definition: "termVectors",
"termPositions", "termOffsets"  (these names are in FieldProperties).

SchemaField represents the <field> definitions in the schema.
FieldType represents the <fieldtype> definitions in the schema.

DocumentBuilder is used to build Lucene Documents, using
SchemaField.createField() to create the Field, which delegates to
FieldType.createField().

FieldType:  public Field createField(SchemaField field, String
externalVal, float boost) {
    String val = toInternal(externalVal);
    if (val==null) return null;
    Field f =  new Field(field.getName(), val, field.stored(),
field.indexed(), isTokenized());
    f.setOmitNorms(field.omitNorms());
    f.setBoost(boost);
    return f;
  }

SchemaField already has
public boolean storeTermVector() { return (properties & STORE_TERMVECTORS)!=0; }
public boolean storeTermPositions() { return (properties &
STORE_TERMPOSITIONS)!=0; }
public boolean storeTermOffsets() { return (properties &
STORE_TERMOFFSETS)!=0; }

So it's just a matter of setting the right properties on the Lucene
Field in FieldType.createField().

The harder part is figuring out what to do with TermVectors once they
are stored however... Right now, they won't be returned in the XML
response, you one would need to create a custom query handler to use
them.

> Other than that, I haven't delved into as deeply as I would like to at this point yet, but that is coming soon.

Super!

-Yonik

Re: XML Schema for schema.xml

Posted by Grant Ingersoll <gr...@yahoo.com>.

Great minds think alike :-)

Yeah, it is a bit eerie how similar they are, but I think they both go to solve a similar issue (mine started out with the desire to have only one Analyzer that I could configure with different filters, believe it or not, and grew from there).  The biggest difference that I see is that we are search engine agnostic (Lucene is but one implementation you could use), but there is no need for Solr to be that.

We use Term Vectors quite a bit, in fact, I was thinking of having a go at a patch (so if you want to point me at where to begin)...  Other than that, I haven't delved into as deeply as I would like to at this point yet, but that is coming soon.

Yonik Seeley <ys...@gmail.com> wrote: Grant, I just today got a chance to page through your ApacheCon Lucene
presentation.
I did a double-take when I paged across your "sample configuration" slide.
WIld how similar some of it looks to Solr's schema!

So since it seems like your stuff has it's own schema too, do you see
any features needed for Solr's schema?

-Yonik

=======From Gran's Presentation========
Declare a Tokenizer:
 
                  class="StandardTokenizerWrapper"/>
Declare a Token Filter:
 
stopFile="stopwords.dat"/>
Declare an Analyzer:
 
        test                    
standardTokenizer
        stop
 
Can also use existing Lucene Analyzers
==================================



----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
Yahoo! Mail
Bring photos to life! New PhotoMail  makes sharing a breeze.

Re: XML Schema for schema.xml

Posted by Yonik Seeley <ys...@gmail.com>.

Grant, I just today got a chance to page through your ApacheCon Lucene
presentation.
I did a double-take when I paged across your "sample configuration" slide.
WIld how similar some of it looks to Solr's schema!

So since it seems like your stuff has it's own schema too, do you see
any features needed for Solr's schema?

-Yonik

=======From Gran's Presentation========
Declare a Tokenizer:
	<tokenizer name="standardTokenizer"
                  class="StandardTokenizerWrapper"/>
Declare a Token Filter:
	<filter name="stop" class="StopFilterWrapper" 	ignoreCase="true"
stopFile="stopwords.dat"/>
Declare an Analyzer:
	<analyzer class="ConfigurableAnalyzer">
       	<name>test</name>           				 	   
<tokenizer>standardTokenizer</tokenizer>
       	<filter>stop</filter>
	</analyzer>
Can also use existing Lucene Analyzers
==================================

Re: XML Schema for schema.xml

Posted by Yonik Seeley <ys...@gmail.com>.

On 2/26/06, Grant Ingersoll <gr...@yahoo.com> wrote:
> Is there an XML schema available for schema.xml?  If not, what Java files are best to look at for understanding what options can be set?

There is currently no schema.
Everything useful should be in the example schema though.
For fields:
   <!-- Valid attributes for fields:
       name: mandatory - the name for the field
       type: mandatory - the name of a previously defined type from
the <types> section
       indexed: true if this field should be indexed (searchable)
       stored: true if this field should be retrievable
       multiValued: true if this field may contain multiple values per document
       omitNorms: (expert) set to true to omit the norms associated
with this field
                  (this disables length normalization and index-time
boosting for the field)
   -->

One little detail I left off is that you can set field attributes in
fieldtype definitions, and they will act as defaults for fields of
that type.
  <types>
     <fieldtype  name="mytype" class="solar.StrField" indexed="true"
stored="false" omitNorms="true"/>
  </types>
  <fields>
    <field name="myfield" type="mytype"/>
    <field name="myfield2" type="mytype" stored="true"/>  // override
the default
  </fields>


 "termVectors", "termPositions", "termOffsets" are also currently
accepted attributes, but they don't do anything yet.


For fieldtype definitions, if you create your own FieldType java class
for some reason, unused attributes will get passed through to your
init(), so it's currently kinda wide open.
See FieldType.setArgs()

Options on TokenizerFactory or TokenFilterFactory implementations are
all custom (see the java class file, or the docs here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

-Yonik

XML Schema for schema.xml

Posted by Grant Ingersoll <gr...@yahoo.com>.

Hi,

Is there an XML schema available for schema.xml?  If not, what Java files are best to look at for understanding what options can be set?

Thanks,
Grant

----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
 Yahoo! Mail
 Use Photomail to share photos without annoying attachments.

Re: Kinda running

Posted by Yonik Seeley <ys...@gmail.com>.

> For analyzers, they can be reused, so no factory is necessary.

To be more specific, I actually added a new comment in the example
schema.xml that shows how:

    <!-- One could also specify an existing Analyzer implementation in Java
         via the class attribute on the analyzer element:
    <fieldtype name="text_lu" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.snowball.SnowballAnalyzer"/>
    </fieldType>
    -->

In general, my preference would be to have flexible
TokenFilterFactories that people can combine to create their custom
analyzers, rather than specifying a single java Analyzer class.  But
one size doesn't fit all, hence the ability still exists.

Re: Kinda running

Posted by Yonik Seeley <ys...@gmail.com>.

On 2/23/06, Grant Ingersoll <gr...@yahoo.com> wrote:

> I think the obvious things are needed at this point, namely, the tutorial, which takes you through installation and your first index/setup and search using a XML request and not the Admin gui.
>

Agree.  That's why there hasn't been a public announcement to
java-user@lucene yet.
A little tutorial, and a feature list.

> The schema.xml is pretty well documented, although I am assumming it doesn't cover all possibilities.

Right.  See
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for a more exhaustive list of options on current TokenFilterFactorties.

> Is there documentation on how to submit a search request (not using the Admin)?  I did this: curl http://localhost:9080/solr-1.0/select?q=Foo&start=0&rows=10&version=2.0
>  to submit a "select" and got results back.  Woo hoo!  What other options are there?

Click on [FULL INTERFACE] above the search box on the admin page and
you will get more options.  Then try the options and see the
translation in the address bar.

You can also specify sort options in the query string.  Example:
text:foo; price desc, score asc

> If I am understanding this correctly, I would do the following for my own app:
> 1. Create a solrconfig.xml and schema.xml.  The schema contains a description of my fields and how I want them to be indexed  Are all the Lucene Analyzers/filters available?

Sort of.  To avoid creation by reflection for each instance of a
TokenFilter, factories are used instead (that also allows complex
configuration beforehand, like reading external files).  So if you
want to use a filter that we don't have support for yet, you need to
make a factory for it.

For analyzers, they can be reused, so no factory is necessary.

>  I presume I can plugin my own, right?  The class fields seem to omit the org.apache part, I presume it is implied.  Does that mean I can't define my own?

You may use your own.  Just specify the full package name.

> 2. Package them up in the WAR and deploy it.
> 3. Create my index as in the Collection Building URL, which isn't totally clear to me yet, but I haven't put much time in to it
> 4. Run my commands, searching, deleting, etc.
>
> Replication/caching is another day...

Small caches are already configured by default... see solrconfig.xml
and the statistics off the admin page.

Replication is indeed for another day... the scripts need some
tweaking to work outside of the CNET environment.  It's not top
priority now though, since I assume no one is going to production in
the next week with Solr ;-)

-Yonik

Re: Kinda running

Posted by Yonik Seeley <ys...@gmail.com>.

The current (>=24th) nightly build should have some example docs and a
slightly changed schema for some random electronic products.

If you download the nightly, you can run it directly from the example
directory since it contains a bundled appserver.  Just do java -jar
start.jar
Then cd to exampledocs and "post.sh *.xml" to add the docs.

Check out the statistics admin page to see numDocs, maxDocs, etc.

The bundled appserver is currently Jetty, but it seems support is
currently split between Tomcat vs Jetty.

-Yonik

Re: Kinda running

Posted by Grant Ingersoll <gr...@yahoo.com>.

Success!

Chris Hostetter <ho...@fucit.org> wrote: 
: Using the example schema, something like this should work...
:
: curl http://:
/update --data-binary '1Foo Bar'
: curl http://:
/update --data-binary '2Yabba Dabba'
: curl http://:
/update --data-binary '3Yacko Bar'
: curl http://:
/update --data-binary '4Foo Dabba'
: curl http://:
/update --data-binary ''

whoops ... i didn't notice the example schema has "text" populated using
copyField.  Adding some sample text to "title" or "body" will work better.


-Hoss









----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
 Yahoo! Autos. Looking for a sweet ride? Get pricing, reviews, & more on new and used cars.

Re: Kinda running

Posted by Chris Hostetter <ho...@fucit.org>.

: Using the example schema, something like this should work...
:
: curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">1</field><field name="text">Foo Bar</field></doc></add>'
: curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">2</field><field name="text">Yabba Dabba</field></doc></add>'
: curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">3</field><field name="text">Yacko Bar</field></doc></add>'
: curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">4</field><field name="text">Foo Dabba</field></doc></add>'
: curl http://<hostname>:<port>/update --data-binary '<commit/>'

whoops ... i didn't notice the example schema has "text" populated using
copyField.  Adding some sample text to "title" or "body" will work better.


-Hoss

Re: Kinda running

Posted by Grant Ingersoll <gr...@yahoo.com>.

Chris Hostetter <ho...@fucit.org> wrote:

: What can I do to help?

Yonik is doing most of the work at the moment, so he's the best person
to answer that question, but I'm thinking the biggest way you can help is
being a guinea pig: as someone with prior lucene knowledge but no prior
knowledge of Solr, what are your impressions with the website/FAQ/Wiki?
what would you like to see in the tutorial/example index?

I think the obvious things are needed at this point, namely, the tutorial, which takes you through installation and your first index/setup and search using a XML request and not the Admin gui.

The schema.xml is pretty well documented, although I am assumming it doesn't cover all possibilities.

Is there documentation on how to submit a search request (not using the Admin)? I did this: curl http://localhost:9080/solr-1.0/select?q=Foo&start=0&rows=10&version=2.0
to submit a "select" and got results back. Woo hoo! What other options are there?

If I am understanding this correctly, I would do the following for my own app:
1. Create a solrconfig.xml and schema.xml. The schema contains a description of my fields and how I want them to be indexed Are all the Lucene Analyzers/filters available? I presume I can plugin my own, right? The class fields seem to omit the org.apache part, I presume it is implied. Does that mean I can't define my own?
2. Package them up in the WAR and deploy it.
3. Create my index as in the Collection Building URL, which isn't totally clear to me yet, but I haven't put much time in to it
4. Run my commands, searching, deleting, etc.

Replication/caching is another day...

-Grant

----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com

---------------------------------
Brings words and photos together (easily) with
PhotoMail - it's free and works with Yahoo! Mail.

Re: Kinda running

Posted by Chris Hostetter <ho...@fucit.org>.

: Seems like there are some familiar names on this project, so I thought I
: would give things a poke around.

Ack! .. we've been discovered! :)

: So I can kinda fake it to get the admin app running, but I don't see
: what to do with the example.

The example isn't really "populated" yet, but if you've got the admin
screen up and running, then you're 90% of the way there.

Take a look at this Wiki page, and see if you can get some adds and a
commit to work using "curl", then try doing a search from the admin
screen.

Using the example schema, something like this should work...

curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">1</field><field name="text">Foo Bar</field></doc></add>'
curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">2</field><field name="text">Yabba Dabba</field></doc></add>'
curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">3</field><field name="text">Yacko Bar</field></doc></add>'
curl http://<hostname>:<port>/update --data-binary '<add><doc><field name="id">4</field><field name="text">Foo Dabba</field></doc></add>'
curl http://<hostname>:<port>/update --data-binary '<commit/>'

: What can I do to help?

Yonik is doing most of the work at the moment, so he's the best person
to answer that question, but I'm thinking the biggest way you can help is
being a guinea pig: as someone with prior lucene knowledge but no prior
knowledge of Solr, what are your impressions with the website/FAQ/Wiki?
what would you like to see in the tutorial/example index?


-Hoss