You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Cory Ondrejka <co...@gmail.com> on 2009/11/14 18:57:56 UTC

converting over from sphinx

I've been using Sphinx for full text search, but since I want to move my
project over to Heroku, need to switch to Solr. Everything's up and running
using the acts_as_solr plugin, but I'm curious if I'm using Solr the right
way.  In particular, I'm doing phrase searching into a corpus of
descriptions, such as "I need help with a foo" where I have a bunch of "foo:
a foo is a subset of a bar often used to create briznatzes", etc.

With Sphinx, I could convert "I need help with a foo" into "*need* *help*
*with* *foo*" and get pretty nice matches. With Solr, my understanding is
that you can only do wildcard matches on the suffix. In addition, stemming
only happens on non-wildcard terms. So, my first thought would be to convert
"I need help with a foo" into "need need* help help* with with* foo foo*".

Thanks in advance for any help.

-- 
Cory Ondrejka
cory.ondrejka@gmail.com
http://ondrejka.net/

Re: converting over from sphinx

Posted by Chris Hostetter <ho...@fucit.org>.
: way.  In particular, I'm doing phrase searching into a corpus of
: descriptions, such as "I need help with a foo" where I have a bunch of "foo:
: a foo is a subset of a bar often used to create briznatzes", etc.
: 
: With Sphinx, I could convert "I need help with a foo" into "*need* *help*
: *with* *foo*" and get pretty nice matches. With Solr, my understanding is
: that you can only do wildcard matches on the suffix. In addition, stemming
: only happens on non-wildcard terms. So, my first thought would be to convert
: "I need help with a foo" into "need need* help help* with with* foo foo*".

First off, we need to make sure we have all our terminology in sync -- i'm 
not very familiar with Sphinx, so i'm not sure what types of vernacular 
are used there to describe various things, but in Solr/Lucene you have 
options regarding how you want text to be "analyzed" when it's indexed -- 
this analysis is what converts an arbitrary stream of characters into 
"Terms" that get indexed.  at query time, it's very easy to match on 
terms, or boolean combinations of terms, and sequential phrases of terms 
-- you only need wildcard type functionality if you want to provide a 
wildcard expression that could match more then one individual term.

In your specific example, if you just configured a basic wildcard 
tokenizer when you indexed your documents (ie: "foo: a foo is a subset of 
a bar often used to create briznatzes") then at query time any of the 
individual words ("foo", "bar", etc...) would match that document.  
likewise a phrase query like "need help with foo" would match that text if 
you defined some stop words (like "need" and "with") and specified a small 
amount of slop on your phrase queries.


The point is: there are a lot of differnet ways to use Solr, and the 
terminology you are use to with Sphinx may not map exactly to some of the 
terminology you'll see in the SOlr docs/configs -- so please feel free to 
ask.

-Hoss


Re: converting over from sphinx

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Something doesn't sound right here.  Why do you need wildcards for queries in the first place?
Are you finding that with stopword removal and stemming you are not matching some docs that you think should be matched?  If so, we may be able to help if you provide a few examples.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Cory Ondrejka <co...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Sat, November 14, 2009 12:57:56 PM
> Subject: converting over from sphinx
> 
> I've been using Sphinx for full text search, but since I want to move my
> project over to Heroku, need to switch to Solr. Everything's up and running
> using the acts_as_solr plugin, but I'm curious if I'm using Solr the right
> way.  In particular, I'm doing phrase searching into a corpus of
> descriptions, such as "I need help with a foo" where I have a bunch of "foo:
> a foo is a subset of a bar often used to create briznatzes", etc.
> 
> With Sphinx, I could convert "I need help with a foo" into "*need* *help*
> *with* *foo*" and get pretty nice matches. With Solr, my understanding is
> that you can only do wildcard matches on the suffix. In addition, stemming
> only happens on non-wildcard terms. So, my first thought would be to convert
> "I need help with a foo" into "need need* help help* with with* foo foo*".
> 
> Thanks in advance for any help.
> 
> -- 
> Cory Ondrejka
> cory.ondrejka@gmail.com
> http://ondrejka.net/