You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2006/08/29 04:25:31 UTC

acts_as_solr

I've spent a few hours tinkering with an Ruby ActiveRecord plugin to  
index, delete, and search models fronted by a database into Solr.   
The results are are

$ script/console
 >> Book.new(:title => "Solr in Action", :author => "Yonik & Hoss").save
=> true
 >> Book.new(:title => "Lucene in Action", :author => "Otis &  
Erik").save
=> true
 >> action_books = Book.find_by_solr("action")
=> [#<Book:0x2406db0 @attributes={"title"=>"Solr in Action",  
"author"=>"Yonik & Hoss", "id"=>"21"}>, #<Book:0x2406d74 @attributes= 
{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]
 >> action_books = Book.find_by_solr("actions")  # to show stemming
=> [#<Book:0x279ebbc @attributes={"title"=>"Solr in Action",  
"author"=>"Yonik & Hoss", "id"=>"21"}>, #<Book:0x279eb80 @attributes= 
{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]
 >> Book.find_by_solr("yonik OR otis") # to show QueryParser boolean  
expressions
=> [#<Book:0x2793adc @attributes={"title"=>"Solr in Action",  
"author"=>"Yonik & Hoss", "id"=>"21"}>, #<Book:0x2793aa0 @attributes= 
{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]

My model looks like this:

   class Book < ActiveRecord::Base
     acts_as_solr
   end

(ain't ActiveRecord slick?!)

acts_as_solr adds save and destroy hooks.  All model attributes are  
sent to Solr like this:

 >> action_books[0].to_solr_doc.to_s
=> "<doc><field name='id'>Book:21</field><field name='type'>Book</ 
field><field name='pk'>21</field><field name='title_t'>Solr in  
Action</field><field name='author_t'>Yonik &amp; Hoss</field></doc>"

The Solr id is <model_name>:<primary_key> formatted, type field is  
the model name and AND'd to queries to narrow them to the requesting  
model, the pk field is the primary key of the database table, and the  
rest of the attributes are named with an _t suffix to leverage the  
dynamic field capability.  All _t fields are copied into the default  
search field of "text".

At this point it is extremely basic, no configurability, and there  
are lots of issues to address to flesh this into something robustly  
general purpose.  But as a proof-of-concept I'm pleased at how easy  
it was to write this hook.

I'd like to commit this to the Solr repository.  Any objections?   
Once committed, folks will be able to use "script/plugin install ..."  
to install the Ruby side of things, and using a binary distribution  
of Solr's example application and a custom solr/conf directory (just  
for schema.xml) they'd be up and running quite quickly.  If ok to  
commit, what directory should I put things under?  How about just  
"ruby"?

I currently do not foresee having a lot of time to spend on this, but  
I do feel quite strongly that having an "acts_as_solr" hook into  
ActiveRecord will really lure in a lot of Rails developers.  I'm sure  
there will be plenty that will not want a hybrid Ruby/Java  
environment, and for them there is the ever improving Ferret  
project.  Ferret, however, would still need layers added on top of it  
to achieve all that Solr provides, so Solr is where I'm at now.   
Despite my time constraints, I'm volunteering to bring this prototype  
to a documented and easily usable state, and manage patches submitted  
by savvy users to make it robust.

Thoughts?

	Erik

p.s. And for the really die-hard bleeding edgers, the complete  
acts_as_solr code is pasted below which you can put into a Rails  
project in vendor/plugins/acts_as_solr.rb, along with a simple one- 
line require 'acts_as_solr' init.rb in vendor/plugins.  Sheepishly,  
here's the hackery....

--------
require 'active_record'
require 'rexml/document'
require 'net/http'


def post_to_solr(body, mode = :search)
   url = URI.parse("http://localhost:8983")
   post = Net::HTTP::Post.new(mode == :search ? "/solr/select" : "/ 
solr/update")
   post.body = body
   post.content_type = 'application/x-www-form-urlencoded'
   response = Net::HTTP.start(url.host, url.port) do |http|
     http.request(post)
   end
   return response.body
end

module SolrMixin
   module Acts #:nodoc:
     module ARSolr #:nodoc:

       def self.included(base)
         base.extend(ClassMethods)
       end

       module ClassMethods

         def acts_as_solr(options={}, solr_options={})
#          configuration = {}
#          solr_configuration = {}
#          configuration.update(options) if options.is_a?(Hash)

#          solr_configuration.update(solr_options) if  
solr_options.is_a?(Hash)

           after_save :solr_save
           after_destroy :solr_destroy
           include SolrMixin::Acts::ARSolr::InstanceMethods
         end


         def find_by_solr(q, options = {}, find_options = {})
           q = "(#{q}) AND type:#{self.name}"
           response = post_to_solr("q=#{ERB::Util::url_encode(q)} 
&wt=ruby&fl=pk")
           data = eval(response)
           docs = data['response']['docs']
           return [] if docs.size == 0

           ids = docs.collect {|doc| doc['pk']}
           conditions = [ "#{self.table_name}.id in (?)", ids ]
           result = self.find(:all,
                              :conditions => conditions)
         end
       end

       module InstanceMethods
         def solr_id
           "#{self.class.name}:#{self.id}"
         end

         def solr_save
           logger.debug "solr_save: #{self.class.name} : #{self.id}"

           xml = REXML::Element.new('add')
           xml.add_element to_solr_doc
           response = post_to_solr(xml.to_s, :update)
           solr_commit
           true
         end

         # remove from index
         def solr_destroy
           logger.debug "solr_destroy: #{self.class.name} : #{self.id}"
           post_to_solr("<delete><id>#{solr_id}</id></delete>", :update)
           solr_commit
           true
         end

         def solr_commit
           post_to_solr('<optimize waitFlush="false"  
waitSearcher="false"/>', :update)
         end

         # convert instance to Solr document
         def to_solr_doc
           logger.debug "to_doc: creating doc for class: # 
{self.class.name}, id: #{self.id}"
           doc = REXML::Element.new('doc')

           # Solr id is <classname>:<id> to be unique across all models
           doc.add_element field("id", solr_id)
           doc.add_element field("type", self.class.name)
           doc.add_element field("pk", self.id.to_s)

           # iterate through the fields and add them to the document
           self.attributes.each_pair do |key,value|
             # _t is appended as a dynamic "text" field for Solr
             doc.add_element field("#{key}_t", value.to_s) unless  
key.to_s == "id"
           end
           return doc
         end

         def field(name, value)
           field = REXML::Element.new("field")
           field.add_attribute("name", name)
           field.add_text(value)

           field
         end

       end
     end
   end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
   include SolrMixin::Acts::ARSolr
end




RE: acts_as_solr

Posted by Brian Lucas <bl...@gmail.com>.
Either Chris or Erik has been in contact with the author of this project,
IIRC.

-----Original Message-----
From: Kevin Lewandowski [mailto:kevinsl@gmail.com] 
Sent: Wednesday, August 30, 2006 1:42 PM
To: solr-user@lucene.apache.org
Subject: Re: acts_as_solr

You might want to look at acts_as_searchable for Ruby:
http://rubyforge.org/projects/ar-searchable

That's a similar plugin for the Hyperestraier search engine using its
REST interface.

On 8/28/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I've spent a few hours tinkering with an Ruby ActiveRecord plugin to
> index, delete, and search models fronted by a database into Solr.


Re: acts_as_solr

Posted by Kevin Lewandowski <ke...@gmail.com>.
You might want to look at acts_as_searchable for Ruby:
http://rubyforge.org/projects/ar-searchable

That's a similar plugin for the Hyperestraier search engine using its
REST interface.

On 8/28/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I've spent a few hours tinkering with an Ruby ActiveRecord plugin to
> index, delete, and search models fronted by a database into Solr.

Re: acts_as_solr

Posted by Bill Au <bi...@gmail.com>.
Currently src is all server specific and I would rather see it kept that
way.
I am OK with either /client or /contrib.

Bill

On 8/29/06, Brian Lucas <bl...@gmail.com> wrote:
>
> Let's create it as a top-level directory solely because it might give
> people
> a small head-start in SOLR evaluation and getting things off the ground
> (less navigation around the tree to get started).  If there are any
> problems, we can always revert back to /contrib/clients.
>
> B
>
> -----Original Message-----
> From: Mike Klaas [mailto:mike.klaas@gmail.com]
> Sent: Tuesday, August 29, 2006 3:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: acts_as_solr
>
> On 8/29/06, Chris Hostetter <ho...@fucit.org> wrote:
>
> > Maybe ... "contrib" in the "Java Lucene" project sense however is all
> java
> > code, i would imagine that if someone wrote a perl utility to deal with
> > index files it would not make sense to but in the Lucene "contrib"
> > directory for that reason ... thre may be Java code submitted down the
> > road that we think is useful enough to make available in releases, but
> > niche enough that we on't want to put it in the main solr.war, which
> might
> > be more along the lines of the "contrib" notion -- hence my suggestion
> of
> > "clients" ...
> >
> > ...but i'm just thinking outloud at this point, i don't have a strong
> > opinion either way.
>
> Your point definately resonates.  clients are also a rather important
> type of third-party contribution for a webapp thus a top-level
> directory makes sense.
>
> -Mike
>
>

RE: acts_as_solr

Posted by Brian Lucas <bl...@gmail.com>.
Let's create it as a top-level directory solely because it might give people
a small head-start in SOLR evaluation and getting things off the ground
(less navigation around the tree to get started).  If there are any
problems, we can always revert back to /contrib/clients.

B

-----Original Message-----
From: Mike Klaas [mailto:mike.klaas@gmail.com] 
Sent: Tuesday, August 29, 2006 3:26 PM
To: solr-user@lucene.apache.org
Subject: Re: acts_as_solr

On 8/29/06, Chris Hostetter <ho...@fucit.org> wrote:

> Maybe ... "contrib" in the "Java Lucene" project sense however is all java
> code, i would imagine that if someone wrote a perl utility to deal with
> index files it would not make sense to but in the Lucene "contrib"
> directory for that reason ... thre may be Java code submitted down the
> road that we think is useful enough to make available in releases, but
> niche enough that we on't want to put it in the main solr.war, which might
> be more along the lines of the "contrib" notion -- hence my suggestion of
> "clients" ...
>
> ...but i'm just thinking outloud at this point, i don't have a strong
> opinion either way.

Your point definately resonates.  clients are also a rather important
type of third-party contribution for a webapp thus a top-level
directory makes sense.

-Mike


Re: acts_as_solr

Posted by Mike Klaas <mi...@gmail.com>.
On 8/29/06, Chris Hostetter <ho...@fucit.org> wrote:

> Maybe ... "contrib" in the "Java Lucene" project sense however is all java
> code, i would imagine that if someone wrote a perl utility to deal with
> index files it would not make sense to but in the Lucene "contrib"
> directory for that reason ... thre may be Java code submitted down the
> road that we think is useful enough to make available in releases, but
> niche enough that we on't want to put it in the main solr.war, which might
> be more along the lines of the "contrib" notion -- hence my suggestion of
> "clients" ...
>
> ...but i'm just thinking outloud at this point, i don't have a strong
> opinion either way.

Your point definately resonates.  clients are also a rather important
type of third-party contribution for a webapp thus a top-level
directory makes sense.

-Mike

Re: acts_as_solr

Posted by Chris Hostetter <ho...@fucit.org>.
: > > perhaps a top level "clients" directory with this going in clients/
: > > ruby ?

: > Pardon me for chiming in, but this is a very good idea.  I would also
: > suggest that Java clients should also go in here.

: Might this fit better under a contrib/ umbrella?  This would more
: closely model lucene's layout.

Maybe ... "contrib" in the "Java Lucene" project sense however is all java
code, i would imagine that if someone wrote a perl utility to deal with
index files it would not make sense to but in the Lucene "contrib"
directory for that reason ... thre may be Java code submitted down the
road that we think is useful enough to make available in releases, but
niche enough that we on't want to put it in the main solr.war, which might
be more along the lines of the "contrib" notion -- hence my suggestion of
"clients" ...

...but i'm just thinking outloud at this point, i don't have a strong
opinion either way.



-Hoss


Re: acts_as_solr

Posted by Mike Klaas <mi...@gmail.com>.
On 8/29/06, WHIRLYCOTT <ph...@whirlycott.com> wrote:
> On Aug 29, 2006, at 4:12 PM, Chris Hostetter wrote:
>
> > perhaps a top level "clients" directory with this going in clients/
> > ruby ?
>
> Pardon me for chiming in, but this is a very good idea.  I would also
> suggest that Java clients should also go in here.

Might this fit better under a contrib/ umbrella?  This would more
closely model lucene's layout.

-Mike

Re: acts_as_solr

Posted by WHIRLYCOTT <ph...@whirlycott.com>.
On Aug 29, 2006, at 4:12 PM, Chris Hostetter wrote:

> perhaps a top level "clients" directory with this going in clients/ 
> ruby ?

Pardon me for chiming in, but this is a very good idea.  I would also  
suggest that Java clients should also go in here.

phil.

--
                                    Whirlycott
                                    Philip Jacob
                                    phil@whirlycott.com
                                    http://www.whirlycott.com/phil/



Re: acts_as_solr

Posted by Chris Hostetter <ho...@fucit.org>.
: I've spent a few hours tinkering with an Ruby ActiveRecord plugin to
: index, delete, and search models fronted by a database into Solr.

I don't know crap about Ruby, but that looks pretty cool.

: I'd like to commit this to the Solr repository.  Any objections?

: commit, what directory should I put things under?  How about just
: "ruby"?

no objections .. as for where, my gut says somwhere under src/ (ie:
src/ruby) but the current src/ tree is very focused on the server itself
-- src/java, src/scripts, and src/webapp all being completley server
specific, src/apps and src/test being server specific in nature since they
focus on src/java.

perhaps a top level "clients" directory with this going in clients/ruby ?



-Hoss


Re: acts_as_solr

Posted by solruser <so...@gmail.com>.
Hi,

Does the acts_as_solr supports now fancier  results such as highlight?
Although I see options to use facets but have not yet explored with the
plugin.

TIA
-amit

Erik Hatcher wrote:
> 
> 
> On Aug 28, 2006, at 10:25 PM, Erik Hatcher wrote:
>> I'd like to commit this to the Solr repository.  Any objections?   
>> Once committed, folks will be able to use "script/plugin  
>> install ..." to install the Ruby side of things, and using a binary  
>> distribution of Solr's example application and a custom solr/conf  
>> directory (just for schema.xml) they'd be up and running quite  
>> quickly.  If ok to commit, what directory should I put things  
>> under?  How about just "ruby"?
> 
> Ok, /client/ruby it is.  I'll get this committed in the next day or so.
> 
> I have to admit that the stuff Seth did with Searchable (linked to  
> from <http://wiki.apache.org/solr/SolRuby>) is very well done so  
> hopefully he can work with us to perhaps integrate that work into  
> what lives in Solr's repository.  Having the Searchable abstraction  
> is interesting, but it might be a bit limiting in terms of leveraging  
> fancier return values from Solr, like the facets and highlighting -  
> or maybe it's just an unnecessary abstraction for those always  
> working with Solr.  I like it though, and will certainly borrow ideas  
> from it on how to do slick stuff with Ruby.
> 
> While I'm at it, I'd be happy to commit the Java client into /client/ 
> java.  I'll check the status of that contribution when I can.
> 
> 	Erik
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/acts_as_solr-tf2181162.html#a10082711
Sent from the Solr - User mailing list archive at Nabble.com.


Re: acts_as_solr

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 28, 2006, at 10:25 PM, Erik Hatcher wrote:
> I'd like to commit this to the Solr repository.  Any objections?   
> Once committed, folks will be able to use "script/plugin  
> install ..." to install the Ruby side of things, and using a binary  
> distribution of Solr's example application and a custom solr/conf  
> directory (just for schema.xml) they'd be up and running quite  
> quickly.  If ok to commit, what directory should I put things  
> under?  How about just "ruby"?

Ok, /client/ruby it is.  I'll get this committed in the next day or so.

I have to admit that the stuff Seth did with Searchable (linked to  
from <http://wiki.apache.org/solr/SolRuby>) is very well done so  
hopefully he can work with us to perhaps integrate that work into  
what lives in Solr's repository.  Having the Searchable abstraction  
is interesting, but it might be a bit limiting in terms of leveraging  
fancier return values from Solr, like the facets and highlighting -  
or maybe it's just an unnecessary abstraction for those always  
working with Solr.  I like it though, and will certainly borrow ideas  
from it on how to do slick stuff with Ruby.

While I'm at it, I'd be happy to commit the Java client into /client/ 
java.  I'll check the status of that contribution when I can.

	Erik


Re: acts_as_solr

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 30, 2006, at 1:36 PM, Yonik Seeley wrote:
> I think Ruby is very fertile ground for Solr to pick up
> users/developers right now.

I fully agree.  Ferret is coming along very nicely as well, which is  
wonderful for pure Rubyists that don't need the additional  
dependency, skill set to manage, and different environment that Solr  
would require.  But Solr really shines for all its caching and index  
management, so I'm sure there will be many RoR folks that will  
embrace Solr.

> Getting into some little details, it looks like a commit (which
> actualy does an optimize) is done on every .save, right?

That's true.  I'm not sure how one would avoid doing a commit for  
a .save.  There isn't, as far as I know, broader granularity for  
database operations.  An optimize wouldn't be necessary, but  
certainly swapping over the searcher would be desired after a save.

> I also notice that the commit is asynchronous... so one could do a
> save, then do an immediate search and not see the changes yet, right?

That is true.  But holding up a save for a new IndexSearcher would be  
a big hit, at least in my application that currently takes 30+  
seconds of warming up before a new searcher is ready.

> I don't know anything about RoR and ActiveRecord, but hopefully there
> is some way to avoid a commit on every operation.

It could certainly be made more manual such that a developer would  
need to code in when a commit happens.  I'm not currently sure what  
other options there would be for it to be automatic but not done for  
every .save.  Within a RoR application, one could code in a <commit/>  
a controller after_filter such that it would occur at the end of an  
HTTP request by the browser.  Any RoR savvy folks have suggestions on  
this?

	Erik



Re: acts_as_solr

Posted by Yonik Seeley <yo...@apache.org>.
Cool stuff Erik!
I think Ruby is very fertile ground for Solr to pick up
users/developers right now.

Getting into some little details, it looks like a commit (which
actualy does an optimize) is done on every .save, right?

I also notice that the commit is asynchronous... so one could do a
save, then do an immediate search and not see the changes yet, right?

I don't know anything about RoR and ActiveRecord, but hopefully there
is some way to avoid a commit on every operation.

> I'd like to commit this to the Solr repository.
+1

Let's go with clients/ruby

-Yonik