You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ruby-dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2007/04/20 03:33:19 UTC
Re: [acts_as_solr] Few question on usage
Sorry, I missed the original mail. Hoss has got it right.
Personally I'd love to see acts_as_solr definitively come into the
solr-ruby fold.
Regarding your questions:
> : 1. What are other alternatives are available for ruby integration
> with solr
> : other than acts-as_solr plugin.
acts_as_solr is purely for ActiveRecord (database O/R mapping)
integration with Solr, such that when you create/update/delete
records they get taken care of in Solr also.
For pure Ruby access to Solr without a database, use solr-ruby. The
0.01 gem is available as "gem install solr-ruby", but if you can I'd
recommend you tinker with the trunk codebase too.
> : 2. acts_as_solr plugin - does it support highlighting feature
This depends on which acts_as_solr you've grabbed. As Hoss
mentioned, there are various flavors of it floating around. I've
promised to speak about acts_as_solr at RailsConf next month, so I'll
be working to get that under control even if that means resurrecting
my initial hack and making it part of solr-ruby and hoping that the
other implementations floating out there would like to collaborate on
a definitive version built into the Solr codebase.
> : 3. performance benchmark for acts_as_solr plugin available if any
What kind of numbers are you after? acts_as_solr searches Solr, and
then will fetch the records from the database to bring back model
objects, so you have to account for the database access in the
picture as well as Solr.
Erik
On Apr 19, 2007, at 5:30 PM, Chris Hostetter wrote:
>
> I don't really know alot about Ruby, but as i understand it there
> are more
> then a few versions of something called "acts_as_solr" floating
> arround
> ... the first written by Erik as a proof of concept, and then
> pickedu pand
> polished a bit by someone else (whose name escapes me)
>
> all of the "serious" ruby/solr development i know about is
> happening as
> part of the "Flare" sub-sub project...
>
> http://wiki.apache.org/solr/Flare
> http://wiki.apache.org/solr/SolRuby
>
> ...most of the people workign on it seem to hang out on the
> ruby-dev@lucene mailing list. as i understand it the "solr-ruby"
> package
> is a low level ruby<->solr API, with Flare being a higher level
> reusable Rails app type thingamombob. (can you tell i don't know a
> lot
> about RUby or rails? ... i'm winging it)
>
>
> : Date: Tue, 17 Apr 2007 10:52:00 -0700
> : From: amit rohatgi <so...@gmail.com>
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: [acts_as_solr] Few question on usage
> :
> : Hi
> :
> : Here are few question for solr integrating with ruby
> :
> : 1. What are other alternatives are available for ruby integration
> with solr
> : other than acts-as_solr plugin.
> : 2. acts_as_solr plugin - does it support highlighting feature
> : 3. performance benchmark for acts_as_solr plugin available if any
> :
> :
> : -thanks
> : dev
> :
>
>
>
> -Hoss
Re: [acts_as_solr] Few question on usage
Posted by solruser <so...@gmail.com>.
Hi Erik
Thanks for detailed information. With your detailed information I understand
that acts_as_solr is presently the best available solution to connect to
Solr from rails application for database. And you look forward to bring this
under Solr Ruby development going forward. Which I assume will happen in
next month or so.
That being the case acts_as_solr plugin from rubyforge is the most suitable
place to start and soon it can be expected to work under apache solrruby
project And could look forward for centralized code additions, updates and
contributions here. Please correct if this understanding and future
expectations is different
Thanks
Erik Hatcher wrote:
>
>
> On Apr 21, 2007, at 9:42 PM, Erik Hatcher wrote:
>> source = DataSource.new
>>
>> mapping = {
>> :id => :isbn,
>> :name => :author,
>> :source => "BOOKS",
>> :year => Proc.new {|record| record.date[0,4] },
>> }
>>
>> Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
>> solr_document[:timestamp] = Time.now
>> end
>
> Sorry, my bad, that's what I get for contriving code without testing
> it, and then changing the implementation to suit how I wanted to
> describe it.
>
> It should be Solr::Indexer.index(source, mapping) .... # mappING
>
> I just changed the implementation to allow a Hash as well as a
> Solr::Importer::Mapper object (well, really anything with a #map
> method).
>
> Erik
>
>
>
>
>
--
View this message in context: http://www.nabble.com/-acts_as_solr--Few-question-on-usage-tf3595245.html#a10152267
Sent from the Solr - User mailing list archive at Nabble.com.
Re: [acts_as_solr] Few question on usage
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 21, 2007, at 9:42 PM, Erik Hatcher wrote:
> source = DataSource.new
>
> mapping = {
> :id => :isbn,
> :name => :author,
> :source => "BOOKS",
> :year => Proc.new {|record| record.date[0,4] },
> }
>
> Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
> solr_document[:timestamp] = Time.now
> end
Sorry, my bad, that's what I get for contriving code without testing
it, and then changing the implementation to suit how I wanted to
describe it.
It should be Solr::Indexer.index(source, mapping) .... # mappING
I just changed the implementation to allow a Hash as well as a
Solr::Importer::Mapper object (well, really anything with a #map
method).
Erik
Re: [acts_as_solr] Few question on usage
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 21, 2007, at 9:42 PM, Erik Hatcher wrote:
> source = DataSource.new
>
> mapping = {
> :id => :isbn,
> :name => :author,
> :source => "BOOKS",
> :year => Proc.new {|record| record.date[0,4] },
> }
>
> Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
> solr_document[:timestamp] = Time.now
> end
Sorry, my bad, that's what I get for contriving code without testing
it, and then changing the implementation to suit how I wanted to
describe it.
It should be Solr::Indexer.index(source, mapping) .... # mappING
I just changed the implementation to allow a Hash as well as a
Solr::Importer::Mapper object (well, really anything with a #map
method).
Erik
Re: [acts_as_solr] Few question on usage
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 20, 2007, at 2:30 PM, solruser wrote:
> For pure Ruby access to Solr without a database, use solr-ruby. The
> 0.01 gem is available as "gem install solr-ruby", but if you can I'd
> recommend you tinker with the trunk codebase too.
>
>>>>
> Well I say, considering use of solr with rails application. Whats
> the ideal
> approach?.
"rails application" is a pretty broad category of applications at
this point. If we're talking about a database-backed application
being searchable by Solr, I'd go for the RubyForge acts_as_solr
first. However, I suspect that it needs work in terms of
facilitating access to facets, highlighting, and other types of
custom query handlers.
If your application is backed by other datastores, like in my cases a
bunch of MARC records in binary format, or a flat delimited file, a
ZIP file full of RDF/XML files, or even more interestingly another
Solr instance that we wanted to repurpose in another Solr-based
application, then go with solr-ruby.
It's my intention to bridge this gap in the near future somehow, I
just haven't formulated an exact plan. acts_as_solr fits nicely and
very very easily on top of solr-ruby. I envision acts_as_solr simply
being part of solr-ruby and it'd only hook in if you have
ActiveRecord installed, otherwise it'd be transparent, only taking up
a few 10's of lines of code in an un-required .rb file.
The first step could be to patch the RubyForge acts_as_solr to use
solr-ruby to kick start collaboration. As for where my effort fits
into a calendar, within the next few weeks I'll be delving into it
deeply and can speak more definitively.
>>>>
> Since there are many flavors floating around which is most sought
> after and
> supported. And I agree that definitive version will help ROR
> community to
> accept solr with much larger level of confidence.
> And since ROR application are addressing
> web2.0 the need for search and collaborate information is much
> higher. So I
> personally believe addressing this will definately go long way.
That's the plan! No question about it. I personally am running on
all cylinders, and will make progress on these technologies as my
real-world needs require them, which is increasing all the time. All
savvy SolRubyists are invited to jump in!
I've not documented this stuff on the wiki to the standards set by
the Solr engine itself, but there is some pretty amazing power going
on with solr-ruby right now. For example, the data mapping / indexer
framework makes this easy to import a dataset into Solr using Ruby:
source = DataSource.new
mapping = {
:id => :isbn,
:name => :author,
:source => "BOOKS",
:year => Proc.new {|record| record.date[0,4] },
}
Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
solr_document[:timestamp] = Time.now
end
This showcases the simplistic data source facility (*quack* -
anything that has a #each method) [with a contrived DataSource bogus
class], and the mapping capabilities. The mapping is a hash of Solr
field names to value mapping. A value mapping can be a String
("BOOKS"), a Symbol (:isbn, :author) which looks up that field from
(uh, #)each of the objects yielded to the each block. This lookup
simply means again *quack* that the data object needs a [] method
defined. The Proc example is a bit more advanced Ruby voodoo for
embedded a bit of code into the mapping to be executed later with
actual record passed into it, and in the example it strips off the
first four characters of the records date property. And one more bit
of Ruby coolness is the do ... end block for the indexer method. The
indexer takes a data source and a mapper melding them together as
described, and allowing you one final chance to affect the
solr_document before it gets indexed, of course also provided the
original data object.
We now already have a simple mapper, an XPath mapper, and an Hpricot
mapper available. We also have some handy data sources including a
tab-delimited file source (obsoleted in my play book by the CSV
importer now built in). I'm also using a simple custom MARC binary
data source and mapper specific to ruby-marc objects, and I just put
together a SolrSource that takes a query (and filters) for one Solr
instance in a configurable paging way, that feeds documents returned
from that query successively out. Apply a mapper to that data source
and you can pipe data from one Solr to another like this:
solr_source = Solr::Importer::SolrSource.new("http://localhost:8420/
solr", "*:*", ["year:[1776 TO 1918]", 'author:smith'])
count = 0
Solr::Indexer.index(source_solr, mapper, {:debug => false, :timeout
=> 120, :solr_url => "http://localhost:8983/solr"}) do |orig_data,
solr_document|
count = count + 1
if count % 100 == 0
puts "#{count}"
end
end
The count junk is just to see console progress on how many records
have been indexed.
So I'm working the Ruby/Solr thing as much as possible right now.
There is something to what we've got there, but its not packaged as
nicely as needed for a community to flourish, and for that I
apologize. But there is also enough goodness there now to lure folks
in to want to get involved.
Right now in RoR with the Flare plugin installed, you can have a
controller that looks like this:
class SearchController < ApplicationController
flare
end
And with some copy/pasting of templates (that we can build in as
defaults somehow I'm sure) you have a faceted browsing Ajax tricked
out (well, inplace editor and Ajax suggest) experience with how many
lines of code? (the devil is in the details though, and that is why
I don't yet recommend flare to folks that just want it to just work
and also be configurable) Flare cuts a lot of corners by hard-coding
some thing that need to be made configurable, etc. Typical
prototyping approach, tinker, tinker, tinker, distill. I'm still in
the first tinker phase with Flare right now. But folks interested in
rolling up their sleeves and don't mind getting a little grubby with
code are more than invited to delve into Flare now, with the
forewarning that the flare you see today will not be at all near the
Flare that spawns from the ashes. Pioneering spirit required.
>> : 3. performance benchmark for acts_as_solr plugin available if any
>
> What kind of numbers are you after? acts_as_solr searches Solr, and
> then will fetch the records from the database to bring back model
> objects, so you have to account for the database access in the
> picture as well as Solr.
>
>>>>
> Well to be specific I am keen to know about creation and update of
> indexes
> when you run into large number of documents. Since database is used to
> populate the models and definately it will be the commulative
> effect of
> retrieval of document from solr with lucene, network issues (since
> its a web
> service) and locally on database (depends on configuration).
Again we need to be clear about "large". I've got near 4M indexes
under my belt now, but many others have gone to 10M+. Lucene and
Solr both scale very well in the 10's of millions and even further up
into the hundreds of millions I've heard.
Certainly those other latencies you mention are valid questions, but
in my experience they've not been show-stopping concerns performance
with Solr + Ruby has been more than acceptable... it's been just
fine, even with several spots for improvement in all those areas in
my applications. First rule of optimization: Don't. Second rule of
optimization: Don't optimize yet.
Erik
Re: [acts_as_solr] Few question on usage
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 20, 2007, at 2:30 PM, solruser wrote:
> For pure Ruby access to Solr without a database, use solr-ruby. The
> 0.01 gem is available as "gem install solr-ruby", but if you can I'd
> recommend you tinker with the trunk codebase too.
>
>>>>
> Well I say, considering use of solr with rails application. Whats
> the ideal
> approach?.
"rails application" is a pretty broad category of applications at
this point. If we're talking about a database-backed application
being searchable by Solr, I'd go for the RubyForge acts_as_solr
first. However, I suspect that it needs work in terms of
facilitating access to facets, highlighting, and other types of
custom query handlers.
If your application is backed by other datastores, like in my cases a
bunch of MARC records in binary format, or a flat delimited file, a
ZIP file full of RDF/XML files, or even more interestingly another
Solr instance that we wanted to repurpose in another Solr-based
application, then go with solr-ruby.
It's my intention to bridge this gap in the near future somehow, I
just haven't formulated an exact plan. acts_as_solr fits nicely and
very very easily on top of solr-ruby. I envision acts_as_solr simply
being part of solr-ruby and it'd only hook in if you have
ActiveRecord installed, otherwise it'd be transparent, only taking up
a few 10's of lines of code in an un-required .rb file.
The first step could be to patch the RubyForge acts_as_solr to use
solr-ruby to kick start collaboration. As for where my effort fits
into a calendar, within the next few weeks I'll be delving into it
deeply and can speak more definitively.
>>>>
> Since there are many flavors floating around which is most sought
> after and
> supported. And I agree that definitive version will help ROR
> community to
> accept solr with much larger level of confidence.
> And since ROR application are addressing
> web2.0 the need for search and collaborate information is much
> higher. So I
> personally believe addressing this will definately go long way.
That's the plan! No question about it. I personally am running on
all cylinders, and will make progress on these technologies as my
real-world needs require them, which is increasing all the time. All
savvy SolRubyists are invited to jump in!
I've not documented this stuff on the wiki to the standards set by
the Solr engine itself, but there is some pretty amazing power going
on with solr-ruby right now. For example, the data mapping / indexer
framework makes this easy to import a dataset into Solr using Ruby:
source = DataSource.new
mapping = {
:id => :isbn,
:name => :author,
:source => "BOOKS",
:year => Proc.new {|record| record.date[0,4] },
}
Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
solr_document[:timestamp] = Time.now
end
This showcases the simplistic data source facility (*quack* -
anything that has a #each method) [with a contrived DataSource bogus
class], and the mapping capabilities. The mapping is a hash of Solr
field names to value mapping. A value mapping can be a String
("BOOKS"), a Symbol (:isbn, :author) which looks up that field from
(uh, #)each of the objects yielded to the each block. This lookup
simply means again *quack* that the data object needs a [] method
defined. The Proc example is a bit more advanced Ruby voodoo for
embedded a bit of code into the mapping to be executed later with
actual record passed into it, and in the example it strips off the
first four characters of the records date property. And one more bit
of Ruby coolness is the do ... end block for the indexer method. The
indexer takes a data source and a mapper melding them together as
described, and allowing you one final chance to affect the
solr_document before it gets indexed, of course also provided the
original data object.
We now already have a simple mapper, an XPath mapper, and an Hpricot
mapper available. We also have some handy data sources including a
tab-delimited file source (obsoleted in my play book by the CSV
importer now built in). I'm also using a simple custom MARC binary
data source and mapper specific to ruby-marc objects, and I just put
together a SolrSource that takes a query (and filters) for one Solr
instance in a configurable paging way, that feeds documents returned
from that query successively out. Apply a mapper to that data source
and you can pipe data from one Solr to another like this:
solr_source = Solr::Importer::SolrSource.new("http://localhost:8420/
solr", "*:*", ["year:[1776 TO 1918]", 'author:smith'])
count = 0
Solr::Indexer.index(source_solr, mapper, {:debug => false, :timeout
=> 120, :solr_url => "http://localhost:8983/solr"}) do |orig_data,
solr_document|
count = count + 1
if count % 100 == 0
puts "#{count}"
end
end
The count junk is just to see console progress on how many records
have been indexed.
So I'm working the Ruby/Solr thing as much as possible right now.
There is something to what we've got there, but its not packaged as
nicely as needed for a community to flourish, and for that I
apologize. But there is also enough goodness there now to lure folks
in to want to get involved.
Right now in RoR with the Flare plugin installed, you can have a
controller that looks like this:
class SearchController < ApplicationController
flare
end
And with some copy/pasting of templates (that we can build in as
defaults somehow I'm sure) you have a faceted browsing Ajax tricked
out (well, inplace editor and Ajax suggest) experience with how many
lines of code? (the devil is in the details though, and that is why
I don't yet recommend flare to folks that just want it to just work
and also be configurable) Flare cuts a lot of corners by hard-coding
some thing that need to be made configurable, etc. Typical
prototyping approach, tinker, tinker, tinker, distill. I'm still in
the first tinker phase with Flare right now. But folks interested in
rolling up their sleeves and don't mind getting a little grubby with
code are more than invited to delve into Flare now, with the
forewarning that the flare you see today will not be at all near the
Flare that spawns from the ashes. Pioneering spirit required.
>> : 3. performance benchmark for acts_as_solr plugin available if any
>
> What kind of numbers are you after? acts_as_solr searches Solr, and
> then will fetch the records from the database to bring back model
> objects, so you have to account for the database access in the
> picture as well as Solr.
>
>>>>
> Well to be specific I am keen to know about creation and update of
> indexes
> when you run into large number of documents. Since database is used to
> populate the models and definately it will be the commulative
> effect of
> retrieval of document from solr with lucene, network issues (since
> its a web
> service) and locally on database (depends on configuration).
Again we need to be clear about "large". I've got near 4M indexes
under my belt now, but many others have gone to 10M+. Lucene and
Solr both scale very well in the 10's of millions and even further up
into the hundreds of millions I've heard.
Certainly those other latencies you mention are valid questions, but
in my experience they've not been show-stopping concerns performance
with Solr + Ruby has been more than acceptable... it's been just
fine, even with several spots for improvement in all those areas in
my applications. First rule of optimization: Don't. Second rule of
optimization: Don't optimize yet.
Erik
Re: [acts_as_solr] Few question on usage
Posted by solruser <so...@gmail.com>.
Hi Erik,
Please find my comments under ">>>" to your queries.
> : 1. What are other alternatives are available for ruby integration
> with solr
> : other than acts-as_solr plugin.
acts_as_solr is purely for ActiveRecord (database O/R mapping)
integration with Solr, such that when you create/update/delete
records they get taken care of in Solr also.
For pure Ruby access to Solr without a database, use solr-ruby. The
0.01 gem is available as "gem install solr-ruby", but if you can I'd
recommend you tinker with the trunk codebase too.
>>>
Well I say, considering use of solr with rails application. Whats the ideal
approach?.
> : 2. acts_as_solr plugin - does it support highlighting feature
This depends on which acts_as_solr you've grabbed. As Hoss
mentioned, there are various flavors of it floating around. I've
promised to speak about acts_as_solr at RailsConf next month, so I'll
be working to get that under control even if that means resurrecting
my initial hack and making it part of solr-ruby and hoping that the
other implementations floating out there would like to collaborate on
a definitive version built into the Solr codebase.
>>>
Since there are many flavors floating around which is most sought after and
supported. And I agree that definitive version will help ROR community to
accept solr with much larger level of confidence.
And since ROR application are addressing
web2.0 the need for search and collaborate information is much higher. So I
personally believe addressing this will definately go long way.
> : 3. performance benchmark for acts_as_solr plugin available if any
What kind of numbers are you after? acts_as_solr searches Solr, and
then will fetch the records from the database to bring back model
objects, so you have to account for the database access in the
picture as well as Solr.
>>>
Well to be specific I am keen to know about creation and update of indexes
when you run into large number of documents. Since database is used to
populate the models and definately it will be the commulative effect of
retrieval of document from solr with lucene, network issues (since its a web
service) and locally on database (depends on configuration).
-TIA
Erik Hatcher wrote:
>
> Sorry, I missed the original mail. Hoss has got it right.
>
> Personally I'd love to see acts_as_solr definitively come into the
> solr-ruby fold.
>
> Regarding your questions:
>
>> : 1. What are other alternatives are available for ruby integration
>> with solr
>> : other than acts-as_solr plugin.
>
> acts_as_solr is purely for ActiveRecord (database O/R mapping)
> integration with Solr, such that when you create/update/delete
> records they get taken care of in Solr also.
>
> For pure Ruby access to Solr without a database, use solr-ruby. The
> 0.01 gem is available as "gem install solr-ruby", but if you can I'd
> recommend you tinker with the trunk codebase too.
>
>> : 2. acts_as_solr plugin - does it support highlighting feature
>
> This depends on which acts_as_solr you've grabbed. As Hoss
> mentioned, there are various flavors of it floating around. I've
> promised to speak about acts_as_solr at RailsConf next month, so I'll
> be working to get that under control even if that means resurrecting
> my initial hack and making it part of solr-ruby and hoping that the
> other implementations floating out there would like to collaborate on
> a definitive version built into the Solr codebase.
>
>> : 3. performance benchmark for acts_as_solr plugin available if any
>
> What kind of numbers are you after? acts_as_solr searches Solr, and
> then will fetch the records from the database to bring back model
> objects, so you have to account for the database access in the
> picture as well as Solr.
>
> Erik
>
>
>
> On Apr 19, 2007, at 5:30 PM, Chris Hostetter wrote:
>
>>
>> I don't really know alot about Ruby, but as i understand it there
>> are more
>> then a few versions of something called "acts_as_solr" floating
>> arround
>> ... the first written by Erik as a proof of concept, and then
>> pickedu pand
>> polished a bit by someone else (whose name escapes me)
>>
>> all of the "serious" ruby/solr development i know about is
>> happening as
>> part of the "Flare" sub-sub project...
>>
>> http://wiki.apache.org/solr/Flare
>> http://wiki.apache.org/solr/SolRuby
>>
>> ...most of the people workign on it seem to hang out on the
>> ruby-dev@lucene mailing list. as i understand it the "solr-ruby"
>> package
>> is a low level ruby<->solr API, with Flare being a higher level
>> reusable Rails app type thingamombob. (can you tell i don't know a
>> lot
>> about RUby or rails? ... i'm winging it)
>>
>>
>> : Date: Tue, 17 Apr 2007 10:52:00 -0700
>> : From: amit rohatgi <so...@gmail.com>
>> : Reply-To: solr-user@lucene.apache.org
>> : To: solr-user@lucene.apache.org
>> : Subject: [acts_as_solr] Few question on usage
>> :
>> : Hi
>> :
>> : Here are few question for solr integrating with ruby
>> :
>> : 1. What are other alternatives are available for ruby integration
>> with solr
>> : other than acts-as_solr plugin.
>> : 2. acts_as_solr plugin - does it support highlighting feature
>> : 3. performance benchmark for acts_as_solr plugin available if any
>> :
>> :
>> : -thanks
>> : dev
>> :
>>
>>
>>
>> -Hoss
>
>
>
--
View this message in context: http://www.nabble.com/-acts_as_solr--Few-question-on-usage-tf3595245.html#a10107800
Sent from the Solr - User mailing list archive at Nabble.com.