You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lukáš Vlček <lu...@gmail.com> on 2009/11/13 10:52:00 UTC

Arguments for Solr implementation at public web site

Hi,

I am looking for good arguments to justify implementation a search for sites
which are available on the public internet. There are many sites in "powered
by Solr" section which are indexed by Google and other search engines but
still they decided to invest resources into building and maintenance of
their own search functionality and not to go with [user_query site:
my_site.com] google search. Why?

By no mean I am saying it makes not sense to implement Solr! But I want to
put together list of reasons and possibly with examples. Your help would be
much appreciated!

Let's narrow the scope of this discussion to the following:
- the search should cover several community sites running open source CMSs,
JIRAs, Bugillas ... and the like
- all documents use open formats (no need to parse Word or Excel)
(maybe something close to what LucidImagination does for mailing lists of
Lucene and Solr)

My initial kick off list would be:

pros:
- considering we understand the content (we understand the domain scope) we
can fine tune the search engine to provide more accurate results
- Solr can give us facets
- we have user search logs (valuable for analysis)
- implementing Solr is a fun

cons:
- requires resources (but the cost is relatively low depending on the query
traffic, index size and frequency of updates)

Regards,
Lukas

http://blog.lukas-vlcek.com/

Re: Arguments for Solr implementation at public web site

Posted by Chantal Ackermann <ch...@btelligent.de>.

Jan-Eirik B. Nævdal schrieb:
> Some extra for the pros list:
> 
> - Full control over which content to be searchable and not.
> - Posibility to make pages searchable almost instant after publication
> - Control over when the site is indexed

+1 expecially the last point
you can also add a robot.txt and prohibit spidering of the site to 
reduce traffic. google won't index any highly dynamic content, then.

> 
> 
> Friendly
> 
> Jan-Eirik
> 
> On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček <lu...@gmail.com> wrote:
> 
>> Hi,
>>
>> I am looking for good arguments to justify implementation a search for
>> sites
>> which are available on the public internet. There are many sites in
>> "powered
>> by Solr" section which are indexed by Google and other search engines but
>> still they decided to invest resources into building and maintenance of
>> their own search functionality and not to go with [user_query site:
>> my_site.com] google search. Why?
>>
>> By no mean I am saying it makes not sense to implement Solr! But I want to
>> put together list of reasons and possibly with examples. Your help would be
>> much appreciated!
>>
>> Let's narrow the scope of this discussion to the following:
>> - the search should cover several community sites running open source CMSs,
>> JIRAs, Bugillas ... and the like
>> - all documents use open formats (no need to parse Word or Excel)
>> (maybe something close to what LucidImagination does for mailing lists of
>> Lucene and Solr)
>>
>> My initial kick off list would be:
>>
>> pros:
>> - considering we understand the content (we understand the domain scope) we
>> can fine tune the search engine to provide more accurate results
>> - Solr can give us facets
>> - we have user search logs (valuable for analysis)
>> - implementing Solr is a fun
>>
>> cons:
>> - requires resources (but the cost is relatively low depending on the query
>> traffic, index size and frequency of updates)
>>
>> Regards,
>> Lukas
>>
>> http://blog.lukas-vlcek.com/
>>
> 
> 
> 
> --
> Jan Eirik B. Nævdal
> Solutions Engineer | +47 982 65 347
> Iterate AS | www.iterate.no
> The Lean Software Development Consultancy

Re: Arguments for Solr implementation at public web site

Posted by "Jan-Eirik B. Nævdal" <ja...@iterate.no>.
Some extra for the pros list:

- Full control over which content to be searchable and not.
- Posibility to make pages searchable almost instant after publication
- Control over when the site is indexed


Friendly

Jan-Eirik

On Fri, Nov 13, 2009 at 10:52 AM, Lukáš Vlček <lu...@gmail.com> wrote:

> Hi,
>
> I am looking for good arguments to justify implementation a search for
> sites
> which are available on the public internet. There are many sites in
> "powered
> by Solr" section which are indexed by Google and other search engines but
> still they decided to invest resources into building and maintenance of
> their own search functionality and not to go with [user_query site:
> my_site.com] google search. Why?
>
> By no mean I am saying it makes not sense to implement Solr! But I want to
> put together list of reasons and possibly with examples. Your help would be
> much appreciated!
>
> Let's narrow the scope of this discussion to the following:
> - the search should cover several community sites running open source CMSs,
> JIRAs, Bugillas ... and the like
> - all documents use open formats (no need to parse Word or Excel)
> (maybe something close to what LucidImagination does for mailing lists of
> Lucene and Solr)
>
> My initial kick off list would be:
>
> pros:
> - considering we understand the content (we understand the domain scope) we
> can fine tune the search engine to provide more accurate results
> - Solr can give us facets
> - we have user search logs (valuable for analysis)
> - implementing Solr is a fun
>
> cons:
> - requires resources (but the cost is relatively low depending on the query
> traffic, index size and frequency of updates)
>
> Regards,
> Lukas
>
> http://blog.lukas-vlcek.com/
>



-- 
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy

Re: Arguments for Solr implementation at public web site

Posted by Andrew Clegg <an...@gmail.com>.

Lukáš Vlček wrote:
> 
> When you need to search for something Lucene or Solr related, which one do
> you use:
> - generic Google
> - go to a particular mail list web site and search from here (if there is
> any search form at all)
> 

Both of these (Nabble in the second case) in case any recent posts have
appeared which Google hasn't picked up.

Andrew.

-- 
View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334980.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arguments for Solr implementation at public web site

Posted by Jon Baer <jo...@gmail.com>.
For this list I usually end up @ http://solr.markmail.org (which I believe also uses Lucene under the hood)

Google is such a black box ... 

Pros:
+ 1 Open Source (enough said :-)

There also seems to always be the notion that "crawling" leads itself to produce the best results but that is rarely the case.  And unless you are a "special" type of site Google will not overlay your results w/ some type of context in the search (ie news or sports, etc).  

What I think really needs to happen is Solr (and is a bit missing @ the moment) is there needs to be a common interface to "reindexing" another index (if that makes sense) ... something akin or like OpenSearch (http://www.opensearch.org/Community/OpenSearch_software)

For example what I would like to do is have my site, have my search index, and connect Google to indexing just to my search index (and not crawl the site) ... the only current option for something like that are sitemaps which I think Solr (templates) should have a contrib project for (but you would have to generate these offline for sure).

- Jon  

On Nov 13, 2009, at 6:00 AM, Lukáš Vlček wrote:

> Hi,
> 
> thanks for inputs so far... however, let's put it this way:
> 
> When you need to search for something Lucene or Solr related, which one do
> you use:
> - generic Google
> - go to a particular mail list web site and search from here (if there is
> any search form at all)
> - go to LucidImagination.com and use its search capability
> 
> Regards,
> Lukas
> 
> 
> On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg <an...@gmail.com>wrote:
> 
>> 
>> 
>> Lukáš Vlček wrote:
>>> 
>>> I am looking for good arguments to justify implementation a search for
>>> sites
>>> which are available on the public internet. There are many sites in
>>> "powered
>>> by Solr" section which are indexed by Google and other search engines but
>>> still they decided to invest resources into building and maintenance of
>>> their own search functionality and not to go with [user_query site:
>>> my_site.com] google search. Why?
>>> 
>> 
>> You're assuming that Solr is just used in these cases to index discrete web
>> pages which Google etc. would be able to access via following navigational
>> links.
>> 
>> I would imagine that in a lot of cases, Solr is used to index database
>> entities which are used to build [parts of] pages dynamically, and which
>> might be viewable in different forms in various different pages.
>> 
>> Plus, with stored fields, you have the option of actually driving a website
>> off Solr instead of directly off a database, which might make sense from a
>> speed perspective in some cases.
>> 
>> And further, going back to page-only indexing -- you have no guarantee when
>> Google will decide to recrawl your site, so there may be a delay before
>> changes show up in their index. With an in-house search engine you can
>> reindex as often as you like.
>> 
>> Andrew.
>> 
>> --
>> View this message in context:
>> http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 


Re: Arguments for Solr implementation at public web site

Posted by Lukáš Vlček <lu...@gmail.com>.
Hi,

thanks for inputs so far... however, let's put it this way:

When you need to search for something Lucene or Solr related, which one do
you use:
- generic Google
- go to a particular mail list web site and search from here (if there is
any search form at all)
- go to LucidImagination.com and use its search capability

Regards,
Lukas


On Fri, Nov 13, 2009 at 11:50 AM, Andrew Clegg <an...@gmail.com>wrote:

>
>
> Lukáš Vlček wrote:
> >
> > I am looking for good arguments to justify implementation a search for
> > sites
> > which are available on the public internet. There are many sites in
> > "powered
> > by Solr" section which are indexed by Google and other search engines but
> > still they decided to invest resources into building and maintenance of
> > their own search functionality and not to go with [user_query site:
> > my_site.com] google search. Why?
> >
>
> You're assuming that Solr is just used in these cases to index discrete web
> pages which Google etc. would be able to access via following navigational
> links.
>
> I would imagine that in a lot of cases, Solr is used to index database
> entities which are used to build [parts of] pages dynamically, and which
> might be viewable in different forms in various different pages.
>
> Plus, with stored fields, you have the option of actually driving a website
> off Solr instead of directly off a database, which might make sense from a
> speed perspective in some cases.
>
> And further, going back to page-only indexing -- you have no guarantee when
> Google will decide to recrawl your site, so there may be a delay before
> changes show up in their index. With an in-house search engine you can
> reindex as often as you like.
>
> Andrew.
>
> --
> View this message in context:
> http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Arguments for Solr implementation at public web site

Posted by Andrew Clegg <an...@gmail.com>.

Lukáš Vlček wrote:
> 
> I am looking for good arguments to justify implementation a search for
> sites
> which are available on the public internet. There are many sites in
> "powered
> by Solr" section which are indexed by Google and other search engines but
> still they decided to invest resources into building and maintenance of
> their own search functionality and not to go with [user_query site:
> my_site.com] google search. Why?
> 

You're assuming that Solr is just used in these cases to index discrete web
pages which Google etc. would be able to access via following navigational
links.

I would imagine that in a lot of cases, Solr is used to index database
entities which are used to build [parts of] pages dynamically, and which
might be viewable in different forms in various different pages.

Plus, with stored fields, you have the option of actually driving a website
off Solr instead of directly off a database, which might make sense from a
speed perspective in some cases.

And further, going back to page-only indexing -- you have no guarantee when
Google will decide to recrawl your site, so there may be a delay before
changes show up in their index. With an in-house search engine you can
reindex as often as you like.

Andrew.

-- 
View this message in context: http://old.nabble.com/Arguments-for-Solr-implementation-at-public-web-site-tp26333987p26334734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arguments for Solr implementation at public web site

Posted by "Markus Jelsma - Buyways B.V." <ma...@buyways.nl>.
Next to the faceting engine:
- MoreLikeThis
- Highlighting
- Spellchecker

But also more flexible querying using the DisMax handler which is
clearly superior. Solr can also be used to store data which can be
retrieved in an instant! We have used this technique in a site and it is
obviously much faster than multiple large and complex SQL statements.


On Fri, 2009-11-13 at 10:52 +0100, Lukáš Vlček wrote:

> pros:
> - considering we understand the content (we understand the domain scope) we
> can fine tune the search engine to provide more accurate results
> - Solr can give us facets
> - we have user search logs (valuable for analysis)
> - implementing Solr is a fun
> 
> cons:
> - requires resources (but the cost is relatively low depending on the query
> traffic, index size and frequency of updates)
> 
> Regards,
> Lukas
> 
> http://blog.lukas-vlcek.com/