You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fuad Efendi <fu...@efendi.ca> on 2009/09/01 22:29:27 UTC

SOLR vs SQL

RE: http://www.mysecondhome.eu

 

 

I am browsing this website again (I have similar challenge at
http://www.casaGURU.com but still prefer database-SQL to search Professional
by service type)

 

I don't think SOLR is applicable in this specific case. I think standard DB
queries with predefined dropdown/radio values perform extremely faster than
SOLR faceting (you currently have only 90000 records) - database queries
have consistent response time without dependency on dataset size (especially
MySQL MyISAM "SELECT COUNT(*)"); SOLR depends on dataset size.

 

SOLR is applicable if we are using at least full-text search (for instance,
search for "Jack London" may return house owned by Jack London in Australia,
and house at Jack Square in London, and etc.); if we are interested in
non-tokenized attributes only (putting heavy constraints on possible query
types,without _any_ full-text): database. 

 

 


RE: SOLR vs SQL

Posted by Fuad Efendi <fu...@efendi.ca>.
Hi gwk,

Thanks for reply!

Yes, SOLR gives out-of-the-box
- indexes
- implicit data normalization
- fault-tolerance, replication, scalability
- performance

(so that we can save _huge_ money & time)

But from just an engineering viewpoint, forgetting cost&time, SELECT
COUNT(*) ... WHERE ... seems to be faster than SOLR faceting on generic
queries like id:[* TO *] (I believe SOLR still counts terms / intersects
DocSets 'from scratch' even for queries returning _all_ documents in index,
and even for filtered by specific term value...)

For queries returning unpredictable subset of documents ('full-text', where
regular database index can't be used) SOLR faceting is the only solution
(database can't do that...)

Of course, development time and reduced cost is very important...

And, agree, if we have normalized schema with 20 parent 'attributes' and
single child 'entity' writing dynamic filter in SQL on 0 to 20 attribute
values (and maintaining 40 indexes) may take some costly effort and
skills...



> Hi Fuad,
> 
> The search box is only used for geographical search, i.e.
> country/region/city searches. The watermark on the homepage indicates
> this but the "search again" box on the search results page does not,
> I'll see if we can fix that.
> 
> We use Solr not so much for the searchbox, which to be honest was an
> afterthought. But we do use Solr for faceting. Honestly, the thought of
> writing an SQL query which calculates all these facet counts every time
> a search parameter is changes gives me a headache, I don't think it's
> possible to do it in one query (although maybe, but I don't think
> anybody would want to maintain it). As for performance, every nontrivial
> database/search engine is affected by dataset for all but the simplest
> queries, and in my tests Solr trumps Mysql by a huge margin for our use
> case. We use a database to store our data in a somewhat normalized way,
> which is good for data consistency, but not so good for retrieval
> speeds. This is what makes Solr so useful for us, we can index all data
> in denormalized form with all data for a property in one record. While
> the (sql) database remains authoritative
> 
> Full-text search is only one part of Solr, while an important part it
> isn't the only reason for using Solr. In our case, since we provide
> support for multiple language we try not to store textual descriptions
> but every facet a property can have. This gives us exactly the data
> needed to perform faceting but not so much on the full text search
> (which is used mind you, to find suggestions when you use the search box).
> 
> Regards,
> 
> gwk



RE: SOLR vs SQL

Posted by Fuad Efendi <fu...@efendi.ca>.
> Just execute 20 SQL queries with filters........

> Same with SOLR vs. Lucene, standard Lucene queries "filter1:value1 AND
> facet2:value2" ... "filter1:value1 AND facet2:value99" are functionally
the
> same as SOLR faceting (99 docset intersections in RAM) and (sooner or
later)
> implementation and performance will be similar very close (Lucene vs.
> SOLR)...



And here I am totally mistaken, of course SOLR gives you it out-of-the-box
(you don't have to write 99 Lucene queries if your simple attribute has 99
different values); but for SQL, you will have to... it seems possible only
in theory; you really need to be master in SQL to write fast-performing
queries with 'GROUP BY' clauses, and analyze execution plan, etc... :(((


SOLR gives it out-of-the-box.

Thanks!



RE: SOLR vs SQL

Posted by Fuad Efendi <fu...@efendi.ca>.
> This article explains in-depth why calculating facets is not practical in
> pure SQL: http://www.kimbly.com/blog/000239.html
-> "The problem is that SQL isn't really capable of expressing set
intersections."


But this article is not applicable to described use case: we are
_faceting_on_filtered_query_results_only_, and filters are defined by facets
(same as).

- no need to count docset intersections in this specific use case. Just
execute 20 SQL queries with filters (if you have 20 'foreign keys' for an
entity). Performance is like [log(n)] where n is index size... with SOLR, it
is linear, it is not log(n)...

Same with SOLR vs. Lucene, standard Lucene queries "filter1:value1 AND
facet2:value2" ... "filter1:value1 AND facet2:value99" are functionally the
same as SOLR faceting (99 docset intersections in RAM) and (sooner or later)
implementation and performance will be similar very close (Lucene vs.
SOLR)...




Re: SOLR vs SQL

Posted by Mauricio Scheffer <ma...@gmail.com>.
This article explains in-depth why calculating facets is not practical in
pure SQL: http://www.kimbly.com/blog/000239.html

Cheers,
Mauricio

On Wed, Sep 2, 2009 at 5:30 AM, gwk <gi...@eyefi.nl> wrote:

> Fuad Efendi wrote:
>
>> "No results found for 'surface area 377', displaying all properties."
>> - why do we need SOLR then...
>>
>>
>>
>>
>>
> Hi Fuad,
>
> The search box is only used for geographical search, i.e.
> country/region/city searches. The watermark on the homepage indicates this
> but the "search again" box on the search results page does not, I'll see if
> we can fix that.
>
> We use Solr not so much for the searchbox, which to be honest was an
> afterthought. But we do use Solr for faceting. Honestly, the thought of
> writing an SQL query which calculates all these facet counts every time a
> search parameter is changes gives me a headache, I don't think it's possible
> to do it in one query (although maybe, but I don't think anybody would want
> to maintain it). As for performance, every nontrivial database/search engine
> is affected by dataset for all but the simplest queries, and in my tests
> Solr trumps Mysql by a huge margin for our use case. We use a database to
> store our data in a somewhat normalized way, which is good for data
> consistency, but not so good for retrieval speeds. This is what makes Solr
> so useful for us, we can index all data in denormalized form with all data
> for a property in one record. While the (sql) database remains authoritative
>
> Full-text search is only one part of Solr, while an important part it isn't
> the only reason for using Solr. In our case, since we provide support for
> multiple language we try not to store textual descriptions but every facet a
> property can have. This gives us exactly the data needed to perform faceting
> but not so much on the full text search (which is used mind you, to find
> suggestions when you use the search box).
>
> Regards,
>
> gwk
>

Re: SOLR vs SQL

Posted by gwk <gi...@eyefi.nl>.
Fuad Efendi wrote:
> "No results found for 'surface area 377', displaying all properties."
> - why do we need SOLR then...
>
>
>
>   
Hi Fuad,

The search box is only used for geographical search, i.e. 
country/region/city searches. The watermark on the homepage indicates 
this but the "search again" box on the search results page does not, 
I'll see if we can fix that.

We use Solr not so much for the searchbox, which to be honest was an 
afterthought. But we do use Solr for faceting. Honestly, the thought of 
writing an SQL query which calculates all these facet counts every time 
a search parameter is changes gives me a headache, I don't think it's 
possible to do it in one query (although maybe, but I don't think 
anybody would want to maintain it). As for performance, every nontrivial 
database/search engine is affected by dataset for all but the simplest 
queries, and in my tests Solr trumps Mysql by a huge margin for our use 
case. We use a database to store our data in a somewhat normalized way, 
which is good for data consistency, but not so good for retrieval 
speeds. This is what makes Solr so useful for us, we can index all data 
in denormalized form with all data for a property in one record. While 
the (sql) database remains authoritative

Full-text search is only one part of Solr, while an important part it 
isn't the only reason for using Solr. In our case, since we provide 
support for multiple language we try not to store textual descriptions 
but every facet a property can have. This gives us exactly the data 
needed to perform faceting but not so much on the full text search 
(which is used mind you, to find suggestions when you use the search box).

Regards,

gwk

RE: SOLR vs SQL

Posted by Fuad Efendi <fu...@efendi.ca>.
"No results found for 'surface area 377', displaying all properties."
- why do we need SOLR then...