You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by "Bayer, Samuel" <sa...@mitre.org> on 2022/03/04 14:32:51 UTC

Looking for expertise on comparing Solr search to Postgres full-text search

Hi all -

In the interest of reducing my technology stack, I'm exploring whether using Postgres full-text search instead of Solr might be an option when I need both complex querying and full-text search. In my experience, so far, Postgres can't compare to Solr, but I'm trying to understand why, in order to have more of an ability to evaluate the functionality/complexity tradeoffs. I know something about search technologies, but I'm not an expert by any stretch of the imagination, and I've been looking for sources that talk about the comparison in an informed way - people, blogs, articles. So far, everything I've found is extremely basic. Does anyone have any pointers for me?

Thanks in advance -
Sam Bayer
The MITRE Corporation
sam@mitre.org

Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Eric Pugh <ep...@opensourceconnections.com>.
What I’ve done to compare other search engines with RRE and Quepid is to put a proxy in the middle that converts your query into what looks like a Solr request/response ;-).  This works great for custom Search API’s, and I *guess* you could do it with database backed search?

Now we are probably getting beyond what Sam was hoping to do!  




> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <a....@sease.io> wrote:
> 
> This is an interesting question.
> I second both comments so far (from Eric and David), but I am afraid at the
> moment the open-source tools for search quality evaluation can't really
> compare Postgres to Solr.
> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
> https://github.com/SeaseLtd/rated-ranking-evaluator and also the Enterprise
> version) are able to compare only Apache Solr and Elasticsearch backed
> systems (against each other, or against different configurations).
> 
> In general, I would recommend following David's suggestions:
> - collect your requirements(both functional and performance-wise)
> - compare
> 
> I have seen in the past many times DB used as terrible search engines and
> search engines used as terrible DB.
> Many times I have seen queries on a search engine to perform poorly because
> they were designed as they were DB queries.
> 
> Cheers
> 
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr PMC member and Committer
> Director, R&D Software Engineer, Search Consultant
> 
> www.sease.io
> 
> 
> On Sat, 5 Mar 2022 at 05:04, David Smiley <ds...@apache.org> wrote:
> 
>> Hello Sam,
>> 
>> You are a familiar name from my MITRE days :-)
>> 
>> Check out Solr's feature list and see how it compares to that of Postgres.
>> If you are only doing the most basic default relevancy ranked top-N search
>> with default text analysis, then the tech/maintenance overhead might not be
>> worth it.  I'm looking at this as such an example:
>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
>> 
>> On the other hand, if you want to ensure that you're able to make search
>> the best it can be for your users, then keeping Solr and using it more will
>> get you there; a database won't.  To a database, full-text-search is just
>> one checkbox of many concerns.  The capabilities there are usually very
>> simple.  It's fine for a demo/POC -- getting started.
>> 
>> One feature in particular I want to call out is faceting.  To some apps,
>> it's a game changer that can pivot the UX from merely having a basic search
>> box to having navigation filters and everything else, at which point Solr
>> is the foundation of what's driving the UX.  I've seen people/apps miss
>> this -- the user experience is so clumsy without it for rich/structured
>> data in particular.  If you've ever used a Maven repository manager like
>> Nexus or it's competitors (last I checked), they are still stuck in the
>> stone-age -- it's painful when you've been exposed to so much better.  On
>> the backend, if all you know is a database, you may not see how to make a
>> faceting UI work because it's rather unnatural for SQL.
>> 
>> Eric's response was great too.
>> 
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>> 
>> 
>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:
>> 
>>> Hi all -
>>> 
>>> In the interest of reducing my technology stack, I'm exploring whether
>>> using Postgres full-text search instead of Solr might be an option when I
>>> need both complex querying and full-text search. In my experience, so
>> far,
>>> Postgres can't compare to Solr, but I'm trying to understand why, in
>> order
>>> to have more of an ability to evaluate the functionality/complexity
>>> tradeoffs. I know something about search technologies, but I'm not an
>>> expert by any stretch of the imagination, and I've been looking for
>> sources
>>> that talk about the comparison in an informed way - people, blogs,
>>> articles. So far, everything I've found is extremely basic. Does anyone
>>> have any pointers for me?
>>> 
>>> Thanks in advance -
>>> Sam Bayer
>>> The MITRE Corporation
>>> sam@mitre.org
>>> 
>> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Charlie Hull <ch...@opensourceconnections.com>.
Hi,

Sort of. You can make Quepid talk to other search engines with a 'shim' 
layer to make the response look like a Solr or Elasticsearch response. 
Quepid can send pretty much anything to a HTTP API as the query. There's 
a project called iSpy that is a prototype for this but I think it's 
currently in a private repo - perhaps Eric will be along in a sec to let 
us know if it's available.

So in theory you could do this comparison with Quepid.

Cheers

Charlie

On 17/03/2022 15:56, Alessandro Benedetti wrote:
> This is an interesting question.
> I second both comments so far (from Eric and David), but I am afraid at the
> moment the open-source tools for search quality evaluation can't really
> compare Postgres to Solr.
> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
> https://github.com/SeaseLtd/rated-ranking-evaluator  and also the Enterprise
> version) are able to compare only Apache Solr and Elasticsearch backed
> systems (against each other, or against different configurations).
>
> In general, I would recommend following David's suggestions:
> - collect your requirements(both functional and performance-wise)
> - compare
>
> I have seen in the past many times DB used as terrible search engines and
> search engines used as terrible DB.
> Many times I have seen queries on a search engine to perform poorly because
> they were designed as they were DB queries.
>
> Cheers
>
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr PMC member and Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Sat, 5 Mar 2022 at 05:04, David Smiley<ds...@apache.org>  wrote:
>
>> Hello Sam,
>>
>> You are a familiar name from my MITRE days :-)
>>
>> Check out Solr's feature list and see how it compares to that of Postgres.
>> If you are only doing the most basic default relevancy ranked top-N search
>> with default text analysis, then the tech/maintenance overhead might not be
>> worth it.  I'm looking at this as such an example:
>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
>>
>> On the other hand, if you want to ensure that you're able to make search
>> the best it can be for your users, then keeping Solr and using it more will
>> get you there; a database won't.  To a database, full-text-search is just
>> one checkbox of many concerns.  The capabilities there are usually very
>> simple.  It's fine for a demo/POC -- getting started.
>>
>> One feature in particular I want to call out is faceting.  To some apps,
>> it's a game changer that can pivot the UX from merely having a basic search
>> box to having navigation filters and everything else, at which point Solr
>> is the foundation of what's driving the UX.  I've seen people/apps miss
>> this -- the user experience is so clumsy without it for rich/structured
>> data in particular.  If you've ever used a Maven repository manager like
>> Nexus or it's competitors (last I checked), they are still stuck in the
>> stone-age -- it's painful when you've been exposed to so much better.  On
>> the backend, if all you know is a database, you may not see how to make a
>> faceting UI work because it's rather unnatural for SQL.
>>
>> Eric's response was great too.
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel<sa...@mitre.org>  wrote:
>>
>>> Hi all -
>>>
>>> In the interest of reducing my technology stack, I'm exploring whether
>>> using Postgres full-text search instead of Solr might be an option when I
>>> need both complex querying and full-text search. In my experience, so
>> far,
>>> Postgres can't compare to Solr, but I'm trying to understand why, in
>> order
>>> to have more of an ability to evaluate the functionality/complexity
>>> tradeoffs. I know something about search technologies, but I'm not an
>>> expert by any stretch of the imagination, and I've been looking for
>> sources
>>> that talk about the comparison in an informed way - people, blogs,
>>> articles. So far, everything I've found is extremely basic. Does anyone
>>> have any pointers for me?
>>>
>>> Thanks in advance -
>>> Sam Bayer
>>> The MITRE Corporation
>>> sam@mitre.org
>>>
-- 
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network <http://www.thesearchnetwork.com> 
and co-author of Searching the Enterprise 
<https://opensourceconnections.com/wp-content/uploads/2020/08/ES_book_final_journal_version.pdf>
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II

-- 
This email has been checked for viruses by AVG.
https://www.avg.com

Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-17 3:41 PM, Dave wrote:
> I’m a big believer in the right tool for the job.  Like what said before if you’re doing just a field:value query or four and no complications, sure use a standard rdbms. But if you inform the client that something like
> Leaves And whitm* title^3 with bf:title^3 author ^2
> Is possible, the conversation changes with the right questions.
> 

Scale also matters. A "one huge text field" in postgres will work well 
enough given infinite RAM and CPU cycles. We actually tried that once, 
and ended up dumping that into redis instead: even for our fairly small 
database the wall clock time from button click to search results page 
wasn't great.

Dima

Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Dave <ha...@gmail.com>.
I’m a big believer in the right tool for the job.  Like what said before if you’re doing just a field:value query or four and no complications, sure use a standard rdbms. But if you inform the client that something like 
Leaves And whitm* title^3 with bf:title^3 author ^2 
Is possible, the conversation changes with the right questions.   

> On Mar 17, 2022, at 3:17 PM, Davis, Daniel (NIH/NLM) [C] <da...@nih.gov.invalid> wrote:
> 
> This is really a question of how big the haystack is and what sort of search task users are trying to accomplish.
> 
> If there is no IDF (a mistake I did *not* make at https://www.indexengines.com/ despite using home-grown search BTW), then there is an assumption both on the size of the documents being similar and also on corpora linguistics.
> 
> In any case, if users are basically doing "Known Item Search", e.g. entering in keywords from a title, then PostgreSQL should do OK.
> 
> On 3/17/22, 1:34 PM, "Alessandro Benedetti" <a....@sease.io> wrote:
> 
>    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
> 
> 
>    Ok Charlie, Eric,
>    we are on the same page.
>    I agree it's definitely possible with some custom proxy work on both Quepid
>    and RRE, I meant it's not possible to directly point to the DB (for example
>    via JDBC).
>    Thanks!
> 
>    Cheers
>    --------------------------
>    Alessandro Benedetti
>    Apache Lucene/Solr PMC member and Committer
>    Director, R&D Software Engineer, Search Consultant
> 
>    www.sease.io
> 
> 
>>    On Thu, 17 Mar 2022 at 17:03, Bayer, Samuel <sa...@mitre.org> wrote:
>> 
>> You are, indeed :-).
>> 
>> What appears to be the problem - and I'm not sure yet, but it sure seems
>> like a good culprit - is that Postgres search, for reasons that mystify me,
>> was implemented with TF but no notion of IDF. There are various extensions
>> that add IDF-like properties to Postgres search. Why it didn't start out
>> that way is a mystery to me, and I don't know how stable any of the
>> extensions that do this actually are.
>> 
>> At the moment, that's my diagnosis of the discrepancy. I'll probably
>> follow up with the Postgres folks to see if they have any more insight into
>> those extensions.
>> 
>> Thanks to all who responded.
>> 
>> Cordially,
>> Sam Bayer
>> The MITRE Corporation
>> 
>>> On 3/17/22 12:42 PM, Eric Pugh wrote:
>>> What I’ve done to compare other search engines with RRE and Quepid is to
>> put a proxy in the middle that converts your query into what looks like a
>> Solr request/response ;-).  This works great for custom Search API’s, and I
>> *guess* you could do it with database backed search?
>>> 
>>> Now we are probably getting beyond what Sam was hoping to do!
>>> 
>>> 
>>> 
>>> 
>>>> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <
>> a.benedetti@sease.io> wrote:
>>>> 
>>>> This is an interesting question.
>>>> I second both comments so far (from Eric and David), but I am afraid at
>> the
>>>> moment the open-source tools for search quality evaluation can't really
>>>> compare Postgres to Solr.
>>>> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
>>>> https://github.com/SeaseLtd/rated-ranking-evaluator and also the
>> Enterprise
>>>> version) are able to compare only Apache Solr and Elasticsearch backed
>>>> systems (against each other, or against different configurations).
>>>> 
>>>> In general, I would recommend following David's suggestions:
>>>> - collect your requirements(both functional and performance-wise)
>>>> - compare
>>>> 
>>>> I have seen in the past many times DB used as terrible search engines
>> and
>>>> search engines used as terrible DB.
>>>> Many times I have seen queries on a search engine to perform poorly
>> because
>>>> they were designed as they were DB queries.
>>>> 
>>>> Cheers
>>>> 
>>>> --------------------------
>>>> Alessandro Benedetti
>>>> Apache Lucene/Solr PMC member and Committer
>>>> Director, R&D Software Engineer, Search Consultant
>>>> 
>>>> www.sease.io
>>>> 
>>>> 
>>>> On Sat, 5 Mar 2022 at 05:04, David Smiley <ds...@apache.org> wrote:
>>>> 
>>>>> Hello Sam,
>>>>> 
>>>>> You are a familiar name from my MITRE days :-)
>>>>> 
>>>>> Check out Solr's feature list and see how it compares to that of
>> Postgres.
>>>>> If you are only doing the most basic default relevancy ranked top-N
>> search
>>>>> with default text analysis, then the tech/maintenance overhead might
>> not be
>>>>> worth it.  I'm looking at this as such an example:
>>>>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
>>>>> 
>>>>> On the other hand, if you want to ensure that you're able to make
>> search
>>>>> the best it can be for your users, then keeping Solr and using it more
>> will
>>>>> get you there; a database won't.  To a database, full-text-search is
>> just
>>>>> one checkbox of many concerns.  The capabilities there are usually very
>>>>> simple.  It's fine for a demo/POC -- getting started.
>>>>> 
>>>>> One feature in particular I want to call out is faceting.  To some
>> apps,
>>>>> it's a game changer that can pivot the UX from merely having a basic
>> search
>>>>> box to having navigation filters and everything else, at which point
>> Solr
>>>>> is the foundation of what's driving the UX.  I've seen people/apps miss
>>>>> this -- the user experience is so clumsy without it for rich/structured
>>>>> data in particular.  If you've ever used a Maven repository manager
>> like
>>>>> Nexus or it's competitors (last I checked), they are still stuck in the
>>>>> stone-age -- it's painful when you've been exposed to so much better.
>> On
>>>>> the backend, if all you know is a database, you may not see how to
>> make a
>>>>> faceting UI work because it's rather unnatural for SQL.
>>>>> 
>>>>> Eric's response was great too.
>>>>> 
>>>>> ~ David Smiley
>>>>> Apache Lucene/Solr Search Developer
>>>>> http://www.linkedin.com/in/davidwsmiley
>>>>> 
>>>>> 
>>>>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:
>>>>> 
>>>>>> Hi all -
>>>>>> 
>>>>>> In the interest of reducing my technology stack, I'm exploring whether
>>>>>> using Postgres full-text search instead of Solr might be an option
>> when I
>>>>>> need both complex querying and full-text search. In my experience, so
>>>>> far,
>>>>>> Postgres can't compare to Solr, but I'm trying to understand why, in
>>>>> order
>>>>>> to have more of an ability to evaluate the functionality/complexity
>>>>>> tradeoffs. I know something about search technologies, but I'm not an
>>>>>> expert by any stretch of the imagination, and I've been looking for
>>>>> sources
>>>>>> that talk about the comparison in an informed way - people, blogs,
>>>>>> articles. So far, everything I've found is extremely basic. Does
>> anyone
>>>>>> have any pointers for me?
>>>>>> 
>>>>>> Thanks in advance -
>>>>>> Sam Bayer
>>>>>> The MITRE Corporation
>>>>>> sam@mitre.org
>>>>>> 
>>>>> 
>>> 
>>> _______________________
>>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/> | My Free/Busy <
>> http://tinyurl.com/eric-cal>
>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>> 
>>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>>> 
>>> 
>> 
> 

Re: [EXTERNAL] Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov.INVALID>.
This is really a question of how big the haystack is and what sort of search task users are trying to accomplish.

If there is no IDF (a mistake I did *not* make at https://www.indexengines.com/ despite using home-grown search BTW), then there is an assumption both on the size of the documents being similar and also on corpora linguistics.

In any case, if users are basically doing "Known Item Search", e.g. entering in keywords from a title, then PostgreSQL should do OK.

On 3/17/22, 1:34 PM, "Alessandro Benedetti" <a....@sease.io> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.


    Ok Charlie, Eric,
    we are on the same page.
    I agree it's definitely possible with some custom proxy work on both Quepid
    and RRE, I meant it's not possible to directly point to the DB (for example
    via JDBC).
    Thanks!

    Cheers
    --------------------------
    Alessandro Benedetti
    Apache Lucene/Solr PMC member and Committer
    Director, R&D Software Engineer, Search Consultant

    www.sease.io


    On Thu, 17 Mar 2022 at 17:03, Bayer, Samuel <sa...@mitre.org> wrote:

    > You are, indeed :-).
    >
    > What appears to be the problem - and I'm not sure yet, but it sure seems
    > like a good culprit - is that Postgres search, for reasons that mystify me,
    > was implemented with TF but no notion of IDF. There are various extensions
    > that add IDF-like properties to Postgres search. Why it didn't start out
    > that way is a mystery to me, and I don't know how stable any of the
    > extensions that do this actually are.
    >
    > At the moment, that's my diagnosis of the discrepancy. I'll probably
    > follow up with the Postgres folks to see if they have any more insight into
    > those extensions.
    >
    > Thanks to all who responded.
    >
    > Cordially,
    > Sam Bayer
    > The MITRE Corporation
    >
    > On 3/17/22 12:42 PM, Eric Pugh wrote:
    > > What I’ve done to compare other search engines with RRE and Quepid is to
    > put a proxy in the middle that converts your query into what looks like a
    > Solr request/response ;-).  This works great for custom Search API’s, and I
    > *guess* you could do it with database backed search?
    > >
    > > Now we are probably getting beyond what Sam was hoping to do!
    > >
    > >
    > >
    > >
    > >> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <
    > a.benedetti@sease.io> wrote:
    > >>
    > >> This is an interesting question.
    > >> I second both comments so far (from Eric and David), but I am afraid at
    > the
    > >> moment the open-source tools for search quality evaluation can't really
    > >> compare Postgres to Solr.
    > >> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
    > >> https://github.com/SeaseLtd/rated-ranking-evaluator and also the
    > Enterprise
    > >> version) are able to compare only Apache Solr and Elasticsearch backed
    > >> systems (against each other, or against different configurations).
    > >>
    > >> In general, I would recommend following David's suggestions:
    > >> - collect your requirements(both functional and performance-wise)
    > >> - compare
    > >>
    > >> I have seen in the past many times DB used as terrible search engines
    > and
    > >> search engines used as terrible DB.
    > >> Many times I have seen queries on a search engine to perform poorly
    > because
    > >> they were designed as they were DB queries.
    > >>
    > >> Cheers
    > >>
    > >> --------------------------
    > >> Alessandro Benedetti
    > >> Apache Lucene/Solr PMC member and Committer
    > >> Director, R&D Software Engineer, Search Consultant
    > >>
    > >> www.sease.io
    > >>
    > >>
    > >> On Sat, 5 Mar 2022 at 05:04, David Smiley <ds...@apache.org> wrote:
    > >>
    > >>> Hello Sam,
    > >>>
    > >>> You are a familiar name from my MITRE days :-)
    > >>>
    > >>> Check out Solr's feature list and see how it compares to that of
    > Postgres.
    > >>> If you are only doing the most basic default relevancy ranked top-N
    > search
    > >>> with default text analysis, then the tech/maintenance overhead might
    > not be
    > >>> worth it.  I'm looking at this as such an example:
    > >>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
    > >>>
    > >>> On the other hand, if you want to ensure that you're able to make
    > search
    > >>> the best it can be for your users, then keeping Solr and using it more
    > will
    > >>> get you there; a database won't.  To a database, full-text-search is
    > just
    > >>> one checkbox of many concerns.  The capabilities there are usually very
    > >>> simple.  It's fine for a demo/POC -- getting started.
    > >>>
    > >>> One feature in particular I want to call out is faceting.  To some
    > apps,
    > >>> it's a game changer that can pivot the UX from merely having a basic
    > search
    > >>> box to having navigation filters and everything else, at which point
    > Solr
    > >>> is the foundation of what's driving the UX.  I've seen people/apps miss
    > >>> this -- the user experience is so clumsy without it for rich/structured
    > >>> data in particular.  If you've ever used a Maven repository manager
    > like
    > >>> Nexus or it's competitors (last I checked), they are still stuck in the
    > >>> stone-age -- it's painful when you've been exposed to so much better.
    > On
    > >>> the backend, if all you know is a database, you may not see how to
    > make a
    > >>> faceting UI work because it's rather unnatural for SQL.
    > >>>
    > >>> Eric's response was great too.
    > >>>
    > >>> ~ David Smiley
    > >>> Apache Lucene/Solr Search Developer
    > >>> http://www.linkedin.com/in/davidwsmiley
    > >>>
    > >>>
    > >>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:
    > >>>
    > >>>> Hi all -
    > >>>>
    > >>>> In the interest of reducing my technology stack, I'm exploring whether
    > >>>> using Postgres full-text search instead of Solr might be an option
    > when I
    > >>>> need both complex querying and full-text search. In my experience, so
    > >>> far,
    > >>>> Postgres can't compare to Solr, but I'm trying to understand why, in
    > >>> order
    > >>>> to have more of an ability to evaluate the functionality/complexity
    > >>>> tradeoffs. I know something about search technologies, but I'm not an
    > >>>> expert by any stretch of the imagination, and I've been looking for
    > >>> sources
    > >>>> that talk about the comparison in an informed way - people, blogs,
    > >>>> articles. So far, everything I've found is extremely basic. Does
    > anyone
    > >>>> have any pointers for me?
    > >>>>
    > >>>> Thanks in advance -
    > >>>> Sam Bayer
    > >>>> The MITRE Corporation
    > >>>> sam@mitre.org
    > >>>>
    > >>>
    > >
    > > _______________________
    > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
    > http://www.opensourceconnections.com <
    > http://www.opensourceconnections.com/> | My Free/Busy <
    > http://tinyurl.com/eric-cal>
    > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
    > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    >
    > > This e-mail and all contents, including attachments, is considered to be
    > Company Confidential unless explicitly stated otherwise, regardless of
    > whether attachments are marked as such.
    > >
    > >
    >


Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Alessandro Benedetti <a....@sease.io>.
Ok Charlie, Eric,
we are on the same page.
I agree it's definitely possible with some custom proxy work on both Quepid
and RRE, I meant it's not possible to directly point to the DB (for example
via JDBC).
Thanks!

Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr PMC member and Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Thu, 17 Mar 2022 at 17:03, Bayer, Samuel <sa...@mitre.org> wrote:

> You are, indeed :-).
>
> What appears to be the problem - and I'm not sure yet, but it sure seems
> like a good culprit - is that Postgres search, for reasons that mystify me,
> was implemented with TF but no notion of IDF. There are various extensions
> that add IDF-like properties to Postgres search. Why it didn't start out
> that way is a mystery to me, and I don't know how stable any of the
> extensions that do this actually are.
>
> At the moment, that's my diagnosis of the discrepancy. I'll probably
> follow up with the Postgres folks to see if they have any more insight into
> those extensions.
>
> Thanks to all who responded.
>
> Cordially,
> Sam Bayer
> The MITRE Corporation
>
> On 3/17/22 12:42 PM, Eric Pugh wrote:
> > What I’ve done to compare other search engines with RRE and Quepid is to
> put a proxy in the middle that converts your query into what looks like a
> Solr request/response ;-).  This works great for custom Search API’s, and I
> *guess* you could do it with database backed search?
> >
> > Now we are probably getting beyond what Sam was hoping to do!
> >
> >
> >
> >
> >> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <
> a.benedetti@sease.io> wrote:
> >>
> >> This is an interesting question.
> >> I second both comments so far (from Eric and David), but I am afraid at
> the
> >> moment the open-source tools for search quality evaluation can't really
> >> compare Postgres to Solr.
> >> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
> >> https://github.com/SeaseLtd/rated-ranking-evaluator and also the
> Enterprise
> >> version) are able to compare only Apache Solr and Elasticsearch backed
> >> systems (against each other, or against different configurations).
> >>
> >> In general, I would recommend following David's suggestions:
> >> - collect your requirements(both functional and performance-wise)
> >> - compare
> >>
> >> I have seen in the past many times DB used as terrible search engines
> and
> >> search engines used as terrible DB.
> >> Many times I have seen queries on a search engine to perform poorly
> because
> >> they were designed as they were DB queries.
> >>
> >> Cheers
> >>
> >> --------------------------
> >> Alessandro Benedetti
> >> Apache Lucene/Solr PMC member and Committer
> >> Director, R&D Software Engineer, Search Consultant
> >>
> >> www.sease.io
> >>
> >>
> >> On Sat, 5 Mar 2022 at 05:04, David Smiley <ds...@apache.org> wrote:
> >>
> >>> Hello Sam,
> >>>
> >>> You are a familiar name from my MITRE days :-)
> >>>
> >>> Check out Solr's feature list and see how it compares to that of
> Postgres.
> >>> If you are only doing the most basic default relevancy ranked top-N
> search
> >>> with default text analysis, then the tech/maintenance overhead might
> not be
> >>> worth it.  I'm looking at this as such an example:
> >>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
> >>>
> >>> On the other hand, if you want to ensure that you're able to make
> search
> >>> the best it can be for your users, then keeping Solr and using it more
> will
> >>> get you there; a database won't.  To a database, full-text-search is
> just
> >>> one checkbox of many concerns.  The capabilities there are usually very
> >>> simple.  It's fine for a demo/POC -- getting started.
> >>>
> >>> One feature in particular I want to call out is faceting.  To some
> apps,
> >>> it's a game changer that can pivot the UX from merely having a basic
> search
> >>> box to having navigation filters and everything else, at which point
> Solr
> >>> is the foundation of what's driving the UX.  I've seen people/apps miss
> >>> this -- the user experience is so clumsy without it for rich/structured
> >>> data in particular.  If you've ever used a Maven repository manager
> like
> >>> Nexus or it's competitors (last I checked), they are still stuck in the
> >>> stone-age -- it's painful when you've been exposed to so much better.
> On
> >>> the backend, if all you know is a database, you may not see how to
> make a
> >>> faceting UI work because it's rather unnatural for SQL.
> >>>
> >>> Eric's response was great too.
> >>>
> >>> ~ David Smiley
> >>> Apache Lucene/Solr Search Developer
> >>> http://www.linkedin.com/in/davidwsmiley
> >>>
> >>>
> >>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:
> >>>
> >>>> Hi all -
> >>>>
> >>>> In the interest of reducing my technology stack, I'm exploring whether
> >>>> using Postgres full-text search instead of Solr might be an option
> when I
> >>>> need both complex querying and full-text search. In my experience, so
> >>> far,
> >>>> Postgres can't compare to Solr, but I'm trying to understand why, in
> >>> order
> >>>> to have more of an ability to evaluate the functionality/complexity
> >>>> tradeoffs. I know something about search technologies, but I'm not an
> >>>> expert by any stretch of the imagination, and I've been looking for
> >>> sources
> >>>> that talk about the comparison in an informed way - people, blogs,
> >>>> articles. So far, everything I've found is extremely basic. Does
> anyone
> >>>> have any pointers for me?
> >>>>
> >>>> Thanks in advance -
> >>>> Sam Bayer
> >>>> The MITRE Corporation
> >>>> sam@mitre.org
> >>>>
> >>>
> >
> > _______________________
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
> >
>

Re: [EXT] Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by "Bayer, Samuel" <sa...@mitre.org>.
You are, indeed :-).

What appears to be the problem - and I'm not sure yet, but it sure seems like a good culprit - is that Postgres search, for reasons that mystify me, was implemented with TF but no notion of IDF. There are various extensions that add IDF-like properties to Postgres search. Why it didn't start out that way is a mystery to me, and I don't know how stable any of the extensions that do this actually are.

At the moment, that's my diagnosis of the discrepancy. I'll probably follow up with the Postgres folks to see if they have any more insight into those extensions.

Thanks to all who responded.

Cordially,
Sam Bayer
The MITRE Corporation

On 3/17/22 12:42 PM, Eric Pugh wrote:
> What I’ve done to compare other search engines with RRE and Quepid is to put a proxy in the middle that converts your query into what looks like a Solr request/response ;-).  This works great for custom Search API’s, and I *guess* you could do it with database backed search?
> 
> Now we are probably getting beyond what Sam was hoping to do!
> 
> 
> 
> 
>> On Mar 17, 2022, at 11:56 AM, Alessandro Benedetti <a....@sease.io> wrote:
>>
>> This is an interesting question.
>> I second both comments so far (from Eric and David), but I am afraid at the
>> moment the open-source tools for search quality evaluation can't really
>> compare Postgres to Solr.
>> As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
>> https://github.com/SeaseLtd/rated-ranking-evaluator and also the Enterprise
>> version) are able to compare only Apache Solr and Elasticsearch backed
>> systems (against each other, or against different configurations).
>>
>> In general, I would recommend following David's suggestions:
>> - collect your requirements(both functional and performance-wise)
>> - compare
>>
>> I have seen in the past many times DB used as terrible search engines and
>> search engines used as terrible DB.
>> Many times I have seen queries on a search engine to perform poorly because
>> they were designed as they were DB queries.
>>
>> Cheers
>>
>> --------------------------
>> Alessandro Benedetti
>> Apache Lucene/Solr PMC member and Committer
>> Director, R&D Software Engineer, Search Consultant
>>
>> www.sease.io
>>
>>
>> On Sat, 5 Mar 2022 at 05:04, David Smiley <ds...@apache.org> wrote:
>>
>>> Hello Sam,
>>>
>>> You are a familiar name from my MITRE days :-)
>>>
>>> Check out Solr's feature list and see how it compares to that of Postgres.
>>> If you are only doing the most basic default relevancy ranked top-N search
>>> with default text analysis, then the tech/maintenance overhead might not be
>>> worth it.  I'm looking at this as such an example:
>>> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
>>>
>>> On the other hand, if you want to ensure that you're able to make search
>>> the best it can be for your users, then keeping Solr and using it more will
>>> get you there; a database won't.  To a database, full-text-search is just
>>> one checkbox of many concerns.  The capabilities there are usually very
>>> simple.  It's fine for a demo/POC -- getting started.
>>>
>>> One feature in particular I want to call out is faceting.  To some apps,
>>> it's a game changer that can pivot the UX from merely having a basic search
>>> box to having navigation filters and everything else, at which point Solr
>>> is the foundation of what's driving the UX.  I've seen people/apps miss
>>> this -- the user experience is so clumsy without it for rich/structured
>>> data in particular.  If you've ever used a Maven repository manager like
>>> Nexus or it's competitors (last I checked), they are still stuck in the
>>> stone-age -- it's painful when you've been exposed to so much better.  On
>>> the backend, if all you know is a database, you may not see how to make a
>>> faceting UI work because it's rather unnatural for SQL.
>>>
>>> Eric's response was great too.
>>>
>>> ~ David Smiley
>>> Apache Lucene/Solr Search Developer
>>> http://www.linkedin.com/in/davidwsmiley
>>>
>>>
>>> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:
>>>
>>>> Hi all -
>>>>
>>>> In the interest of reducing my technology stack, I'm exploring whether
>>>> using Postgres full-text search instead of Solr might be an option when I
>>>> need both complex querying and full-text search. In my experience, so
>>> far,
>>>> Postgres can't compare to Solr, but I'm trying to understand why, in
>>> order
>>>> to have more of an ability to evaluate the functionality/complexity
>>>> tradeoffs. I know something about search technologies, but I'm not an
>>>> expert by any stretch of the imagination, and I've been looking for
>>> sources
>>>> that talk about the comparison in an informed way - people, blogs,
>>>> articles. So far, everything I've found is extremely basic. Does anyone
>>>> have any pointers for me?
>>>>
>>>> Thanks in advance -
>>>> Sam Bayer
>>>> The MITRE Corporation
>>>> sam@mitre.org
>>>>
>>>
> 
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
> 
> 

Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Alessandro Benedetti <a....@sease.io>.
This is an interesting question.
I second both comments so far (from Eric and David), but I am afraid at the
moment the open-source tools for search quality evaluation can't really
compare Postgres to Solr.
As far as I know, both Quepid(Eric correct me if I am wrong) and RRE(
https://github.com/SeaseLtd/rated-ranking-evaluator and also the Enterprise
version) are able to compare only Apache Solr and Elasticsearch backed
systems (against each other, or against different configurations).

In general, I would recommend following David's suggestions:
- collect your requirements(both functional and performance-wise)
- compare

I have seen in the past many times DB used as terrible search engines and
search engines used as terrible DB.
Many times I have seen queries on a search engine to perform poorly because
they were designed as they were DB queries.

Cheers

--------------------------
Alessandro Benedetti
Apache Lucene/Solr PMC member and Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Sat, 5 Mar 2022 at 05:04, David Smiley <ds...@apache.org> wrote:

> Hello Sam,
>
> You are a familiar name from my MITRE days :-)
>
> Check out Solr's feature list and see how it compares to that of Postgres.
> If you are only doing the most basic default relevancy ranked top-N search
> with default text analysis, then the tech/maintenance overhead might not be
> worth it.  I'm looking at this as such an example:
> https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr
>
> On the other hand, if you want to ensure that you're able to make search
> the best it can be for your users, then keeping Solr and using it more will
> get you there; a database won't.  To a database, full-text-search is just
> one checkbox of many concerns.  The capabilities there are usually very
> simple.  It's fine for a demo/POC -- getting started.
>
> One feature in particular I want to call out is faceting.  To some apps,
> it's a game changer that can pivot the UX from merely having a basic search
> box to having navigation filters and everything else, at which point Solr
> is the foundation of what's driving the UX.  I've seen people/apps miss
> this -- the user experience is so clumsy without it for rich/structured
> data in particular.  If you've ever used a Maven repository manager like
> Nexus or it's competitors (last I checked), they are still stuck in the
> stone-age -- it's painful when you've been exposed to so much better.  On
> the backend, if all you know is a database, you may not see how to make a
> faceting UI work because it's rather unnatural for SQL.
>
> Eric's response was great too.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:
>
> > Hi all -
> >
> > In the interest of reducing my technology stack, I'm exploring whether
> > using Postgres full-text search instead of Solr might be an option when I
> > need both complex querying and full-text search. In my experience, so
> far,
> > Postgres can't compare to Solr, but I'm trying to understand why, in
> order
> > to have more of an ability to evaluate the functionality/complexity
> > tradeoffs. I know something about search technologies, but I'm not an
> > expert by any stretch of the imagination, and I've been looking for
> sources
> > that talk about the comparison in an informed way - people, blogs,
> > articles. So far, everything I've found is extremely basic. Does anyone
> > have any pointers for me?
> >
> > Thanks in advance -
> > Sam Bayer
> > The MITRE Corporation
> > sam@mitre.org
> >
>

Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by David Smiley <ds...@apache.org>.
Hello Sam,

You are a familiar name from my MITRE days :-)

Check out Solr's feature list and see how it compares to that of Postgres.
If you are only doing the most basic default relevancy ranked top-N search
with default text analysis, then the tech/maintenance overhead might not be
worth it.  I'm looking at this as such an example:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=solr

On the other hand, if you want to ensure that you're able to make search
the best it can be for your users, then keeping Solr and using it more will
get you there; a database won't.  To a database, full-text-search is just
one checkbox of many concerns.  The capabilities there are usually very
simple.  It's fine for a demo/POC -- getting started.

One feature in particular I want to call out is faceting.  To some apps,
it's a game changer that can pivot the UX from merely having a basic search
box to having navigation filters and everything else, at which point Solr
is the foundation of what's driving the UX.  I've seen people/apps miss
this -- the user experience is so clumsy without it for rich/structured
data in particular.  If you've ever used a Maven repository manager like
Nexus or it's competitors (last I checked), they are still stuck in the
stone-age -- it's painful when you've been exposed to so much better.  On
the backend, if all you know is a database, you may not see how to make a
faceting UI work because it's rather unnatural for SQL.

Eric's response was great too.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Mar 4, 2022 at 9:33 AM Bayer, Samuel <sa...@mitre.org> wrote:

> Hi all -
>
> In the interest of reducing my technology stack, I'm exploring whether
> using Postgres full-text search instead of Solr might be an option when I
> need both complex querying and full-text search. In my experience, so far,
> Postgres can't compare to Solr, but I'm trying to understand why, in order
> to have more of an ability to evaluate the functionality/complexity
> tradeoffs. I know something about search technologies, but I'm not an
> expert by any stretch of the imagination, and I've been looking for sources
> that talk about the comparison in an informed way - people, blogs,
> articles. So far, everything I've found is extremely basic. Does anyone
> have any pointers for me?
>
> Thanks in advance -
> Sam Bayer
> The MITRE Corporation
> sam@mitre.org
>

Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Eric Pugh <ep...@opensourceconnections.com>.
In general when evaluating a new option for my search engine, I like to take a set of queries that represent my users, with some head, and some tail queries.    Rate the results that you are getting, to get a concrete numbers on the quality of search.   Then, take that rated “judgement set” and point the same queries at your Postgres based search, and compare the differences that you are getting! 

There are a number of tools out there, though you can script it yourself too.   https://quepid.com/ is one that I’m involved with ;-).




> On Mar 4, 2022, at 9:32 AM, Bayer, Samuel <sa...@mitre.org> wrote:
> 
> Hi all -
> 
> In the interest of reducing my technology stack, I'm exploring whether using Postgres full-text search instead of Solr might be an option when I need both complex querying and full-text search. In my experience, so far, Postgres can't compare to Solr, but I'm trying to understand why, in order to have more of an ability to evaluate the functionality/complexity tradeoffs. I know something about search technologies, but I'm not an expert by any stretch of the imagination, and I've been looking for sources that talk about the comparison in an informed way - people, blogs, articles. So far, everything I've found is extremely basic. Does anyone have any pointers for me?
> 
> Thanks in advance -
> Sam Bayer
> The MITRE Corporation
> sam@mitre.org

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Looking for expertise on comparing Solr search to Postgres full-text search

Posted by Brad Burke <bb...@gmail.com>.
I hate to spam this entire list.. I have tried every automated way I know
to get off this list.  Can someone please unsubscribe me from the
users@solr.apach.org email list.

Again, I am very sorry to disturb your Saturday (or Sunday) with this mass
email.



On Fri, Mar 4, 2022 at 8:33 AM Bayer, Samuel <sa...@mitre.org> wrote:

> Hi all -
>
> In the interest of reducing my technology stack, I'm exploring whether
> using Postgres full-text search instead of Solr might be an option when I
> need both complex querying and full-text search. In my experience, so far,
> Postgres can't compare to Solr, but I'm trying to understand why, in order
> to have more of an ability to evaluate the functionality/complexity
> tradeoffs. I know something about search technologies, but I'm not an
> expert by any stretch of the imagination, and I've been looking for sources
> that talk about the comparison in an informed way - people, blogs,
> articles. So far, everything I've found is extremely basic. Does anyone
> have any pointers for me?
>
> Thanks in advance -
> Sam Bayer
> The MITRE Corporation
> sam@mitre.org
>