You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jamie Keenan <j....@mcpartners.co.uk> on 2018/02/06 17:41:24 UTC

Fusion or DIY w/Solr?

Hi All,

I was recommended to reach out by Ramkumar Aiyengar – I have known him for quite a while and I wanted to get some advice on Solr / Lucence…

To give you a bit of context – I am the founder of a boutique search firm with a big vision on augmenting what we do with technology as there is nothing that works well in our market (or as well as we would like…). We have SOLR at present and we were considering fusion so I wanted to gain some insights from the community about what we can achieve with SOLR / Lucence.

We are trying to build an industry leading search engine for talent! At present, we have SOLR installed on our internal servers and we want to really unlock it’s power. We currently use sovren to parse our data into XML format and we add relevant metadata as we engage with talent, we also have a pretty strong data model and a good idea of what we want a search engine to do for us. The bigger vision is to work on a product roadmap to create an innovative and intuitive search system to make sense of much larger sets of data, such as online profiles as well as incorporating machine learning into its approach to achieve great things. We are just at the beginning of this journey but we know there is nothing out there that works like this presently – even Linkedin is flawed (even as a multi-billion dollar company)

I would love to share our ideas, explain what we have at the moment along with our vision for the product to see what you think we could achieve and of course the best way to do it! To do this, we really need to think outside of the box and the conventional tools out there – hence the big vision! There may even be room to partner with us as an advisor if you find the problem interesting and exciting enough (and of course you have the time) 😊

Would anyone be interested in exploring the beginnings of a new tech product with us?

Kind Regards

Jamie Keenan

Founder

Technology






+44 (0)20 7014 1026<tel:+44%20(0)20%207014%201026>
+44 (0)7891 646094<tel:+44%20(0)%207891%20646094>
www.mcpartners.co.uk<http://www.mcpartners.co.uk/>






Follow us on: Twitter<https://twitter.com/MCPartnersUK> / LinkedIn<https://www.linkedin.com/company/mc-partners>
Add my details to your address book<http://mcwp.pdwd.net/wp-content/uploads/2016/10/Jamie-Keenan.vcf>






[cid:image009.png@01D239CE.FF2385D0]<http://www.mcpartners.co.uk/>






Confidentiality notice: The information in this e-mail (which includes any files transmitted with it) is confidential. It is intended for the addressee only. If this e-mail has been received by you and you are not the addressee please inform the sender and erase the e-mail from your system immediately. Any use, re-transmission, forwarding or copying of this e-mail by anyone other than the addressee is prohibited. Copyright in this e-mail and any document created by us will be and remains our property and this e-mail does not transfer any rights to you. This e-mail and its contents may not be relied on for any purpose whatsoever by anyone other than the addressee and no liability will be accepted for any loss however arising caused by misuse of or unauthorised reliance on this e-mail.






RE: Fusion or DIY w/Solr?

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Norconex filesystem collector should be able to handle XML output by Sovren very flexibly.  I am a big fan.     You can use a DOMSplitter to split a single large XML document into multiple smaller ones.

I started with Norconex because I found Heritrix a bit of a pain to configure, as it is more complicated (it does bigger parse jobs, crawls in distributed fashion, etc.), and is poorly documented.

I also liked that Norconex wanted to work much like Watson Explorer Engine, which is our internal prior search engine.

If you use Norconex for crawling and enriching your XML data, the benefit is that you are not tied to Solr.   You will still want to work on your schema, and any custom ResultHandler etc. the same you you currently do with Solr, but you can do a lot of the processing with Norconex.

Since I've now done several pull requests, you can take my enthusiasm with a grain of salt ;)

-----Original Message-----
From: Doug Turnbull [mailto:dturnbull@opensourceconnections.com] 
Sent: Tuesday, February 6, 2018 3:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Fusion or DIY w/Solr?

Also chime in on other options I’m familiar with:

- Hosted Solr (searchstax/websolr) to manage the ops of a pure open source product. I’m a big fan of both companies for removing ops headaches and focusing more on value add for open source

- Voyager Search for spatial focused enterprise search using Solr On Tue, Feb 6, 2018 at 2:30 PM Doug Turnbull < dturnbull@opensourceconnections.com> wrote:

> Used Fusion an a couple projects,
>
> *Pros:*
> *-* Wraps Solr, so you should be able to do anything you can do in 
> Solr in Fusion
> - 'Opinionated' Solr - Annoying problems I see on every team, solved 
> with common tooling (experiments, signal collection, etc). Can save tons of work.
> - Relevance focused - good defaults for relevance work, built by 
> people like Trey/Grant who are smart about this stuff
> - It's really a very good 'relevance IDE' - Still allows all the 
> customization of Solr, plus a dose of extra customization you can do 
> with Fusion based on open source tools
>
> *Cons*
> - It's in part a closed source search product, so if it gets acquired, 
> you can get stuck depending on a closed source product that gets 
> abandoned. See Fast, GSA, etc
> - You'll need to fit into the opinions, and be OK with the Fusion way 
> of thinking
> - Can be hard to tell where Solr ends and Fusion begins, if you want 
> to limit your dependency on Fusion
>
> Sure I'm missing stuff. It's one of the best search products out 
> there, in my opinion. But a search product isn't for anyone. (see 
> http://opensourceconnections.com/blog/2017/04/29/choosing-proprietary-
> vs-open-source-search/
> )
>
> I'm always happy to talk 1-1 about any search product :)
>
> Hope that helps
> -Doug
>
>
> On Tue, Feb 6, 2018 at 1:28 PM Jamie Keenan 
> <j....@mcpartners.co.uk>
> wrote:
>
>> Hi All,
>>
>>
>>
>> I was recommended to reach out by Ramkumar Aiyengar – I have known 
>> him for quite a while and I wanted to get some advice on Solr / 
>> Lucence…
>>
>>
>>
>> To give you a bit of context – I am the founder of a boutique search 
>> firm with a big vision on augmenting what we do with technology as 
>> there is nothing that works well in our market (or as well as we 
>> would like…). We have SOLR at present and we were considering fusion 
>> so I wanted to gain some insights from the community about what we 
>> can achieve with SOLR / Lucence.
>>
>>
>>
>> We are trying to build an industry leading search engine for talent! 
>> At present, we have SOLR installed on our internal servers and we 
>> want to really unlock it’s power. We currently use sovren to parse 
>> our data into XML format and we add relevant metadata as we engage 
>> with talent, we also have a pretty strong data model and a good idea 
>> of what we want a search engine to do for us. The bigger vision is to 
>> work on a product roadmap to create an innovative and intuitive 
>> search system to make sense of much larger sets of data, such as 
>> online profiles as well as incorporating machine learning into its 
>> approach to achieve great things. We are just at the beginning of 
>> this journey but we know there is nothing out there that works like 
>> this presently – even Linkedin is flawed (even as a multi-billion 
>> dollar company)
>>
>>
>>
>> I would love to share our ideas, explain what we have at the moment 
>> along with our vision for the product to see what you think we could 
>> achieve and of course the best way to do it! To do this, we really 
>> need to think outside of the box and the conventional tools out there 
>> – hence the big vision! There may even be room to partner with us as 
>> an advisor if you find the problem interesting and exciting enough 
>> (and of course you have the
>> time) 😊
>>
>>
>>
>> Would anyone be interested in exploring the beginnings of a new tech 
>> product with us?
>>
>>
>>
>> Kind Regards
>>
>>
>>
>> *Jamie Keenan*
>>
>> Founder
>>
>> Technology
>>
>>
>>
>>
>>
>>
>>
>> +44 (0)20 7014 1026 <+44%20(0)20%207014%201026>
>> +44 (0)7891 646094 <+44%20(0)%207891%20646094>
>> *www.mcpartners.co.uk* <http://www.mcpartners.co.uk/>
>>
>>
>>
>>
>>
>>
>>
>> Follow us on: *Twitter* <https://twitter.com/MCPartnersUK> / 
>> *LinkedIn* <https://www.linkedin.com/company/mc-partners>
>> *Add my details to your address book* 
>> <http://mcwp.pdwd.net/wp-content/uploads/2016/10/Jamie-Keenan.vcf>
>>
>>
>>
>>
>>
>>
>>
>> [image: cid:image009.png@01D239CE.FF2385D0]
>> <http://www.mcpartners.co.uk/>
>>
>>
>>
>>
>>
>>
>>
>> Confidentiality notice: The information in this e-mail (which 
>> includes any files transmitted with it) is confidential. It is 
>> intended for the addressee only. If this e-mail has been received by 
>> you and you are not the addressee please inform the sender and erase 
>> the e-mail from your system immediately. Any use, re-transmission, 
>> forwarding or copying of this e-mail by anyone other than the 
>> addressee is prohibited. Copyright in this e-mail and any document 
>> created by us will be and remains our property and this e-mail does 
>> not transfer any rights to you. This e-mail and its contents may not 
>> be relied on for any purpose whatsoever by anyone other than the 
>> addressee and no liability will be accepted for any loss however arising caused by misuse of or unauthorised reliance on this e-mail.
>>
>>
>>
>>
>>
>>
>>
>>
>>
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>
--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: Fusion or DIY w/Solr?

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
Also chime in on other options I’m familiar with:

- Hosted Solr (searchstax/websolr) to manage the ops of a pure open source
product. I’m a big fan of both companies for removing ops headaches and
focusing more on value add for open source

- Voyager Search for spatial focused enterprise search using Solr
On Tue, Feb 6, 2018 at 2:30 PM Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Used Fusion an a couple projects,
>
> *Pros:*
> *-* Wraps Solr, so you should be able to do anything you can do in Solr
> in Fusion
> - 'Opinionated' Solr - Annoying problems I see on every team, solved with
> common tooling (experiments, signal collection, etc). Can save tons of work.
> - Relevance focused - good defaults for relevance work, built by people
> like Trey/Grant who are smart about this stuff
> - It's really a very good 'relevance IDE' - Still allows all the
> customization of Solr, plus a dose of extra customization you can do with
> Fusion based on open source tools
>
> *Cons*
> - It's in part a closed source search product, so if it gets acquired, you
> can get stuck depending on a closed source product that gets abandoned. See
> Fast, GSA, etc
> - You'll need to fit into the opinions, and be OK with the Fusion way of
> thinking
> - Can be hard to tell where Solr ends and Fusion begins, if you want to
> limit your dependency on Fusion
>
> Sure I'm missing stuff. It's one of the best search products out there, in
> my opinion. But a search product isn't for anyone. (see
> http://opensourceconnections.com/blog/2017/04/29/choosing-proprietary-vs-open-source-search/
> )
>
> I'm always happy to talk 1-1 about any search product :)
>
> Hope that helps
> -Doug
>
>
> On Tue, Feb 6, 2018 at 1:28 PM Jamie Keenan <j....@mcpartners.co.uk>
> wrote:
>
>> Hi All,
>>
>>
>>
>> I was recommended to reach out by Ramkumar Aiyengar – I have known him
>> for quite a while and I wanted to get some advice on Solr / Lucence…
>>
>>
>>
>> To give you a bit of context – I am the founder of a boutique search firm
>> with a big vision on augmenting what we do with technology as there is
>> nothing that works well in our market (or as well as we would like…). We
>> have SOLR at present and we were considering fusion so I wanted to gain
>> some insights from the community about what we can achieve with SOLR /
>> Lucence.
>>
>>
>>
>> We are trying to build an industry leading search engine for talent! At
>> present, we have SOLR installed on our internal servers and we want to
>> really unlock it’s power. We currently use sovren to parse our data into
>> XML format and we add relevant metadata as we engage with talent, we also
>> have a pretty strong data model and a good idea of what we want a search
>> engine to do for us. The bigger vision is to work on a product roadmap to
>> create an innovative and intuitive search system to make sense of much
>> larger sets of data, such as online profiles as well as incorporating
>> machine learning into its approach to achieve great things. We are just at
>> the beginning of this journey but we know there is nothing out there that
>> works like this presently – even Linkedin is flawed (even as a
>> multi-billion dollar company)
>>
>>
>>
>> I would love to share our ideas, explain what we have at the moment along
>> with our vision for the product to see what you think we could achieve and
>> of course the best way to do it! To do this, we really need to think
>> outside of the box and the conventional tools out there – hence the big
>> vision! There may even be room to partner with us as an advisor if you find
>> the problem interesting and exciting enough (and of course you have the
>> time) 😊
>>
>>
>>
>> Would anyone be interested in exploring the beginnings of a new tech
>> product with us?
>>
>>
>>
>> Kind Regards
>>
>>
>>
>> *Jamie Keenan*
>>
>> Founder
>>
>> Technology
>>
>>
>>
>>
>>
>>
>>
>> +44 (0)20 7014 1026 <+44%20(0)20%207014%201026>
>> +44 (0)7891 646094 <+44%20(0)%207891%20646094>
>> *www.mcpartners.co.uk* <http://www.mcpartners.co.uk/>
>>
>>
>>
>>
>>
>>
>>
>> Follow us on: *Twitter* <https://twitter.com/MCPartnersUK> / *LinkedIn*
>> <https://www.linkedin.com/company/mc-partners>
>> *Add my details to your address book*
>> <http://mcwp.pdwd.net/wp-content/uploads/2016/10/Jamie-Keenan.vcf>
>>
>>
>>
>>
>>
>>
>>
>> [image: cid:image009.png@01D239CE.FF2385D0]
>> <http://www.mcpartners.co.uk/>
>>
>>
>>
>>
>>
>>
>>
>> Confidentiality notice: The information in this e-mail (which includes
>> any files transmitted with it) is confidential. It is intended for the
>> addressee only. If this e-mail has been received by you and you are not the
>> addressee please inform the sender and erase the e-mail from your system
>> immediately. Any use, re-transmission, forwarding or copying of this e-mail
>> by anyone other than the addressee is prohibited. Copyright in this e-mail
>> and any document created by us will be and remains our property and this
>> e-mail does not transfer any rights to you. This e-mail and its contents
>> may not be relied on for any purpose whatsoever by anyone other than the
>> addressee and no liability will be accepted for any loss however arising
>> caused by misuse of or unauthorised reliance on this e-mail.
>>
>>
>>
>>
>>
>>
>>
>>
>>
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug

Re: Fusion or DIY w/Solr?

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
Used Fusion an a couple projects,

*Pros:*
*-* Wraps Solr, so you should be able to do anything you can do in Solr in
Fusion
- 'Opinionated' Solr - Annoying problems I see on every team, solved with
common tooling (experiments, signal collection, etc). Can save tons of work.
- Relevance focused - good defaults for relevance work, built by people
like Trey/Grant who are smart about this stuff
- It's really a very good 'relevance IDE' - Still allows all the
customization of Solr, plus a dose of extra customization you can do with
Fusion based on open source tools

*Cons*
- It's in part a closed source search product, so if it gets acquired, you
can get stuck depending on a closed source product that gets abandoned. See
Fast, GSA, etc
- You'll need to fit into the opinions, and be OK with the Fusion way of
thinking
- Can be hard to tell where Solr ends and Fusion begins, if you want to
limit your dependency on Fusion

Sure I'm missing stuff. It's one of the best search products out there, in
my opinion. But a search product isn't for anyone. (see
http://opensourceconnections.com/blog/2017/04/29/choosing-proprietary-vs-open-source-search/
)

I'm always happy to talk 1-1 about any search product :)

Hope that helps
-Doug


On Tue, Feb 6, 2018 at 1:28 PM Jamie Keenan <j....@mcpartners.co.uk>
wrote:

> Hi All,
>
>
>
> I was recommended to reach out by Ramkumar Aiyengar – I have known him for
> quite a while and I wanted to get some advice on Solr / Lucence…
>
>
>
> To give you a bit of context – I am the founder of a boutique search firm
> with a big vision on augmenting what we do with technology as there is
> nothing that works well in our market (or as well as we would like…). We
> have SOLR at present and we were considering fusion so I wanted to gain
> some insights from the community about what we can achieve with SOLR /
> Lucence.
>
>
>
> We are trying to build an industry leading search engine for talent! At
> present, we have SOLR installed on our internal servers and we want to
> really unlock it’s power. We currently use sovren to parse our data into
> XML format and we add relevant metadata as we engage with talent, we also
> have a pretty strong data model and a good idea of what we want a search
> engine to do for us. The bigger vision is to work on a product roadmap to
> create an innovative and intuitive search system to make sense of much
> larger sets of data, such as online profiles as well as incorporating
> machine learning into its approach to achieve great things. We are just at
> the beginning of this journey but we know there is nothing out there that
> works like this presently – even Linkedin is flawed (even as a
> multi-billion dollar company)
>
>
>
> I would love to share our ideas, explain what we have at the moment along
> with our vision for the product to see what you think we could achieve and
> of course the best way to do it! To do this, we really need to think
> outside of the box and the conventional tools out there – hence the big
> vision! There may even be room to partner with us as an advisor if you find
> the problem interesting and exciting enough (and of course you have the
> time) 😊
>
>
>
> Would anyone be interested in exploring the beginnings of a new tech
> product with us?
>
>
>
> Kind Regards
>
>
>
> *Jamie Keenan*
>
> Founder
>
> Technology
>
>
>
>
>
>
>
> +44 (0)20 7014 1026 <+44%20(0)20%207014%201026>
> +44 (0)7891 646094 <+44%20(0)%207891%20646094>
> *www.mcpartners.co.uk* <http://www.mcpartners.co.uk/>
>
>
>
>
>
>
>
> Follow us on: *Twitter* <https://twitter.com/MCPartnersUK> / *LinkedIn*
> <https://www.linkedin.com/company/mc-partners>
> *Add my details to your address book*
> <http://mcwp.pdwd.net/wp-content/uploads/2016/10/Jamie-Keenan.vcf>
>
>
>
>
>
>
>
> [image: cid:image009.png@01D239CE.FF2385D0] <http://www.mcpartners.co.uk/>
>
>
>
>
>
>
>
> Confidentiality notice: The information in this e-mail (which includes any
> files transmitted with it) is confidential. It is intended for the
> addressee only. If this e-mail has been received by you and you are not the
> addressee please inform the sender and erase the e-mail from your system
> immediately. Any use, re-transmission, forwarding or copying of this e-mail
> by anyone other than the addressee is prohibited. Copyright in this e-mail
> and any document created by us will be and remains our property and this
> e-mail does not transfer any rights to you. This e-mail and its contents
> may not be relied on for any purpose whatsoever by anyone other than the
> addressee and no liability will be accepted for any loss however arising
> caused by misuse of or unauthorised reliance on this e-mail.
>
>
>
>
>
>
>
>
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug