You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/04/14 22:59:20 UTC
Listing Priority
I have crawled some internet pages and indexed them at Solr.
When I list my results via Solr I want that: if a page has a URL(my schema
includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
give more priority to them.
How can I do it in a more efficient way at Solr?
RE: Listing Priority
Posted by Markus Jelsma <ma...@openindex.io>.
You can use boost queries to boost documents that match some query e.g. suffix:co.uk but you'll need to have URL suffixes indexed. Nutch knows about URL suffixes but does not index them. You would need to add a custom indexing filter or patch an existing filter to add a suffix field. URLUtil has methods that return the URL suffix for a given URL.
http://wiki.apache.org/solr/FunctionQuery#query
-----Original message-----
> From:Furkan KAMACI <fu...@gmail.com>
> Sent: Sun 14-Apr-2013 22:59
> To: solr-user@lucene.apache.org
> Subject: Listing Priority
>
> I have crawled some internet pages and indexed them at Solr.
>
> When I list my results via Solr I want that: if a page has a URL(my schema
> includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
> give more priority to them.
>
> How can I do it in a more efficient way at Solr?
>
Re: Solr consultant recommendation
Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Christian,
We (Sematext) do a ton of performance/tuning and scaling/architecture
work around Solr and ElasticSearch and run a performance monitoring
SaaS called SPM that you may be interested in, as it captures a bunch
of Solr and server metrics. Nobody in Denmark, but we have several
folks in Europe.
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Wed, Apr 24, 2013 at 6:58 AM, Christian von Wendt-Jensen
<Ch...@infopaq.com> wrote:
> Hi
>
> We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark.
>
> Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark?
>
> We have issues regarding hardware setup (storage, RAM, cores pr instance, instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to store or not to store, automated deployment of (more) shards, cache optimization, garbage collection issues, field collapsing, PERFORMANCE. You name it and we probably have it as an issue to discuss.
>
> We are currently running a setup of ~450 mio documents, receiving +1mio/day. Interesting challenge, if you ask me…
>
> If YOU are the one, then please get in contact.
>
>
>
> Med venlig hilsen / Best Regards
>
> Christian von Wendt-Jensen
> IT Team Lead, Customer Solutions
>
> Infopaq International A/S
> Kgs. Nytorv 22
> DK-1050 København K
>
> Phone +45 36 99 00 00
> Mobile +45 31 17 10 07
> Email christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
> Web www.infopaq.com<http://www.infopaq.com/>
Re: Solr consultant recommendation
Posted by Christian von Wendt-Jensen <Ch...@infopaq.com>.
Actually no, I didn't. But I can see that I should have. Thanks!
Med venlig hilsen / Best Regards
Christian von Wendt-Jensen
IT Team Lead, Customer Solutions
Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K
Phone +45 36 99 00 00
Mobile +45 31 17 10 07
Email christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
Web www.infopaq.com<http://www.infopaq.com/>
DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00
P Please consider the environment before printing this mail note.
From: Gora Mohanty <go...@mimirtech.com>>
Reply-To: "solr-user@lucene.apache.org<ma...@lucene.apache.org>" <so...@lucene.apache.org>>
Date: Wed, 24 Apr 2013 13:02:03 +0200
To: "solr-user@lucene.apache.org<ma...@lucene.apache.org>" <so...@lucene.apache.org>>
Subject: Re: Solr consultant recommendation
On 24 April 2013 16:28, Christian von Wendt-Jensen
<Ch...@infopaq.com>> wrote:
Hi
We have some detailed Solr setup issues we would like to discuss with a
Solr Expert (certified or self-declared), but we are having some
difficulties getting in contact with anyone here in Copenhagen, Denmark.
Therefore I would like to hear if anybody out there can drop me some names
of Solr Experts to contact, available in Denmark?
[...]
Have you looked at http://wiki.apache.org/solr/Support ?
Regards,
Gora
Re: Solr consultant recommendation
Posted by Gora Mohanty <go...@mimirtech.com>.
On 24 April 2013 16:28, Christian von Wendt-Jensen
<Ch...@infopaq.com> wrote:
>
> Hi
>
> We have some detailed Solr setup issues we would like to discuss with a
> Solr Expert (certified or self-declared), but we are having some
> difficulties getting in contact with anyone here in Copenhagen, Denmark.
>
> Therefore I would like to hear if anybody out there can drop me some names
> of Solr Experts to contact, available in Denmark?
[...]
Have you looked at http://wiki.apache.org/solr/Support ?
Regards,
Gora
Re: Solr consultant recommendation
Posted by Charlie Hull <ch...@flax.co.uk>.
On 24/04/2013 11:58, Christian von Wendt-Jensen wrote:
> Hi
>
> We have some detailed Solr setup issues we would like to discuss with
> a Solr Expert (certified or self-declared), but we are having some
> difficulties getting in contact with anyone here in Copenhagen,
> Denmark.
>
> Therefore I would like to hear if anybody out there can drop me some
> names of Solr Experts to contact, available in Denmark?
>
> We have issues regarding hardware setup (storage, RAM, cores pr
> instance, instances per machine), Solr Cloud vs Classic Master/Slave,
> shard size, to store or not to store, automated deployment of (more)
> shards, cache optimization, garbage collection issues, field
> collapsing, PERFORMANCE. You name it and we probably have it as an
> issue to discuss.
>
> We are currently running a setup of ~450 mio documents, receiving
> +1mio/day. Interesting challenge, if you ask me…
>
> If YOU are the one, then please get in contact.
Hi Christian,
We are based in the UK but have worked for a client in Copenhagen with a
large Solr index - in fact I was there last week visiting another
potential client. You can find out more about us from www.flax.co.uk -
generally we work remotely but the flight from our local airport is only
1hr20m. Do get in touch if I can tell you more.
Cheers
Charlie
>
>
>
> Med venlig hilsen / Best Regards
>
> Christian von Wendt-Jensen IT Team Lead, Customer Solutions
>
> Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K
>
> Phone +45 36 99 00 00 Mobile +45 31 17 10 07
> Email
> christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
>
>
Web www.infopaq.com<http://www.infopaq.com/>
>
>
>
>
>
>
>
>
> DISCLAIMER: This e-mail and accompanying documents contain privileged
> confidential information. The information is intended only for the
> recipient(s) named. Any unauthorised disclosure, copying,
> distribution, exploitation or the taking of any action in reliance of
> the content of this e-mail is strictly prohibited. If you have
> received this e-mail in error we would be obliged if you would delete
> the e-mail and attachments and notify the dispatcher by return e-mail
> or at +45 36 99 00 00 P Please consider the environment before
> printing this mail note.
>
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk
Re: Solr consultant recommendation
Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Solr consultant recommendation
: In-Reply-To: <E8...@cominvent.com>
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives
particularly difficult.
-Hoss
Solr consultant recommendation
Posted by Christian von Wendt-Jensen <Ch...@infopaq.com>.
Hi
We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark.
Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark?
We have issues regarding hardware setup (storage, RAM, cores pr instance, instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to store or not to store, automated deployment of (more) shards, cache optimization, garbage collection issues, field collapsing, PERFORMANCE. You name it and we probably have it as an issue to discuss.
We are currently running a setup of ~450 mio documents, receiving +1mio/day. Interesting challenge, if you ask me…
If YOU are the one, then please get in contact.
Med venlig hilsen / Best Regards
Christian von Wendt-Jensen
IT Team Lead, Customer Solutions
Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K
Phone +45 36 99 00 00
Mobile +45 31 17 10 07
Email christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
Web www.infopaq.com<http://www.infopaq.com/>
DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00
P Please consider the environment before printing this mail note.
Re: Listing Priority
Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,
Check out the new RegexpBoostProcessor https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/update/processor/RegexpBoostProcessor.html which does exactly this based on a config file
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
24. apr. 2013 kl. 00:22 skrev Furkan KAMACI <fu...@gmail.com>:
> Let's assume that I have written an update processor and extracted the
> domain and checked it with my predefined list. What should I do at indexing
> process and select?
>
>
> 2013/4/15 Alexandre Rafalovitch <ar...@gmail.com>
>
>> You may find the work and code contributions by Jan Høydahl quite
>> relevant. See the presentation from 2 years ago:
>>
>> http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011
>>
>> One of the things he/they contributed is URLClassify Update Processor,
>> it might be quite relevant.
>>
>> https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
>>
>> Regards,
>> Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
>> book)
>>
>>
>> On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI <fu...@gmail.com>
>> wrote:
>>> I have crawled some internet pages and indexed them at Solr.
>>>
>>> When I list my results via Solr I want that: if a page has a URL(my
>> schema
>>> includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
>>> give more priority to them.
>>>
>>> How can I do it in a more efficient way at Solr?
>>
Re: Listing Priority
Posted by Furkan KAMACI <fu...@gmail.com>.
Let's assume that I have written an update processor and extracted the
domain and checked it with my predefined list. What should I do at indexing
process and select?
2013/4/15 Alexandre Rafalovitch <ar...@gmail.com>
> You may find the work and code contributions by Jan Høydahl quite
> relevant. See the presentation from 2 years ago:
>
> http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011
>
> One of the things he/they contributed is URLClassify Update Processor,
> it might be quite relevant.
>
> https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
>
> Regards,
> Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
> book)
>
>
> On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI <fu...@gmail.com>
> wrote:
> > I have crawled some internet pages and indexed them at Solr.
> >
> > When I list my results via Solr I want that: if a page has a URL(my
> schema
> > includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
> > give more priority to them.
> >
> > How can I do it in a more efficient way at Solr?
>
Re: Listing Priority
Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You may find the work and code contributions by Jan Høydahl quite
relevant. See the presentation from 2 years ago:
http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011
One of the things he/they contributed is URLClassify Update Processor,
it might be quite relevant.
https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)
On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI <fu...@gmail.com> wrote:
> I have crawled some internet pages and indexed them at Solr.
>
> When I list my results via Solr I want that: if a page has a URL(my schema
> includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
> give more priority to them.
>
> How can I do it in a more efficient way at Solr?