You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/04/14 22:59:20 UTC

Listing Priority

I have crawled some internet pages and indexed them at Solr.

When I list my results via Solr I want that: if a page has a URL(my schema
includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
give more priority to them.

How can I do it in a more efficient way at Solr?

RE: Listing Priority

Posted by Markus Jelsma <ma...@openindex.io>.
You can use boost queries to boost documents that match some query e.g. suffix:co.uk but you'll need to have URL suffixes indexed. Nutch knows about URL suffixes but does not index them. You would need to add a custom indexing filter or patch an existing filter to add a suffix field. URLUtil has methods that return the URL suffix for a given URL.

http://wiki.apache.org/solr/FunctionQuery#query

 
 
-----Original message-----
> From:Furkan KAMACI <fu...@gmail.com>
> Sent: Sun 14-Apr-2013 22:59
> To: solr-user@lucene.apache.org
> Subject: Listing Priority
> 
> I have crawled some internet pages and indexed them at Solr.
> 
> When I list my results via Solr I want that: if a page has a URL(my schema
> includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
> give more priority to them.
> 
> How can I do it in a more efficient way at Solr?
> 

Re: Solr consultant recommendation

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi Christian,

We (Sematext) do a ton of performance/tuning and scaling/architecture
work around Solr and ElasticSearch and run a performance monitoring
SaaS called SPM that you may be interested in, as it captures a bunch
of Solr and server metrics.  Nobody in Denmark, but we have several
folks in Europe.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Apr 24, 2013 at 6:58 AM, Christian von Wendt-Jensen
<Ch...@infopaq.com> wrote:
> Hi
>
> We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark.
>
> Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark?
>
> We have issues regarding hardware setup (storage, RAM, cores pr instance, instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to store or not to store, automated deployment of (more) shards, cache optimization, garbage collection issues, field collapsing, PERFORMANCE. You name it and we probably have it as an issue to discuss.
>
> We are currently running a setup of ~450 mio documents, receiving +1mio/day. Interesting challenge, if you ask me…
>
> If YOU are the one, then please get in contact.
>
>
>
> Med venlig hilsen / Best Regards
>
> Christian von Wendt-Jensen
> IT Team Lead, Customer Solutions
>
> Infopaq International A/S
> Kgs. Nytorv 22
> DK-1050 København K
>
> Phone             +45 36 99 00 00
> Mobile             +45 31 17 10 07
> Email              christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
> Web                www.infopaq.com<http://www.infopaq.com/>

Re: Solr consultant recommendation

Posted by Christian von Wendt-Jensen <Ch...@infopaq.com>.
Actually no, I didn't. But I can see that I should have. Thanks!




Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone             +45 36 99 00 00
Mobile             +45 31 17 10 07
Email              christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
Web                www.infopaq.com<http://www.infopaq.com/>








DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00
P Please consider the environment before printing this mail note.

From: Gora Mohanty <go...@mimirtech.com>>
Reply-To: "solr-user@lucene.apache.org<ma...@lucene.apache.org>" <so...@lucene.apache.org>>
Date: Wed, 24 Apr 2013 13:02:03 +0200
To: "solr-user@lucene.apache.org<ma...@lucene.apache.org>" <so...@lucene.apache.org>>
Subject: Re: Solr consultant recommendation

On 24 April 2013 16:28, Christian von Wendt-Jensen
<Ch...@infopaq.com>> wrote:

Hi

We have some detailed Solr setup issues we would like to discuss with a
Solr Expert (certified or self-declared), but we are having some
difficulties getting in contact with anyone here in Copenhagen, Denmark.

Therefore I would like to hear if anybody out there can drop me some names
of Solr Experts to contact, available in Denmark?
[...]

Have you looked at http://wiki.apache.org/solr/Support ?

Regards,
Gora


Re: Solr consultant recommendation

Posted by Gora Mohanty <go...@mimirtech.com>.
On 24 April 2013 16:28, Christian von Wendt-Jensen
<Ch...@infopaq.com> wrote:
>
> Hi
>
> We have some detailed Solr setup issues we would like to discuss with a
> Solr Expert (certified or self-declared), but we are having some
> difficulties getting in contact with anyone here in Copenhagen, Denmark.
>
> Therefore I would like to hear if anybody out there can drop me some names
> of Solr Experts to contact, available in Denmark?
[...]

Have you looked at http://wiki.apache.org/solr/Support ?

Regards,
Gora

Re: Solr consultant recommendation

Posted by Charlie Hull <ch...@flax.co.uk>.
On 24/04/2013 11:58, Christian von Wendt-Jensen wrote:
> Hi
>
> We have some detailed Solr setup issues we would like to discuss with
> a Solr Expert (certified or self-declared), but we are having some
> difficulties getting in contact with anyone here in Copenhagen,
> Denmark.
>
> Therefore I would like to hear if anybody out there can drop me some
> names of Solr Experts to contact, available in Denmark?
>
> We have issues regarding hardware setup (storage, RAM, cores pr
> instance, instances per machine), Solr Cloud vs Classic Master/Slave,
> shard size, to store or not to store, automated deployment of (more)
> shards, cache optimization, garbage collection issues, field
> collapsing, PERFORMANCE. You name it and we probably have it as an
> issue to discuss.
>
> We are currently running a setup of ~450 mio documents, receiving
> +1mio/day. Interesting challenge, if you ask me…
>
> If YOU are the one, then please get in contact.

Hi Christian,

We are based in the UK but have worked for a client in Copenhagen with a 
large Solr index - in fact I was there last week visiting another 
potential client. You can find out more about us from www.flax.co.uk - 
generally we work remotely but the flight from our local airport is only 
1hr20m. Do get in touch if I can tell you more.

Cheers

Charlie

>
>
>
> Med venlig hilsen / Best Regards
>
> Christian von Wendt-Jensen IT Team Lead, Customer Solutions
>
> Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K
>
> Phone             +45 36 99 00 00 Mobile             +45 31 17 10 07
> Email
> christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
>
>
Web                www.infopaq.com<http://www.infopaq.com/>
>
>
>
>
>
>
>
>
> DISCLAIMER: This e-mail and accompanying documents contain privileged
> confidential information. The information is intended only for the
> recipient(s) named. Any unauthorised disclosure, copying,
> distribution, exploitation or the taking of any action in reliance of
> the content of this e-mail is strictly prohibited. If you have
> received this e-mail in error we would be obliged if you would delete
> the e-mail and attachments and notify the dispatcher by return e-mail
> or at +45 36 99 00 00 P Please consider the environment before
> printing this mail note.
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Solr consultant recommendation

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Solr consultant recommendation
: In-Reply-To: <E8...@cominvent.com>

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Solr consultant recommendation

Posted by Christian von Wendt-Jensen <Ch...@infopaq.com>.
Hi

We have some detailed Solr setup issues we would like to discuss with a Solr Expert (certified or self-declared), but we are having some difficulties getting in contact with anyone here in Copenhagen, Denmark.

Therefore I would like to hear if anybody out there can drop me some names of Solr Experts to contact, available in Denmark?

We have issues regarding hardware setup (storage, RAM, cores pr instance, instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to store or not to store, automated deployment of (more) shards, cache optimization, garbage collection issues, field collapsing, PERFORMANCE. You name it and we probably have it as an issue to discuss.

We are currently running a setup of ~450 mio documents, receiving +1mio/day. Interesting challenge, if you ask me…

If YOU are the one, then please get in contact.



Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone             +45 36 99 00 00
Mobile             +45 31 17 10 07
Email              christian.sonne.jensen@infopaq.com<ma...@infopaq.com>
Web                www.infopaq.com<http://www.infopaq.com/>








DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential information. The information is intended only for the recipient(s) named. Any unauthorised disclosure, copying, distribution, exploitation or the taking of any action in reliance of the content of this e-mail is strictly prohibited. If you have received this e-mail in error we would be obliged if you would delete the e-mail and attachments and notify the dispatcher by return e-mail or at +45 36 99 00 00
P Please consider the environment before printing this mail note.


Re: Listing Priority

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Check out the new RegexpBoostProcessor https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/update/processor/RegexpBoostProcessor.html which does exactly this based on a config file

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

24. apr. 2013 kl. 00:22 skrev Furkan KAMACI <fu...@gmail.com>:

> Let's assume that I have written an update processor and extracted the
> domain and checked it with my predefined list. What should I do at indexing
> process and select?
> 
> 
> 2013/4/15 Alexandre Rafalovitch <ar...@gmail.com>
> 
>> You may find the work and code contributions by Jan Høydahl quite
>> relevant. See the presentation from 2 years ago:
>> 
>> http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011
>> 
>> One of the things he/they contributed is URLClassify Update Processor,
>> it might be quite relevant.
>> 
>> https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
>> 
>> Regards,
>>   Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>> 
>> 
>> On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI <fu...@gmail.com>
>> wrote:
>>> I have crawled some internet pages and indexed them at Solr.
>>> 
>>> When I list my results via Solr I want that: if a page has a URL(my
>> schema
>>> includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
>>> give more priority to them.
>>> 
>>> How can I do it in a more efficient way at Solr?
>> 


Re: Listing Priority

Posted by Furkan KAMACI <fu...@gmail.com>.
Let's assume that I have written an update processor and extracted the
domain and checked it with my predefined list. What should I do at indexing
process and select?


2013/4/15 Alexandre Rafalovitch <ar...@gmail.com>

> You may find the work and code contributions by Jan Høydahl quite
> relevant. See the presentation from 2 years ago:
>
> http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011
>
> One of the things he/they contributed is URLClassify Update Processor,
> it might be quite relevant.
>
> https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
>
> Regards,
>    Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI <fu...@gmail.com>
> wrote:
> > I have crawled some internet pages and indexed them at Solr.
> >
> > When I list my results via Solr I want that: if a page has a URL(my
> schema
> > includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
> > give more priority to them.
> >
> > How can I do it in a more efficient way at Solr?
>

Re: Listing Priority

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You may find the work and code contributions by Jan Høydahl quite
relevant. See the presentation from 2 years ago:
http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011

One of the things he/they contributed is URLClassify Update Processor,
it might be quite relevant.
https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI <fu...@gmail.com> wrote:
> I have crawled some internet pages and indexed them at Solr.
>
> When I list my results via Solr I want that: if a page has a URL(my schema
> includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
> give more priority to them.
>
> How can I do it in a more efficient way at Solr?