You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jim Glynn <jr...@hotmail.com> on 2013/12/05 01:47:55 UTC

Prioritize search returns by URL path?

We have a Telligent based community with Solr as the search engine. We want
to prioritize search returns from within the community by the type of
content: Wiki articles as most relevant, then blog posts, then Verified
answer and Suggested answer forum posts, then remaining forum posts. We have
also implemented a Helpful voting capability and would like to boost items
with more Helpful votes above those within their same category with fewer
votes.

Has anyone out there done something similar, or can someone suggest how to
do this? We're new to search engine tuning, so assume very little knowledge
on our part.

Thanks for your help!
JRG



--
View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritize search returns by URL path?

Posted by Jim Glynn <jr...@hotmail.com>.
Thanks Chris. I think you've hit the nail on the head.

I understand your concern about prioritizing content simply by content type,
and generally I'd agree with you. However, our situation is a bit unusual.
We don't use our Wiki feature as true wikis. We publish only authoritative
content to them, and to our blogs, so those really are the things we want
returned first. The wikis most often contain the information we want our
customers to find.

Thanks again for the syntax help. We'll give it a try.

JRG



--
View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4106481.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritize search returns by URL path?

Posted by Chris Hostetter <ho...@fucit.org>.

1) i would strongly advise you against falling in the trap of thinking 
things like "Wiki posts should always be returned higher than blog posts" 
... unless you truly want *any* wiki post that matches your keywords, no 
matter how tangentially and how poorly, to come back "higher" on the list 
of results that any blog post -- evne if that blog post is 100% dedicated 
to the keywords the user searched for.

if that's really want you want, then all you need is "sort=doc_type desc, 
score desc" where you assign a numeric doct_type value at index type -- 
but i assure you, it's a terrible idea.

2) in general, what you are interesting in is "domain boosting" ... where
because of the specifics of your domain knowledge, you know that certain  
documents should generally score higher -- how much higher is an art form, 
that again is going to largely dependon the specifics of your domain, but 
you will most likeley want it to be something you can tweak and tune.

3) regardless of the specifics of the website you are dealing with, and 
the URL structure used, what really matters is how you convert the raw 
data on your website into documents to be indexed -- when you do that, 
however you do that, is when you can add fields to your documents to 
convey information like "this document is from the wiki" or "this document 
is from the forum" or "this doument is a verified forum answer".  If the 
only way you can conceptually know this information is by parsing the URL, 
then so be it -- but more then likeley if you are reading this data 
directly from an authoritative source (instead of just crawling URLs), 
there are easy methods to determine this stuff.

	. . .

My initial suggestion would be to create a simple field called 
"doc_type" containing values like "wiki", "blog", "forum", 
"forum_verified", and "forum_suggested" ... with those values *indexed* 
for each doc, you can then use the ExternalFileField to associate a 
numeric value to each of those special values, and you can tune & tweak 
those numeric values w/o re-indexing.  Then you should look into how boost 
functions work to make those numeric values an input into the final score 
calculations.  

In the long run hwoever, you may want ot consider indexing a general
"importance" value for each doc that you re-compute periodically based not 
just on the *type* of the document, but also things like the number of 
page views, the number of votes for forum answers to be "verified", etc...


More information about "domain boosting"...

https://people.apache.org/~hossman/ac2012eu/
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630





On Fri, 6 Dec 2013, Jim Glynn wrote:

: Date: Fri, 6 Dec 2013 13:10:59 -0800 (PST)
: From: Jim Glynn <jr...@hotmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Prioritize search returns by URL path?
: 
: Thanks all. Yes, we can differentiate between content types by URL.
: Everything else being equal, Wiki posts should always be returned higher
: than blog posts, and blog posts should always be returned higher than forum
: posts.
: 
: Within forum posts, we want to rank Verified answered and Suggested answered
: posts higher than unanswered posts. These cannot be identified via path -
: only via metadata attached to the individual post. Any suggestions?
: 
: @Alex, I'll investigate the references you provided. Thanks!
: 
: 
: 
: --
: View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/

Re: Prioritize search returns by URL path?

Posted by manju16832003 <ma...@gmail.com>.
Could it be achieved using multiple request handlers?

Example:

http://localhost:8983/solr/my/wiki
http://localhost:8983/solr/my/blog
http://localhost:8983/solr/my/forum

As we could configure config for each request handler to specify the query.
It would be great if Solr supports to query those three request handlers
together and combine the result set?





--
View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritize search returns by URL path?

Posted by Jim Glynn <jr...@hotmail.com>.
Thanks all. Yes, we can differentiate between content types by URL.
Everything else being equal, Wiki posts should always be returned higher
than blog posts, and blog posts should always be returned higher than forum
posts.

Within forum posts, we want to rank Verified answered and Suggested answered
posts higher than unanswered posts. These cannot be identified via path -
only via metadata attached to the individual post. Any suggestions?

@Alex, I'll investigate the references you provided. Thanks!



--
View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritize search returns by URL path?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Something like URLClassifyProcessor could be useful to work with URLs:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

Look for presentations/writing by Jan Hoydahl on the background for this,
similar work.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Dec 5, 2013 at 7:47 AM, Jim Glynn <jr...@hotmail.com> wrote:

> We have a Telligent based community with Solr as the search engine. We want
> to prioritize search returns from within the community by the type of
> content: Wiki articles as most relevant, then blog posts, then Verified
> answer and Suggested answer forum posts, then remaining forum posts. We
> have
> also implemented a Helpful voting capability and would like to boost items
> with more Helpful votes above those within their same category with fewer
> votes.
>
> Has anyone out there done something similar, or can someone suggest how to
> do this? We're new to search engine tuning, so assume very little knowledge
> on our part.
>
> Thanks for your help!
> JRG
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Prioritize search returns by URL path?

Posted by manju16832003 <ma...@gmail.com>.
Hi Jim Glynn,

KAMACI is correct. How do you discriminate your documents?.

Jim and Kamaci,

I do have the same situation where I will be boosting document regular basis
and expect documents with higher score appears on top and lower one at the
bottom.

Here is my requirement.

My entity name is LISTING.

I have few 100k listings and I distinguish each listing by *bumped, featured
and normal* listings by listing_type.

My Implementation plan is that
 - I want all the bumped (paid listings) on top of featured and normal
listings
 - I want all featured listings on top of normal listings
 - Normal listings always stays bottom of the list

Again in bumped listings, when user pays more money, those listings must
appear on top of other bumped listings?

My Question is
 - Can it be done using boosting those specific listings and sort by the
score
 - Can it be done on the fly from an application and directly inject to Solr
and reflect the changes in application again

 - At certain conditions, I might need to mix up bumped listings to get
random bumped listings.






--
View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105245.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Prioritize search returns by URL path?

Posted by Furkan KAMACI <fu...@gmail.com>.
Hi;

How do you discriminate your content type, by their paths? If you don't
want to do any regex operation and complex things you can send content type
as a filed of your document too. Does every wiki content has more priority
than every blogs? If yes you can use facet at that new field (content type).

Thanks;
Furkan KAMACI

5 Aralık 2013 Perşembe tarihinde Jim Glynn <jr...@hotmail.com> adlı
kullanıcı şöyle yazdı:
> We have a Telligent based community with Solr as the search engine. We
want
> to prioritize search returns from within the community by the type of
> content: Wiki articles as most relevant, then blog posts, then Verified
> answer and Suggested answer forum posts, then remaining forum posts. We
have
> also implemented a Helpful voting capability and would like to boost items
> with more Helpful votes above those within their same category with fewer
> votes.
>
> Has anyone out there done something similar, or can someone suggest how to
> do this? We're new to search engine tuning, so assume very little
knowledge
> on our part.
>
> Thanks for your help!
> JRG
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>