You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by RP <rp...@earthlink.net> on 2006/12/19 16:52:56 UTC

How best to add "sponsored link" support..??

Hi all,

I've been tasked with looking into this and am not a coder - that said, 
Nutch  is doing great and the bean counters have asked me to look into 
adding sponsored link results and I'm wondering how best to add this.

It would be nice to utilize the Nutch engine to come up with the pages 
versus just doing a lookup on words and results in a flat file but the 
key word data could change daily (hourly) and would need to be able to 
be hand entered (or automated) as people sign up (re-index is not really 
an option).  I'm not sure this would fly within the main Nutch segments 
and index, but I could see maybe a separate index or possibly adding a 
flag to the existing data but I've not seen any easy to use tools to 
change/update/insert records into what is already there (yes Luke on the 
index but that does not touch the segment data, right?).  I don't want 
to change existing searched data and I don't see an issue with having 
duplicate results (sponsored up top and existing entry down below 
somewhere) but it would be more elegant to not have that occur.  I also 
see issues in a simple flat file look up as a multiple word search is 
best handled inside Nutch to "score" the results versus having to do 
something similar in the sponsored results.  I can see the need to 
control the summary text displayed and also pass thru any codes in the 
URL which are currently being stripped during the main crawl/index 
cycle.  I also see issues with seriously customizing the internals as 
they would have to be maintained as Nutch itself is updated....

If anyone has looked at this and has at least some ideas on how best to 
do this let me know.  I need to come up with a preliminary estimate 
before I can engage and pay the coders to make this happen so if there 
are any easy or "best practices" ways on doing this any help/pointers 
would be appreciated....

-- 
rp




Re: How best to add "sponsored link" support..??

Posted by Jim Wilson <wi...@gmail.com>.
You may want to consider letting a third-party handle your sponsored links,
unless of course you already have an infrastructure for handling everything
you already mentioned as well as the following:

* Advertiser registration
* Advertiser purchase of keywords/page space
* Calculation of impressions and clicks
* Payment model based on impressions and clicks.
* Collections from advertisers (who may dispute the numbers)

I'm not saying the idea is without merit, as it would certainly be useful. I
just hope that your time in developing the feature and the bean-counters'
time in managing the above items is worth the revenue - which many times
comes down to raw traffic numbers.

You could always try a third-party option, and if the revenue is good, then
manage it in house to try to skim a larger margin.

-- Jim R. Wilson

On 12/19/06, RP <rp...@earthlink.net> wrote:
>
> Hi all,
>
> I've been tasked with looking into this and am not a coder - that said,
> Nutch  is doing great and the bean counters have asked me to look into
> adding sponsored link results and I'm wondering how best to add this.
>
> It would be nice to utilize the Nutch engine to come up with the pages
> versus just doing a lookup on words and results in a flat file but the
> key word data could change daily (hourly) and would need to be able to
> be hand entered (or automated) as people sign up (re-index is not really
> an option).  I'm not sure this would fly within the main Nutch segments
> and index, but I could see maybe a separate index or possibly adding a
> flag to the existing data but I've not seen any easy to use tools to
> change/update/insert records into what is already there (yes Luke on the
> index but that does not touch the segment data, right?).  I don't want
> to change existing searched data and I don't see an issue with having
> duplicate results (sponsored up top and existing entry down below
> somewhere) but it would be more elegant to not have that occur.  I also
> see issues in a simple flat file look up as a multiple word search is
> best handled inside Nutch to "score" the results versus having to do
> something similar in the sponsored results.  I can see the need to
> control the summary text displayed and also pass thru any codes in the
> URL which are currently being stripped during the main crawl/index
> cycle.  I also see issues with seriously customizing the internals as
> they would have to be maintained as Nutch itself is updated....
>
> If anyone has looked at this and has at least some ideas on how best to
> do this let me know.  I need to come up with a preliminary estimate
> before I can engage and pay the coders to make this happen so if there
> are any easy or "best practices" ways on doing this any help/pointers
> would be appreciated....
>
> --
> rp
>
>
>
>