You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by MyD <my...@googlemail.com> on 2010/01/08 04:12:37 UTC

Injecting URLs and define Inlink?

Dear Nutch developers:

Is there any way to inject URLs and define the inlink for those URLs? How
and where can I find the inlink from a certain URL?

Example:

We inject a URL www.example.com/john_doe. We start the crawl and maybe we
are crawling the URL www.example.com/john_doe4.

*=> www.example.com/john_doe*
==> www.example.com/john_doe1
====> www.example.com/john_doe4
==> www.example.com/john_doe2
====> www.example.com/john_doe5
==> www.example.com/john_doe3
===>www.example.com/john_doe6

Is there any way to find the base (inlink) URL www.example.com/john_doe ???

Thanks in advance.

Cheers,
MyD

Re: Injecting URLs and define Inlink?

Posted by MyD <my...@googlemail.com>.
Sorry for the confusion, but I'd like to define the inlink for the URL:
www.example.com/john_doe. Let's say www.inlink.com. Is there a way to define
a inlink for a certain URL? If so how can I get the inlink for a certain
URL? Thanks in advance. Hope that clears everything.

Cheers,
MyD




On Fri, Jan 8, 2010 at 7:16 AM, xiao yang <ya...@gmail.com> wrote:

> What do you mean? You already know the url. Why do you want to find it?
>
> On Thu, Jan 7, 2010 at 7:12 PM, MyD <my...@googlemail.com> wrote:
> > Dear Nutch developers:
> > Is there any way to inject URLs and define the inlink for those URLs? How
> > and where can I find the inlink from a certain URL?
> > Example:
> > We inject a URL www.example.com/john_doe. We start the crawl and maybe
> we
> > are crawling the URL www.example.com/john_doe4.
> > => www.example.com/john_doe
> > ==> www.example.com/john_doe1
> > ====> www.example.com/john_doe4
> > ==> www.example.com/john_doe2
> > ====> www.example.com/john_doe5
> > ==> www.example.com/john_doe3
> > ===>www.example.com/john_doe6
> > Is there any way to find the base (inlink) URL www.example.com/john_doe???
> > Thanks in advance.
> > Cheers,
> > MyD
>

Re: Injecting URLs and define Inlink?

Posted by xiao yang <ya...@gmail.com>.
What do you mean? You already know the url. Why do you want to find it?

On Thu, Jan 7, 2010 at 7:12 PM, MyD <my...@googlemail.com> wrote:
> Dear Nutch developers:
> Is there any way to inject URLs and define the inlink for those URLs? How
> and where can I find the inlink from a certain URL?
> Example:
> We inject a URL www.example.com/john_doe. We start the crawl and maybe we
> are crawling the URL www.example.com/john_doe4.
> => www.example.com/john_doe
> ==> www.example.com/john_doe1
> ====> www.example.com/john_doe4
> ==> www.example.com/john_doe2
> ====> www.example.com/john_doe5
> ==> www.example.com/john_doe3
> ===>www.example.com/john_doe6
> Is there any way to find the base (inlink) URL www.example.com/john_doe ???
> Thanks in advance.
> Cheers,
> MyD