You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by ritika jain <ri...@gmail.com> on 2021/05/19 12:31:25 UTC

Manifoldcf Redirection process

Hi

I want to understand the process of "How does manifold.cf handles
redirection of URL." in case of web crawler connector

If there is a page redirect (through a 301) to another URL, then the next
crawl will detect the redirect and index the new (final) URL and display it
in the search results. (instead of the old URL that redirects). Just as is
also done by search engines like Google / Bing.

Is that true, , what manifold is capable of avoiding the URL that is 301
and pick the URL to which it is redirected? and ingest that URL .

If not , what process Manifoldcf follows to inges redirection of URL's

Thanks
Ritika

Re: Manifoldcf Redirection process

Posted by Karl Wright <da...@gmail.com>.
302 does get recognized as a redirection, yes


On Fri, May 28, 2021 at 5:07 AM ritika jain <ri...@gmail.com>
wrote:

> Is the process the same when fetch/process status code returned is 302  ?
>>>>  When running a job with web crawler and ES output connector
>>>>
>>>
> can anybody have a clue about  this
>

Re: Manifoldcf Redirection process

Posted by ritika jain <ri...@gmail.com>.
>
> Is the process the same when fetch/process status code returned is 302  ?
>>>  When running a job with web crawler and ES output connector
>>>
>>
can anybody have a clue about  this

Re: Manifoldcf Redirection process

Posted by ritika jain <ri...@gmail.com>.
Is the process the same when fetch/process status code returned is 302  ?
 When running a job with web crawler and ES output connector

On Wed, May 19, 2021 at 10:35 PM Karl Wright <da...@gmail.com> wrote:

> ManifoldCF reads all the URLs on its queue.
> If it's a 301, it detects this and pushes the new URL onto the document
> queue.
> When it gets to the new URL, it processes it like any other.
>
> Karl
>
>
> On Wed, May 19, 2021 at 8:32 AM ritika jain <ri...@gmail.com>
> wrote:
>
>> Hi
>>
>> I want to understand the process of "How does manifold.cf handles
>> redirection of URL." in case of web crawler connector
>>
>> If there is a page redirect (through a 301) to another URL, then the next
>> crawl will detect the redirect and index the new (final) URL and display it
>> in the search results. (instead of the old URL that redirects). Just as is
>> also done by search engines like Google / Bing.
>>
>> Is that true, , what manifold is capable of avoiding the URL that is 301
>> and pick the URL to which it is redirected? and ingest that URL .
>>
>> If not , what process Manifoldcf follows to inges redirection of URL's
>>
>> Thanks
>> Ritika
>>
>>
>>

Re: Manifoldcf Redirection process

Posted by Karl Wright <da...@gmail.com>.
ManifoldCF reads all the URLs on its queue.
If it's a 301, it detects this and pushes the new URL onto the document
queue.
When it gets to the new URL, it processes it like any other.

Karl


On Wed, May 19, 2021 at 8:32 AM ritika jain <ri...@gmail.com>
wrote:

> Hi
>
> I want to understand the process of "How does manifold.cf handles
> redirection of URL." in case of web crawler connector
>
> If there is a page redirect (through a 301) to another URL, then the next
> crawl will detect the redirect and index the new (final) URL and display it
> in the search results. (instead of the old URL that redirects). Just as is
> also done by search engines like Google / Bing.
>
> Is that true, , what manifold is capable of avoiding the URL that is 301
> and pick the URL to which it is redirected? and ingest that URL .
>
> If not , what process Manifoldcf follows to inges redirection of URL's
>
> Thanks
> Ritika
>
>
>