You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by ritika jain <ri...@gmail.com> on 2021/05/19 12:31:25 UTC
Manifoldcf Redirection process
Hi
I want to understand the process of "How does manifold.cf handles
redirection of URL." in case of web crawler connector
If there is a page redirect (through a 301) to another URL, then the next
crawl will detect the redirect and index the new (final) URL and display it
in the search results. (instead of the old URL that redirects). Just as is
also done by search engines like Google / Bing.
Is that true, , what manifold is capable of avoiding the URL that is 301
and pick the URL to which it is redirected? and ingest that URL .
If not , what process Manifoldcf follows to inges redirection of URL's
Thanks
Ritika
Re: Manifoldcf Redirection process
Posted by Karl Wright <da...@gmail.com>.
302 does get recognized as a redirection, yes
On Fri, May 28, 2021 at 5:07 AM ritika jain <ri...@gmail.com>
wrote:
> Is the process the same when fetch/process status code returned is 302 ?
>>>> When running a job with web crawler and ES output connector
>>>>
>>>
> can anybody have a clue about this
>
Re: Manifoldcf Redirection process
Posted by ritika jain <ri...@gmail.com>.
>
> Is the process the same when fetch/process status code returned is 302 ?
>>> When running a job with web crawler and ES output connector
>>>
>>
can anybody have a clue about this
Re: Manifoldcf Redirection process
Posted by ritika jain <ri...@gmail.com>.
Is the process the same when fetch/process status code returned is 302 ?
When running a job with web crawler and ES output connector
On Wed, May 19, 2021 at 10:35 PM Karl Wright <da...@gmail.com> wrote:
> ManifoldCF reads all the URLs on its queue.
> If it's a 301, it detects this and pushes the new URL onto the document
> queue.
> When it gets to the new URL, it processes it like any other.
>
> Karl
>
>
> On Wed, May 19, 2021 at 8:32 AM ritika jain <ri...@gmail.com>
> wrote:
>
>> Hi
>>
>> I want to understand the process of "How does manifold.cf handles
>> redirection of URL." in case of web crawler connector
>>
>> If there is a page redirect (through a 301) to another URL, then the next
>> crawl will detect the redirect and index the new (final) URL and display it
>> in the search results. (instead of the old URL that redirects). Just as is
>> also done by search engines like Google / Bing.
>>
>> Is that true, , what manifold is capable of avoiding the URL that is 301
>> and pick the URL to which it is redirected? and ingest that URL .
>>
>> If not , what process Manifoldcf follows to inges redirection of URL's
>>
>> Thanks
>> Ritika
>>
>>
>>
Re: Manifoldcf Redirection process
Posted by Karl Wright <da...@gmail.com>.
ManifoldCF reads all the URLs on its queue.
If it's a 301, it detects this and pushes the new URL onto the document
queue.
When it gets to the new URL, it processes it like any other.
Karl
On Wed, May 19, 2021 at 8:32 AM ritika jain <ri...@gmail.com>
wrote:
> Hi
>
> I want to understand the process of "How does manifold.cf handles
> redirection of URL." in case of web crawler connector
>
> If there is a page redirect (through a 301) to another URL, then the next
> crawl will detect the redirect and index the new (final) URL and display it
> in the search results. (instead of the old URL that redirects). Just as is
> also done by search engines like Google / Bing.
>
> Is that true, , what manifold is capable of avoiding the URL that is 301
> and pick the URL to which it is redirected? and ingest that URL .
>
> If not , what process Manifoldcf follows to inges redirection of URL's
>
> Thanks
> Ritika
>
>
>