You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by h0444xk8 <h0...@posteo.de> on 2022/01/31 11:10:28 UTC

Web connector HTTP response header content-encoding:gzip

Hi all,

I have a question regarding the Web connector. I want to crawl a website 
which responses with compressed HTTP. The returned HTTP response header 
is content-encoding:gzip.

Is Manifold able to handle such compressed websites? If I try to crawl 
the website there is a 404 displayed in the history. But from the 
machine the site is accessable and if I try to crawl a non compressed 
website, all works fine. So the 404 is maybe missleading.

Can I set the accepted-encoding header for the Web connector repo to 
avoid getting compressed content?

Or is there any other solution?

Regards

Sebastian