You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Good Guy <xf...@hotmail.com> on 2017/12/23 23:53:12 UTC

[users@httpd] Re: mirror a html site

On 23/12/2017 10:26, Miguel González wrote:
>
>   A hosting company with their builder tool created a static html site
> that can´t be downloaded.
>
Did you try this tool?

<https://www.httrack.com/>

If not please provide a link of the site because there is no such thing 
as "can´t be downloaded" when the site is visible to the public.






---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Re: mirror a html site

Posted by Ruben Safir <mr...@panix.com>.
On 12/24/2017 01:49 PM, Miguel González wrote:
> On 12/24/17 12:53 AM, Good Guy wrote:
>> On 23/12/2017 10:26, Miguel González wrote:
>>>   A hosting company with their builder tool created a static html site
>>> that can´t be downloaded.
>>>
>> Did you try this tool?
>>
>> <https://www.httrack.com/>
>>
>> If not please provide a link of the site because there is no such thing
>> as "can´t be downloaded" when the site is visible to the public.
> What I mean is that the company doesn´t provide any FTP access to
> download the files.
> 
> I did use httrack and at least I could keep a backup of the website (not
> complete, because It wasn´t able to download links with spanish characters).
> 
> Unfortunately as I said, it creates folders for the cdn entries and the
> structure of the website is using www.mysite.com/www.mysite.com/
> structure with subfolders for each cdn.
> 
> For the time being I am using wget -mkEp which is still using the cdn
> entries from the company. It´s not the best solution but in case they
> turn of the cdns It will be much "easier" to change links manually.
> 
> thanks!


Scraping the website largely depends on the amount of javascript garbage
on the pages.  The straight html and source can be pulled by LWP and
w3m, fairly easily.

-- 
So many immigrant groups have swept through our town
that Brooklyn, like Atlantis, reaches mythological
proportions in the mind of the world - RI Safir 1998
http://www.mrbrklyn.com

DRM is THEFT - We are the STAKEHOLDERS - RI Safir 2002
http://www.nylxs.com - Leadership Development in Free Software
http://www2.mrbrklyn.com/resources - Unpublished Archive
http://www.coinhangout.com - coins!
http://www.brooklyn-living.com

Being so tracked is for FARM ANIMALS and and extermination camps,
but incompatible with living as a free human being. -RI Safir 2013

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Re: mirror a html site

Posted by Miguel González <mi...@yahoo.es.INVALID>.
On 12/24/17 12:53 AM, Good Guy wrote:
> On 23/12/2017 10:26, Miguel González wrote:
>>
>>   A hosting company with their builder tool created a static html site
>> that can´t be downloaded.
>>
> Did you try this tool?
> 
> <https://www.httrack.com/>
> 
> If not please provide a link of the site because there is no such thing
> as "can´t be downloaded" when the site is visible to the public.

What I mean is that the company doesn´t provide any FTP access to
download the files.

I did use httrack and at least I could keep a backup of the website (not
complete, because It wasn´t able to download links with spanish characters).

Unfortunately as I said, it creates folders for the cdn entries and the
structure of the website is using www.mysite.com/www.mysite.com/
structure with subfolders for each cdn.

For the time being I am using wget -mkEp which is still using the cdn
entries from the company. It´s not the best solution but in case they
turn of the cdns It will be much "easier" to change links manually.

thanks!

---
This email has been checked for viruses by AVG.
http://www.avg.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org