You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Hubertus Groepper <hu...@groepper.com> on 2004/12/10 11:30:49 UTC

WGet static export

Hi there

Dumb question:
Is http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26987 (absolute 
path in pages of exported static site)
and http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26986 
(Navigation inconsistent for export of static site)
and http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26708 
(StaticHTMLExporter/WGet is not exporting css files in link tags)
still valid in light of comment #3:
	the stattic exporter will be replaced by the cocoon cli so it makes 
little sense to fix this now
of that last bug, from Gregor on 2004-03-24?

If yes (or unrelated), maybe this is an issue with wrong parameters 
being passed on to WGet in j.o.a.lenya.net.WGet?
I do export my site (off the live area) with

	wget -r -k -np -nH -N -p --cut-dirs=3 
http://localhost:8080/lenya/default/live/index.html

and don't experience any of the aforementioned problems.

And in my usual attempt to melange two threads into one on my road to 
enlightenment: what's the current recommended approach to export to 
static?

Thanks.

hubertus


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Comiotto Thomas <th...@unicom.unizh.ch>.
>
> did you try build export -Dpublication=default
>
> i just checked in 2 fixes to that target, but it needs more work. at 
> least it crawled the execption ;)
>

Will give it another try then!

Thomas



> -- 
> Gregor J. Rothfuss
> COO, Wyona       Content Management Solutions    http://wyona.com
> Apache Lenya                              http://lenya.apache.org
> gregor.rothfuss@wyona.com                       gregor@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail:            dev-help@lenya.apache.org
> Apache Lenya Project                          http://lenya.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Comiotto Thomas wrote:

> Now since the cli export task seems to be broken  - can I do something 
> to help fixing it?

did you try build export -Dpublication=default

i just checked in 2 fixes to that target, but it needs more work. at 
least it crawled the execption ;)

-- 
Gregor J. Rothfuss
COO, Wyona       Content Management Solutions    http://wyona.com
Apache Lenya                              http://lenya.apache.org
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Jean Pierre LeJacq <jp...@quoininc.com>.
On Mon, 13 Dec 2004, Comiotto Thomas wrote:

> Hello Jean-Pierre
>
>
> >> rting and not about re-importing ourselves, do we?
> >
> > I assume you mean using the lenya sitetree.xml file to crawl.  This
> > isn't sufficient since it doesn't list all resources such as CSS
> > files, lenya assets, etc.
>
> Now since the cli export task seems to be broken - can I do
> something to help fixing it?

Well I'm moving over to 2.1.6 right now.  Once I have that in place
I'd like to look at how forrest uses it.  My thinking is to reuse
this if possible.

-- 
JP



---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Comiotto Thomas <th...@unicom.unizh.ch>.
Hello Jean-Pierre


>> rting and not about re-importing ourselves, do we?
>
> I assume you mean using the lenya sitetree.xml file to crawl.  This
> isn't sufficient since it doesn't list all resources such as CSS
> files, lenya assets, etc.
>

Now since the cli export task seems to be broken  - can I do something 
to help fixing it?

Thomas



> -- 
> JP
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail:            dev-help@lenya.apache.org
> Apache Lenya Project                          http://lenya.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Jean Pierre LeJacq <jp...@quoininc.com>.
On Fri, 10 Dec 2004, Thomas Comiotto wrote:

> > I'm in fact working on a static site exporter based on the cocoon
> > CLI mode.  I hope to have this available soon.
>
> Maybe we can join forces then - I'm currently using a sitemap-only
> based hack that uses the sitetree to fetch pages because the
> WGet/crawling-approach isn't flexible, stable and fast enough for what
> I need (exporting user-configurable subsets of a publication,
> forms-based navigation). Now I want to move that to cocoon CLI.
>
> Still, in contrast to what was agreed here crawling a publication in
> the sense of trying to find out something I already know (site
> structure & contents) doesn't really make sense to me. After all we're
> talking about exporting and not about re-importing ourselves, do we?

I assume you mean using the lenya sitetree.xml file to crawl.  This
isn't sufficient since it doesn't list all resources such as CSS
files, lenya assets, etc.

-- 
JP



---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Thomas Comiotto <co...@rcfmedia.ch>.
Hello Jean-Pierre

>> ter spent to move it over to cocoon cli.
>
> I'm in fact working on a static site exporter based on the cocoon
> CLI mode.  I hope to have this available soon.

Maybe we can join forces then - I'm currently using a sitemap-only 
based hack that uses the sitetree to fetch pages because the 
WGet/crawling-approach isn't flexible, stable and fast enough for what 
I need (exporting user-configurable subsets of a publication, 
forms-based navigation). Now I want to move that to cocoon CLI.

Still, in contrast to what was agreed here crawling a publication in 
the sense of trying to find out something I already know (site 
structure & contents) doesn't really make sense to me. After all we're 
talking about exporting and not about re-importing ourselves, do we?

Regards
Thomas




> -- 
> JP
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail:            dev-help@lenya.apache.org
> Apache Lenya Project                          http://lenya.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Comiotto Thomas <th...@unicom.unizh.ch>.
>
>>>
>>> on the contrary, this allows you to reuse the live pipelines very 
>>> easily, without having to second-guess lenya.
>>>
>> True - but why can't we just reuse pipelines internally?
>
> you would still have to build the part that fetches the pages and 
> stores them somewhere, with directory structure intact. which is what 
> WGet does, and CLI too.
>

Sure - but you still might need a facility to export a pub to some 
other (eventually web unaware) format, like in my case ELML for 
instance (markup targeted at eLearning szenarios - they put all of the 
contents including navigational info into one file). Took me a couple 
of hours to extend that to support xhtml and it works like a charm.

But yeah - I am of course also in favor of standardized hybridity! And 
in favor of the cli.

Bests
Thomas



> -- 
> Gregor J. Rothfuss
> COO, Wyona       Content Management Solutions    http://wyona.com
> Apache Lenya                              http://lenya.apache.org
> gregor.rothfuss@wyona.com                       gregor@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail:            dev-help@lenya.apache.org
> Apache Lenya Project                          http://lenya.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Comiotto Thomas wrote:
>>
>> on the contrary, this allows you to reuse the live pipelines very 
>> easily, without having to second-guess lenya.
>>
> 
> True - but why can't we just reuse pipelines internally?

you would still have to build the part that fetches the pages and stores 
them somewhere, with directory structure intact. which is what WGet 
does, and CLI too.

-- 
Gregor J. Rothfuss
COO, Wyona       Content Management Solutions    http://wyona.com
Apache Lenya                              http://lenya.apache.org
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Comiotto Thomas <th...@unicom.unizh.ch>.
>
> on the contrary, this allows you to reuse the live pipelines very 
> easily, without having to second-guess lenya.
>

True - but why can't we just reuse pipelines internally?

Bests
Thomas

> -- 
> Gregor J. Rothfuss
> COO, Wyona       Content Management Solutions    http://wyona.com
> Apache Lenya                              http://lenya.apache.org
> gregor.rothfuss@wyona.com                       gregor@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail:            dev-help@lenya.apache.org
> Apache Lenya Project                          http://lenya.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Comiotto Thomas wrote:

> Maybe we can join forces - I'm currently using a sitemap-only based hack 
> that uses the sitetree to fetch pages because the WGet/crawling-approach 
> isn't flexible, stable and fast enough for what I need (exporting 
> user-configurable subsets of a publication, forms-based navigation). Now 
> I want to move that to cocoon CLI.
> 
> Still, in contrast to what was agreed here crawling a publication in the 
> sense of trying to find out something I already know (site structure & 
> contents) doesn't really make sense to me. After all we're talking about 
> exporting and not about re-importing ourselves, do we?

on the contrary, this allows you to reuse the live pipelines very 
easily, without having to second-guess lenya.

-- 
Gregor J. Rothfuss
COO, Wyona       Content Management Solutions    http://wyona.com
Apache Lenya                              http://lenya.apache.org
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Comiotto Thomas <th...@unicom.unizh.ch>.
Hello Jean-Pierre

>> ter spent to move it over to cocoon cli.
>
> I'm in fact working on a static site exporter based on the cocoon
> CLI mode.  I hope to have this available soon.

Maybe we can join forces - I'm currently using a sitemap-only based 
hack that uses the sitetree to fetch pages because the 
WGet/crawling-approach isn't flexible, stable and fast enough for what 
I need (exporting user-configurable subsets of a publication, 
forms-based navigation). Now I want to move that to cocoon CLI.

Still, in contrast to what was agreed here crawling a publication in 
the sense of trying to find out something I already know (site 
structure & contents) doesn't really make sense to me. After all we're 
talking about exporting and not about re-importing ourselves, do we?

Regards
Thomas




> -- 
> JP
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
> For additional commands, e-mail:            dev-help@lenya.apache.org
> Apache Lenya Project                          http://lenya.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by Jean Pierre LeJacq <jp...@quoininc.com>.
On Fri, 10 Dec 2004, Gregor J. Rothfuss wrote:

> Hubertus Groepper wrote:
> >
> > Is http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26987 (absolute
> > path in pages of exported static site)
> > and http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26986
> > (Navigation inconsistent for export of static site)
> > and http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26708
> > (StaticHTMLExporter/WGet is not exporting css files in link tags)
> > still valid in light of comment #3:
> >     the stattic exporter will be replaced by the cocoon cli so it makes
> > little sense to fix this now
> > of that last bug, from Gregor on 2004-03-24?
> >
> > If yes (or unrelated), maybe this is an issue with wrong parameters
> > being passed on to WGet in j.o.a.lenya.net.WGet?
>
> not a dumb question at all. it seems uneccessary to have our own crawler
> (WGet) when Cocoon has a perfectly fine CLI mode for this purpose. thus
> my comment. if you are so inclined, you are welcome to fix WGet, but
> maybe that time would be better spent to move it over to cocoon cli.

I'm in fact working on a static site exporter based on the cocoon
CLI mode.  I hope to have this available soon.

-- 
JP



---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org


Re: WGet static export

Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Hubertus Groepper wrote:
> Hi there
> 
> Dumb question:
> Is http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26987 (absolute 
> path in pages of exported static site)
> and http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26986 
> (Navigation inconsistent for export of static site)
> and http://issues.eu.apache.org/bugzilla/show_bug.cgi?id=26708 
> (StaticHTMLExporter/WGet is not exporting css files in link tags)
> still valid in light of comment #3:
>     the stattic exporter will be replaced by the cocoon cli so it makes 
> little sense to fix this now
> of that last bug, from Gregor on 2004-03-24?
> 
> If yes (or unrelated), maybe this is an issue with wrong parameters 
> being passed on to WGet in j.o.a.lenya.net.WGet?

not a dumb question at all. it seems uneccessary to have our own crawler 
(WGet) when Cocoon has a perfectly fine CLI mode for this purpose. thus 
my comment. if you are so inclined, you are welcome to fix WGet, but 
maybe that time would be better spent to move it over to cocoon cli.

> I do export my site (off the live area) with
> 
>     wget -r -k -np -nH -N -p --cut-dirs=3 
> http://localhost:8080/lenya/default/live/index.html
> 
> and don't experience any of the aforementioned problems.

that is always an option, although not a portable one, and does not 
happen automatically when you publish (which is presumably what you'd want)

> And in my usual attempt to melange two threads into one on my road to 
> enlightenment: what's the current recommended approach to export to static?

if wget -r works fine for you, by all means use it. if you are 
interested to improve lenya's own export facilities, then you should 
take a look at cocoon cli.

-- 
Gregor J. Rothfuss
COO, Wyona       Content Management Solutions    http://wyona.com
Apache Lenya                              http://lenya.apache.org
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail:              dev-unsubscribe@lenya.apache.org
For additional commands, e-mail:            dev-help@lenya.apache.org
Apache Lenya Project                          http://lenya.apache.org