You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Bertrand Delacretaz <bd...@codeconsult.ch> on 2006/07/04 13:22:29 UTC

Re: caching RSS feeds or external xml files (WAS: Caching jx with flow)

On 7/4/06, Ard Schrijvers <a....@hippo.nl> wrote:

> ...I think it is not very hard to make an external rss/xml generator that meets these 4 points...but I am afraid that it must be in cocoon somewhere already...

A fairly effective poor man's way of doing this, as someone also
suggested recently, is to use an external cron job to fetch a feed's
XML, check that it is well-formed (in case you don't trust the backend
too much), and replace a local copy with the new version if different.

A full-blown generator is of course a cleaner way of handling this,
but if someone's interested, here's a script that I use to implement
such a scenario. I'm using xmlfw to check the XML, but many other
tools would do.

ME=$(basename $0)
URL=$1
FINAL_OUTPUT=$2
OUTPUT_DIR=/tmp/$ME-$$
USAGE="usage: $ME url_to_retrieve output_file (example: $ME
http://somefeed.ch/rdf/ /tmp/output.xml)"

# this name is fixed by xmlfw, hence the variable OUTPUT_DIR
TEMP_FILE=$OUTPUT_DIR/STDIN

fatal() {
    echo $ME: $*
    rmdir $OUTPUT_DIR 2>/dev/null
    exit 1
}

[[ -n "$URL" ]] || fatal $USAGE
[[ -n "$FINAL_OUTPUT" ]] || fatal $USAGE

# use wget to retrieve URL
# xmlfw checks well-formedness and only if ok copies file to
# output dir, using STDIN as the filename
mkdir -p $OUTPUT_DIR
rm -f $TEMP_FILE
( wget -q -O- $URL | xmlwf -d $OUTPUT_DIR -c )
[[ -f $TEMP_FILE ]] || fatal "did not get well-formed XML from $URL"
mv $TEMP_FILE $FINAL_OUTPUT


-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org