You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by g4 <ja...@root10.net> on 2003/08/12 20:21:46 UTC

meta thoughts

Hi Jeff,

looking into awk got the better of me.  So I ended going off at a 
tangent ;)

As you said, FTR. nonetheless I've come up with some interesting 
possibilities, so here's just to let you and the list know . Here is 
the result of a quick test using the Forrest home page:

[g4:~/awk-dev] g4%  /sw/bin/gawk-3.1.0 -f strip forrest.html
forrest 27
apache  14
website 7
documentation   7
document        7
content 7
project 6
write   5
dynamic 5
cocoon  5
status  4
skins   4
sites   4
Ended at Tue/Aug/2003 18:14:26
[g4:~/awk-dev] g4%

This is with a word length set to > 4 and word frequency > 3

[g4:~/awk-dev] g4%  /sw/bin/gawk-3.1.0 -f strip forrest.html
forrest 27
apache  14
website 7
documentation   7
document        7
content 7
project 6
write   5
dynamic 5
cocoon  5
status  4
skins   4
sites   4
using   3
these   3
static  3
specific        3
software        3
rendered        3
projects        3
print   3
powerful        3
making  3
makes   3
formats 3
focus   3
Ended at Tue/Aug/2003 18:36:06

and this with a word frequency > 2

As you can see there are some words that shouldn't be there (these, 
makes, etc...). So I think managing keywords words by frequency is not 
really the way to go with something like this, a definitive list of 
excluded words would be needed, this would also have the benefit of 
being accessible and manageable.  I will continue with this anyway, at 
least I'm getting to know awk ;)

Jason Lane