You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@httpd.apache.org by Mi...@telekurs.com on 2002/02/28 16:14:49 UTC
Antwort: Re: Antwort: [STATUS] (httpd-docs-1.3) Wed Feb 27 23:45 :21 EST 2002

Hi,


>> This would at least show whether it is really just the "Hello" page or
>> whether there is a large number of pages available in translated form.
> Is this something that you would be willing to do as a start - count,
> for each language, what we have?

Um - I am not sure whether I am ready to engage myself enough as
as to learn how to cope with the CVS interface and the whole tech-
nical procedures.
Currently I consider myself here as a listener only, and more com-
mitted to the mod_gzip area. It just doesn't hurt to see what's
going on here.

But where yould you take these informations from, when you maintain
this list? The only source I currently know is the document tree
itself, which I suspect being the result of a "cvs checkout".
If there is a different information source with superior content
quality, the lines above would be subject to change.

I just have a look at the document tree. I believe we are talking
about the documents having a filename matching something simular to
(Perl regexp)
          "\w+\.html(.\s+)?"
If so, then there might be some program traversing this document tree
and simply detecting the files having some no-empty $1 content and
counting them - is this the information you want in the list?
Maybe we need a mapping between language character and clear text
name for each language/encoding string, which may be just a simple
ASCII file with two columns, whitespace-separated.
We might also need some rule how to group 'related' extensions (like
maybe those 'ru.'- variants for the "Hello" page with just different
encodings - should these be handled as separate 'languages', or should
the script rather ignore the encoding?).

What I intend to say is that I wouldn't want to maintain such a
table manually, I rather generate it by running a simple program.
You might execute this one each time before sending your mail.
If this is something that would help, then I would try writing such
a script - this should not be too difficult, given exact requirements.
I am much more confident being able to solve a detail problem by wri-
ting a program than by committing myself to a long-term maintenance
job.

>> One more informations that could possibly be of value inside such a list
>> (which might then rather become a table than a list ...) would be the
date
>> of the last update of each language line (i. e. the date your
"translation
>> coordinator" or yourself updated the content of this line for the last
>> time) - this might help identifying languages where nothing has happened
>> for a long time.
> Well, I suppose this information could be immediately obtained from CVS.
> There's a cool little utility called cvs2cl that converts cvs log
messages
> to a change log, which can be tinkered with to only see certain files.
> Or, I'm sure there are other ways to tackle this. But, yes, in the final
> analysis, this would be a manual process.

Again, if the script described above would have to parse the document
tree anyway to check the file name extensions, then
- doing a "stat()" call to retrieve the last modification date of each
  file and
- collect the maximum date for each language
would be only a handful of code lines more.
If you have a document tree with 'meaningful' last modification dates
then this information can be retrieved automatically. You might even
have a tool to extract a list of documents for each language, and by
comparing the modification dates of the language's and the english
version you can easily detect documents whose translations are out-
dated. Given meaningful and reliable information sources you might
get information of various types about the state of each translation,
as well as about the state of the original documentation (see below).

# from the DTD:
# <!ELEMENT modulesynopsis (name , status , identifier , sourcefile ,
#  compatibility? , description , summary , seealso+ , directivesynopsis+)>

The more structural information each single document will contain,
the more information one might be able to retrieve from the whole
set of documents.
Although this is about the first DTD I ever read, I believe I already
see the equivalent of <meta name="description"> there, but no equiva-
lent of <meta name="keywords">, for example.
Other useful fields (for internal maintenance purposes) might be:
- date of last content change
- date of last layout  change
- name of last author having changed the content (+ mail address?)
If we had keywords definition inside these documents, one might
well write a program to automatically generate a keyword index page,
and some search engine might even use some ranking algorithm based
upon these fields.
If we had the date of last content change, if might be a reliable
source of the last-changed date of the document content as discussed
above, i. e. build the maximum of all date values of one language
and you automagically get the last content modification date for
this language's document subset.

Maybe some of the fields mentioned above would be obsolete, as CVS
itself would provide the information (like the "last checkin date").
On the other hand, CVS (to my degree of CVS/RCS knowledge) would not
normally allow for 'types' of checkin (i. e. different handling of
content and layout changes), and if would save some userid of the
person who did the checkin, but maybe not its realname or even mail
address. (Maybe this information might be available by some other
source inside the big Apache universe.)

> I, for one, very much like having it sent as text to the mailing list,
> because of the conversation that it generates (like this note) and
> because I force myself to read it each time. If a URL was sent, I would
> not look at it, and would seldom know that there had been a change.

Well, there really is something to that. So I concentrate now upon
whether this table can be auto-generated from reliable sources and
ASCII-formatted, as to be used in these mails.

> Or, more likely, I would never think to make my own changes to it.

And you shouldn't have to. I would look for changes being made in
the original sources, i. e. the documents itself - I don't like to
invest work into the maintenance of redundant data structures.

>> Another aspect might be that 3rd party modules tend to have version
>> numbers of their own, which might be worth mentioning already in the
>> texts of links leading from the index page to those documents.
>> If so, then the script mentioned by you might want to rely upon some
>> interface where to find these informations. I would suggest using some
>> explicitly specified <META> tags inside the docs which could easily be
>> extracted by the script.
> Joshua, might it be useful to have a tag in the XML docs which specifies
> the latest version of the module? I can't think of any time I have ever
> needed this information, except for mod_perl and mod_php, but perhaps
> it would be useful somewhere?

I could imagine wanting this information for each and every Apache
document, just to see which documentation is referring to which
Apache version.
If you have this information inside the document file, and have a
reliable change log of the modules source code telling which modules
were changed, you could get some programm to automatically detect
any documentation that needs to be updated - any have this list be
sorted by the number of changes since the last content update, or
by the date difference, or by whatever.
I would like to be able to automatically detect the points where
to invest time first.

Maybe I am talking about tools that are only "nice to have".
But I would consider the Apache documentation as a reasonably large
amount of information, and a structured one - especially if is goes
into the XML direction. The semantic model I have in mind while
writing these lines would be a relational database and SQL queries,
but I believe everything of the above would be possible with XML
just as well, and probably even with HTML documents containing pro-
perly filled <meta> tags (i. e. "validating" to the requirements
of some "documentation maintenance tools").

I would like to discuss about which informations may be _worth_ being
extracted automatically from the documentation (this question may as
well be answered from the Apache user's perspective as from the docu-
ment maintainer's or even translator's perspective).
After having some "task description" of this kind there will be enough
information to provide the necessary information by defining appropria-
te DTD fields.
And finally there may be tools to extract appropriate informations
(just like having a tool that detects a new english version of some
document being committed, detects all the names of translators of the
same documents - we know their file name patterns and find names and
even mail addresses inside the DTD structure - and automatically in-
forms them via e-mail about the change).

One final aspect: If we had the list of evaluations to be done for the
DTD information fields, we could derive which of these fields _must_
be filled to provide the required information - i. e. we know how strict
the documents must be conforming to the DTD requirements.
As the DTD is being made right now, maybe it's just the right time for
thinking about which tasks should be handled, and thus which information
fields would be necessary. Doing the evaluation scripts might then be
only a minor part of the work later, and be done individually by several
programmers.


Regards, Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org