You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Bhuvaneswaran Arumugam <bh...@collab.net> on 2006/07/18 09:22:18 UTC

[PATCH] Atom 1.0 feeds for subversion

Hello,

Based on the discussion [1] i have written a new python script to
generate Atom 1.0 feeds for commit information. It uses xml dom library
to generate the feeds.

[1] http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=116571

Please find the attached patch.
TODO: Implement a scheme to insert the items to the file if it already
exists (using pickle?) and make use of --max-items option.

[[
* contrib/hook-scripts/svn2atom.py
  New script to generate Atom 1.0 feeds for commit information. It uses
  xml dom library to generate the feeds

* www/tools_contrib.html
  Include the documentation for contrib/hook-scripts/svn2atom.py script
]]

-- 
Regards,
Bhuvaneswaran

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Bhuvaneswaran Arumugam <bh...@collab.net>.
> I haven't run this, but I betcha the test case is, "Run svn2atom.py from
> inside the repository directory, and specify it as '.'" or somesuch.  If the
> code, is I suspect, nearly identical to that of svn2rss.py, then the line
> which reads:
> 
>    % (os.path.basename(self.repos_path))
> 
> Should actually be:
> 
>    % os.path.basename(os.path.abspath(self.repos_path)))

Yes, i can repeat the problem with this test case. The above statement
just fixes the issue!

-- 
Regards,
Bhuvaneswaran

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Max Bowsher <ma...@ukf.net>.
Malcolm Rowe wrote:
> On Wed, Jul 19, 2006 at 08:55:21AM -0700, C. Michael Pilato wrote:
>>>> In my testing, the script generated a file called '.atom', not
>>>> REPOS_NAME.atom.
>>>
>>> I'm unable to repeat this issue. Can you please provide the test case ?
>> I haven't run this, but I betcha the test case is, "Run svn2atom.py from
>> inside the repository directory, and specify it as '.'" or somesuch.  If the
>> code, is I suspect, nearly identical to that of svn2rss.py, then the line
>> which reads:
>>
>>    % (os.path.basename(self.repos_path))
>>
>> Should actually be:
>>
>>    % os.path.basename(os.path.abspath(self.repos_path)))
>>
> 
> Actually, it's caused by a trailing slash on the repo path:
> 
> $ ./svn2atom.py ~/svn/sample-repo/
> creates a file called '.atom', while
> $ ./svn2atom.py ~/svn/sample-repo
> creates 'sample-repo.atom'.
> 
> Will the above fix work?  (I'm not sure if Max has already fixed this
> in trunk's svn2rss - there was something like this that just went in).


I haven't been paying attention to this bug, I've been working solely on
getting things into a sane state to merge Atom support.

If I've fixed anything like this, it was accidental.

Max.


Re: [PATCH] Atom 1.0 feeds for subversion

Posted by "C. Michael Pilato" <cm...@collab.net>.
Bhuvaneswaran wrote:
>>Actually, it's caused by a trailing slash on the repo path:
>>
>>$ ./svn2atom.py ~/svn/sample-repo/
>>creates a file called '.atom', while
>>$ ./svn2atom.py ~/svn/sample-repo
>>creates 'sample-repo.atom'.
>>
>>Will the above fix work?  (I'm not sure if Max has already fixed this
>>in trunk's svn2rss - there was something like this that just went in).
>
>
> It was fixed by Mike in r20750.

I think it was a combination of fixes -- my abspath and Max's (?)
canonicalization -- that fixed this.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Bhuvaneswaran Arumugam <bh...@collab.net>.
> Actually, it's caused by a trailing slash on the repo path:
> 
> $ ./svn2atom.py ~/svn/sample-repo/
> creates a file called '.atom', while
> $ ./svn2atom.py ~/svn/sample-repo
> creates 'sample-repo.atom'.
> 
> Will the above fix work?  (I'm not sure if Max has already fixed this
> in trunk's svn2rss - there was something like this that just went in).

It was fixed by Mike in r20750.
-- 
Regards,
Bhuvaneswaran

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Jul 19, 2006 at 08:55:21AM -0700, C. Michael Pilato wrote:
> >>In my testing, the script generated a file called '.atom', not
> >>REPOS_NAME.atom.
> > 
> > 
> > I'm unable to repeat this issue. Can you please provide the test case ?
> 
> I haven't run this, but I betcha the test case is, "Run svn2atom.py from
> inside the repository directory, and specify it as '.'" or somesuch.  If the
> code, is I suspect, nearly identical to that of svn2rss.py, then the line
> which reads:
> 
>    % (os.path.basename(self.repos_path))
> 
> Should actually be:
> 
>    % os.path.basename(os.path.abspath(self.repos_path)))
> 

Actually, it's caused by a trailing slash on the repo path:

$ ./svn2atom.py ~/svn/sample-repo/
creates a file called '.atom', while
$ ./svn2atom.py ~/svn/sample-repo
creates 'sample-repo.atom'.

Will the above fix work?  (I'm not sure if Max has already fixed this
in trunk's svn2rss - there was something like this that just went in).

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by "C. Michael Pilato" <cm...@collab.net>.
>>In my testing, the script generated a file called '.atom', not
>>REPOS_NAME.atom.
> 
> 
> I'm unable to repeat this issue. Can you please provide the test case ?

I haven't run this, but I betcha the test case is, "Run svn2atom.py from
inside the repository directory, and specify it as '.'" or somesuch.  If the
code, is I suspect, nearly identical to that of svn2rss.py, then the line
which reads:

   % (os.path.basename(self.repos_path))

Should actually be:

   % os.path.basename(os.path.abspath(self.repos_path)))


-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Bhuvaneswaran Arumugam <bh...@collab.net>.
Malcolm, Thanks for your review!
I have made necessary changes and i'll attach the patch shortly in next
message. Please see my comments below which will be reflected in my
patch.

> * The usage message could be a lot clearer.  I couldn't determine what
>   the effect of some of the switches would be without using them.

Ok. I have added the enough document to such messages.

> * The output Atom file declares the XHTML namespace, but doesn't use it.

Yeah, you are correct. Henceforth i don't declare the XHTML namespace.

> * There's a fair bit of unnecessary trailing whitespace in the file.

I have corrected them.

> In my testing, the script generated a file called '.atom', not
> REPOS_NAME.atom.

I'm unable to repeat this issue. Can you please provide the test case ?

> The generated URL for each item is ITEM_URL?rev=N.  That's a bit odd
> (it assumes a particular URL scheme), but if we decide to keep it,
> we should document it here.

Yes, i've now documented.

> Atom feeds require both a feed URL and item URL - or perhaps you can drop
> the feed URL if every item has a URL, I'm not exactly sure.  Whichever it
> is, there's no point in allowing the user to be able to generate invalid
> Atom feeds (with URL's such as 'None', as currently happens).

The feed URL and item URL are required in the Atom feed. The domain part
of the feed URL and item URL may be different, so, we can't drop the
feed URL. If the user don't pass the feed URL or item URL, it is set as
"".

> * The generated feeds don't pass the feed validator[1], since they lack an
>   atom:link entry with rel="self".

Yeah, you are correct. But i guess, it is just a warning message.
Moreover, as per Atom syndication format specified in [1], this entry is
optional. Anyhow, i tried to add this entry with 'rel="self"' attribute,
but still the warning continues to appear. I'm unsure howto overcome
this warning message.

> Finally, I needed to apply the attached patch to get this to work
> at all.  I'm assuming you're using something outside of the standard
> Python libraries for XML pretty-printing, so I just disabled that bit.
> I think you should also be using a different method to get at the
> DOMImplementation object (see the patch).

Thanks for the patch. I've incorporated it in my patch.

[1] http://www.atomenabled.org/developers/syndication/#link , Atom
syndication format - Introduction
-- 
Regards,
Bhuvaneswaran

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Tue, Jul 18, 2006 at 02:52:18PM +0530, Bhuvaneswaran Arumugam wrote:
> Hello,
> 
> Based on the discussion [1] i have written a new python script to
> generate Atom 1.0 feeds for commit information. It uses xml dom library
> to generate the feeds.
> 

Nice!  Here's another quick review, though not so much focussed on the
implementation, since my Python's not that hot.

Some quick notes:

* The usage message could be a lot clearer.  I couldn't determine what
  the effect of some of the switches would be without using them.

* The output Atom file declares the XHTML namespace, but doesn't use it.

* There's a fair bit of unnecessary trailing whitespace in the file.

> + -f, --atom-file=PATH   Store the ATOM feed in the file located at PATH,
> +                        which will be created if it doesn't already exist.  
> +                        If not provided, the script will store the feed in 
> +                        the current working directory, in a file named
> +                        REPOS_NAME.atom (where REPOS_NAME is the basename
> +                        of the REPOS_PATH command-line argument). 
> + 

In my testing, the script generated a file called '.atom', not
REPOS_NAME.atom.

> + -u, --item-url=URL     Use URL as the basis for generating ATOM item links.
> + 

The generated URL for each item is ITEM_URL?rev=N.  That's a bit odd
(it assumes a particular URL scheme), but if we decide to keep it,
we should document it here.

> + -U, --feed-url=URL     Use URL as the global ATOM feed link.
> +

Atom feeds require both a feed URL and item URL - or perhaps you can drop
the feed URL if every item has a URL, I'm not exactly sure.  Whichever it
is, there's no point in allowing the user to be able to generate invalid
Atom feeds (with URL's such as 'None', as currently happens).


* The generated feeds don't pass the feed validator[1], since they lack an
  atom:link entry with rel="self".


Finally, I needed to apply the attached patch to get this to work
at all.  I'm assuming you're using something outside of the standard
Python libraries for XML pretty-printing, so I just disabled that bit.
I think you should also be using a different method to get at the
DOMImplementation object (see the patch).

(Okay, I don't think I really needed to remove the StringIO import,
but it didn't seem to be used either).

Regards,
Malcolm

[1] http://feedvalidator.org/, passing of which should be a prerequisite to
    checking this in, I feel.

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Bhuvaneswaran Arumugam <bh...@collab.net>.
> > Yes, i see you have incorporated all the changes and *more* than that.
> > It looks to be fully complete! Thank you very much for spending your
> > time on this!
> 
> Not _quite_ complete. There is still the issue of removing old items
> from atom feeds.

Yes and we have explicitly mentioned this.

> > Not really. Anyhow, the revision date is already part of the feed
> > content. So, to avoid redundant data we can retain as it is now.
> 
> Many feed readers have a separate display for "Date" apart from the feed
> content. I think we should put the revision date into the actual XML
> metadata, and stop putting the date into the feed summary at all.

I guess almost all feed readers have a separate display for "Date".
Infact, couple of feed readers (ex: sage, a firefox plugin. i'm unsure
about other feed readers) render the feeds based on this date field,
irrespective of the order of the item in the XML file. So, i guess, if
we start putting the revision date in the actual XML metadata, we will
be sure the latest revision information is always displayed on top.

IMO, it will be a good thing to do.

> Oops. First, refactored the code to use append() to replace the use of
> lots of local variables.
> 
> Then, later, I came back and refactored my code again to use string
> interpolation directly from the pipe results. Sorry for the confusion.

Yeah, it happens sometimes. No problem!

> The RSS generator puts new items at the *start* of the XML document.
> 
> The new Atom generator however, puts new items at the *end* of the XML
> document.
> 
> Does this matter?

As stated above, it seems to be depend on feed readers. Anyhow, the
following patch may address this:

     def add_revision_item(self, revision):
         item = self._make_atom_item(revision)
-        self.feed.appendChild(item)
+        self.feed.insertBefore(item, self.feed.firstChild)
         # FIXME: Process max_items

     def write_output(self):
-- 
Regards,
Bhuvaneswaran

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Max Bowsher <ma...@ukf.net>.
Bhuvaneswaran Arumugam wrote:
>>> This script is almost inherited from svn2rss.py. So, all the
>>> corrections/changes we do here must be done in that script as well.
>> Ah, I *see*. In that case we must merge the code into a unified feed
>> producing script. To do otherwise is a severe maintenance headache.
>>
>> So, that's what I have done. Please review the latest version of
>> svn2rss.py, in which I think I have incorporated all your changes,
>> though in substantially different form.
> 
> Yes, i see you have incorporated all the changes and *more* than that.
> It looks to be fully complete! Thank you very much for spending your
> time on this!

Not _quite_ complete. There is still the issue of removing old items
from atom feeds.

>>>>> +        doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))
>>>> Now? Shouldn't the time be taken from the revision's svn:date?
>>> I guess it should be the entry published/updated date instead of
>>> revision date.
>> Is it required to be so for feed readers to work properly? Otherwise, I
>> think it might make more sense to use the revision date.
> 
> Not really. Anyhow, the revision date is already part of the feed
> content. So, to avoid redundant data we can retain as it is now.

Many feed readers have a separate display for "Date" apart from the feed
content. I think we should put the revision date into the actual XML
metadata, and stop putting the date into the feed summary at all.

>>>> desc_lines.append('')
>>>> item_desc = "\n".join(desc_lines)
>>> Yes, i have incorporated this change.
>> Actually, I meant, use this to replace the use of individual variables
>> entirely  - have a look at my version to see what I meant.
> 
> I see you no more use append() function.

Oops. First, refactored the code to use append() to replace the use of
lots of local variables.

Then, later, I came back and refactored my code again to use string
interpolation directly from the pipe results. Sorry for the confusion.



One additional thing I noticed, but forgot to mention in a review until now:

The RSS generator puts new items at the *start* of the XML document.

The new Atom generator however, puts new items at the *end* of the XML
document.

Does this matter?

Max.


Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Bhuvaneswaran Arumugam <bh...@collab.net>.
> > This script is almost inherited from svn2rss.py. So, all the
> > corrections/changes we do here must be done in that script as well.
> 
> Ah, I *see*. In that case we must merge the code into a unified feed
> producing script. To do otherwise is a severe maintenance headache.
> 
> So, that's what I have done. Please review the latest version of
> svn2rss.py, in which I think I have incorporated all your changes,
> though in substantially different form.

Yes, i see you have incorporated all the changes and *more* than that.
It looks to be fully complete! Thank you very much for spending your
time on this!

> >>> +        doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))
> >> Now? Shouldn't the time be taken from the revision's svn:date?
> > 
> > I guess it should be the entry published/updated date instead of
> > revision date.
> 
> Is it required to be so for feed readers to work properly? Otherwise, I
> think it might make more sense to use the revision date.

Not really. Anyhow, the revision date is already part of the feed
content. So, to avoid redundant data we can retain as it is now.

> >> desc_lines.append('')
> >> item_desc = "\n".join(desc_lines)
> > 
> > Yes, i have incorporated this change.
> 
> Actually, I meant, use this to replace the use of individual variables
> entirely  - have a look at my version to see what I meant.

I see you no more use append() function.

> The point was to avoid creating a range at all - your code really just
> did what range() does, but manually.
> 
> Since this is pre-existing code in svn2rss.py, we can leave it and tidy
> it up later.

Ok!
-- 
Regards,
Bhuvaneswaran

Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Max Bowsher <ma...@ukf.net>.
Bhuvaneswaran Arumugam wrote:
> This script is almost inherited from svn2rss.py. So, all the
> corrections/changes we do here must be done in that script as well.

Ah, I *see*. In that case we must merge the code into a unified feed
producing script. To do otherwise is a severe maintenance headache.

So, that's what I have done. Please review the latest version of
svn2rss.py, in which I think I have incorporated all your changes,
though in substantially different form.

>>>> TODO: Implement a scheme to insert the items to the file if it already
>>>> exists (using pickle?) and make use of --max-items option.

Implementing insertion turned out to be really easy, so I just did it.

For now, item removal is NOT implemented, and needs to be. It shouldn't
be too difficult, but I do not know DOM very well, so I left it for now.

>>> +        self.init_output()
>>> +        self.add_header()
>> Why are these not one combined function?
> 
> We use it only here. So, IMO one combined function is unnecessary.

It seems very weird to have two functions which are only ever invoked
one after the other. Therefore, I have merged them.

>>> +        self.feed.setAttribute("xml:lang", "en")
>> How can you be certain of this? Better to leave it blank than set it to
>> a possibly incorrect value.
> 
> I have corrected it.

Actually, it was still in your second version of the patch. I dropped it.

>>> +        self.author.appendChild(self.aname)
>>> +        self.aname.appendChild(doc.createTextNode("subversion"))
>> Is it really necessary for all these nodes to be member variables?
> 
> These entries are necessary for an Atom feed. If i get you wrong, can
> you please elaborate?

The tags are required, yes, but they can be local variables of the
function. They do not need to be member variables of the Svn2Atom instance.

>>> +        doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))
>> Now? Shouldn't the time be taken from the revision's svn:date?
> 
> I guess it should be the entry published/updated date instead of
> revision date.

Is it required to be so for feed readers to work properly? Otherwise, I
think it might make more sense to use the revision date.

>> desc_lines.append('')
>> item_desc = "\n".join(desc_lines)
> 
> Yes, i have incorporated this change.

Actually, I meant, use this to replace the use of individual variables
entirely  - have a look at my version to see what I meant.

>>> +            revisions = range(start, end + 1)[-max_items:]
>> range() creates an explicit list. Nasty things will happen if the user
>> specifies a large revision range here. Rewrite to not use range(), by
>> simply using the start and end variables to control a loop variable
>> directly.
> 
> I have incorporated this change.

The point was to avoid creating a range at all - your code really just
did what range() does, but manually.

Since this is pre-existing code in svn2rss.py, we can leave it and tidy
it up later.

Max.



Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Bhuvaneswaran Arumugam <bh...@collab.net>.
Max, thanks for your review! Please find attached the revised patch and
see my comments below.

> > TODO: Implement a scheme to insert the items to the file if it already
> > exists (using pickle?) and make use of --max-items option.
> 
> If the file *only* contains Subversion commit entries, then all the data
> required to reproduce old items is in the Subversion repository, so
> there should be no need to pickle, or re-read the file.

Yep. But, the Atom feed should be updated with recent revisions on
periodic basis. For example, when the feed contains revision 5 .. 1 and
if we run the script, the feed should be updated to have revisions 6,
5 ... 1. Please refer svn2rss.py for more details.

This script is almost inherited from svn2rss.py. So, all the
corrections/changes we do here must be done in that script as well.

> 'Python xml.dom module' explains what you mean more clearly.
> 
> Also, missing '.' at end of sentences.

[[
Patch by: Bhuvaneswaran Arumugam <bh...@collab.net>
Tweaked by: malcolm
Reviewed by: malcolm
             maxb

* contrib/hook-scripts/svn2atom.py
  New script to generate Atom 1.0 feeds for commit information. It uses
  python xml.dom module to generate the feeds.

* www/tools_contrib.html
  Include the documentation for svn2atom.py script.
]]

> ATOM or Atom? Find out which is officially correct, and use it exclusively.

In all reference sites (ex: [1], [2]) they use Atom and never use ATOM.
So, i'm assuming it is officially correct and use it.

> Explain what happens if the file already exists. It will be overwritten?

Ok. I have addressed it.

> If keeping of old items isn't implemented yet, then the above comment is
>  wrong.
> 
> If you don't want to implement keeping of old items in the initial
> version, that is fine, but the documentation must reflect that. To put
> it another way: missing features are acceptable. Bugs are not.

Yeah, for now, i have not implemented the method to keep old items.

> > + -u, --item-url=URL     Use URL as the basis for generating ATOM item links.
> 
> It's unclear what you are supposed to use this for without reading the code.
> 
> > + -U, --feed-url=URL     Use URL as the global ATOM feed link.
> 
> Ditto.

I have added more description.

> > + -P, --svn-path=DIR     Look in DIR for the svnlook binary.  If not provided,
> > +                        the script will run "svnlook" via a typical $PATH hunt.
> 
> Say: "svnlook" must be on the PATH.
> Not: the script will run "svnlook" via a typical $PATH hunt.
> 
> because the old version implies that the script is hunting though PATH
> itself.

Yeah, i have corrected it.

> 
> > +"""
> > +
> > +import sys
> > +
> > +# All clear on the custom module checks.  Import some standard stuff.
> 
> Wrong comment. There were no custom module checks. Also "Import some
> standard stuff." is entirely self evident. Delete the comment.

I have corrected it.

> Please use PEP 8 style <http://www.python.org/dev/peps/pep-0008/>. In
> this case, that means each of the above imports gets a separate line.

Thanks for the link. Now, each import statement gets a separate line!

> > +    stream = errmsg is not None and sys.stderr or sys.stdout
> 
> This feels too unreadable for me, I'd prefer an explicit "if" statement.

I have incorporated this change.

> No. What if someone wants to put a svn:// URL in there? Or an ftp://
> one? Or even a relative URL?
> 
> I think this check should be removed.

The URL we are talking here is an Atom feed URL. It could be an http://,
https:// or file:// based URL. This value cannot be a svn:// or ftp://
based URL. So, we need not remove this check.

> > +class SVN2ATOM:
> 
> PEP 8 ==> Svn2Atom

I have corrected it.

> > +        self.init_output()
> > +        self.add_header()
> 
> Why are these not one combined function?

We use it only here. So, IMO one combined function is unnecessary.

> > +        self.feed.appendChild(doc.createTextNode("\n  "))
> > +        self.feed.setAttribute("xml:lang", "en")
> 
> How can you be certain of this? Better to leave it blank than set it to
> a possibly incorrect value.

I have corrected it.

> > +        self.title.appendChild(doc.createTextNode("%s SVN Commit's Feed" % \
> 
> I would prefer 'Subversion' in place of 'SVN'.
> Incorrect use of apostrophe. Remove.
> Do not use redundant backslashes.

I have corrected it.

> > +        self.id.appendChild(doc.createTextNode("%s" % self.feed_url or ""))
> 
> What's with the 'or ""' ?

It's intended to store "" in case self.feed_url is None. I have
corrected it.

> If you are going to require Python 2.3, then you must state that at the
> top of the script.

Now, I have stated it at the top of the script.

> > +        self.author.appendChild(self.aname)
> > +        self.aname.appendChild(doc.createTextNode("subversion"))
> 
> Is it really necessary for all these nodes to be member variables?

These entries are necessary for an Atom feed. If i get you wrong, can
you please elaborate?

> > +              self.item_url + "?rev=%d" % self.revision or ""
> 
> Hardcoding of ?rev= is not reasonable. Arrange for there to be a marker
> in item_url which you replace.

As i mentioned earlier, this script is almost inherited from svn2rss.py
script. I should come up with such marker for both these scripts and
i'll take it separately.

> > +        doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))
> 
> Now? Shouldn't the time be taken from the revision's svn:date?

I guess it should be the entry published/updated date instead of
revision date.

> > +        cmd = "%s info -r %s %s" % (self.svnlook_cmd, self.revision, self.repos_path)
> 
> Use list style commands, as you do later in main(), to avoid shell
> metacharacter issues.

Yes, i have incorporated this change.

> > +        out, x, y = popen2.popen3(cmd)
> 
> x? y? Yuck, use clear variable names.

Yes, i'm using clear variable names now!

> But don't the lines already have a newline attached?

Yes, they do have a newline attached. So, i have corrected it.

> Weird capitalization. Local variables should be lower-case.

Yes, i have corrected it.

> > +            changed_files = changed_files + item
> 
> +=

Yes, i have incorporated this change. Thanks to madan for clarifying
this comment :)

> desc_lines.append('')
> item_desc = "\n".join(desc_lines)

Yes, i have incorporated this change.

> > +    atom_file = os.path.basename(repos_path) + ".atom"
> 
> Is this a normal file extension convention for atom files?

Yes, as per [1] this is the normal file extension convention for atom
files.

> > +    if (commit_rev == None):
> 
> Un-Pythonic parentheses.
> 
> It is conventional to test "is None", not "== None".

Yes, i have incorporated this change.

> > +        except ValueError, msg:
> > +            usage_and_exit("svn2atom.py: Invalid value '%s' for --revision." \
> > +                           % (commit_rev))
> 
> What's this supposed to catch, exactly? I cannot think of any
> circumstances in which a ValueError would be raised here.

Earlier, the 'except' was not in correct loop. I have corrected it and
it is intended to handle invalid revision values (ex: 1:a).

> > +            revisions = range(start, end + 1)[-max_items:]
> 
> range() creates an explicit list. Nasty things will happen if the user
> specifies a large revision range here. Rewrite to not use range(), by
> simply using the start and end variables to control a loop variable
> directly.

I have incorporated this change.

[1]
http://en.wikipedia.org/wiki/Atom_(standard)#Atom_1.0_and_IETF_Standardization
[2] http://www.atomenabled.org/developers/syndication/
-- 
Regards,
Bhuvaneswaran


Re: [PATCH] Atom 1.0 feeds for subversion

Posted by Max Bowsher <ma...@ukf.net>.
Bhuvaneswaran Arumugam wrote:
> Based on the discussion [1] i have written a new python script to
> generate Atom 1.0 feeds for commit information. It uses xml dom library
> to generate the feeds.
> 
> [1] http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=116571
> 
> Please find the attached patch.

Thanks! Here's a skim-through review.

> TODO: Implement a scheme to insert the items to the file if it already
> exists (using pickle?) and make use of --max-items option.

If the file *only* contains Subversion commit entries, then all the data
required to reproduce old items is in the Subversion repository, so
there should be no need to pickle, or re-read the file.

> [[
> * contrib/hook-scripts/svn2atom.py
>   New script to generate Atom 1.0 feeds for commit information. It uses
>   xml dom library to generate the feeds

'Python xml.dom module' explains what you mean more clearly.

Also, missing '.' at end of sentences.

> * www/tools_contrib.html
>   Include the documentation for contrib/hook-scripts/svn2atom.py script
> ]]
> 
> Index: contrib/hook-scripts/svn2atom.py
> ===================================================================
> --- contrib/hook-scripts/svn2atom.py	(revision 0)
> +++ contrib/hook-scripts/svn2atom.py	(revision 0)
> @@ -0,0 +1,284 @@
> +#!/usr/bin/env python
> +
> +"""Usage: svn2atom.py [OPTION...] REPOS-PATH
> +
> +Generate an ATOM 1.0 feed file containing commit information for the

ATOM or Atom? Find out which is officially correct, and use it exclusively.

> +Subversion repository located at REPOS-PATH.  The item title is the 
> +revision number, and the item description contains the author, date, 
> +log messages and changed paths.
> +
> +Options:
> +
> + -h, --help             Show this help message.
> + 
> + -f, --atom-file=PATH   Store the ATOM feed in the file located at PATH,
> +                        which will be created if it doesn't already exist.  
> +                        If not provided, the script will store the feed in 
> +                        the current working directory, in a file named
> +                        REPOS_NAME.atom (where REPOS_NAME is the basename
> +                        of the REPOS_PATH command-line argument). 

Explain what happens if the file already exists. It will be overwritten?

> + -r, --revision=X[:Y]   Subversion revision (or revision range) to generate
> +                        ATOM info for.  If not provided, info for the single
> +                        youngest revision in the repository will be generated.
> + 
> + -m, --max-items=N      Keep only N items in the ATOM feed file.  By default,
> +                        20 items are kept.

If keeping of old items isn't implemented yet, then the above comment is
 wrong.

If you don't want to implement keeping of old items in the initial
version, that is fine, but the documentation must reflect that. To put
it another way: missing features are acceptable. Bugs are not.

> + -u, --item-url=URL     Use URL as the basis for generating ATOM item links.

It's unclear what you are supposed to use this for without reading the code.

> + -U, --feed-url=URL     Use URL as the global ATOM feed link.

Ditto.

> + -P, --svn-path=DIR     Look in DIR for the svnlook binary.  If not provided,
> +                        the script will run "svnlook" via a typical $PATH hunt.

Say: "svnlook" must be on the PATH.
Not: the script will run "svnlook" via a typical $PATH hunt.

because the old version implies that the script is hunting though PATH
itself.

> +"""
> +
> +import sys
> +
> +# All clear on the custom module checks.  Import some standard stuff.

Wrong comment. There were no custom module checks. Also "Import some
standard stuff." is entirely self evident. Delete the comment.

> +import getopt, os, popen2, datetime

Please use PEP 8 style <http://www.python.org/dev/peps/pep-0008/>. In
this case, that means each of the above imports gets a separate line.

> +from StringIO import StringIO
> +from xml.dom.DOMImplementation import implementation
> +import xml.utils
> +
> +def usage_and_exit(errmsg=None):
> +    """Print a usage message, plus an ERRMSG (if provided), then exit.
> +    If ERRMSG is provided, the usage message is printed to stderr and
> +    the script exits with a non-zero error code.  Otherwise, the usage
> +    message goes to stdout, and the script exits with a zero
> +    errorcode."""

> +    stream = errmsg is not None and sys.stderr or sys.stdout

This feels too unreadable for me, I'd prefer an explicit "if" statement.

> +    print >> stream, __doc__
> +    if errmsg:
> +        print >> stream, "\nError: %s" % (errmsg)
> +        sys.exit(2)
> +    sys.exit(0)
> +
> +def check_url(url, opt):
> +    """Verify that URL looks like a valid URL or option OPT."""
> +    if not (url.startswith('https://') \
> +            or url.startswith('http://') \
> +            or url.startswith('file://')):
> +      usage_and_exit("svn2atom.py: Invalid url '%s' is specified for " \
> +                     "'%s' option" % (url, opt))

No. What if someone wants to put a svn:// URL in there? Or an ftp://
one? Or even a relative URL?

I think this check should be removed.

> +class SVN2ATOM:

PEP 8 ==> Svn2Atom

> +    def __init__(self, svn_path, repos_path, item_url, atom_file, max_items, 
> +                 feed_url):
> +        self.repos_path = repos_path
> +        self.item_url = item_url
> +        self.atom_file = atom_file
> +        self.max_items = max_items
> +        self.feed_url = feed_url
> +        self.svnlook_cmd = 'svnlook'
> +        if svn_path is not None:
> +            self.svnlook_cmd = os.path.join(svn_path, 'svnlook')
> +
> +        self.init_output()
> +        self.add_header()

Why are these not one combined function?

> +    def init_output(self):
> +        """ initialize atom output """
> +        self.document = implementation.createDocument(None,None,None)
> +        self.feed = self.document.createElement("feed")
> +        self.document.appendChild(self.feed)
> +
> +    def add_header(self):
> +        """ add atom xml header """
> +
> +        doc = self.document
> +        self.feed.appendChild(doc.createTextNode("\n  "))
> +        self.feed.setAttribute("xml:lang", "en")

How can you be certain of this? Better to leave it blank than set it to
a possibly incorrect value.

> +        self.feed.setAttribute("xmlns", "http://www.w3.org/2005/Atom")
> +        self.feed.setAttribute("xmlns:xh", "http://www.w3.org/1999/xhtml")
> +
> +        self.title = self.document.createElement("title")
> +        self.feed.appendChild(self.title)
> +        self.title.appendChild(doc.createTextNode("%s SVN Commit's Feed" % \

I would prefer 'Subversion' in place of 'SVN'.
Incorrect use of apostrophe. Remove.
Do not use redundant backslashes.

> +        (os.path.basename(self.repos_path))))
> +
> +        self.id = self.document.createElement("id")
> +        self.feed.appendChild(self.id)
> +        self.id.appendChild(doc.createTextNode("%s" % self.feed_url or ""))

What's with the 'or ""' ?

> +        self.updated = self.document.createElement("updated")
> +        self.feed.appendChild(self.updated)
> +        self.updated.appendChild(\
> +        doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))

If you are going to require Python 2.3, then you must state that at the
top of the script.

> +        self.link = self.document.createElement("link")
> +        self.feed.appendChild(self.link)
> +        self.link.setAttribute("href", self.feed_url or "")
> +
> +        self.author = self.document.createElement("author")
> +        self.feed.appendChild(self.author)
> +        self.aname = self.document.createElement("name")
> +        self.author.appendChild(self.aname)
> +        self.aname.appendChild(doc.createTextNode("subversion"))

Is it really necessary for all these nodes to be member variables?

> +
> +    def add_entry(self, revision):
> +        """ add new atom entry for a revision """
> +        self.revision = revision
> +        url = self.item_url and \
> +              self.item_url + "?rev=%d" % self.revision or ""

Hardcoding of ?rev= is not reasonable. Arrange for there to be a marker
in item_url which you replace.

> +        doc = self.document
> +        self.entry = self.document.createElement("entry")
> +        self.feed.appendChild(self.entry)
> +        self.entry.appendChild(doc.createTextNode("\n  "))
> +
> +        self.id = self.document.createElement("id")
> +        self.entry.appendChild(self.id)
> +        self.id.appendChild(doc.createTextNode("%s" % url))
> +
> +        self.title = self.document.createElement("title")
> +        self.entry.appendChild(self.title)
> +        self.title.appendChild(doc.createTextNode(\
> +          "Revision %s" % self.revision))
> +
> +        self.updated = self.document.createElement("updated")
> +        self.entry.appendChild(self.updated)
> +        self.updated.appendChild(\
> +        doc.createTextNode("%s" % self.format_date(datetime.datetime.now())))

Now? Shouldn't the time be taken from the revision's svn:date?

> +        self.link = self.document.createElement("link")
> +        self.entry.appendChild(self.link)
> +        self.link.setAttribute("href", url)
> +
> +        self.summary = self.document.createElement("summary")
> +        self.entry.appendChild(self.summary)
> +
> +        description = self.make_atom_item_desc()
> +        self.summary.appendChild(doc.createTextNode(description))
> +
> +    def finish_output(self):
> +        """ write the atom feed to the file """
> +        t = self.document.createTextNode("\n")
> +        self.feed.appendChild(t)
> +        fp = open(self.atom_file, "w")
> +        xml.dom.ext.PrettyPrint(self.document, fp)
> +        fp.write("\n")
> +        fp.close()
> +
> +    def make_atom_item_desc(self):
> +        cmd = "%s info -r %s %s" % (self.svnlook_cmd, self.revision, self.repos_path)

Use list style commands, as you do later in main(), to avoid shell
metacharacter issues.

> +        out, x, y = popen2.popen3(cmd)

x? y? Yuck, use clear variable names.

> +        cmd_out = out.readlines()
> +        cmd_err  = y.readlines()

Hmm... I _think_ this is relying on OS buffering to avoid deadlocking.

> +        Author = "Author: " + cmd_out[0] + "\n"

But don't the lines already have a newline attached?

> +        Date = "Date: " + cmd_out[1] + "\n"
> +        New_Revision = "Revision: %s\n" % self.revision
> +        Log = "Log: " + cmd_out[3] + "\n"

Weird capitalization. Local variables should be lower-case.

> +        out.close()
> +        x.close()
> +        y.close()
> +
> +        cmd = "%s changed -r %s %s" % (self.svnlook_cmd, self.revision, self.repos_path)
> +        out, x, y = popen2.popen3(cmd)
> +        cmd_out = out.readlines()
> +        changed_files = "Modified: "
> +        for item in cmd_out:
> +            changed_files = changed_files + item

+=

> +        item_desc = Author + Date + New_Revision + Log + changed_files

Clearer would be:
desc_lines = []
desc_lines.append(...)
desc_lines.append(...)
desc_lines.append(...)
desc_lines.append(...)
desc_lines.append('')
item_desc = "\n".join(desc_lines)

> +        out.close()
> +        x.close()
> +        y.close()
> +
> +        return item_desc
> +
> +    def format_date(self, dt):
> +        """ input date must be in GMT """
> +        return "%04d-%02d-%02dT%02d:%02d:%02dZ" % \
> +                (dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second)
> +
> +def main():
> +    # Parse the command-line options and arguments.
> +    try:
> +        opts, args = getopt.gnu_getopt(sys.argv[1:], "hP:r:u:f:m:U:",
> +                                       ["help",
> +                                        "svn-path=",
> +                                        "revision=",
> +                                        "item-url=",
> +                                        "atom-file=",
> +                                        "max-items=",
> +                                        "feed-url=",
> +                                        ])
> +    except getopt.GetoptError, msg:
> +        usage_and_exit(msg)
> +
> +    # Make sure required arguments are present.
> +    if len(args) != 1:
> +        usage_and_exit("You must specify a repository path.")
> +    repos_path = args[0]
> +
> +    # Now deal with the options.
> +    max_items = 20
> +    commit_rev = None
> +    svn_path = item_url = feed_url = None
> +    atom_file = os.path.basename(repos_path) + ".atom"

Is this a normal file extension convention for atom files?

> +    for opt, arg in opts:
> +        if opt in ("-h", "--help"):
> +            usage_and_exit()
> +        elif opt in ("-P", "--svn-path"):
> +            svn_path = arg
> +        elif opt in ("-r", "--revision"):
> +            commit_rev = arg
> +        elif opt in ("-u", "--item-url"):
> +            item_url = arg
> +            check_url(item_url, opt)
> +        elif opt in ("-f", "--atom-file"):
> +            atom_file = arg
> +        elif opt in ("-m", "--max-items"):
> +            try:
> +               max_items = int(arg)
> +            except ValueError, msg:
> +               usage_and_exit("Invalid value '%s' for --max-items." % (arg))
> +            if max_items < 1:
> +               usage_and_exit("Value for --max-items must be a positive integer.")
> +        elif opt in ("-U", "--feed-url"):
> +            feed_url = arg
> +            check_url(feed_url, opt)
> +    
> +    if (commit_rev == None):

Un-Pythonic parentheses.

It is conventional to test "is None", not "== None".

> +        svnlook_cmd = 'svnlook'
> +        if svn_path is not None:
> +            svnlook_cmd = os.path.join(svn_path, 'svnlook')
> +        out, x, y = popen2.popen3([svnlook_cmd, 'youngest', repos_path])
> +        cmd_out = out.readlines()
> +        revisions = [int(cmd_out[0])]
> +        out.close()
> +        x.close()
> +        y.close()
> +    else:
> +        try:
> +            rev_range = commit_rev.split(':')
> +        except ValueError, msg:
> +            usage_and_exit("svn2atom.py: Invalid value '%s' for --revision." \
> +                           % (commit_rev))

What's this supposed to catch, exactly? I cannot think of any
circumstances in which a ValueError would be raised here.

> +        len_rev_range = len(rev_range)
> +        if len_rev_range == 1:
> +            revisions = [int(commit_rev)]
> +        elif len_rev_range == 2:
> +            start, end = rev_range
> +            start = int(start)
> +            end = int(end)
> +            if (start > end):
> +                tmp = start
> +                start = end
> +                end = tmp
> +            revisions = range(start, end + 1)[-max_items:]

range() creates an explicit list. Nasty things will happen if the user
specifies a large revision range here. Rewrite to not use range(), by
simply using the start and end variables to control a loop variable
directly.

> +        else:
> +            usage_and_exit("svn2atom.py: Invalid value '%s' for --revision." \
> +                           % (commit_rev))
> +    
> +    svn2atom = SVN2ATOM(svn_path, repos_path, item_url, atom_file, 
> +                        max_items, feed_url)
> +    for revision in revisions:
> +      svn2atom.add_entry(revision)
> +    svn2atom.finish_output()
> +    
> +if __name__ == "__main__":
> +    main()