You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Sahlberg <da...@gmail.com> on 2021/02/26 07:31:55 UTC

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Moving from users@ to dev@

Den tors 25 feb. 2021 kl 08:00 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Wed, Feb 24, 2021 at 23:27:18 +0100:
> > Learning Python has been on my todo list for ages,  so I've cobbled
> > together something that seems to do the job.
>
> ☺
>

Thank you very much for your detailed feedback! Based on it I more or less
did a ground-up rewrite, sorry for forcing you to restart the review from
scratch. I will address your commets below.


> > There are basically two modes of operation:
> >
> > ./store-plaintext-password.py --listpassword|--list
> >
> > Which lists all cached realms/usernames (and passwords). I've
> intentionally
> > split it in two in case someone prefer not to put the password on-screen,
> > but I've also added the option to display to show that the password is
> > really in plain text.
>
> Isn't this what `svn auth` does?
>
> Also, "do one thing and do it well".  Reasons:
>
> - So users have less code to audit before they run it.
>
> - Extensibility of interface
>
> - Clarity of interface (cf. «svn switch --relocate»)
>

Oops, missed that one.  I've removed the --list/--listpassword options and
instead hinted at svn auth in the help text.

> ./store-plaintext-password.py realm password [username]
> >
> > Which stores/updates the password for the given realm. If username is
> given
> > it is stored/updated as well. If there is no cached entry for the
> specified
> > realm a new file will be created (and username is a mandatory argument).
>
> Passwords shouldn't be passed in argv because the values in argv are
> visible to other non-root users.
>

I've switched to read password using getpass.getpass(). The realm and
username is still passed on the command line.

> TODO:
> > - Is the license ok? I stole it from svn.c
>
> I assume you're asking about the header format as opposed to the choice
> of license.  Sure, svn.c's header ought to be fine.
>
> > - Improve Python code
>
> Reviewing.
>

Thanks! I've taken note of your suggestions regarding the code related to
--list|--listpassword but since that code has been removed I'm not
considering it here. Sorry for the extra work!


> > - Improve documentation - especially on where to find the 'realm' string
>
> For one, by running `svn info` interactively and ^C'ing it at the prompt.
>

Added!


> > - Decide where to put it - I've gone with contrib for now as per Nathan's
> > and Daniel's suggestions
>
> This was a patch submission and my reply contains a review.  We may want
> to move to dev@.
>
> > > > > Regarding the FAQ,


...cut...

Nathan made a very good suggestion about the FAQ in a separate post,
incorporating both my ideas and danielsh's feedback. I'll review his
suggestion and post a reply there (if any).

> +++ contrib/client-side/store-plaintext-password.py   (working copy)
> > @@ -0,0 +1,207 @@
> > +#!/usr/bin/env python3
> > +
> > +# Script to store password in plaintext in
> ~/.subversion/auth/svn.simple/
> > +#
> > +# Script require the tabulate module: pip install tabulate
>
> s/require/requires/
>
> More importantly, it would be better not to require non-stdlib modules,
> at least in the "save passwords" codepath.  The easiest way to do this
> would be to move the «import» statement into the function that needs it.
>

Noted (but code removed).


> > +# ====================================================================
> > +#    Licensed to the Apache Software Foundation (ASF) under one
> > +#    or more contributor license agreements.  See the NOTICE file
> > +#    distributed with this work for additional information
> > +#    regarding copyright ownership.  The ASF licenses this file
> > +#    to you under the Apache License, Version 2.0 (the
> > +#    "License"); you may not use this file except in compliance
> > +#    with the License.  You may obtain a copy of the License at
> > +#
> > +#      http://www.apache.org/licenses/LICENSE-2.0
> > +#
> > +#    Unless required by applicable law or agreed to in writing,
> > +#    software distributed under the License is distributed on an
> > +#    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> > +#    KIND, either express or implied.  See the License for the
> > +#    specific language governing permissions and limitations
> > +#    under the License.
> > +# ====================================================================
> > +
> > +from hashlib import md5
> > +import argparse
> > +import os
> > +from os import listdir
> > +from os.path import isfile, join
>
> These two imports aren't very idiomatic, at least not in Subversion's
> use of Python.  We tend to just call these functions by their
> fully-qualified names.
>

Googled these up :-) Switched to import the lib and use fully-qualified
names.


> > +from pathlib import Path
> > +from tabulate import tabulate
> > +
> > +# Read a hashfile and return a key/value list
> > +#
>
> Err, what?  The function reads the hashfile and returns the first
> key-value pair in it as a list.  It doesn't read any subsequent
> key-value pairs, which is what this sentence seems to imply.
>
> Also, the function doesn't ensure the 'END\n' terminator and EOF happen
> where they're expected.
>
> Also, don't rely on the key-value pairs in the hash file being in any
> particular order.
>
> > +# A hashfile should contain the following four lines, repeted as
> needed, ending by the line END
>
> "repeated" is misspelled.  Also, this should reference the upstream docs
> (svn_hash_write2()), either in addition to or instead of repeating them.
>

Added this reference


> > +# K [length]
> > +# [keyname]
> > +# V [length]
> > +# [value]
> > +# ...
> > +# END
> > +#
> > +# The length is not validated
>
> It's not just "not validated"; it's not used at all.  The file format is
> binary — the lengths are in octets — but the Python code below treats
> the file as a line-based format.  That may be fine for now, but is not
> forward-compatible with ~/.subversion/auth/ files that may be created by
> future versions.
>

Length is now validated.

I think it is acceptable that we make some assumptions about the format. If
some future version would make a more complex format, the script can be
updated. (This part of the code is removed, but the comment is relevant for
the section updating an existing file).

I'm not sure what level of the API (public or private) third-party
> products would need to operate at in order to add their own custom keys
> to the cache files.  I don't know of anyone doing that, but it's not
> impossible.
>

Switched to us a dictionary (but the old code should also have been
agnostic of key ordering).

> +def readHash(file):
> > +    # Read K [length] or END but discard it
> > +    key = file.readline().strip()
> > +    if key == 'END':
> > +        return None
> > +    if key[:1] != 'K':
> > +        raise Exception('Parse failed, expected K...')
> > +
> > +    # Read keyname
> > +    key = file.readline().strip()
> > +
> > +    # Read V [length] but discard it
> > +    val = file.readline().strip()
> > +    if val[:1] != 'V':
> > +        raise Exception('Parse failed, expected V...')
> > +
> > +    # Read value
> > +    val = file.readline().strip()
> > +
> > +    return [key, val]
>
> Returning a tuple would be more idiomatic.
>

Noted (but code removed)!


> > +# Write a key/value pair to a hashfile
> > +def writeHash(file, key, val):
>
> The docstring should be a string literal as the first statement in the
> function, as in subversion/tests/cmdline/.  That then becomes the
> function's __doc__ attribute.
>

Fixed!


>
> > +    file.write('K ' + str(len(key)) + '\n')
> > +    file.write(key + '\n')
>
> str() gives the length in characters, but the file format demands
> a length in octets, so this would break as soon as somebody puts
> non-ASCII in their password — which, given the nature of passwords,
> is plausible.
>
> You'll want to convert from str to bytes.  See str.encode().  You'll
> need to use b'' literals in concatenations.
>

Better now?

> +    file.write('V ' + str(len(val)) + '\n')
> > +    file.write(val + '\n')
> > +
> > +# Main function
> > +parser = argparse.ArgumentParser(description='Store plain text password
> in ~/.subversion/auth/svn.simple/')
> > +parser.add_argument('realm', help='Realm string from server, required
> value for updates', nargs='?')
> > +parser.add_argument('password', help='Password, required value for
> updates', nargs='?')
>
> IIRC, nargs='?' means that both ['store-plaintext-password.py',
> '--password'] and ['store-plaintext-password.py', '--password', 'foo']
> are valid uses, which isn't what's needed here.
>

I've rebuilt the argument handling to specify -u (to store/update a
username), pass the realm in a positional argument and always read the
password using getpass().


> > +parser.add_argument('username', help='Username, optional value for
> updates', nargs='?')
> > +parser.add_argument('--list', help='Lists all stored realms/usernames',
> action='store_true')
> > +parser.add_argument('--listpassword', help='Lists all stored
> realms/usernames/passwords', action='store_true')
> > +
> > +args = parser.parse_args()
> > +authpath = str(Path.home()) + '/.subversion/auth/svn.simple/'
>
> Just use os.path.expanduser() directly?  You don't need pathlib.Path's
> bells and whistles.
>

Switched to os.path.expanduser() (and removed the import of pathlib).

> +if args.listpassword or args.list:
> > +    data = []
> > +    if args.listpassword:
> > +        header = ['Realm', 'Username', 'Password']
> > +    else:
> > +        header = ['Realm', 'Username']
> > +    for name in listdir(authpath):
> > +        fullname = join(authpath, name)
> > +        if isfile(fullname):
> > +            file = open(fullname, 'r')
> > +            realm = ''
> > +            user = ''
> > +            password = ''
> > +            while(True):
>
> The parentheses aren't idiomatic.
>

C-sick. ;-)


> > +                hash = readHash(file)
>
> So this just slurps the file, rather than return each keypair
> iteratively (with «yield»).  Good enough for the present use-case.
>

Yes, that's a correct analysis.

> +                if hash is None:
> > +                    break
> > +                elif hash[0] == 'svn:realmstring':
> > +                    realm = hash[1]
> > +                elif hash[0] == 'username':
> > +                    user = hash[1]
> > +                elif hash[0] == 'password':
> > +                    password = hash[1]
> > +            file.close()
> > +            if args.listpassword:
> > +                data.append([realm, user, password])
>
> This sounds like it could become a list of collections.namedtuple
> instance instances (sic), but as mentioned above, this entire function
> seems a reinvention of `svn auth`.
>

And thus removed.


> > +            else:
> > +                data.append([realm, user])
> > +    print(tabulate(data, headers=header))
> > +    quit()
> > +
> > +if args.realm is None:
> > +    print("Realm required\n")
>
> Errors should go to stderr, and begin with the name (argv[0]) of whoever
> generated them.
>
> Any reason not to let argparse enforce the mandatoriness of the argument?¨
>

Nothing more than not being able to make up my mind about the command line
arguements. No realm is mandatory (positional) and -u/-p flags.

> +    parser.print_help()
> > +    quit()
>
> This will NameError on 'quit'.  At least the exit code will be non-zero
> this way.
>
> > +if args.password is None:
> > +    print("Password required\n")
> > +    parser.print_help()
> > +    quit()
> > +
> > +# The file name is the md5encoding of the realm
> > +m = md5()
> > +m.update(args.realm.encode('utf-8'))
> > +fullname = join(authpath, m.hexdigest())
> > +
> > +# In an existing file, we add/replace password/username/passtype
>
> This bit of code conflates several distinct pieces of logic:
>
> - Parsing a serialized hash file
>
> - Serializing a hash
>
> - The actual patching of the hash
>
> - Implementing editing a file by an atomic rename
>
> I can't say I like this at all.  All these concerns should be cleanly
> separated from one another.
>

The rewrite first read the complete hash to a dictionary. Then edit the
dictionary. Finally writing the dictionary to a new file.

> +if isfile(fullname):
> > +    pwdAdded = False
> > +    passtypeAdded = False
> > +    usernameAdded = False
> > +
> > +    # Read existing file, write to .tmp which we rename in the end
> > +    inFile = open(fullname, 'r')
> > +    outFile = open(fullname + '.tmp', 'w')
>
> Also 'x' please for O_EXCL.  If nothing else, this'll plug a race
> condition when two instances of this script run simultaneously.
>

Fixed.

> +    while(True):
> > +        # Read K [length] or END and write back
> > +        line = inFile.readline()
> > +        if not line:
> > +            raise Exception('Parse failed, expected K ... or END')
> > +        if line.strip() == 'END':
>
> Duplicates logic from earlier in the file.
>
> > +            # If username, password and/or passtype has not been found
> in the file, add them here
> > +            if not usernameAdded and args.username is not None:
> > +                writeHash(outFile, 'username', args.username)
> > +            if not passtypeAdded:
> > +                writeHash(outFile, 'passtype', 'simple')
>
> Leaky abstraction: this layer needn't know that the file format
> serialized keys and values in the order K1 V1 K2 V2 … as opposed to,
> say, K1 K2 … V1 V2 ….
>
> > +            if not pwdAdded:
> > +                writeHash(outFile, 'password', args.password)
> > +            outFile.write(line)
> > +            break
> > +        outFile.write(line)
> > +
> > +        # Read keyname and write back, save keyname for later
> > +        line = inFile.readline()
> > +        outFile.write(line)
>
> Input is used unvalidated.
>
> > +        elif key == 'passtype':
> > +            outFile.write('V 6\n')
>
> Magic number.
>

Thanks again for your review!

Kind regards
Daniel Sahlberg

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Sun, Mar 07, 2021 at 19:34:15 +0100:
> I'm attaching v4 where I blatantly stole Danielsh's script to read the hash
> file
> (
> http://mail-archives.apache.org/mod_mbox/subversion-dev/202102.mbox/%3c20210226173525.GA24828@tarpaulin.shahaf.local2%3e
> )

> +++ contrib/client-side/store-plaintext-password.py	(working copy)
> @@ -0,0 +1,184 @@
> +def _read_one_datum(fd, letter):
> +    """\
> +    Read a 'K <length>\\n<key>\\n' or 'V <length>\\n<value>\\n' block from
> +    an svn_hash_write2()-format FD.
> +
> +    LETTER identifies the first letter, as a bytes object.
> +    """
> +    assert letter in {b'K', b'V'}
> +    
> +    # Read the letter and the space
> +    if fd.read(1) != letter or fd.read(1) != b' ':
> +        raise ...

This line simply raises a TypeError that's unrelated to the
business-logic-level problem.  Make it more user-friendly?

I know I wrote that line, but I meant it as a placeholder.  (Compare
r1869139.)

> +    if line[-1:] != b'\n':

Aside: the colon here isn't redundant, as it would be if «line» were
a str:

    >>> type(b'foo'[-1])
    <class 'int'>
    >>> type(b'foo'[-1:])
    <class 'bytes'>

    >>> type('foo'[-1])
    <class 'str'>
    >>> type('foo'[-1:])
    <class 'str'>

Cheers,

Daniel

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Sahlberg <da...@gmail.com>.
Den fre 20 aug. 2021 kl 09:05 skrev Barry <ba...@barrys-emacs.org>:

>
>
> > On 8 Mar 2021, at 08:10, Daniel Sahlberg <da...@gmail.com>
> wrote:
> >
> > 
> > Attaching version 5, with changes as below.
>
> I do not seem to see this in the subversion repo, I would find the code
> useful.
>
> Will it become part of contrib at some point?
>

Thanks for the reminder. It was a project during our winter holiday and I
lost steam when going back to work. As far as I can remember the script was
working fine, but danielsh had some remarks on coding style. My weekend is
busy but I will try to look at it next week. Of course, you are welcome to
pick up from where I left off and improve the script and post it here.

Kind regards,
Daniel

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Barry <ba...@barrys-emacs.org>.

> On 8 Mar 2021, at 08:10, Daniel Sahlberg <da...@gmail.com> wrote:
> 
> 
> Attaching version 5, with changes as below.

I do not seem to see this in the subversion repo, I would find the code useful.

Will it become part of contrib at some point?

Barry



Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Sahlberg <da...@gmail.com>.
Coming back to a thread from long time ago, with the intention to commit
the script in tools/client-side so we can link this from the FAQ based on a
recent question on IRC [1], [2], [3]

[1] https://colabti.org/irclogger/irclogger_log/svn-dev?date=2022-07-01
[2] https://colabti.org/irclogger/irclogger_log/svn-dev?date=2022-07-04
[3] https://colabti.org/irclogger/irclogger_log/svn-dev?date=2022-07-07

More below...

Den tis 9 mars 2021 kl 21:09 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Mon, Mar 08, 2021 at 09:15:21 +0100:
> > Attaching version 5, with changes as below.
> >
> > Den sön 7 mars 2021 kl 20:42 skrev Daniel Shahaf <d.s@daniel.shahaf.name
> >:
> >
> > > Daniel Sahlberg wrote on Sun, Mar 07, 2021 at 19:34:15 +0100:
> > > > +    # Parse arguments
> > > > +    import argparse
> > > > +    parser = argparse.ArgumentParser(
> > > ⋮
> > > > +    # The file name is the md5encoding of the realm
> > > > +    import hashlib
> > > > +    m = hashlib.md5()
> > >
> > > Hmm.  Should all function-local imports be grouped at the top of the
> > > function?  I don't remember the convention here.
>

[... long discussion about coding style with the conclusion that even
though PEP-8 says all import should be in the top of the file,
CPython's own source code contains function level imports. And it would be
more PEP-8-ish to have function level imports at the top of the function
...]

I've changed it this way and sorted them alphabetically.

> (Offtopic: When reviewing /trunk/doc, I had a look at
> > /trunk/doc/programmer/WritingChangeLogs.txt. Is it still relevant? It
> seems
> > very CVS-ish and mostly covered in HACKING).
>
> Everything up to the last two section headers is good advice for writing
> log messages.  For those parts, they should be added to HACKING or
> referenced therefrom.
>
> The remaining two sections don't seem applicable to us at all, but as
> the file is "an essay by jimb", I don't think we can change it.
>

I was more thinking about removing it if it doesn't apply to the Subversion
project (anymore).

> +++ contrib/client-side/store-plaintext-password.py   (working copy)
> > @@ -0,0 +1,189 @@
> > +def _read_one_datum(fd, letter):
> ⋮
> > +    # Read the letter and the space
> > +    if fd.read(1) != letter or fd.read(1) != b' ':
> > +        raise ...
>
> (Raising ellipses is discussed elsethread.)
>

Now raising a ValueError(). I'm not experienced in Python but it looked
like the most suitable (standard) exception class.


>
> > +def outputHash(fd, hash):
> > +    """\
> > +    Write a dictionary to an FD in the same format as used by
>
> Say the formal parameter's name somewhere in here?
>
> Calling it "hash" could be confusing, because svn_hash_write2() and
> hashlib use that term in different senses, but I don't have a suggestion
> about this.
>

Renamed the parameter to dict and added the reference in the docstring.


> > +def writeHashFile(filename, hash):
> > +    """\
> > +    Open a temporary file and write the dictionary to the temp
> > +    file in svn_hash_write2() format.
> > +
> > +    If the file is successfully written it is renamed to <filename>.
>
> Please have all docstrings use the same markup for formal parameter
> names.  outputHash() above uses the "capitalize the name" convention the
> C code uses for non-public-API function.  I'll also use that for the
> example I write in the next paragraph.
>
> Describe the contract, not the implementation.  For instance, "Write the
> dictionary HASH to a file named FILENAME in svn_hash_write2() format."
> would be a reasonable first sentence.  The docstring could then promise
> a postcondition that if the file exists upon return, then it's not
> truncated, mention what happens if at entry FILENAME is a regular file
> or an unbroken symlink, etc..
>

Docstring rewritten.


>
> > +    """
> > +    tmpFilename = filename + '.tmp'
> > +    try:
> > +        with open(tmpFilename, 'xb') as fd:
> > +            outputHash(fd, hash)
> > +            os.rename(tmpFilename, filename)
> > +    except FileExistsError:
> > +        print('{}: File {!r} already exist. Is the script already
> running?\n'.format(argv[0], tmpFilename), file=sys.stderr)
>
> s/the/this/
>
> Wrap to 80 columns.
>

done


>
> > +    except:
> > +        os.remove(tmpFilename)
> > +        raise
> > +
> > +def main():
> > +    # Parse arguments
> > +    import argparse
> > +    parser = argparse.ArgumentParser(
> > +        description=PARSERDESCR,
> > +        formatter_class=argparse.RawDescriptionHelpFormatter)
> > +    parser.add_argument('realm', help='Server authentication real')
> > +    parser.add_argument('-u', '--user', help='Set username', nargs='?')
>
> Just delete nargs entirely, for reasons given upthread?  Sorry for
> missing that in my reviews between then and now.
>

done


>
> > +    args = parser.parse_args()
> > +
> > +    # The file name is the md5encoding of the realm
> > +    import hashlib
> > +    m = hashlib.md5()
> > +    m.update(args.realm.encode('utf-8'))
> > +    authfileName =
> os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'),
> m.hexdigest())
> > +
> > +    # If new file, check that username was given as an argument before
> prompting for password
> > +    existingFile = os.path.exists(authfileName)
> > +    if not existingFile and args.user is None:
> > +        print('{}: New file, username required (see
> -u)\n'.print(argv[0]), file=sys.stderr)
>
> The phrase "new file" in the comment isn't clear, probably because it
> isn't a full sentence.
>
> The phrase "New file" in the error message has the same problem, as well
> as being phrased in implementation terms rather than in user-facing
> terms.
>

Rephrased, and also added an interactive prompt for the username if missing.


>
> > +        parser.print_help()
> > +        return
>
> The exit code should be non-zero whenever an error was printed to
> stderr.
>
> Aren't you reimplementing one of
> https://docs.python.org/3/library/argparse.html#exiting-methods
> ?
>

Changed to parser.exit()

I took the courage to commit as r1902590 and we can work any changes from
there.

Kind regards,
Daniel

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
> > I don't think there's anything written down, but I don't think we
> > invented anything either.  Python scripts are typically written to be
> > importable:
> > .
> >    $ printf '%s\n' 'def foo():' '    return 42' > foo.py
> >    $ python3 -c 'import foo; print(foo.foo())' 
> >    42
> >    $ 
> 
> I see that many of  the .py files in the folder cannot be imported either.

Past bugs are no excuse to write new ones :)

> Give the file name it has it cannot be imported, this is a mute point.

No, it's not.  It still matters if the file is renamed (either by us
before committing it, or by a user down the road) or if it is imported
using the appropriate stdlib functions.

> >> +def outputHash(fd, hash):
> >> +    """\
> >> +    Write a dictionary to an FD in the same format as used by
> > 
> > Say the formal parameter's name somewhere in here?
> 
> Why have a back slash at all?

Presumably so as to make the docstring easier to read inside the source.

> I avoid them in python code because of the problems of \<SP>. 

Even if there were a trailing space there, that wouldn't break anything.
Nobody's going to run a docstring extractor tool on this script (and if
they do, they'll discover the whitespace and remove it then).

I guess we could just remove the backslash, too.  No strong opinion here.

>

Incidentally, if anyone wants to drop the zsh implementation I posted
upthread somewhere, feel free.  I'm around to answer questions, review
docs, etc..

Cheers,

Daniel

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Barry Scott <ba...@barrys-emacs.org>.

> On 9 Mar 2021, at 20:09, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> 
> Daniel Sahlberg wrote on Mon, Mar 08, 2021 at 09:15:21 +0100:
>> Attaching version 5, with changes as below.

There are lines with trailing white space that should be cleaned up.

>> 
>> Den sön 7 mars 2021 kl 20:42 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
>> 
>>> Daniel Sahlberg wrote on Sun, Mar 07, 2021 at 19:34:15 +0100:
>>>> +    # Parse arguments
>>>> +    import argparse
>>>> +    parser = argparse.ArgumentParser(
>>> ⋮
>>>> +    # The file name is the md5encoding of the realm
>>>> +    import hashlib
>>>> +    m = hashlib.md5()
>>> 
>>> Hmm.  Should all function-local imports be grouped at the top of the
>>> function?  I don't remember the convention here.
>>> 
>> 
>> Is there a Subversion specific Style Guide for Python? I didn't find
>> anything in HACKING or trunk/doc.
> 
> I don't think there's anything written down, but I don't think we
> invented anything either.  Python scripts are typically written to be
> importable:
> .
>    $ printf '%s\n' 'def foo():' '    return 42' > foo.py
>    $ python3 -c 'import foo; print(foo.foo())' 
>    42
>    $ 

Give the file name it has it cannot be imported, this is a mute point.
The hypens would need to be underscores.

I see that many of  the .py files in the folder cannot be imported either.

> .
> Thus, putting the imports in main() rather than at the top level means
> they don't get executed when they aren't needed.
> 
>> PEP 8 says:
>> [[[
>> Imports are always put at the top of the file, just after any module
>> comments and docstrings, and before module globals and constants.
>> ]]]
>> 
>> If I understand this correctly, it prefers global imports only? If keeping
>> the imports in the function, then moving to the top is probably more
>> PEP-8-ish. I didn't change anything though.
> 
> In CPython's own source code,
> .
>    rm -rf Lib/test && grep -Rn '    import' Lib Tools | perl -lnE '/(\d+)/; print if $1 > 100'
> .
> has over 400 matches.  Spot checking a few (shuf(1) is my friend ☺),
> they seem like true positives.
> 
>> For reference, ordering should be
>> [[[
>> Imports should be grouped in the following order:
>> 1. standard library imports
>> 2. related third party imports
>> 3. local application/library specific imports
>> ]]]
>> No explicit mentioning of alphabetical sorting, but that seems like a good
>> idea.
>> 
>> (Offtopic: When reviewing /trunk/doc, I had a look at
>> /trunk/doc/programmer/WritingChangeLogs.txt. Is it still relevant? It seems
>> very CVS-ish and mostly covered in HACKING).
> 
> Everything up to the last two section headers is good advice for writing
> log messages.  For those parts, they should be added to HACKING or
> referenced therefrom.
> 
> The remaining two sections don't seem applicable to us at all, but as
> the file is "an essay by jimb", I don't think we can change it.
> 
>> +++ contrib/client-side/store-plaintext-password.py	(working copy)
>> @@ -0,0 +1,189 @@
>> +def _read_one_datum(fd, letter):
> ⋮
>> +    # Read the letter and the space
>> +    if fd.read(1) != letter or fd.read(1) != b' ':
>> +        raise ...
> 
> (Raising ellipses is discussed elsethread.)
> 
>> +def outputHash(fd, hash):
>> +    """\
>> +    Write a dictionary to an FD in the same format as used by
> 
> Say the formal parameter's name somewhere in here?

Why have a back slash at all?
I avoid them in python code because of the problems of \<SP>. 

> 
> Calling it "hash" could be confusing, because svn_hash_write2() and
> hashlib use that term in different senses, but I don't have a suggestion
> about this.
> 
>> +def writeHashFile(filename, hash):
>> +    """\
>> +    Open a temporary file and write the dictionary to the temp
>> +    file in svn_hash_write2() format.
>> +
>> +    If the file is successfully written it is renamed to <filename>.
> 
> Please have all docstrings use the same markup for formal parameter
> names.  outputHash() above uses the "capitalize the name" convention the
> C code uses for non-public-API function.  I'll also use that for the
> example I write in the next paragraph.
> 
> Describe the contract, not the implementation.  For instance, "Write the
> dictionary HASH to a file named FILENAME in svn_hash_write2() format."
> would be a reasonable first sentence.  The docstring could then promise
> a postcondition that if the file exists upon return, then it's not
> truncated, mention what happens if at entry FILENAME is a regular file
> or an unbroken symlink, etc..
> 
>> +    """
>> +    tmpFilename = filename + '.tmp'
>> +    try:
>> +        with open(tmpFilename, 'xb') as fd:
>> +            outputHash(fd, hash)
>> +            os.rename(tmpFilename, filename)
>> +    except FileExistsError:
>> +        print('{}: File {!r} already exist. Is the script already running?\n'.format(argv[0], tmpFilename), file=sys.stderr)
> 
> s/the/this/
> 
> Wrap to 80 columns.
> 
>> +    except:
>> +        os.remove(tmpFilename)
>> +        raise
>> +            
>> +def main():
>> +    # Parse arguments
>> +    import argparse
>> +    parser = argparse.ArgumentParser(
>> +        description=PARSERDESCR,
>> +        formatter_class=argparse.RawDescriptionHelpFormatter)
>> +    parser.add_argument('realm', help='Server authentication real')
>> +    parser.add_argument('-u', '--user', help='Set username', nargs='?')
> 
> Just delete nargs entirely, for reasons given upthread?  Sorry for
> missing that in my reviews between then and now.
> 
>> +    args = parser.parse_args()
>> +    
>> +    # The file name is the md5encoding of the realm
>> +    import hashlib
>> +    m = hashlib.md5()
>> +    m.update(args.realm.encode('utf-8'))
>> +    authfileName = os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'), m.hexdigest())
>> +    
>> +    # If new file, check that username was given as an argument before prompting for password
>> +    existingFile = os.path.exists(authfileName)
>> +    if not existingFile and args.user is None:
>> +        print('{}: New file, username required (see -u)\n'.print(argv[0]), file=sys.stderr)
> 
> The phrase "new file" in the comment isn't clear, probably because it
> isn't a full sentence.
> 
> The phrase "New file" in the error message has the same problem, as well
> as being phrased in implementation terms rather than in user-facing
> terms.
> 
>> +        parser.print_help()
>> +        return
> 
> The exit code should be non-zero whenever an error was printed to
> stderr.
> 
> Aren't you reimplementing one of 
> https://docs.python.org/3/library/argparse.html#exiting-methods
> ?
> 
> Cheers,
> 
> Daniel
> 

Barry



Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Mon, Mar 08, 2021 at 09:15:21 +0100:
> Attaching version 5, with changes as below.
> 
> Den sön 7 mars 2021 kl 20:42 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> 
> > Daniel Sahlberg wrote on Sun, Mar 07, 2021 at 19:34:15 +0100:
> > > +    # Parse arguments
> > > +    import argparse
> > > +    parser = argparse.ArgumentParser(
> > ⋮
> > > +    # The file name is the md5encoding of the realm
> > > +    import hashlib
> > > +    m = hashlib.md5()
> >
> > Hmm.  Should all function-local imports be grouped at the top of the
> > function?  I don't remember the convention here.
> >
> 
> Is there a Subversion specific Style Guide for Python? I didn't find
> anything in HACKING or trunk/doc.

I don't think there's anything written down, but I don't think we
invented anything either.  Python scripts are typically written to be
importable:
.
    $ printf '%s\n' 'def foo():' '    return 42' > foo.py
    $ python3 -c 'import foo; print(foo.foo())' 
    42
    $ 
.
Thus, putting the imports in main() rather than at the top level means
they don't get executed when they aren't needed.

> PEP 8 says:
> [[[
> Imports are always put at the top of the file, just after any module
> comments and docstrings, and before module globals and constants.
> ]]]
> 
> If I understand this correctly, it prefers global imports only? If keeping
> the imports in the function, then moving to the top is probably more
> PEP-8-ish. I didn't change anything though.

In CPython's own source code,
.
    rm -rf Lib/test && grep -Rn '    import' Lib Tools | perl -lnE '/(\d+)/; print if $1 > 100'
.
has over 400 matches.  Spot checking a few (shuf(1) is my friend ☺),
they seem like true positives.

> For reference, ordering should be
> [[[
> Imports should be grouped in the following order:
> 1. standard library imports
> 2. related third party imports
> 3. local application/library specific imports
> ]]]
> No explicit mentioning of alphabetical sorting, but that seems like a good
> idea.
> 
> (Offtopic: When reviewing /trunk/doc, I had a look at
> /trunk/doc/programmer/WritingChangeLogs.txt. Is it still relevant? It seems
> very CVS-ish and mostly covered in HACKING).

Everything up to the last two section headers is good advice for writing
log messages.  For those parts, they should be added to HACKING or
referenced therefrom.

The remaining two sections don't seem applicable to us at all, but as
the file is "an essay by jimb", I don't think we can change it.

> +++ contrib/client-side/store-plaintext-password.py	(working copy)
> @@ -0,0 +1,189 @@
> +def _read_one_datum(fd, letter):
⋮
> +    # Read the letter and the space
> +    if fd.read(1) != letter or fd.read(1) != b' ':
> +        raise ...

(Raising ellipses is discussed elsethread.)

> +def outputHash(fd, hash):
> +    """\
> +    Write a dictionary to an FD in the same format as used by

Say the formal parameter's name somewhere in here?

Calling it "hash" could be confusing, because svn_hash_write2() and
hashlib use that term in different senses, but I don't have a suggestion
about this.

> +def writeHashFile(filename, hash):
> +    """\
> +    Open a temporary file and write the dictionary to the temp
> +    file in svn_hash_write2() format.
> +
> +    If the file is successfully written it is renamed to <filename>.

Please have all docstrings use the same markup for formal parameter
names.  outputHash() above uses the "capitalize the name" convention the
C code uses for non-public-API function.  I'll also use that for the
example I write in the next paragraph.

Describe the contract, not the implementation.  For instance, "Write the
dictionary HASH to a file named FILENAME in svn_hash_write2() format."
would be a reasonable first sentence.  The docstring could then promise
a postcondition that if the file exists upon return, then it's not
truncated, mention what happens if at entry FILENAME is a regular file
or an unbroken symlink, etc..

> +    """
> +    tmpFilename = filename + '.tmp'
> +    try:
> +        with open(tmpFilename, 'xb') as fd:
> +            outputHash(fd, hash)
> +            os.rename(tmpFilename, filename)
> +    except FileExistsError:
> +        print('{}: File {!r} already exist. Is the script already running?\n'.format(argv[0], tmpFilename), file=sys.stderr)

s/the/this/

Wrap to 80 columns.

> +    except:
> +        os.remove(tmpFilename)
> +        raise
> +            
> +def main():
> +    # Parse arguments
> +    import argparse
> +    parser = argparse.ArgumentParser(
> +        description=PARSERDESCR,
> +        formatter_class=argparse.RawDescriptionHelpFormatter)
> +    parser.add_argument('realm', help='Server authentication real')
> +    parser.add_argument('-u', '--user', help='Set username', nargs='?')

Just delete nargs entirely, for reasons given upthread?  Sorry for
missing that in my reviews between then and now.

> +    args = parser.parse_args()
> +    
> +    # The file name is the md5encoding of the realm
> +    import hashlib
> +    m = hashlib.md5()
> +    m.update(args.realm.encode('utf-8'))
> +    authfileName = os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'), m.hexdigest())
> +    
> +    # If new file, check that username was given as an argument before prompting for password
> +    existingFile = os.path.exists(authfileName)
> +    if not existingFile and args.user is None:
> +        print('{}: New file, username required (see -u)\n'.print(argv[0]), file=sys.stderr)

The phrase "new file" in the comment isn't clear, probably because it
isn't a full sentence.

The phrase "New file" in the error message has the same problem, as well
as being phrased in implementation terms rather than in user-facing
terms.

> +        parser.print_help()
> +        return

The exit code should be non-zero whenever an error was printed to
stderr.

Aren't you reimplementing one of 
https://docs.python.org/3/library/argparse.html#exiting-methods
?

Cheers,

Daniel


Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Sahlberg <da...@gmail.com>.
Attaching version 5, with changes as below.

Den sön 7 mars 2021 kl 20:42 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Sun, Mar 07, 2021 at 19:34:15 +0100:
> > Den fre 26 feb. 2021 kl 16:11 skrev Daniel Shahaf <
> d.s@daniel.shahaf.name>:
> > > Daniel Sahlberg wrote on Fri, Feb 26, 2021 at 13:49:25 +0100:
> > > > +    with open(filename + '.tmp', 'x', encoding='utf-8') as file:
> > >
> > > See my previous mail about adding 'w' and ensuring CR isn't emitted on
> > > Windows.
> > >
> >
> > Python doesn't like wx combined:
> >
> > Traceback (most recent call last):
> >   File "./store-plaintext-password.py", line 112, in writeHashFile
> >     with open(tmpFilename, 'wxb') as fd:
> > ValueError: must have exactly one of create/read/write/append mode
>
> I stand corrected.
>
> > > > +parser.add_argument('realm', help='Server authentication real')
> > > > +parser.add_argument('-u', '--user', help='Set username', nargs='?')
> > > > +args = parser.parse_args()
> > > > +
> > > > +# The file name is the md5encoding of the realm
> > > > +m = hashlib.md5()
> > > > +m.update(args.realm.encode('utf-8'))
> > > > +authfileName =
> os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'),
> .hexdigest())
> > > > +
> > > > +# If new file, check that username was given as an argument before
> prompting for password
> > > > +if not os.path.isfile(authfileName) and args.user is None:
> > >
> > > «not isfile()» could mean it's a directory or a broken symlink.  How
> > > about:
> > > .
> > >     if isfile():
> > >     elif not exists():
> > >     else:
> > > .
> > > ?
> > >
> >
> > I switched to exists() only. I'm not sure it makes sense to care for
> every
> > possible variation - if it is a directory then open() will raise an
> > exception and I think that is acceptable in this case.
>
> +1
>
> > To reduce the number of stat() I'm storing the result form exists() in a
> > variable and reusing late. As for race condition... well it's not really
> > that kind of script where I could bother.
>
> Let's just say that "should X be done" and "could Daniel be bothered to
> do X" are not synonymous.
>

Agree. Patches welcome!


> > > (The point below about stat() applies to this pseudocode too.)
> > >
> > > > +    print('New file, username required (see -u)\n', file=sys.stderr)
> > >
> > > Error messages should start with the argv[0] of whoever generated them.
> > >
> >
> > Point taken, but in this particular case it is displayed by print_help().
>
> Yes, but that's printed later.  If this script is invoked by another,
> that's then invoked by another that's invoked by some CI, etc.,
> _prefixing_ the script's name to the message tends to be helpful.
>

Ok. Added now.

> > > +# In an existing file, we add/replace password/username/passtype
> > > > +if os.path.isfile(authfileName):
> > >
> > > Calling stat() twice has a performance impact and looks like a race
> > > condition.  Maybe in this case it doesn't matter, though…
> > >
> >
> > Using the result from the previous check. Race condition, agree..
> >
> > > > +# For a new file, set realmstring, username, password and passtype
> > > > +else:
> > > > +    hash = {}
> > > > +    hash['svn:realmstring'] = args.realm
> > > > +    hash['username'] = args.user
> > > > +    hash['passtype'] = 'simple'
> > > > +    hash['password'] = password
> > >
> > > Style: Use a single, multiline initializer rather than multiple
> > > assignments?
> > >
> >
> > Like this?
> >       hash = {
> >             b'svn:realmstring': args.realm.encode('utf-8'),
> >             b'username': args.user.encode('utf-8'),
> >             b'passtype': 'simple'.encode('utf-8'),
> >             b'password': password.encode('utf-8')
> >         }
>
> Yeah, except add a comma on the fourth line to minimize future diffs.
>

(y)

For future reference, there's also a «dict(foo=42)» syntax which is
> sometimes convenient.
>
> > > Forward cribbing compatibility: «del password» once it's no longer
> needed?
> > >
> >
> > Sorry, didn't quite understand this. Do you ask for an option to remove a
> > stored password? Isn't svn auth --remove enough?
>
> No, I meant literally add the line of code «del password» once the
> variable «password» is no longer needed.  That won't actually overwrite
> the value's bytes in memory, but it will prevent the variable from being
> accidentally used afterwards.
>

(y)


> > +++ contrib/client-side/store-plaintext-password.py   (working copy)
> > @@ -0,0 +1,184 @@
> > +def outputHash(fd, hash):
> > +    """\
> > +    Write a dictionary to an FD in the same format as used by
> > +    svn_hash_write2(), ie 'K <length>\\n<key>\\nV <length>\\n<value>\\n'
> > +    terminated by "END\n".
>
> Escape the newline.
>

(y)


> Mention the data type of HASH's keys and values?
>

(y)


> > +    """
> > +    assert 'b' in fd.mode
> > +
> > +    for key, val in hash.items():
> > +        fd.write(b'K ' + bytes(str(len(key)), 'utf-8') + b'\n')
> > +        fd.write(key + b'\n')
> > +        fd.write(b'V ' + bytes(str(len(val)), 'utf-8') + b'\n')
> > +        fd.write(val + b'\n')
> > +    fd.write(TERMINATOR)
> > +
> > +def writeHashFile(filename, hash):
> > +    """\
> > +    Open a temporary file and write the dictionary to the temp
> > +    file in svn_hash_write2() format.
> > +
> > +    If the file is successfully written it is renamed to <filename>.
> > +    """
> > +    tmpFilename = filename + '.tmp'
> > +    try:
> > +        with open(tmpFilename, 'xb') as fd:
> > +            outputHash(fd, hash)
> > +            os.rename(tmpFilename, filename)
> > +    except FileExistsError:
> > +        print('{} File {} already exist. Is the script already
> running?\n'.format(argv[0], tmpFilename), file=sys.stderr)
>
> s/{}/{}:/ (first time)
>

(y)


> s/{}/{!r}/ (second time) - in this case this will probably just add
> quotes, but still.  (tl;dr: This prints the repr() of the formatee
> rather than its str().)
>

(y)


> Is there any point in naming the exception and printing something for it
> (probably one of its attributes)?  I don't see a failure mode in which
> that would make a difference, but absence of evidence isn't evidence of
> absence.
>

Probably, and moving the error handling to a more "top level" position.
However I'm out of time this morning.

> +    except:
> > +        os.remove(tmpFilename)
> > +        raise
> > +
> > +def main():
> > +    """Main function"""
>
> I'm not a fan of docstrings that simply repeat what may be inferred from
> the identifier.  I don't think there's much this function's docstring
> _could_ say, though.  I'd consider removing the docstring entirely or
> having it use the term "entry point" so as to make it less tautological.
>

(y)


> > +    # Parse arguments
> > +    import argparse
> > +    parser = argparse.ArgumentParser(
> ⋮
> > +    # The file name is the md5encoding of the realm
> > +    import hashlib
> > +    m = hashlib.md5()
>
> Hmm.  Should all function-local imports be grouped at the top of the
> function?  I don't remember the convention here.
>

Is there a Subversion specific Style Guide for Python? I didn't find
anything in HACKING or trunk/doc.

PEP 8 says:
[[[
Imports are always put at the top of the file, just after any module
comments and docstrings, and before module globals and constants.
]]]

If I understand this correctly, it prefers global imports only? If keeping
the imports in the function, then moving to the top is probably more
PEP-8-ish. I didn't change anything though.

For reference, ordering should be
[[[
Imports should be grouped in the following order:
1. standard library imports
2. related third party imports
3. local application/library specific imports
]]]
No explicit mentioning of alphabetical sorting, but that seems like a good
idea.

(Offtopic: When reviewing /trunk/doc, I had a look at
/trunk/doc/programmer/WritingChangeLogs.txt. Is it still relevant? It seems
very CVS-ish and mostly covered in HACKING).

Kind regards,
Daniel Sahlberg

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Sun, Mar 07, 2021 at 19:34:15 +0100:
> Den fre 26 feb. 2021 kl 16:11 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> > Daniel Sahlberg wrote on Fri, Feb 26, 2021 at 13:49:25 +0100:
> > > +    with open(filename + '.tmp', 'x', encoding='utf-8') as file:
> >
> > See my previous mail about adding 'w' and ensuring CR isn't emitted on
> > Windows.
> >
> 
> Python doesn't like wx combined:
> 
> Traceback (most recent call last):
>   File "./store-plaintext-password.py", line 112, in writeHashFile
>     with open(tmpFilename, 'wxb') as fd:
> ValueError: must have exactly one of create/read/write/append mode

I stand corrected.

> > > +parser.add_argument('realm', help='Server authentication real')
> > > +parser.add_argument('-u', '--user', help='Set username', nargs='?')
> > > +args = parser.parse_args()
> > > +
> > > +# The file name is the md5encoding of the realm
> > > +m = hashlib.md5()
> > > +m.update(args.realm.encode('utf-8'))
> > > +authfileName = os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'), .hexdigest())
> > > +
> > > +# If new file, check that username was given as an argument before prompting for password
> > > +if not os.path.isfile(authfileName) and args.user is None:
> >
> > «not isfile()» could mean it's a directory or a broken symlink.  How
> > about:
> > .
> >     if isfile():
> >     elif not exists():
> >     else:
> > .
> > ?
> >
> 
> I switched to exists() only. I'm not sure it makes sense to care for every
> possible variation - if it is a directory then open() will raise an
> exception and I think that is acceptable in this case.

+1

> To reduce the number of stat() I'm storing the result form exists() in a
> variable and reusing late. As for race condition... well it's not really
> that kind of script where I could bother.

Let's just say that "should X be done" and "could Daniel be bothered to
do X" are not synonymous.

> > (The point below about stat() applies to this pseudocode too.)
> >
> > > +    print('New file, username required (see -u)\n', file=sys.stderr)
> >
> > Error messages should start with the argv[0] of whoever generated them.
> >
> 
> Point taken, but in this particular case it is displayed by print_help().

Yes, but that's printed later.  If this script is invoked by another,
that's then invoked by another that's invoked by some CI, etc.,
_prefixing_ the script's name to the message tends to be helpful.

> > > +# In an existing file, we add/replace password/username/passtype
> > > +if os.path.isfile(authfileName):
> >
> > Calling stat() twice has a performance impact and looks like a race
> > condition.  Maybe in this case it doesn't matter, though…
> >
> 
> Using the result from the previous check. Race condition, agree..
> 
> > > +# For a new file, set realmstring, username, password and passtype
> > > +else:
> > > +    hash = {}
> > > +    hash['svn:realmstring'] = args.realm
> > > +    hash['username'] = args.user
> > > +    hash['passtype'] = 'simple'
> > > +    hash['password'] = password
> >
> > Style: Use a single, multiline initializer rather than multiple
> > assignments?
> >
> 
> Like this?
>       hash = {
>             b'svn:realmstring': args.realm.encode('utf-8'),
>             b'username': args.user.encode('utf-8'),
>             b'passtype': 'simple'.encode('utf-8'),
>             b'password': password.encode('utf-8')
>         }

Yeah, except add a comma on the fourth line to minimize future diffs.

For future reference, there's also a «dict(foo=42)» syntax which is
sometimes convenient.

> > Forward cribbing compatibility: «del password» once it's no longer needed?
> >
> 
> Sorry, didn't quite understand this. Do you ask for an option to remove a
> stored password? Isn't svn auth --remove enough?

No, I meant literally add the line of code «del password» once the
variable «password» is no longer needed.  That won't actually overwrite
the value's bytes in memory, but it will prevent the variable from being
accidentally used afterwards.

> +++ contrib/client-side/store-plaintext-password.py	(working copy)
> @@ -0,0 +1,184 @@
> +def outputHash(fd, hash):
> +    """\
> +    Write a dictionary to an FD in the same format as used by
> +    svn_hash_write2(), ie 'K <length>\\n<key>\\nV <length>\\n<value>\\n'
> +    terminated by "END\n".

Escape the newline.

Mention the data type of HASH's keys and values?

> +    """
> +    assert 'b' in fd.mode
> +    
> +    for key, val in hash.items():
> +        fd.write(b'K ' + bytes(str(len(key)), 'utf-8') + b'\n')
> +        fd.write(key + b'\n')
> +        fd.write(b'V ' + bytes(str(len(val)), 'utf-8') + b'\n')
> +        fd.write(val + b'\n')
> +    fd.write(TERMINATOR)
> +    
> +def writeHashFile(filename, hash):
> +    """\
> +    Open a temporary file and write the dictionary to the temp
> +    file in svn_hash_write2() format.
> +
> +    If the file is successfully written it is renamed to <filename>.
> +    """
> +    tmpFilename = filename + '.tmp'
> +    try:
> +        with open(tmpFilename, 'xb') as fd:
> +            outputHash(fd, hash)
> +            os.rename(tmpFilename, filename)
> +    except FileExistsError:
> +        print('{} File {} already exist. Is the script already running?\n'.format(argv[0], tmpFilename), file=sys.stderr)

s/{}/{}:/ (first time)

s/{}/{!r}/ (second time) - in this case this will probably just add
quotes, but still.  (tl;dr: This prints the repr() of the formatee
rather than its str().)

Is there any point in naming the exception and printing something for it
(probably one of its attributes)?  I don't see a failure mode in which
that would make a difference, but absence of evidence isn't evidence of
absence.

> +    except:
> +        os.remove(tmpFilename)
> +        raise
> +            
> +def main():
> +    """Main function"""

I'm not a fan of docstrings that simply repeat what may be inferred from
the identifier.  I don't think there's much this function's docstring
_could_ say, though.  I'd consider removing the docstring entirely or
having it use the term "entry point" so as to make it less tautological.

> +    # Parse arguments
> +    import argparse
> +    parser = argparse.ArgumentParser(
⋮
> +    # The file name is the md5encoding of the realm
> +    import hashlib
> +    m = hashlib.md5()

Hmm.  Should all function-local imports be grouped at the top of the
function?  I don't remember the convention here.

Cheers,

Daniel

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Fri, Feb 26, 2021 at 13:49:25 +0100:
> +++ contrib/client-side/store-plaintext-password.py	(working copy)
> @@ -0,0 +1,127 @@
> +def readHashFile(filename):
> +    """Read a hashfile as written by svn_hash_write2() to a list of tuples (key, value)"""
> +    hash = {}
> +    with open(authfileName, 'r', encoding='utf-8') as file:
> +        while True:
> +            # Expecting K [length] or END
> +            line = file.readline()
> +            if not line:
> +                raise Exception('Parse failed, expected K [length] or END, got EOF')
> +            line = line.strip()
> +            if line.strip() == 'END':
> +                if file.readline():
> +                    raise Exception('Parse failed, unexpected data after END')
> +                return hash
> +            elif line[:1] != 'K':
> +                raise Exception('Parse failed, expected K [length]')
> +            
> +            # Read keyname
> +            key = file.readline()
> +            if not line:
> +                raise Exception('Parse failed, expected keyname')
> +            key = key.strip()
> +            if len(key.encode('utf-8')) != int(line[2:]):
> +                raise Exception('Parse failed, invalid key length {} expected {}'.format(len(key.encode('utf-8')), line[2:]))
> +            
> +            # Read V [length]
> +            line = file.readline()
> +            if not line:
> +                raise Exception('Parse failed, expected V [length], got EOF')
> +            line = line.strip()
> +            if line[:1] != 'V':
> +                raise Exception('Parse failed, expected V [length]')
> +            
> +            # Read value
> +            value = file.readline()
> +            if not value:
> +                raise Exception('Parse failed, expected value')
> +            value = value.strip()
> +            if len(value.encode('utf-8')) != int(line[2:]):
> +                raise Exception('Parse failed, invalid value length {} expected {}'.format(len(value.encode('utf-8')), line[2:]))
> +            
> +            # Store
> +            hash[key] = value

Couldn't resist.  Attached is my take of this function.  Preview:

[[[
% ./svn_hash_read.py ~/.subversion/auth/svn.simple/d3c8a345b14f6a1b42251aef8027ab57
{b'svn:realmstring': b'<https://svn.apache.org:443> ASF Committers',
 b'username': b'danielsh'}
]]]

The script should support everything the serialized hash format allows,
including binary data in both keys and values (even though we generally
use «const char *» keys).  I'm sure there are a few nits to pick, though.

The code is runnable as-is, despite the ellipses.

Cheers,

Daniel


[[[
#!/usr/bin/env python3

TERMINATOR = b"END\n"

def _read_one_datum(fd, letter):
    """\
    Read a 'K <length>\\n<key>\\n' or 'V <length>\\n<value>\\n' block from
    an svn_hash_write2()-format FD.

    LETTER identifies the first letter, as a bytes object.
    """
    assert letter in {b'K', b'V'}

    # Read the letter and the space
    if fd.read(1) != letter or fd.read(1) != b' ':
        raise ...

    # Read the length and the newline
    line = fd.readline()
    if line[-1:] != b'\n':
        raise ...
    expected_length = int(line[:-1])

    # Read the datum and its newline
    datum = fd.read(expected_length)
    if len(datum) != expected_length:
        raise ...
    if fd.read(1) != b'\n':
        raise ...

    return datum

def svn_hash_read(fd):
    """\
    Read an svn_hash_write2()-formatted dict from FD, terminated by "END".

    Return a dict mapping bytes to bytes.
    """
    assert 'b' in fd.mode
    assert TERMINATOR[0] not in {b'K', b'V'}

    ret = {}
    while True:
        if fd.peek(1)[0] == TERMINATOR[0]:
            if fd.readline() != TERMINATOR:
                raise ...
            if fd.peek(1):
                raise ...
            return ret

        key = _read_one_datum(fd, b'K')
        value = _read_one_datum(fd, b'V')
        ret[key] = value

def main():
    import sys
    parsed = svn_hash_read(open(sys.argv[1], 'rb'))

    import pprint
    pprint.pprint(parsed)

if __name__ == '__main__':
    main()
]]]

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Sahlberg <da...@gmail.com>.
I finally found some time to work on this, sorry for the delay!

I'm attaching v4 where I blatantly stole Danielsh's script to read the hash
file
(
http://mail-archives.apache.org/mod_mbox/subversion-dev/202102.mbox/%3c20210226173525.GA24828@tarpaulin.shahaf.local2%3e
)
and addressed the other comments below. I appreciate the "lead by example"
and I've tried to follow the best I could.

Den fre 26 feb. 2021 kl 16:11 skrev Daniel Shahaf <d....@daniel.shahaf.name>:

> Daniel Sahlberg wrote on Fri, Feb 26, 2021 at 13:49:25 +0100:
> > +++ contrib/client-side/store-plaintext-password.py   (working copy)
> > @@ -0,0 +1,127 @@
> > +import hashlib
> > +import argparse
> > +import os
> > +import getpass
> > +import sys
>
> For bonus points, sort these.  PEP 8 doesn't seem to mention this.
>

I've moved the imports to where they are being used (for most part)


> > +def readHashFile(filename):
>

[...] since the code to read the hash is based on Danielsh's script I will
not make any comments on this part of the rewiew, however I'm again
grateful for the feedback and I'll try to remember next time.


> > +            # Store
> > +            hash[key] = value
> > +
> > +def writeHashFile(filename, hash):
> > +    """Write a list of tuples (key, value) to a hashfile of the same
> format as svn_hash_write2()"""
>
> Same issue with the docstring.
>

This is adjusted. And I've also considered your commment about using FDs as
arguments and split the write function in two:
- one to open the tmp file (and handle FileExistsError) and the renaming
afterwards
- one accepting the FD and the hash and writing the actual content


>
> > +    with open(filename + '.tmp', 'x', encoding='utf-8') as file:
>
> See my previous mail about adding 'w' and ensuring CR isn't emitted on
> Windows.
>

Python doesn't like wx combined:

Traceback (most recent call last):
  File "./store-plaintext-password.py", line 112, in writeHashFile
    with open(tmpFilename, 'wxb') as fd:
ValueError: must have exactly one of create/read/write/append mode

The CR issue should be alright by using b mode, inspired by your code
reading in b mode

> +# Parse arguments
> > +parser = argparse.ArgumentParser(description='''Store plain-text
> password in ~/.subversion/auth/svn.simple/\n\n
> > +Existing passwords and authentication realms can be inspected by:\n\n
> > +  svn auth [--show-passwords]\n\n
> > +The authentication realm can also be found using:\n\n
> > +  svn info URL\n\n
> > +Realm will be read from stdin if not provided on the command line.''',
> formatter_class=argparse.RawDescriptionHelpFormatter)
>
> Don't embed multiline string literals this way.  Either make it a global
> variable, or indent the named actual parameters conventionally:
>
>     parser = argparse.ArgumentParser(
>         description='''\
>     Store plain-text password in ~/.subversion/auth/svn.simple/
>     ⋮
>     Realm will be read from stdin if not provided on the command line.
>     ''',
>         formatter_class=argparse.RawDescriptionHelpFormatter,
>     )
>
> I'm not sure whether you can indent the string literal's contents there
> and still have it output as desired.
>
> Note you have three \n's between every two lines (two escaped and one
> literal).
>

Indenting the text kept the indentation in the output, doesn't look good.

Moved this to a global variable and removed the escaped \n.


> > +parser.add_argument('realm', help='Server authentication real')
> > +parser.add_argument('-u', '--user', help='Set username', nargs='?')
> > +args = parser.parse_args()
> > +
> > +# The file name is the md5encoding of the realm
> > +m = hashlib.md5()
> > +m.update(args.realm.encode('utf-8'))
> > +authfileName =
> os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'),
> m.hexdigest())
> > +
> > +# If new file, check that username was given as an argument before
> prompting for password
> > +if not os.path.isfile(authfileName) and args.user is None:
>
> «not isfile()» could mean it's a directory or a broken symlink.  How
> about:
> .
>     if isfile():
>     elif not exists():
>     else:
> .
> ?
>

I switched to exists() only. I'm not sure it makes sense to care for every
possible variation - if it is a directory then open() will raise an
exception and I think that is acceptable in this case.

The check is here because I want to give the error message "username
required" before asking for the password.

To reduce the number of stat() I'm storing the result form exists() in a
variable and reusing late. As for race condition... well it's not really
that kind of script where I could bother.

(The point below about stat() applies to this pseudocode too.)
>
> > +    print('New file, username required (see -u)\n', file=sys.stderr)
>
> Error messages should start with the argv[0] of whoever generated them.
>

Point taken, but in this particular case it is displayed by print_help().


>
> > +    parser.print_help()
> > +    quit()
> > +
> > +# Prompt for password
> > +password = getpass.getpass("Enter password: ")
>
> How about "Enter password for {realm}" or "for {user}", etc?
>

"for {realm}"


>
> > +# In an existing file, we add/replace password/username/passtype
> > +if os.path.isfile(authfileName):
>
> Calling stat() twice has a performance impact and looks like a race
> condition.  Maybe in this case it doesn't matter, though…
>

Using the result from the previous check. Race condition, agree..


>
> > +    hash = readHashFile(authfileName)
> > +    if args.user is not None:
> > +        hash['username'] = args.user
> > +    hash['password'] = password
> > +    hash['passtype'] = 'simple'
> > +
> > +# For a new file, set realmstring, username, password and passtype
> > +else:
> > +    hash = {}
> > +    hash['svn:realmstring'] = args.realm
> > +    hash['username'] = args.user
> > +    hash['passtype'] = 'simple'
> > +    hash['password'] = password
>
> Style: Use a single, multiline initializer rather than multiple
> assignments?
>

Like this?
      hash = {
            b'svn:realmstring': args.realm.encode('utf-8'),
            b'username': args.user.encode('utf-8'),
            b'passtype': 'simple'.encode('utf-8'),
            b'password': password.encode('utf-8')
        }


> Forward cribbing compatibility: «del password» once it's no longer needed?
>

Sorry, didn't quite understand this. Do you ask for an option to remove a
stored password? Isn't svn auth --remove enough?


> > +
> > +# Write out the resulting file
> > +writeHashFile(authfileName, hash)
>

Kind regards
Daniel Sahlberg

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Fri, Feb 26, 2021 at 13:49:25 +0100:
> +++ contrib/client-side/store-plaintext-password.py	(working copy)
> @@ -0,0 +1,127 @@
> +import hashlib
> +import argparse
> +import os
> +import getpass
> +import sys

For bonus points, sort these.  PEP 8 doesn't seem to mention this.

> +def readHashFile(filename):
> +    """Read a hashfile as written by svn_hash_write2() to a list of tuples (key, value)"""

This returns a dict, not a list of tuples.

> +    hash = {}
> +    with open(authfileName, 'r', encoding='utf-8') as file:
> +        while True:
> +            # Expecting K [length] or END
> +            line = file.readline()
> +            if not line:
> +                raise Exception('Parse failed, expected K [length] or END, got EOF')

Rule of thumb: every error message should contain at least one
replaceable.  (Why?  To identify the context, the data at fault, etc.)
In this case, it might be easiest to wrap the whole block with
a try/except that just adds the filename to the error message.

Which reminds me: in idiomatic Python, functions are passed open
file-like objects, rather than filenames.  This way any file-like object
can be passed (e.g., those returned by gzip.open() or
urllib2.urlopen()).  Not asking to change this; just FYI.

> +            line = line.strip()
> +            if line.strip() == 'END':
> +                if file.readline():
> +                    raise Exception('Parse failed, unexpected data after END')
> +                return hash
> +            elif line[:1] != 'K':

There's line.startswith('K'), but up to you whether to change it.

> +            # Read value
> +            value = file.readline()
> +            if not value:
> +                raise Exception('Parse failed, expected value')
> +            value = value.strip()

s/strip()/rstrip('\n')/ to allow passwords with trailing whitespace?

> +            if len(value.encode('utf-8')) != int(line[2:]):
> +                raise Exception('Parse failed, invalid value length {} expected {}'.format(len(value.encode('utf-8')), line[2:]))

OK: This length check will cause the script to abort if someone has
a password that ends with whitespace, or a custom key whose value isn't
just one line, and so on.

I'm not sure file.readline().encode('utf-8') is guaranteed to be the
same number of octets as there are octets in the file between the two
successive newlines (even assuming there isn't any leading/trailing
whitespace).  This could happen in two cases: if .decode()/.encode() do
silent normalization (to NFC or to NFD), or if there are two different
sequences of octets that UTF-8 maps to the same codepoint.

> +            # Store
> +            hash[key] = value
> +
> +def writeHashFile(filename, hash):
> +    """Write a list of tuples (key, value) to a hashfile of the same format as svn_hash_write2()"""

Same issue with the docstring.

> +    with open(filename + '.tmp', 'x', encoding='utf-8') as file:

See my previous mail about adding 'w' and ensuring CR isn't emitted on
Windows.

> +# Parse arguments
> +parser = argparse.ArgumentParser(description='''Store plain-text password in ~/.subversion/auth/svn.simple/\n\n
> +Existing passwords and authentication realms can be inspected by:\n\n
> +  svn auth [--show-passwords]\n\n
> +The authentication realm can also be found using:\n\n
> +  svn info URL\n\n
> +Realm will be read from stdin if not provided on the command line.''', formatter_class=argparse.RawDescriptionHelpFormatter)

Don't embed multiline string literals this way.  Either make it a global
variable, or indent the named actual parameters conventionally:

    parser = argparse.ArgumentParser(
        description='''\
    Store plain-text password in ~/.subversion/auth/svn.simple/
    ⋮
    Realm will be read from stdin if not provided on the command line.
    ''',
        formatter_class=argparse.RawDescriptionHelpFormatter,
    )

I'm not sure whether you can indent the string literal's contents there
and still have it output as desired.

Note you have three \n's between every two lines (two escaped and one
literal).

> +parser.add_argument('realm', help='Server authentication real')
> +parser.add_argument('-u', '--user', help='Set username', nargs='?')
> +args = parser.parse_args()
> +
> +# The file name is the md5encoding of the realm
> +m = hashlib.md5()
> +m.update(args.realm.encode('utf-8'))
> +authfileName = os.path.join(os.path.expanduser('~/.subversion/auth/svn.simple/'), m.hexdigest())
> +
> +# If new file, check that username was given as an argument before prompting for password
> +if not os.path.isfile(authfileName) and args.user is None:

«not isfile()» could mean it's a directory or a broken symlink.  How
about:
.
    if isfile():
    elif not exists():
    else:
.
?

(The point below about stat() applies to this pseudocode too.)

> +    print('New file, username required (see -u)\n', file=sys.stderr)

Error messages should start with the argv[0] of whoever generated them.

> +    parser.print_help()
> +    quit()
> +    
> +# Prompt for password
> +password = getpass.getpass("Enter password: ")

How about "Enter password for {realm}" or "for {user}", etc?

> +# In an existing file, we add/replace password/username/passtype
> +if os.path.isfile(authfileName):

Calling stat() twice has a performance impact and looks like a race
condition.  Maybe in this case it doesn't matter, though…

> +    hash = readHashFile(authfileName)
> +    if args.user is not None:
> +        hash['username'] = args.user
> +    hash['password'] = password
> +    hash['passtype'] = 'simple'
> +    
> +# For a new file, set realmstring, username, password and passtype
> +else:
> +    hash = {}
> +    hash['svn:realmstring'] = args.realm
> +    hash['username'] = args.user
> +    hash['passtype'] = 'simple'
> +    hash['password'] = password

Style: Use a single, multiline initializer rather than multiple
assignments?

Forward cribbing compatibility: «del password» once it's no longer needed?

> +
> +# Write out the resulting file
> +writeHashFile(authfileName, hash)

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Sahlberg <da...@gmail.com>.
Den fre 26 feb. 2021 kl 08:31 skrev Daniel Sahlberg <
daniel.l.sahlberg@gmail.com>:

> Moving from users@ to dev@
>

[...]

I was in a bit of a hurry this morning and didn't test properly. The
previous version (store-plaintext-passwords2.txt) contained a number of
bugs which I hope I've corrected here:

- Didn't properly encode keys/values before checking length. Non-ascii
characters in the values raised an exception.
- Didn't properly add the lenght to K when writing the hash file (added the
key name instead)
- Reorganized prompting for password and checking for new file and the -u
argument to show the error message before prompting for a password [UX
improvement]

New version attached.

Kind regards,
Daniel

Re: A more permanent home for the add-a-password-to-a-cached-username script? (was: Re: using svn cli with --non-interactive (in scripts) securely, without exposing password)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Sahlberg wrote on Fri, Feb 26, 2021 at 08:31:55 +0100:
> Den tors 25 feb. 2021 kl 08:00 skrev Daniel Shahaf <d....@daniel.shahaf.name>:
> 
> > Daniel Sahlberg wrote on Wed, Feb 24, 2021 at 23:27:18 +0100:
> > > Learning Python has been on my todo list for ages,  so I've cobbled
> > > together something that seems to do the job.
> >
> > ☺
> >
> 
> Thank you very much for your detailed feedback! Based on it I more or less
> did a ground-up rewrite, sorry for forcing you to restart the review from
> scratch. I will address your commets below.

No worries.

> > > +# K [length]
> > > +# [keyname]
> > > +# V [length]
> > > +# [value]
> > > +# ...
> > > +# END
> > > +#
> > > +# The length is not validated
> >
> > It's not just "not validated"; it's not used at all.  The file format is
> > binary — the lengths are in octets — but the Python code below treats
> > the file as a line-based format.  That may be fine for now, but is not
> > forward-compatible with ~/.subversion/auth/ files that may be created by
> > future versions.
> >
> 
> Length is now validated.
> 
> I think it is acceptable that we make some assumptions about the format. If
> some future version would make a more complex format, the script can be
> updated. (This part of the code is removed, but the comment is relevant for
> the section updating an existing file).

*nod*

> > I'm not sure what level of the API (public or private) third-party
> > products would need to operate at in order to add their own custom keys
> > to the cache files.  I don't know of anyone doing that, but it's not
> > impossible.
> >
> 
> Switched to us a dictionary (but the old code should also have been
> agnostic of key ordering).

I was concerned not of key ordering but about cache files that have more
than just the standard set of keys.  In particular, such custom keys
aren't necessarily restricted to texty, single-line values.

> > > +# Write a key/value pair to a hashfile
> > > +def writeHash(file, key, val):
> >
> > The docstring should be a string literal as the first statement in the
> > function, as in subversion/tests/cmdline/.  That then becomes the
> > function's __doc__ attribute.
> >
> 
> Fixed!

Thanks.  That's a thing for the top-of-file comment too, by the way.  We
aren't consistent about this, but ezt.py and run_tests.py both do this.

> >
> > > +    file.write('K ' + str(len(key)) + '\n')
> > > +    file.write(key + '\n')
> >
> > str() gives the length in characters, but the file format demands
> > a length in octets, so this would break as soon as somebody puts
> > non-ASCII in their password — which, given the nature of passwords,
> > is plausible.
> >
> > You'll want to convert from str to bytes.  See str.encode().  You'll
> > need to use b'' literals in concatenations.
> >
> 
> Better now?

Better, yes.  However, I think as it stands in v3, it'll write CRLF on
Windows?  I.e., don't you need 'b' in open()'s mode=* argument?

> > +if args.listpassword or args.list:
> > > +    data = []
> > > +    if args.listpassword:
> > > +        header = ['Realm', 'Username', 'Password']
> > > +    else:
> > > +        header = ['Realm', 'Username']
> > > +    for name in listdir(authpath):
> > > +        fullname = join(authpath, name)
> > > +        if isfile(fullname):
> > > +            file = open(fullname, 'r')
> > > +            realm = ''
> > > +            user = ''
> > > +            password = ''
> > > +            while(True):
> >
> > The parentheses aren't idiomatic.
> >
> 
> C-sick. ;-)

Then write a C-shell script to upload C-metric eggs to C-pan?  I'm sure
I can come up with a cpan(1) invocation that will break a egg… :P

(https://www.cpan.org/; https://packaging.python.org/glossary/#term-Egg)

> > +    parser.print_help()
> > > +    quit()
> >
> > This will NameError on 'quit'.  At least the exit code will be non-zero
> > this way.

That's still applicable to v3.

> > > +# In an existing file, we add/replace password/username/passtype
> >
> > This bit of code conflates several distinct pieces of logic:
> >
> > - Parsing a serialized hash file
> >
> > - Serializing a hash
> >
> > - The actual patching of the hash
> >
> > - Implementing editing a file by an atomic rename
> >
> > I can't say I like this at all.  All these concerns should be cleanly
> > separated from one another.
> >
> 
> The rewrite first read the complete hash to a dictionary. Then edit the
> dictionary. Finally writing the dictionary to a new file.

Much better.

> > +if isfile(fullname):
> > > +    pwdAdded = False
> > > +    passtypeAdded = False
> > > +    usernameAdded = False
> > > +
> > > +    # Read existing file, write to .tmp which we rename in the end
> > > +    inFile = open(fullname, 'r')
> > > +    outFile = open(fullname + '.tmp', 'w')
> >
> > Also 'x' please for O_EXCL.  If nothing else, this'll plug a race
> > condition when two instances of this script run simultaneously.
> >
> 
> Fixed.

Thanks.  You changed it to 'x'; shouldn't the 'w' be retained as well,
i.e., 'wx' (and 'wxb' in combination with a previous point)?  O_EXCL
doesn't imply either O_RDWR or O_WRONLY.

> Thanks again for your review!

You're welcome.  I see you've posted v3 so I'll skip reviewing v2.

Cheers,

Daniel