You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by David James <ja...@cs.toronto.edu> on 2007/09/01 00:10:23 UTC

Re: Merging Ctypes Python Branch

On 8/30/07, Eric Gillespie <ep...@pretzelnet.org> wrote:
> "Sage La Torra" <sa...@gmail.com> writes:
>
> > As of last week, David James and I think that the Ctypes Python
> > Binding Branch is ready for merging, or at least a thorough review in
> > preparation for merging. The branch is available at
> > http://svn.collab.net/repos/svn/branches/ctypes-python-bindings/.
>
> As a first step to reviewing, I tried to build it.  Currently,
> you assume everything is installed on the linker search path, but
> this is frequently not the case.  So, I changed autogen.py and
> ctypesgen/wrap.py to take -L, -R, and --rpath options (as well as
> the -Wl,-prefixed forms) to specify libdirs to search.
>
> I also changed autogen.py to use apr-1-config --ldflags --link-ld
> so that ensuring autogen.py has the right apr-1-config is all a
> user needs to do if all svn stuff is in the same libdir.

Hi Eric,

Currently, autogen.py assumes that both apr and apr-util are installed
under the same prefix, according to the output of apr-1-config
--prefix. Further, you can specify where Subversion is installed by
using the "--prefix" option for autogen.py.

We add these directories to the default linker path in autogen.py by
setting the LIBRARY_PATH environment variable.

On my system, Subversion is installed in $HOME. APR is installed in
/usr/local/apache2. To configure the ctypes Python bindings, I run the
following command:
   python autogen.py --apr-config=/usr/local/apache2/bin/apr-1-config -p $HOME

Unfortunately, this didn't work, because of a bug in the ctypes
find_library function (the find_library function doesn't strip colons
from the output of gcc -Wl,-t correctly!). I patched wraptypes to work
around this bug and the bindings build like a charm without any other
changes.

Could you try the patched version of ctypesgen on your system and see
if it helps? You'll need to first update ctypesgen by typing ( cd
ctypesgen && svn up ). I'm particularly interested to see if this
works on OS X, which has a different linker and might require us to
play with different options.

Cheers,

David

P.S. If my patch solves your problem, are there still parts of your
patch that you'd still like me to commit? Let me know.



>
> [[[
> * autogen.py
>   (option_W): Add optparse callback to parse -Wl,-prefixed linker
>     options (typically used as -Wl,-R/some/dir to specify RPATH).
>   (parser): Handle -L, -R, --rpath, and -W (using option_W) options,
>     storing values into options.libdirs list.
>   (get_apr_config): Initialize ldflags to the output of 'apr-1-config
>     --ldflags --libs' so we pick up any libdir options, to be passed on to
>     ctypesgen/wrap.py .
>   Pass options.libdirs on to ctypesgen/wrap.py with -R option.  Handle
>   errors from ctypesgen/wrap.py .
>
> * wrap.py
>   (load_library): Take new optional libdirs argument; if provided, if CDLL
>     couldn't find the library, and if the path from find_library is
>     relative, try loading the library from each of the libdirs.
>   (CtypesWrapper.begin_output): Save new optional libdirs argument in
>     self.libdirs for foo.
>   (CtypesWrapper.print_preamble): Pass self.libdirs along to load_library.
>   (option_W): Add optparse callback to parse -Wl,-prefixed linker
>     options (typically used as -Wl,-R/some/dir to specify RPATH).
>   (main): Handle -L, -R, --rpath, and -W (using option_W) options, storing
>     values into options.libdirs list.
> ]]]
>
> Index: autogen.py
> ===================================================================
> --- autogen.py  (revision 26401)
> +++ autogen.py  (working copy)
> @@ -6,6 +6,17 @@
>  from tempfile import mkdtemp
>  from glob import glob
>
> +def option_W(option, opt, value, parser):
> +    if len(value) < 4 or value[0:3] != 'l,-':
> +        raise optparse.BadOptionError("not in '-Wl,<opt>' form: %s%s"
> +                                      % (opt, value))
> +    opt = value[2:]
> +    if opt not in ['-L', '-R', '--rpath']:
> +        raise optparse.BadOptionError("-Wl option must be -L, -R"
> +                                      " or --rpath, not " + value[2:])
> +    # Push the linker option onto the list for further parsing.
> +    parser.rargs.insert(0, value)
> +
>  parser = OptionParser()
>  parser.add_option("-a", "--apr-config", dest="apr_config",
>                    help="The full path to your apr-1-config or apr-config script")
> @@ -15,6 +26,12 @@
>  parser.add_option("", "--save-preprocessed-headers", dest="filename",
>                    help="Save the preprocessed headers to the specified "
>                         "FILENAME")
> +parser.add_option("-W", action="callback", callback=option_W, type='str',
> +                  metavar="l,OPTION",
> +                  help="where OPTION is -L, -R, or --rpath")
> +parser.add_option("-L", "-R", "--rpath", action="append", dest="libdirs",
> +                  metavar="LIBDIR", help="Add LIBDIR to the search path")
> +parser.set_defaults(libdirs=[])
>
>  (options, args) = parser.parse_args()
>
> @@ -48,6 +65,10 @@
>              apr_include_dir = run_cmd("%s --includedir" % apr_config).strip()
>              apr_version = run_cmd("%s --version" % apr_config).strip()
>              cpp  = run_cmd("%s --cpp" % apr_config).strip()
> +
> +            fout = run_cmd("%s --ldflags --link-ld" % apr_config)
> +            if fout:
> +                ldflags = fout.split()
>              break
>      else:
>          print ferr
> @@ -72,10 +93,9 @@
>                          "to your Subversion installation using the --prefix\n"
>                          "option.")
>
> -    ldflags = [
> -        "-lapr-%s" % apr_version[0],
> -        "-laprutil-%s" % apr_version[0],
> -    ]
> +    # TODO: Use apu-1-config, as above:
> +    # ldflags.extend(run_cmd('apu-1-config --ldflags --link-ld').split())
> +    ldflags.append("-laprutil-%s" % apr_version[0])
>
>      # List the libraries in the order they should be loaded
>      libraries = [
> @@ -124,16 +144,31 @@
>
>  os.environ["LIBRARY_PATH"] = library_path
>
> -cmd = ("cd %s && %s %s/ctypesgen/wrap.py --cpp '%s %s' %s "
> +cmd = ["cd %s && %s %s/ctypesgen/wrap.py --cpp '%s %s' %s "
>         "%s -o svn_all.py" % (tempdir, sys.executable, os.getcwd(),
> -                             cpp, flags, ldflags, includes))
> +                             cpp, flags, ldflags, includes)]
> +cmd.extend('-R ' + x for x in options.libdirs)
> +cmd = ' '.join(cmd)
>
>  if options.filename:
>      cmd += " --save-preprocessed-headers=%s" % \
>          os.path.abspath(options.filename)
>
>  print cmd
> -os.system(cmd)
> +status = os.system(cmd)
> +if os.WIFEXITED(status):
> +    status = os.WEXITSTATUS(status)
> +    if status != 0:
> +        sys.exit(status)
> +elif os.WIFSIGNALED(status):
> +    sys.stderr.write("wrap.py killed with signal %d" % (os.WTERMSIG(status),))
> +    sys.exit(2)
> +elif os.WIFSTOPPED(status):
> +    sys.stderr.write("wrap.py stopped with signal %d" % (os.WSTOPSIG(status),))
> +    sys.exit(2)
> +else:
> +    sys.stderr.write("wrap.py exited with invalid status %d" % (status,))
> +    sys.exit(2)
>
>  func_re = re.compile(r"CFUNCTYPE\(POINTER\((\w+)\)")
>  out = file("%s/svn_all2.py" % tempdir, "w")
> Index: ctypesgen/wrap.py
> ===================================================================
> --- ctypesgen/wrap.py   (revision 35)
> +++ ctypesgen/wrap.py   (working copy)
> @@ -14,11 +14,12 @@
>
>  from ctypesparser import *
>  import textwrap
> -import sys, re
> +import optparse, os, sys, re
> +from errno import ENOENT
>  from ctypes import CDLL, RTLD_GLOBAL, c_byte
>  from ctypes.util import find_library
>
> -def load_library(name, mode=RTLD_GLOBAL):
> +def load_library(name, mode=RTLD_GLOBAL, libdirs=None):
>      if os.name == "nt":
>          return CDLL(name, mode=mode)
>      path = find_library(name)
> @@ -26,14 +27,34 @@
>          # Maybe 'name' is not a library name in the linker style,
>          # give CDLL a last chance to find the library.
>          path = name
> -    return CDLL(path, mode=mode)
> +    try:
> +        return CDLL(path, mode=mode)
> +    except OSError, e:
> +        # XXX Sadly, ctypes raises OSError without setting errno.
> +        if e.errno not in [ENOENT, None]:
> +            raise
> +        if os.path.isabs(path) or libdirs is None:
> +            raise
> +        # path is relative and the linker couldn't find it; if the
> +        # user provided any libdirs, try those.
> +        # XXX This isn't quite right; if the user provided libdirs, we
> +        # should *only* try those; he may be specifically trying to
> +        # avoid one on the system search path.  But we're relying on
> +        # find_library to turn e.g. 'apr-1' into 'libapr-1.so.0'
> +        for libdir in set(libdirs):
> +            try:
> +                return CDLL(os.path.join(libdir, path), mode=mode)
> +            except OSError, e:
> +                if e.errno not in [ENOENT, None]:
> +                    raise
>
>  class CtypesWrapper(CtypesParser, CtypesTypeVisitor):
>      file=None
>      def begin_output(self, output_file, library, link_modules=(),
>                       emit_filenames=(), all_headers=False,
>                       include_symbols=None, exclude_symbols=None,
> -                     save_preprocessed_headers=None):
> +                     save_preprocessed_headers=None,
> +                     libdirs=None):
>          self.library = library
>          self.file = output_file
>          self.all_names = []
> @@ -51,6 +72,7 @@
>              self.exclude_symbols = re.compile(exclude_symbols)
>          self.preprocessor_parser.save_preprocessed_headers = \
>              save_preprocessed_headers
> +        self.libdirs = libdirs
>
>          self.linked_symbols = {}
>          for name in link_modules:
> @@ -180,7 +202,7 @@
>          }).lstrip()
>          self.loaded_libraries = []
>          for library in self.library:
> -            lib = load_library(library)
> +            lib = load_library(library, libdirs=self.libdirs)
>              if lib:
>                  self.loaded_libraries.append(lib)
>                  print >>self.file, textwrap.dedent("""
> @@ -314,11 +336,19 @@
>                      self.all_names.append(name)
>                      break
>
> +def option_W(option, opt, value, parser):
> +    if len(value) < 4 or value[0:3] != 'l,-':
> +        raise optparse.BadOptionError("not in '-Wl,<opt>' form: %s%s"
> +                                      % (opt, value))
> +    opt = value[2:]
> +    if opt not in ['-L', '-R', '--rpath']:
> +        raise optparse.BadOptionError("-Wl option must be -L, -R"
> +                                      " or --rpath, not " + value[2:])
> +    # Push the linker option onto the list for further parsing.
> +    parser.rargs.insert(0, value)
> +
>  def main(*argv):
>      from tempfile import NamedTemporaryFile
> -    import optparse
> -    import sys
> -    import os.path
>
>      usage = 'usage: %prog [options] <header.h>'
>      op = optparse.OptionParser(usage=usage)
> @@ -341,7 +371,12 @@
>      op.add_option('', '--save-preprocessed-headers', dest='filename',
>                    help='Save the preprocessed headers to the specified '
>                         'FILENAME')
> -
> +    op.add_option("-W", action="callback", callback=option_W, type='str',
> +                  metavar="l,OPTION",
> +                  help="where OPTION is -L, -R, or --rpath")
> +    op.add_option("-L", "-R", "--rpath", action="append", dest="libdirs",
> +                  metavar="LIBDIR", help="Add LIBDIR to the search path")
> +
>      (options, args) = op.parse_args(list(argv[1:]))
>      if len(args) < 1:
>          print >> sys.stderr, 'No header files specified.'
> @@ -363,7 +398,8 @@
>                           all_headers=options.all_headers,
>                           include_symbols=options.include_symbols,
>                           exclude_symbols=options.exclude_symbols,
> -                         save_preprocessed_headers=options.filename
> +                         save_preprocessed_headers=options.filename,
> +                         libdirs=options.libdirs,
>                           )
>      wrapper.preprocessor_parser.cpp = options.cpp
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by masaru tsuchiyama <m....@gmail.com>.

Hello

> Do you happen to have a copy of some recent DLLs for Subversion that
> work on Windows?
No. I often run gen-make.py with --disable-shared option.

> For now, though, I'd like to keep the ctypes python bindings as
> self-contained as possible, so that I don't have to constantly merge
> changes between trunk and the ctypes python bindings branch.
Yes. I agree too.

Regards.
Masaru.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by Eric Gillespie <ep...@pretzelnet.org>.

"David James" <ja...@cs.toronto.edu> writes:

> Currently, users can type any of the following:
>   - wc.update("foo")
>   - wc.update(["foo", "bar"])
>   - wc.update(("foo", "bar", "baz"))
> 
> You think that we should only allow the last two forms? Thinking about
> this carefully, I guess I agree -- it'll probably be easier to explain
> the API if we don't have to explain those special cases.

Yeah, the others are just needless permutations, not only
confusing, but more importantly, hiding bugs.  I see that Sage
disagrees, so I'm not sure where you guys will end up taking
this.  Hard experience has taught me that magic like this only
seems nice in the short term, and in the long term will bite
someone in the ass.  And for what?  So you can avoid typing
brackets, or avoid thinking about 1-element lists?  Bah.

Finally, I just don't think it's Pythonic: "Explicit is better
than implicit" and "There should be one-- and preferably only one
--obvious way to do it".  All the ass-biting magic I've seen over
the years has been in Perl.  Well, almost; Python has one
ass-biting misfeature: the str % operator optionally taking
non-iterables: '%s' % foo vs. '%s' % (foo,).  Works just fine
until foo is a sequence, and note that strings are sequences...

> self.iterpool is intended to be used for temporary allocations only.
> If you use iterpool in a public method, you should clear the pool
> before returning from the method.

It shouldn't be called "iterpool" unless iteration is involved.

> To solve the problem, we should either allocate return values in their
> own, dedicated pool, or -- even better -- convert the return values
> into true python datastructures which don't require pools at all.

That might work.  I'm still skeptical that all but the simplest
scripts can avoid pool management.  Not that we shouldn't offer
what freedom from pools we can; there are quick scripts that just
won't care.

>   2. Temporary objects should be allocated in an iterpool. When
>      our functions exit, we should clear the iterpool, so as to clear
>      out the temporary objects.

As long as we don't end up with some methods clearing the pool
upon return and others not.

Bleh, I just thought of another issue: exceptions.  If I'm
passing my own (possibly iter)pool into methods, I can clear it
or take the appropriate action when I get an exception.  If I
have no control over pools, then I might just start leaking.
That's why I'm skeptical; I just don't think all the issues are
really going to be obvious up front.

> If you invoke a low-level function pointer, and an error is reported,
> the function will simply return an error code. If you want to convert
> this error code into a Python exception, you'll need to wrap the
> function pointer invocation with SVN_ERR. This is unfortunately a
> limitation of ctypes.

I understand this, or at least I think I do.  I don't understand
what it has to do with callbacks, which is what the comment is
talking about.

> To see a list of the low-level function pointers accessible via the
> ctypes python bindings, grep functions.py for CFUNCTYPE. You'll note
> that all of the return values of these functions are marked as
> UNCHECKED because ctypes doesn't support any error checks on function
> pointer invocations.

They all seem to be callback types, which are called by svn, not
by my Python app.

> +1 on fixing both bugs. I don't care whether users return an error or
> raise an error -- whatever you think is best is fine.

For the high-level, I should be raising an exception.  I have a
partly working patch to generate separate exception classes (and
a hierarchy) from the SVN_* error codes.  Then I'll be able to
raise csvn.errors.Cancel("Caught signal").

> +1 on the concept here. I mainly wrote the RepositoryURI class for the
> benefit of mucc.py. Is there an equivalent to
> svn_path_get_longest_ancestor in urlparse?

posixpath.commonprefix

> > - Are you planning to use CallbackReceiver to turn all functions
> >   which call receiver callbacks into generators?
> 
> Sure. We could upgrade wc.info to be a generator, so that users can
> iterate over all of the different elements in the WC using a for loop.

I think log, info, and status are the ones.

> > - RemoteRepository.revprop_get should use get_dir or get_file.
> 
> Why should we use get_dir or get_file instead of revprop_list?

Heh, I guess I was more tired than I thought when I was writing
some of this.  I meant that it should use svn_ra_rev_prop to get
only the request revprop rather than fetching them all.

> > - LocalRepository.__init__
> >
> >   A separate create constructor would be better than having a
> >   create argument here.
> 
> To be clear, you think we should create a static method like
> 'LocalRepository.create' instead, which returns a repos object?
> 
> You would invoke it like this:
>   repos = LocalRepository.create("some/path")
> 
> How would the static method tell __init__ that we want to create a
> repository? The current boolean that's already in the __init__ method
> would suffice for the job -- or do you think we should use some other
> method of communication?

Yeah, it would need a bit of reworking in order to dump the
create argument, and opinions differ whether it's worth it.
Sometimes separate named constructors are provided, .open and
.create, with the documentation saying not to use cls() at all,
but I prefer lazy instantiation, which also makes this easier:

Index: repos.py
===================================================================
--- repos.py	(revision 26445)
+++ repos.py	(working copy)
@@ -344,7 +344,7 @@
        and implement the allow_access function.
     """

-    def __init__(self, path, create=False, user=User()):
+    def __init__(self, path, user=None):
         """Open the repository at PATH. If create is True,
            create a new repository.

@@ -353,14 +353,26 @@
         self.pool = Pool()
         self.iterpool = Pool()
         self._as_parameter_ = POINTER(svn_repos_t)()
-        self.user = user
-        if create:
-            svn_repos_create(byref(self._as_parameter_), path,
-                             None, None, None, None, self.pool)
+        if self.user is None:
+            self.user = User()
         else:
-            svn_repos_open(byref(self._as_parameter_), path, self.pool)
-        self.fs = _fs(self)
+            self.user = user

+    @classmethod
+    def create(cls, path, user=None):
+        self = cls(path, user)
+        self._fs = svn_repos_create(byref(self._as_parameter_), self.path,
+                                    None, None, None, None, self.pool)
+        return self
+
+    _fs = None
+    def _get_fs(self):
+        if self._fs is None:
+            self._fs = svn_repos_open(byref(self._as_parameter_), self.path,
+                                      self.pool)
+        return self._fs
+    fs = property(_get_fs)
+
     def latest_revnum(self):
         """Get the latest revision in the repository"""
         return self.fs.latest_revnum()

Another common approach is to provide an open constructor as
well, with callers never meant to use cls().

Note the user= default argument fix, too; user=User() means all
instances share the same default object.

> > - inconsistent assert
> >
> >   IMO, just drop all the asserts.  If someone fails to provide a
> >   working User instance, for example, that's their problem, and
> >   they'll get appropriate exceptions.  It's no different from
> >   passing a bool as get_rev_prop's name argument.
> >
> >   If we're going to go with the asserts, though, we should use
> >   them everywhere.  For example, LocalRepository.txn asserts
> >   self.user but _get_commit_editor does not.
> 
> Personally I find the assertions helpful because they point out that
> it's user error rather than a problem in the bindings. These
> assertions are especially helpful when they're adorned with a comment
> explaining the mistake the user made.

So how is this any different than passing in a bool as
get_rev_prop's name argument?  He's a poor programmer who assumes
that an exception in his program is someone else's fault ;->.

> _get_commit_editor is a private function so it doesn't need the
> assertion. It's already protected via LocalRepository.txn.

I see now; I missed it earlier.

> > - _get_commit_editor (and others) have comments that should
> >   become docstrings.
> 
> I typically don't write docstrings for private functions. Should I?

I'm sure opinions differ.  Documentation generators know to
separate public interfaces from private (those with a leading _
in the name).  I'm with those who say that doc strings are useful
not only to consumers but also implementors, and doc strings on
the private interfaces can help me as I work on the class itself.
Not to mention help code reviewers ;->.

> This code is borrowed from mucc.c. We build the transaction up first
> before driving the editor so that we can make sure that all of the
> editor operations happen in the right order.

Ahh, so that's the purpose.  You want to let callers not have to
worry about ordering.  Hm, I'm not sure.  We can disregard the
mucc program in thinking about the right thing to do here, as it
can sort the operations itself before driving the editor, as I
always assumed it did.

> If users want to drive the editor manually, they can use the low-level
> interface.

If we go this route, then we need a medium-level interface.
Currently, the choice is between the constraining high-level
interface and the insane, worse-than-swig ctypes interface, with
all the POINTER stuff that I don't yet understand.

> > - Txn check_path and copy use peg rev syntax in their doc
> >   strings, which is not right; I think they should say "PATH in
> >   REV" instead, to reduce confusion.
> 
> Personally, I think that PATH@REV is more precise than PATH in REV. If
> we don't specify the peg rev, folks might think we're talking about
> PATH@HEAD in REV, since the peg rev defaults to HEAD

If you want to be completely precise, say "PATH@REV in REV";
currently, it's saying "PATH@REV in HEAD", which is what I found
confusing.

> > - Txn.copy doesn't need an if for the path kind; both blocks are
> >   identical anyway.
> 
> Really? They look different to me.

Yeah, I missed the local_path=local_path part.  See earlier
comment about being tired ;->.

> > - Txn.commit -- the entire span when the edit is open needs to be
> >   wrapped in a try/except/abort, and for all exceptions.  And
> >   since an exception is already in progress, the inner except
> >   should also trap all exceptions:
> 
> Should we trap all exceptions? I suppose it's a good idea to abort the
> edit regardless of what exception happened, so +1.

I mean trap, abort, and re-raise.

> On 8/30/07, Eric Gillespie <ep...@pretzelnet.org> wrote:
> > remoterepos.RemoteRepositoryTestCase segfaults for me:
> 
> Thanks for the detailed bug report! Fixed in r26414.

I haven't looked at the core dump (probably won't this week), but
it still segfaults:

0 ctypes-python-bindings% python test/run_all.py -v
test_local_check_path (localrepos.LocalRepositoryTestCase) ... ok
test_local_get_prop (localrepos.LocalRepositoryTestCase) ... ok
test_local_latest_revnum (localrepos.LocalRepositoryTestCase) ... ok
test_remote_check_path (remoterepos.RemoteRepositoryTestCase) ... ok
test_remote_latest_revnum (remoterepos.RemoteRepositoryTestCase) ... ok
test_revprop_get (remoterepos.RemoteRepositoryTestCase) ... ok
test_revprop_list (remoterepos.RemoteRepositoryTestCase) ... ok
test_revprop_set (remoterepos.RemoteRepositoryTestCase) ... ok
test_svnimport (remoterepos.RemoteRepositoryTestCase) ... zsh: segmentation fault (core dumped)  python test/run_all.py -v

>   * I conditionalized your new error checking code on
>     os.name == "posix", because os.WIFEXITED are not available on
>     Windows.

Hm, so it's still ignoring errors and printing "Generated OK" on
Windows.  Maybe some Windows person can chime in with the correct
equivalent of the W* macros on Windows.  At the very least, you
should print out the status code and exit non-zero, not continue
on to print the "OK" message.

> For now, though, I'd like to keep the ctypes python bindings as
> self-contained as possible, so that I don't have to constantly merge
> changes between trunk and the ctypes python bindings branch.

+1

BTW, it may not sound like it from my posts so far, but it's
*because* I'm so excited by Sage's and your work that I'm looking
into this in such detail :).  And I don't feel strongly about
most of these, they're just opinions.

Thanks.

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by David James <ja...@cs.toronto.edu>.

In this mail, I've attempted to answer all the questions posed in the
thread so far. So, it's a long mail -- feel free to skip down to the
parts you find interesting :)

On 8/31/07, Eric Gillespie <ep...@pretzelnet.org> wrote:
> I've finished reviewing the high-level interface.  A lot of these
> things are probably on someone's TODO list already.
> Nevertheless, I think this needs more work before merging onto
> trunk, especially now that we're starting to firm up the 1.5
> release schedule.

Agreed.

> - Style nits:

+1 on fixing all the style nits you mentioned. I'll try to get around
to this soon and fix everything in one big style commit.

> - WC._build_path_list
>
>   This sort of magic is generally a bad idea.  It isn't at all
>   easier for callers to pass in "file" than ["file"], but
>   allowing both can lead to hidden bugs.  I looked to see if
>   other functions taking path lists behave this way, but it seems
>   to be limited to wc.py.

Currently, users can type any of the following:
  - wc.update("foo")
  - wc.update(["foo", "bar"])
  - wc.update(("foo", "bar", "baz"))

You think that we should only allow the last two forms? Thinking about
this carefully, I guess I agree -- it'll probably be easier to explain
the API if we don't have to explain those special cases.

> Not all methods use _build_path_list; lock and unlock, for
> example, just take a list.

That's a bug. Our API should be consistent.

> - WC.pool and WC.iterpool
>
>   We use the name "iterpool" for pools we clear on each iteration
>   of a loop.  I'm not sure of its usage here.  Some methods use
>   it, others use self.pool (which is never cleared).  Some
>   methods that use self.iterpool clear it, some don't.  For those
>   that clear it, I wonder about the lifetime of objects returned.
>
>   Specifically, I was thinking about commit_info.  I see that
>   WC.commit does not clear self.iterpool; is that so the returned
>   commit_info will still work?  It's no good to have these
>   methods vary so.

self.iterpool is intended to be used for temporary allocations only.
If you use iterpool in a public method, you should clear the pool
before returning from the method.

It looks like wc.commit allocates commit_info from iterpool. This is
bad, because, as you suspect, the commit info will be invalidated as
soon as the iterpool is cleared.

Looking around the bindings, it seems like many other functions suffer
from this problem.

To solve the problem, we should either allocate return values in their
own, dedicated pool, or -- even better -- convert the return values
into true python datastructures which don't require pools at all.

>   I'm not sure we can do anything to save quick scripts from pool
>   management except allocating everything in one pool that we
>   never clear.  I've always assumed that's what we do in swig,
>   though I'm not sure, as I manage pools myself.

Ideally, users who use high-level ctypes Python bindings should never
have to worry about pool management.

To isolate users from pool management, I've setup the following system:

  1. Large, long-lived C objects (such as repository objects) should be
     allocated in their own dedicated memory pool. Any private helper
     objects which share the same lifetime as the connection should also
     be allocated out of the same pool.  When Python determines that
     the object should be deleted, we should destroy the entire memory
     pool.

  2. Temporary objects should be allocated in an iterpool. When
     our functions exit, we should clear the iterpool, so as to clear
     out the temporary objects.

  3. Typical Python objects (such as strings, lists and dictionaries)
     shouldn't be allocated in pools at all. They should just be regular
     Python objects. If Subversion returns an APR array or an APR hash,
     for example, our higher level API should convert these arrays and
     hashes into Python lists and dictionaries which are allocated
     in Python memory.

Does the above system make sense?

> - I don't understand this:
>
> # As of ctypes 1.0, ctypes does not support custom error-checking
> # functions on callbacks, nor does it support custom datatypes on
> # callbacks, so ctypesgen sets the return values for callbacks
> # to c_void_p.
> #
> # Callback functions, therefore, won't have their errors checked
> # automatically. Users who call these functions should wrap
> # their function call using SVN_ERR so as to ensure that any
> # exceptions are thrown appropriately.
>
>   Does it mean that I must wrap all svn calls I make inside my
>   progress_func callback with SVN_ERR()?  Or does it mean I must
>   wrap all calls I make to an svn function takes a callback in
>   SVN_ERR()?  Also, this needs to be in user-facing documentation
>   somewhere, not buried in a comment down here.

No, that's not it. Let me explain.

If you call a regular Subversion function from Python, and an error is
reported, Python will throw an appropriate exception. This is great --
nice and Pythonic.

If you invoke a low-level function pointer, and an error is reported,
the function will simply return an error code. If you want to convert
this error code into a Python exception, you'll need to wrap the
function pointer invocation with SVN_ERR. This is unfortunately a
limitation of ctypes.

To see a list of the low-level function pointers accessible via the
ctypes python bindings, grep functions.py for CFUNCTYPE. You'll note
that all of the return values of these functions are marked as
UNCHECKED because ctypes doesn't support any error checks on function
pointer invocations.

Yes, this does need to be documented somewhere more obvious. In the
README? We don't really have a getting started guide yet but that's
probably where it would belong.

> - Why the staticmethod on the callback wrappers?  That shouldn't
>   be necessary.  If you drop that, you'll get self as the first
>   argument as normal, and can ignore the baton entirely.  Batons
>   are just a way to fake closures in C; Python code should never
>   have to bother with them.

Very cool. Nice suggestion! I didn't think that ctypes would handle
instance methods correctly, but it happily turns out I'm wrong.

> - cancel_func
>
>   How is this supposed to work?  It seems to me that for the
>   high-level side, my callback should do something like 'raise
>   csvn.CancelError("Caught signal")' and the high-level wrapper
>   should catch that and return an appropriate svn_error_t.  The
>   current high-level cancel wrapper doesn't do anything like
>   that, but it also ignores my callback's return value, so I
>   can't even do it the low-level way.  I don't think these work
>   right now.

+1 on fixing both bugs. I don't care whether users return an error or
raise an error -- whatever you think is best is fine.

>           For details of apr_off_t objects, see the apr_off_t
>           documentation."""
>
>   Cute ;->.  The high-level cb wrapper should translate to normal
>   Python longs before calling the real cb.

+1

> WC.checkout should return result_rev

+1

> - RepositoryURI.__init__
>
>   More of the magic with the uri argument, this time with an
>   encoded parameter so callers can work around it.  As with
>   character encoding, the barriers should be at clearly defined
>   places (usually at end-uesr interaction), not at random layers.
>   Just require a URI-encoded argument.

We already require a URI-encoded argument by default. +1 on removing
the auto-encode feature, as it currently isn't used anywhere.

>   I'm not sure this class is a good idea in the first place,
>   though.  It seems to me that a high-level, Pythonic interface
>   should integrate well with standard Python modules and idioms,
>   not export the bits of svn and apr that Python already has.
>   It's the same reason the high level interface should take
>   lists, strs, dicts, and so on as opposed to ctyped apr
>   equivalents.  In this case, I want to use urlparse.

+1 on the concept here. I mainly wrote the RepositoryURI class for the
benefit of mucc.py. Is there an equivalent to
svn_path_get_longest_ancestor in urlparse?

> - RemoteRepository -- well, it's not always remote.  I suggest
>   just Repository and LocalRepository.  Furthermore, the Local
>   version is missing some things, such as the import method.
>   LocalRepository should be a superset of Repository.  I suggest
>   making Local a subclass of Repository; it can just prepend
>   file:/// and let the superclass methods do all the work.

+1.

> - Rename RemoteRepository.log's verbose argument to
>   discover_changed_paths.  Same for _LogMessageReceiver.  Also,
>   _LogMessageReceiver.receiver probably ought to set
>   entry.changed_paths based on whether the changed_paths hash is
>   NULL or not.

+1.

> - Are you planning to use CallbackReceiver to turn all functions
>   which call receiver callbacks into generators?

Sure. We could upgrade wc.info to be a generator, so that users can
iterate over all of the different elements in the WC using a for loop.

Any other candidates?

> - RemoteRepository.revprop_get should use get_dir or get_file.

Why should we use get_dir or get_file instead of revprop_list?

> - RemoteRepository._log_func_wrapper is duplicated in wc.py

Oops. We should fix that and only provide one implementation
(especially since the version in wc.py is now outdated.)

> - LocalRepository.__init__
>
>   A separate create constructor would be better than having a
>   create argument here.

To be clear, you think we should create a static method like
'LocalRepository.create' instead, which returns a repos object?

You would invoke it like this:
  repos = LocalRepository.create("some/path")

How would the static method tell __init__ that we want to create a
repository? The current boolean that's already in the __init__ method
would suffice for the job -- or do you think we should use some other
method of communication?

> - LocalRepository vs. RemoteRepository
>
> [...]
>
>   Yeah, I know, svn is awful about this, but no reason to
>   propagate the mess into the high-level layer.  I vote for
>   revprop and verb_noun.

+1

> - inconsistent assert
>
>   IMO, just drop all the asserts.  If someone fails to provide a
>   working User instance, for example, that's their problem, and
>   they'll get appropriate exceptions.  It's no different from
>   passing a bool as get_rev_prop's name argument.
>
>   If we're going to go with the asserts, though, we should use
>   them everywhere.  For example, LocalRepository.txn asserts
>   self.user but _get_commit_editor does not.

Personally I find the assertions helpful because they point out that
it's user error rather than a problem in the bindings. These
assertions are especially helpful when they're adorned with a comment
explaining the mistake the user made.

I don't think we need to have full coverage for the assertions, but I
do agree it's a good idea to be consistent.

_get_commit_editor is a private function so it doesn't need the
assertion. It's already protected via LocalRepository.txn.

> - _get_commit_editor (and others) have comments that should
>   become docstrings.

I typically don't write docstrings for private functions. Should I?

> - _fs and _fs_root can lose the NOTE about being private; the
>   leading _ already indicates that.
>
> - _txn_operation.__init__ should use SVN_INVALID_REVNUM instead
>   of -1 for copyfrom_rev default argument.

+1

>   _txn_operation had me scratching my head for a while, but I
>   think I understand now.  So, the Txn class is all about
>   building up a transaction in client memory and only driving the
>   editor when the caller commits?  Why?
>
>   I'm not saying it's never a good idea, but it seems odd for
>   this to be the default mode of operation, much less the only
>   mode.

This code is borrowed from mucc.c. We build the transaction up first
before driving the editor so that we can make sure that all of the
editor operations happen in the right order.

If users want to drive the editor manually, they can use the low-level
interface.

> Also, the open method isn't open at all; rename it
>   (add_op?  queue_op?  dunno).

+1 on queue_operation

> - Most of the Txn methods do a check_path first, which is really
>   going to hurt performance, but is redundant besides.  If I try
>   to mkdir an existing path, I'll get an error.  SVN_ERR_BAD_URL
>   is the wrong code anyway; it should be SVN_ERR_FS_NOT_FOUND.
>   Regarding "No such file or directory", I think we normally just
>   say "path not found".

We could possibly get rid of the check_path calls in 'delete',
'mkdir', and 'propset'. It looks like we'll still need at least one
check_path call in copy and _upload_file.

> - Txn check_path and copy use peg rev syntax in their doc
>   strings, which is not right; I think they should say "PATH in
>   REV" instead, to reduce confusion.

Personally, I think that PATH@REV is more precise than PATH in REV. If
we don't specify the peg rev, folks might think we're talking about
PATH@HEAD in REV, since the peg rev defaults to HEAD

> - Txn.copy doesn't need an if for the path kind; both blocks are
>   identical anyway.

Really? They look different to me.

> - Txn.commit -- the entire span when the edit is open needs to be
>   wrapped in a try/except/abort, and for all exceptions.  And
>   since an exception is already in progress, the inner except
>   should also trap all exceptions:

Should we trap all exceptions? I suppose it's a good idea to abort the
edit regardless of what exception happened, so +1.

> - SvnDate -- What I want in a high-level interface is easy
>   integration with standard Python date/time modules, not
>   svn_time_to_human_string.

+1, that's a TODO.

On 8/30/07, Eric Gillespie <ep...@pretzelnet.org> wrote:
> remoterepos.RemoteRepositoryTestCase segfaults for me:

Thanks for the detailed bug report! Fixed in r26414.

On 8/31/07, Eric Gillespie <ep...@pretzelnet.org> wrote:
> I have time to look at this again now.  If I add the correct -L
> flag to your gcc command, it works.  So, I think we need to merge
> my patch with your change: take the libdir option stuff from
> mine, drop my load_library change, and then your new find_library
> should make things work.

Thanks Eric. I committed this patch to ctypesgen in r38. I committed
your companion patch to the ctypes-python-bindings branch in r26415.

I made a few tweaks to your patch:

  * I updated get_apr_config to add appropriate -L flags for Subversion
    in case the Subversion lib directory is different from the APR lib
    directory, as it is on my system.

  * I removed the code which set the "LIBRARYPATH" variable, as it is
    no longer necessary, and has been replaced by your new, more
    portable, -L directives.

  * I conditionalized your new error checking code on
    os.name == "posix", because os.WIFEXITED are not available on
    Windows.

On 8/31/07, masaru tsuchiyama <m....@gmail.com> wrote:
> I tested it on Windows XP and got the following error.
> [...]
>     raise Exception("Cannot find apr-1-config or apr-config. Please specify\n"
> Exception: Cannot find apr-1-config or apr-config. Please specify
> the full path to your apr-1-config or apr-config script
> using the --apr-config option.
>
> I think autogen.py should have the following options like as gen-make.py.
>  --with-apr=DIR
>  --with-apr-util=DIR
>  --with-httpd=DIR
> [...]

We will definitely need to make a few changes to autogen.py to make it
work on Windows. I think that we will probably need to update more
than just the "--with-apr" option here though.

Do you happen to have a copy of some recent DLLs for Subversion that
work on Windows? I don't have Visual Studio on my machine but I might
be able to get the ctypes bindings working if I had a copy of the DLLs
at least.

> But If autogen.py will be merged to gen-make.gy when merging the
> branch to trunk, it may not be a problem.

That's a good idea. After we merge to trunk, I'm planning to update
gen-make.py to also tell the ctypes python bindings where APR and
APR-util live so that users don't need to configure this twice.

For now, though, I'd like to keep the ctypes python bindings as
self-contained as possible, so that I don't have to constantly merge
changes between trunk and the ctypes python bindings branch.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by Eric Gillespie <ep...@pretzelnet.org>.

Eric Gillespie <ep...@pretzelnet.org> writes:

> I have time to look at this again now.  If I add the correct -L
> flag to your gcc command, it works.  So, I think we need to merge
> my patch with your change: take the libdir option stuff from
> mine, drop my load_library change, and then your new find_library
> should make things work.

Yep, this works (autogen.py diff is the same):

Index: wrap.py
===================================================================
--- wrap.py	(revision 36)
+++ wrap.py	(working copy)
@@ -14,20 +14,27 @@
 
 from ctypesparser import *
 import textwrap
-import sys, re, os, tempfile
+import optparse, os, re, sys, tempfile
+from errno import ENOENT
 from ctypes import CDLL, RTLD_GLOBAL, c_byte
+from ctypes.util import find_library
 
 if os.name == "nt":
-    from ctypes.util import find_library
+    def find_library_in_dirs(name, libdirs=None):
+        return find_library(name)
 else:
     # ctypes.util.find_library is buggy in Python 2.5, unfortunately,
     # so we have to parse the output of gcc -Wl,-t manually
-    def find_library(name):
+    def find_library_in_dirs(name, libdirs=None):
         expr = r'[^\(\)\s]*lib%s\.[^\(\)\s:]*' % re.escape(name)
         fdout, ccout = tempfile.mkstemp()
         os.close(fdout)
-        cmd = 'if type gcc >/dev/null 2>&1; then CC=gcc; else CC=cc; fi;' \
-              '$CC -Wl,-t -o %s 2>&1 -l %s' % (ccout, name)
+        cmd = ['if type gcc >/dev/null 2>&1; then CC=gcc; else CC=cc; fi;',
+               '$CC']
+        if libdirs is not None:
+            cmd.extend('-L' + x for x in libdirs)
+        cmd.append('-Wl,-t -o %s 2>&1 -l%s' % (ccout, name))
+        cmd = ' '.join(cmd)
         try:
             f = os.popen(cmd)
             trace = f.read()
@@ -40,10 +47,10 @@
             return None
         return res.group(0)
 
-def load_library(name, mode=RTLD_GLOBAL):
+def load_library(name, mode=RTLD_GLOBAL, libdirs=None):
     if os.name == "nt":
         return CDLL(name, mode=mode)
-    path = find_library(name)
+    path = find_library_in_dirs(name, libdirs)
     if path is None:
         # Maybe 'name' is not a library name in the linker style,
         # give CDLL a last chance to find the library.
@@ -55,7 +62,8 @@
     def begin_output(self, output_file, library, link_modules=(), 
                      emit_filenames=(), all_headers=False,
                      include_symbols=None, exclude_symbols=None,
-                     save_preprocessed_headers=None):
+                     save_preprocessed_headers=None,
+                     libdirs=None):
         self.library = library
         self.file = output_file
         self.all_names = []
@@ -73,6 +81,7 @@
             self.exclude_symbols = re.compile(exclude_symbols)
         self.preprocessor_parser.save_preprocessed_headers = \
             save_preprocessed_headers
+        self.libdirs = libdirs
 
         self.linked_symbols = {}
         for name in link_modules:
@@ -202,7 +211,7 @@
         }).lstrip()
         self.loaded_libraries = []
         for library in self.library:
-            lib = load_library(library)
+            lib = load_library(library, libdirs=self.libdirs)
             if lib:
                 self.loaded_libraries.append(lib)
                 print >>self.file, textwrap.dedent("""
@@ -336,11 +345,19 @@
                     self.all_names.append(name)
                     break
 
+def option_W(option, opt, value, parser):
+    if len(value) < 4 or value[0:3] != 'l,-':
+        raise optparse.BadOptionError("not in '-Wl,<opt>' form: %s%s"
+                                      % (opt, value))
+    opt = value[2:]
+    if opt not in ['-L', '-R', '--rpath']:
+        raise optparse.BadOptionError("-Wl option must be -L, -R"
+                                      " or --rpath, not " + value[2:])
+    # Push the linker option onto the list for further parsing.
+    parser.rargs.insert(0, value)
+
 def main(*argv):
     from tempfile import NamedTemporaryFile
-    import optparse
-    import sys
-    import os.path
 
     usage = 'usage: %prog [options] <header.h>'
     op = optparse.OptionParser(usage=usage)
@@ -363,7 +380,12 @@
     op.add_option('', '--save-preprocessed-headers', dest='filename',
                   help='Save the preprocessed headers to the specified '
                        'FILENAME')
-    
+    op.add_option("-W", action="callback", callback=option_W, type='str',
+                  metavar="l,OPTION",
+                  help="where OPTION is -L, -R, or --rpath")
+    op.add_option("-L", "-R", "--rpath", action="append", dest="libdirs",
+                  metavar="LIBDIR", help="Add LIBDIR to the search path")
+
     (options, args) = op.parse_args(list(argv[1:]))
     if len(args) < 1:
         print >> sys.stderr, 'No header files specified.'
@@ -385,7 +407,8 @@
                          all_headers=options.all_headers,
                          include_symbols=options.include_symbols,
                          exclude_symbols=options.exclude_symbols,
-                         save_preprocessed_headers=options.filename
+                         save_preprocessed_headers=options.filename,
+                         libdirs=options.libdirs,
                          )
     wrapper.preprocessor_parser.cpp = options.cpp
 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by Eric Gillespie <ep...@pretzelnet.org>.

Eric Gillespie <ep...@pretzelnet.org> writes:

> But, you keep referring to your patch, which I do not see.  Is
> your patch committed, so updating is sufficient?

I have time to look at this again now.  If I add the correct -L
flag to your gcc command, it works.  So, I think we need to merge
my patch with your change: take the libdir option stuff from
mine, drop my load_library change, and then your new find_library
should make things work.

-- 
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by Eric Gillespie <ep...@pretzelnet.org>.

"David James" <ja...@cs.toronto.edu> writes:

> We add these directories to the default linker path in autogen.py by
> setting the LIBRARY_PATH environment variable.

I don't think that's not enough.  The libdirs must be on the
system search path (i.e. listed in /etc/ld.so.conf on Linux) or
listed in the LD_LIBRARY_PATH environment variable.

> Could you try the patched version of ctypesgen on your system and see
> if it helps? You'll need to first update ctypesgen by typing ( cd
> ctypesgen && svn up ). I'm particularly interested to see if this
> works on OS X, which has a different linker and might require us to
> play with different options.

But I'm not entirely sure I understand these things correctly, so
I reverted and updated in a new working copy and tried again,
this time using the --prefix option.  Same error:

OSError: apr-1: cannot open shared object file: No such file or directory

But, you keep referring to your patch, which I do not see.  Is
your patch committed, so updating is sufficient?

> P.S. If my patch solves your problem, are there still parts of your
> patch that you'd still like me to commit? Let me know.

Yes, apr-*-config scripts provide enough information for autogen
to work without any options at all, and my patch was mainly about
getting that information, while also allowing users to specify
libdirs by hand.  Also, I fixed the error reporting.

-- 
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Merging Ctypes Python Branch

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

On 8/31/07, David James <ja...@cs.toronto.edu> wrote:
> Currently, autogen.py assumes that both apr and apr-util are installed
> under the same prefix, according to the output of apr-1-config
> --prefix. Further, you can specify where Subversion is installed by
> using the "--prefix" option for autogen.py.

You should be using apu-1-config to ask apr-util where it lives - not
assuming that they live in the same location.  And, the 'standard'
flags should be --with-apr and --with-apr-util.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org