You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Alexander Shigin <sh...@rambler-co.ru> on 2008/07/18 19:05:05 UTC

Python library enhance proposal

1. Object creation.

If we have a class Simple the initialization look something like

    simple = Simple(d={'num':1}) # or
    simple = Simple({'num':1}) 

but it's possible to make it 
    simple = Simple(num'=1) 

The solution is backward incompatible for common case, but in 90% the
issue can be solved:
    def __init__(self, __d=None, **kwargs):
        # well, there is a small risk of __d field
        if __d is not None:
            d = __d
        if 'd' in kwargs and isinstance(d, dict):
            # if 'd' is a struct field warn user
            if 'd' in (spec[2] for spec in self.thrift_spec if
spec):
                warning.warn("use first argument as a initiator")
            d = kwargs['d']
        else:
            d = kwargs
        # rest is the same

The solution doesn't work at all if thrift_spec isn't defined (fields
without number is used, please look at clause #4). 

2. Get rid of old-style classes.

I don't think if anyone need it.

3. Introduce common parent for thrift classes.

It can help if you need to create a JSON or XML representation of thrift
object (like me). It doesn't affect legacy codebase.

4. Generate thrift_spec if there is negative fields identifiers.

At the present time in the common case there is no way to get
information about thrift class in python.

There is a two possible solution:
  * make thrift_spec a map (it can give a little slowdown, but I think 
    we can't measure it);
  * introduce spec_offset variable, e.g.:
      spec_offset = 1
      thrift_spec = (
        (1, TType.STRING, 'phrase', None, None, ), # 1
        (2, TType.I16, 'count', None, None, ), # 2
      )

    and for negative values:
      spec_offset = -1
      thrift_spec = (
        (-1, TType.STRING, 'phrase', None, None, ), # -1
        None, # 0
        None, # 1
        (2, TType.I16, 'count', None, None, ), # 2
      )

The change will be backward compatible.

4. Use thrift_spec to serialize/deserialize thrift object instead of
generated code.

You can look at the possible solution in my git repository at
    http://github.com/shigin/thrift/tree/py-newlib

The change will be backward compatible.

5. Do only one import.
   
I do hate the current 
    from thrift.blabla.bla import TBla
    from thrift.foo import bar

So I prefer something like:
    import thrift
    thrift.TSocket(...)
    thrift.TBufferTransport(...)

The change will be backward compatible.

6. Get rid of T prefix for a one import.

The prefix need at the present time (I can't
write thrift.transport.TSocket.TSocket every time). But if we have one
import of thrift, it's really easy to determine that we have:
thrift.Socket.



If you find it useful I can implement it in next week or so.


Re: Python library enhance proposal

Posted by Alexander Shigin <sh...@rambler-co.ru>.
В Птн, 18/07/2008 в 11:12 -0700, David Reiss пишет:
> #2: We measured a significant slowdown with new-style classes, even
> when using slots.

Ooops, I haven't known this. I need an extra time to do some benchmarks.

> #3: I don't object to this in principle, but it forces a decision on
> new- vs. old-style classes.  It might also cause a slowdown when
> serializing structures with many unset fields because they have
> to be looked up in the parent class, though I don't expect that to
> be significant.  Can't the JSON or XML representation be produced
> by an external function like thrift.to_xml(my_object).  Think about
> len(my_list), which is not my_list.len().

I've just understood that I was on the wrong way all evening. Well, I've
done "thrift_xml(object)" method with a small troubles. But I see now
that the right way is creating my own XMLProtocol. 

I remove the proposal.

> #4a: I think I prefer the offset technique.  We'd have to modify
> fastbinary.c to handle the new style, but Ben or I can do that if
> you don't want to.

I don't think that it's a huge problem. If I have any troubles I'll beg
you for help.

> #6: As I told the Ruby guys, I prefer having the T because it makes
> it easier to just import the class directly and not have the longer
> "thrift." prefix.  I don't see the benefit of removing it.  But it
> looks like the Ruby guys ignored my opinion.  What's up there?

I have four arguments:
  * for me T prefix means "Type", not "Thrift";
  * it's a bad style to do "from module import *" (especially for
recursive inclusion);
  * it will give a huge amount of names in the top scope, i mean if we
create a new TJSONProtocol, but user has another one;
  * if you look at python standard library you won't find a huge amount
of prefixes (prefixes were forbidden by PEP8). It's a way often to use a
suffix like "Error" or "Handler".

On the other hand, every constant from errno module has a E prefix, like
it declared in POSIX. If ruby guys drop the prefix I don't see any
reason to stay with an ugly (for me) prefix.

As an alternative we can do a double interface, e.g. 
    # it's awful, but works 
    from thrift import *   # TSocket, TFramedTransport, etc
    import thrift
    thrift.Socket

But it gives another troubles: code can mix these two styles and for
legacy code it would be a pain to support one.

> I'm particularly excited about seeing #1 and #5.  I'll check out
> your git repo now.  

In me repo I use new style classes. I need a few time to review it.

> Do you think you could send your public key
> to the list so we can get your stuff on git.thrift-rpc.org?

Sure.

Re: Python library enhance proposal

Posted by David Reiss <dr...@facebook.com>.
#1: I think this is awesome.  I think we should warn whenever the
d={} style is used to phase out this behavior.

#2: We measured a significant slowdown with new-style classes, even
when using slots.

#3: I don't object to this in principle, but it forces a decision on
new- vs. old-style classes.  It might also cause a slowdown when
serializing structures with many unset fields because they have
to be looked up in the parent class, though I don't expect that to
be significant.  Can't the JSON or XML representation be produced
by an external function like thrift.to_xml(my_object).  Think about
len(my_list), which is not my_list.len().

#4a: I think I prefer the offset technique.  We'd have to modify
fastbinary.c to handle the new style, but Ben or I can do that if
you don't want to.

#4b: I haven't looked at the code yet, but I feel like this is the
right design for dynamic languages that support it because it shrinks
the (long, gross, repetitive, difficult-to-maintain) generator code
and moves more of the logic into the language in question.  It also
makes it a lot easier to make changes, especially for developers who
are not that comfortable with C++.

#5: I personally like this.  We use nested namespaces in C++ also,
and it seems like kind of a waste because we never have any name
collisions.

#6: As I told the Ruby guys, I prefer having the T because it makes
it easier to just import the class directly and not have the longer
"thrift." prefix.  I don't see the benefit of removing it.  But it
looks like the Ruby guys ignored my opinion.  What's up there?

I'm particularly excited about seeing #1 and #5.  I'll check out
your git repo now.  Do you think you could send your public key
to the list so we can get your stuff on git.thrift-rpc.org?

--David

Alexander Shigin wrote:
> 1. Object creation.
> 
> If we have a class Simple the initialization look something like
> 
>     simple = Simple(d={'num':1}) # or
>     simple = Simple({'num':1})
> 
> but it's possible to make it
>     simple = Simple(num'=1)
> 
> The solution is backward incompatible for common case, but in 90% the
> issue can be solved:
>     def __init__(self, __d=None, **kwargs):
>         # well, there is a small risk of __d field
>         if __d is not None:
>             d = __d
>         if 'd' in kwargs and isinstance(d, dict):
>             # if 'd' is a struct field warn user
>             if 'd' in (spec[2] for spec in self.thrift_spec if
> spec):
>                 warning.warn("use first argument as a initiator")
>             d = kwargs['d']
>         else:
>             d = kwargs
>         # rest is the same
> 
> The solution doesn't work at all if thrift_spec isn't defined (fields
> without number is used, please look at clause #4).
> 
> 2. Get rid of old-style classes.
> 
> I don't think if anyone need it.
> 
> 3. Introduce common parent for thrift classes.
> 
> It can help if you need to create a JSON or XML representation of thrift
> object (like me). It doesn't affect legacy codebase.
> 
> 4. Generate thrift_spec if there is negative fields identifiers.
> 
> At the present time in the common case there is no way to get
> information about thrift class in python.
> 
> There is a two possible solution:
>   * make thrift_spec a map (it can give a little slowdown, but I think
>     we can't measure it);
>   * introduce spec_offset variable, e.g.:
>       spec_offset = 1
>       thrift_spec = (
>         (1, TType.STRING, 'phrase', None, None, ), # 1
>         (2, TType.I16, 'count', None, None, ), # 2
>       )
> 
>     and for negative values:
>       spec_offset = -1
>       thrift_spec = (
>         (-1, TType.STRING, 'phrase', None, None, ), # -1
>         None, # 0
>         None, # 1
>         (2, TType.I16, 'count', None, None, ), # 2
>       )
> 
> The change will be backward compatible.
> 
> 4. Use thrift_spec to serialize/deserialize thrift object instead of
> generated code.
> 
> You can look at the possible solution in my git repository at
>     http://github.com/shigin/thrift/tree/py-newlib
> 
> The change will be backward compatible.
> 
> 5. Do only one import.
>   
> I do hate the current
>     from thrift.blabla.bla import TBla
>     from thrift.foo import bar
> 
> So I prefer something like:
>     import thrift
>     thrift.TSocket(...)
>     thrift.TBufferTransport(...)
> 
> The change will be backward compatible.
> 
> 6. Get rid of T prefix for a one import.
> 
> The prefix need at the present time (I can't
> write thrift.transport.TSocket.TSocket every time). But if we have one
> import of thrift, it's really easy to determine that we have:
> thrift.Socket.
> 
> 
> 
> If you find it useful I can implement it in next week or so.
>