You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Noble Paul (JIRA)" <ji...@apache.org> on 2008/08/22 12:54:44 UTC

[jira] Created: (THRIFT-122) Allow for heterogeneous collections

Allow for heterogeneous collections
-----------------------------------

                 Key: THRIFT-122
                 URL: https://issues.apache.org/jira/browse/THRIFT-122
             Project: Thrift
          Issue Type: New Feature
            Reporter: Noble Paul


Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections

implementation details 

the IDL can allow syntax 
{code}
list<?>
set<?>
map<?,?>
map<?,the-type>
map<the-type,?>
{code}


While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
for a List/Set use a type modifier 1 to specify that it is heterogeneous

If it is a homogeneous collection do it the way it is done now.

Or else

add type information just before the data. So it adds an extra byte/element 

For ma



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624873#action_12624873 ] 

Alexander Shigin commented on THRIFT-122:
-----------------------------------------

Noble, I've got your point. Do you try to use different versions to fit your requirement?

My point is: if you change a protocol, you should change the protocol for all existent language. Your proposal will make a huge correction to binary protocol and will affect all code base.

The backward compatibility issue can be resolved by creating new types like hlist for list, hmap for map and hset for set, but anyway it requires change binary protocol.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Shigin updated THRIFT-122:
------------------------------------

    Attachment: thrift-any-type.patch

I finally can show some working implementation of 'any' type for python and C++. There is a huge additional work:

- use shared_ptr in C++ library;
- makes -Wall happy (gcc complains about unused *Register);
- make better any_cast for C++;
- make an any support for dense, debug and json protocols;
- add some documentation;
- extend test suite and make it a bit more beauty;
- implement any type for python's AcceleratedBinaryProtocol.

Here is a brief description of serialization of any:
- any : type-id data;
- if type-id is struct data consist of md5 fingerprint of structure and data of the structure.

If you use child class of some auto generated thrift file, you can re-register your class, so your class'll be deserialized and you don't need copy your data.

And the main problem: makes support for other languages. I think it's really ease for ruby and perl (it only takes 123 lines for python), but I know nothing about java and cocoa.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>         Attachments: thrift-any-type.patch
>
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624930#action_12624930 ] 

noble.paul edited comment on THRIFT-122 at 8/22/08 11:04 AM:
-------------------------------------------------------------

bq.I am familiar with Java and Python and I have found the use of heterogeneous collections to be extremely rare

If you ever use Solr you will see how useful that is.  The output object is one huge mix of collection objects. 
In our organization we tend to use a mix of collections to pass around data. 

bq.he problem is that it is so hard to write code to deal with a collection that might contain a combination of number, strings, lists, and objects. You effectively have to do a switch statement on every element, which makes for unwieldy code

There are only a finite no:of types to deal with in any structure basic+ struct types. I'll be surprised to see if there are a total of more than 25 types. out of those,  15 are known types(primitives) so they do not have to be there in the generated code they can be in the runtime code. Say you have around 10 custom types . So a switch with 10 cases really is not that complex.  Take a look at the readVal() in [this class | http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup ] this is our custom binary format. 





      was (Author: noble.paul):
    bq.I am familiar with Java and Python and I have found the use of heterogeneous collections to be extremely rare

If you ever see use Solr you will see how useful that is.  The output object is one huge mix of collection objects. 
In our organization we tend to use a mix of collections to pass around data. 

bq.he problem is that it is so hard to write code to deal with a collection that might contain a combination of number, strings, lists, and objects. You effectively have to do a switch statement on every element, which makes for unwieldy code

There are only a finite no:of types to deal with in any structure basic+ struct types. I'll be surprised to see if there are a total of more than 25 types. out of those,  15 are known types(primitives) so they do not have to be there in the generated code they can be in the runtime code. Say you have around 10 custom types . So a switch with 10 cases really is not that complex.  Take a look at the readVal() in [this class | http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup ] this is our custom binary format. 




  
> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626419#action_12626419 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

bq.How about using fingerprint of the structure instead of the name? 
I thought the name is quite unique enough in an IDL file. Isn't the name itself a good enough ? The cost will be minimal cost if we use the name as an 'extern string'. 

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624942#action_12624942 ] 

noble.paul edited comment on THRIFT-122 at 8/22/08 11:31 AM:
-------------------------------------------------------------

bq.I think it should be a structure instead of a map. This is safer, faster, and easier to reason about.

I wish we had that flexibility. The data is enormous. Each RequestHandler creates it's own set of data. We do not create custom structures because there will be 100's of them . It is nearly impossible to manage that. Just to avoid custom structures we resort to this trick. sometimes the value is another collection type and so on and so forth. So every time a new value is added we do not have to generate classes and compile them and update server and client for so many different languages

Moreover we allow users to write their own requesthandlers. How will they write out there own data? They cannot create IDL and compile and deploy it .

Because every user knows what is the associated value for a name is , it all works well. So with just one class (one class each for each language php/ruby/json) we manage reading and writing of any data spit out by our various request handlers. It is an interesting usecase but It is very convenient and works well. And we actually use it in our organization also and I have seen this in many places.

      was (Author: noble.paul):
    bq.I think it should be a structure instead of a map. This is safer, faster, and easier to reason about.

I wish we had that flexibility. The data is enormous. Each RequestHandler creates it's own set of data. We do not create custom structures because there will be 100's of them . It is nearly impossible to manage that. Just to avoid custom structures we resort to this trick. sometimes the value is another collection type and so on and so forth. So every time a new value is added we do not have to generate classes and compile them and update server and client for so many different languages

Because every user knows what is the associated value for a name is , it all works well. So with just one class (one class each for each language php/ruby/json) we manage reading and writing of any data spit out by our various request handlers. It is an interesting usecase but It is very convenient and works well. And we actually use it in our organization also and I have seen this in many places.
  
> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624930#action_12624930 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

bq.I am familiar with Java and Python and I have found the use of heterogeneous collections to be extremely rare

If you ever see use Solr you will see how useful that is.  The output object is one huge mix of collection objects. 
In our organization we tend to use a mix of collections to pass around data. 

bq.he problem is that it is so hard to write code to deal with a collection that might contain a combination of number, strings, lists, and objects. You effectively have to do a switch statement on every element, which makes for unwieldy code

There are only a finite no:of types to deal with in any structure basic+ struct types. I'll be surprised to see if there are a total of more than 25 types. out of those,  15 are known types(primitives) so they do not have to be there in the generated code they can be in the runtime code. Say you have around 10 custom types . So a switch with 10 cases really is not that complex.  Take a look at the readVal() in [this class | http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/util/NamedListCodec.java?revision=685640&view=markup ] this is our custom binary format. 





> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624938#action_12624938 ] 

David Reiss commented on THRIFT-122:
------------------------------------

I the "map" has fields of a known name with a known type, I think it should be a structure instead of a map.  This is safer, faster, and easier to reason about.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624849#action_12624849 ] 

Alexander Shigin commented on THRIFT-122:
-----------------------------------------

It seems really hard to implement for C++. In the case of heterogeneous list, every thrift type must be a child of some TAbstractClass: int, float and bool. It also breaks backward compatible: it makes to change binary and dense protocol.

I do not see any advantages of heterogeneous collections for protocol like thrift.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated THRIFT-122:
------------------------------

    Summary: Allow heterogeneous collections  (was: Allow for heterogeneous collections)

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624958#action_12624958 ] 

David Reiss commented on THRIFT-122:
------------------------------------

If a user is deploying code for their own request handler, couldn't they also deploy the generated code needed to [de]serialize it?  If a client is willing to write code to handle a given type of response, wouldn't they be willing to build the generated code to [de]serialize it?  I would posit that will such a large variety of types, you might actually be better served by having them all defined in one authoritative IDL file (or a set of files) rather than having to maintain documentation for what fields are present and what types they might have, especially when that documentation can easily get out of date.  Also, having statically-typed data will make your code (both the [de]serialization code and the application code) faster, and you seem to be quite concerned with efficiency.

However, assuming that you are, for whatever reason, unable to change your data model, allow me to explain the variant idea that was previously discussed on the old mailing list.  A "variant" would be a type that would be encoded on the wire as a structure with a single field.  The field would either be a number, a string, a list of variants, or a map from strings to variants (this could be tweaked a bit).  Any language that can't properly represent a variant (or hasn't implemented it yet) can safely skip it because it is wire-compatible with a structure.  A language that does support variants would represent it in memory in the most natural way possible.  I know you said that backwards compatibility is not a concern *for you*, but this design has the benefit that it can be fairly easily implemented in a language that does not support the full compact protocol.  It also supports a few more use cases than heterogenous collections, while adding less complexity (I think) to Thrift's data model.  Therefore, I think it is a better feature for us to try to implement.  (And it seems like it would cover your use case as well.)

It is actually possible for variants to contain structs as well while preserving compatibility, but it is pretty much impossible to implement in C++ and in other languages it would either require some sort of global registry of Thrift types (which I am opposed to) or instantiating a class by name (which is both tricky and inefficient).  I would request that anyone requesting the feature of mixing statically and dynamically typed data in this way think long and hard about whether it is truly necessary and how to implement it cleanly is several languages.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625134#action_12625134 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

For WildCard types in C++ a dumb proposal.
{code}
class WildCard {
    std::string idl_name; //the name as in idl
    void* data; // a pointer let the user typecast and use it . This is how java users use it.just that java 
                // makes it easy by with instanceof . If somebody knows well what to expect it becomes even easier
}
{code}

So if  {{list<?>}} is the type the generated code can put it as {{list<WildCard>}}.

It is not very elegant but it should work.Considering the fact that somebody is using this feature , most likely is is using languages like java, so it should be OK.

My knowledge of C++ is quite dated correct me if I am wrong

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625973#action_12625973 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

Let us assume that new 'denseprotocol' is not going to be binary-compatible with binaryprotocol because it will have to be so different. We decided that we will just keep the API common but wire format different. 

On implementation of this feature on the wire. 

Any wildcard collection will be written down in the following way .

# every element will write down the type associated with the object first . 
# If the type is a STRUCT . The idl name name of the STRUCT will be writtten down as an 'extern' string .This ensures that we only write the name once per-struct per payload
# then the value is written down using the Object's write(TProtocol ) method

on the read side

# the type is read first . 
# if it is a non-struct type , read the data
# if it is a struct read the struct-name 
# Every language can have specific implementation on how to read this.
#* In java we can keep an idl-name->java-class (in the <filename>_constants.java) and do a new instance of the class  and the instance can read the struct . If the name is unknown ,skip the object and the value will be null
#* C++ can just read field by field into a byte[] and keep a void pointer to that. Users can type cast and use as they wish 

The caveat is that every Object they put into the collection must be a known 'primitive' type or an  idl type.  If it does not meet this criterion it will be written down as null 

bq.I see a little problem with versioning, if we have two structures 
bq.Can V2 read data from V1 or the first field should be skipped?

It must be skipped . If somebody wishes to change a type change the 'id' of the field also should change because the generated code will be incompatible

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625843#action_12625843 ] 

Alexander Shigin commented on THRIFT-122:
-----------------------------------------

Noble, i don't see any ways to implement wild card for structures. We can use boost::any for float/string/int types (and perhaps for list, map and set). But neither dense nor binary protocols don't send structure or field name. And I find really difficult to determine the structure type run-time.

If you need wild card only for float, int and string types I think it can be implemented. It is a backward compatible feature.

If you have any thoughts about implementing wild cards for structures, I'll reviewed my plan.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624934#action_12624934 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

I may not really need a switch in the appcode. 
for instance see the sample code that we have 
{code}
Long l = (Long)map.get("size");//because I know size is a Long val
String s = (String)map.get("name");
{code}
A union is a workaround. Bedsides that a union of around 10-15 fields will be impractical

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624906#action_12624906 ] 

David Reiss commented on THRIFT-122:
------------------------------------

As I said on the other issue, I'm not convinced this is the right way to go.  I would recommend creating a union-like structure that can contain all of the types you want in your collection.  Other possibilities that were discussed in a previous thread are adding explicit support for unions or a generic variant type that would allow (nearly) arbitrary combinations of base types and collections (but not structures).  Either of these proposals would (IMO) cover the heterogeneous collection use case, and neither would require any changes to existing protocol implementations (they would be serialized as structures and only have different in-memory behavior).

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624932#action_12624932 ] 

David Reiss commented on THRIFT-122:
------------------------------------

I didn't mean a switch in the generated code.  I meant in the application code.  Besides, if the set of types is as small as you say, it should be that much easier to write a union for it. ;)

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640794#action_12640794 ] 

Alexander Shigin commented on THRIFT-122:
-----------------------------------------

If you want to place structures into any in C++, you have to compile your thrift file with :any switch:
{code}
gen-cpp/anytest_types.cpp: anytest.thrift
    $(THRIFT) --gen cpp:any $<
{code}

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>         Attachments: thrift-any-type.patch
>
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625868#action_12625868 ] 

Alexander Shigin commented on THRIFT-122:
-----------------------------------------

I see a little problem with versioning, if we have two structures
{code}
struct V1 {
  1: list<string> test
}
{code}

and 

{code}
struct V2 {
  1: list<?> test
}
{code}

Can V2 read data from V1 or the first field should be skipped?

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated THRIFT-122:
------------------------------

    Description: 
Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this

implementation details 

the IDL can allow syntax 
{code}
list<?>
set<?>
map<?,?>
map<?,the-type>
map<the-type,?>
{code}


While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
for a List/Set use a type modifier 1 to specify that it is heterogeneous

If it is a homogeneous collection do it the way it is done now.

Or else

add type information just before the data. So it adds an extra byte/element 

For ma



  was:
Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections

implementation details 

the IDL can allow syntax 
{code}
list<?>
set<?>
map<?,?>
map<?,the-type>
map<the-type,?>
{code}


While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
for a List/Set use a type modifier 1 to specify that it is heterogeneous

If it is a homogeneous collection do it the way it is done now.

Or else

add type information just before the data. So it adds an extra byte/element 

For ma




> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624921#action_12624921 ] 

David Reiss commented on THRIFT-122:
------------------------------------

I am familiar with Java and Python and I have found the use of heterogeneous collections to be extremely rare.  The problem is that it is so hard to write code to deal with a collection that might contain a combination of number, strings, lists, and objects.  You effectively have to do a switch statement on every element, which makes for unwieldy code.  The only compelling use case that I'm aware of is a collection of a superclass type where the individual objects are of different subclasses.  This works cleanly because of virtual methods.  However, no Thrift types support virtual methods, so this case does not apply to Thrift.  If you want to explain your use case in more detail, you might be able to convince me that this feature is "extremely important".  Otherwise, I will continue to argue against a feature that I consider to be adding a harmful amount of complexity to Thrift.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-122) Allow heterogeneous collections

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-122:
---------------------------------

    Priority: Trivial  (was: Major)

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>            Priority: Trivial
>         Attachments: thrift-any-type.patch
>
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625020#action_12625020 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------


Let us assume that some languages have practical difficulty with this kind data structures. Even if a language does not support this it will be easily be able to skip the element very easily because the type info of each field is also encoded there.

>From what you have been saying C++ is the language which will have most difficulty. For other languages it may just work fine? .C++ can at least return  a void* for such data structures 

Let us explore how we can elegantly support this in C++. Let us not say that it is *difficult* so we will not do it. It is our job to make the difficult things possible. 

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624914#action_12624914 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

I am not expecting this to be available in the existing binary protocol. So no backward compatibility problems must arise (If that is what you are concerned about) . I can change the issue description

I am willing to wait for the new dense protocol to complete because anyway this release of Solr will not need it. So next release is what we are aiming for.

bq.As I said on the other issue, I'm not convinced this is the right way to go.

This issue is not specifically for my own project . I may be able to work around the problems in some way. If you are familiar with programming in languages like java, C# ,python etc you may know that it is an extremely common practice to have heterogeneous collections. Having native support  for those use cases is *extremely important* . 







> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Alexander Shigin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626283#action_12626283 ] 

Alexander Shigin commented on THRIFT-122:
-----------------------------------------

How about using fingerprint of the structure instead of the name?

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Chad Walters (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624861#action_12624861 ] 

Chad Walters commented on THRIFT-122:
-------------------------------------

Alex, clearly Noble sees some advantages to this for his use cases, so let's not dismiss it out of hand.

I agree that the C++ implementation has some challenges and I'd like to see some proposals regarding the C++ implementation as a key part of the design of this feature.

I also agree that this will require support in all the existing protocol implementations.

I will note that this can, however, be added in a backwards compatible manner by adding new first class types. Those who don't wish to use these types can simply avoid using them in the IDL. Unless they complicate the implementation tremendously, I don't see why we shouldn't at least explore what this might look like.

Noble, can you provide more specifics on what you would like to see? The initial comment is incomplete. Is it good enough to support only non-homogeneous base types (integer, float, or string)? Or do you require non-homogeneous collections to support compound types and nested collection types as well?

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625848#action_12625848 ] 

David Reiss commented on THRIFT-122:
------------------------------------

I agree with Alexander.  The problem is not representing the structures in memory, but writing the serialization and deserialization code.

I like the boost::any idea, but it still has the limitation that if we allow structures, clients can never be sure that their code covers all possible cases.  If we only allow variants to be numbers, strings, lists of variants, or maps of strings (or perhaps numbers) to variants, then it is fairly easy to write code that handles all cases.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Chad Walters (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624814#action_12624814 ] 

Chad Walters commented on THRIFT-122:
-------------------------------------

I am not sure that the type modification mechanism can/should be used here. Type modifications are optional hints and can be ignored by protocols. These non-homogeneous collection types cannot be ignored by the protocols -- they will require special handling by the existing protocol implementations. If we are adding support for these, I believe they should become full-fledged types.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "David Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626423#action_12626423 ] 

David Reiss commented on THRIFT-122:
------------------------------------

Two different structures that are structurally identical will have the same fingerprint, even though they are different types with different field names.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625196#action_12625196 ] 

Luke Lu commented on THRIFT-122:
--------------------------------

Unless I missed something, it seems boost::any would work just fine. variable_map in boost::program_options is such a heterogeneous collection.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624865#action_12624865 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

A lot of it is covered in THRIFT-110 but let us cover them once again

Our (Solr) requirements are actually very simple.

Our output is an Object which comprises of collections (lists,maps,primitives) and one custom type object  . So actually we need base types as well as structs in these collections. 

Currently we support a lot of formats xml,json,ruby and a very compact binary format (works only w/ java) . If thrift has support for heterogeneous collections then we can consider this as a format for fast data transfer. I am not aware of our clients using C++ . They are mostly web shops and use java/php/python/ruby .



> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624942#action_12624942 ] 

Noble Paul commented on THRIFT-122:
-----------------------------------

bq.I think it should be a structure instead of a map. This is safer, faster, and easier to reason about.

I wish we had that flexibility. The data is enormous. Each RequestHandler creates it's own set of data. We do not create custom structures because there will be 100's of them . It is nearly impossible to manage that. Just to avoid custom structures we resort to this trick. sometimes the value is another collection type and so on and so forth. So every time a new value is added we do not have to generate classes and compile them and update server and client for so many different languages

Because every user knows what is the associated value for a name is , it all works well. So with just one class (one class each for each language php/ruby/json) we manage reading and writing of any data spit out by our various request handlers. It is an interesting usecase but It is very convenient and works well. And we actually use it in our organization also and I have seen this in many places.

> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (THRIFT-122) Allow heterogeneous collections

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626419#action_12626419 ] 

noble.paul edited comment on THRIFT-122 at 8/27/08 10:40 PM:
-------------------------------------------------------------

bq.How about using fingerprint of the structure instead of the name? 
I thought the name is quite unique enough in an IDL file. Isn't the name good enough ? The cost will be minimal cost if we use the name as an 'extern string'. 

      was (Author: noble.paul):
    bq.How about using fingerprint of the structure instead of the name? 
I thought the name is quite unique enough in an IDL file. Isn't the name itself a good enough ? The cost will be minimal cost if we use the name as an 'extern string'. 
  
> Allow heterogeneous collections
> -------------------------------
>
>                 Key: THRIFT-122
>                 URL: https://issues.apache.org/jira/browse/THRIFT-122
>             Project: Thrift
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> Currently thrift only supports homogeneous collections . But , that is very restrictive for many languages which allows heterogeneous collections. It does not have to be supported in BinaryProtocol The new DenseProtocol may add support for this
> implementation details 
> the IDL can allow syntax 
> {code}
> list<?>
> set<?>
> map<?,?>
> map<?,the-type>
> map<the-type,?>
> {code}
> While writing down data use a type modifier to say whether key (1), value(2) or both(3) are wild cards
> for a List/Set use a type modifier 1 to specify that it is heterogeneous
> If it is a homogeneous collection do it the way it is done now.
> Or else
> add type information just before the data. So it adds an extra byte/element 
> For ma

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.