You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Mathias Herberts (JIRA)" <ji...@apache.org> on 2011/01/10 00:26:45 UTC

[jira] Created: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Container types containing binary data are parameterized with ByteBuffer in the generated Java code
---------------------------------------------------------------------------------------------------

                 Key: THRIFT-1035
                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
             Project: Thrift
          Issue Type: Bug
          Components: Java - Compiler, Java - Library
    Affects Versions: 0.5, 0.4, 0.6, 0.7
         Environment: All
            Reporter: Mathias Herberts


Since THRIFT-830, binary fields are internally handled using ByteBuffer.

Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).

THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.

During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].

list<binary> -> List<ByteBuffer>
set<binary> -> Set<ByteBuffer>
map<binary,...> -> Map<ByteBuffer,...>
map<...,binary> -> Map<...,ByteBuffer>

This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.

We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980342#action_12980342 ] 

Bryan Duxbury commented on THRIFT-1035:
---------------------------------------

The nested case is a good example. In that situation, my wrapper workaround wouldn't succeed.

Honestly, my inclination is not to fix this problem at all - if our two options are 1) toss out consistency and performance opportunities to fix a backwards compatibility issue that's already a version old or 2) do nothing, I lean towards option 2.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980333#action_12980333 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

I think developers using Thrift expect a.setBinary_map(mymap) to not alter or replace 'mymap' with anything else, at least that's what the set semantics of Java beans mean to me, so replacing 'mymap' with another map with the same content does not seem to be right as 'mymap' could have special properties (could be a ConcurrentHashMap or a fancy collection from com.google.common.collect).

And let's consider scenario B:

struct B {
  1: map<string, map<string, binary>> bin_map,
}

B b = new B();

b.setBin_map(new HashMap<String,Map<String,byte[]>>());
b.getBin_map().put("foo", new HashMap<String,byte[]>());

unless you wrap the original Map and catch the call to put there, I don't think the approach you mentioned works that easily.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979488#action_12979488 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

I'd like to have feedback on the following:

What if we modify the Java generator so that containers of binary data (list,set,map) are implemented as List, Set, Map parameterized with byte[], while retaining the ByteBuffer implementation for binary fields not in containers?

This would still provide the ability to benefit from the ByteBuffer optimization by replacing 'binary' with a simple wrapper 'struct BinaryWrapper { 1: binary content }' in container types which might benefit from the faster I/Os.

TProtocol would need to be  modified to have a writeBinary(byte[])
 

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979435#action_12979435 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

Another problem arising from this, even though it is not a strict requirement, as java.nio.ByteBuffer does not implement Serializable, the types used for the containers containing binary data can no longer be serialized (in the sense of Java Serialization).

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980383#action_12980383 ] 

Bryan Duxbury commented on THRIFT-1035:
---------------------------------------

bq. ByteBuffer are not Thread Safe and thus would impose extraneous precautions on their use.

Neither are byte[], in the sense that you can easily clobber the contents being used by other threads if you're not being careful. Besides, in 99.9% of cases, the ByteBuffer is going to be wrapped around a byte[], so you can use it just as thread-safely.

bq. I agree that my fix introduces an inconsistency internally, but on the user facing side we've worked after THRIFT-830 so binary fields could be set/retrieved using byte[], right now collections of binary fields cannot use byte[], so we suffer from a lack of external consistency which, in my point of view, is more important to have than internal consistency.

I'm actually not talking about internal consistency. I'm talking about consistently giving users a way to access the ByteBuffer version if that's what they want. Overall, if we were creating Thrift Java from scratch today, *everything* would be using ByteBuffer. We're only supporting the byte[] getter for backwards compatibility. From our discussion, I think that the cost of giving a backwards compatible interface to collections is very high, and that's why I'm counseling that we don't do it.

bq. On the performance side, could you pinpoint cases where using byte[] in collections would be less efficient that the ByteBuffers, since the retrieved arrays will be those backing the ByteBuffers when deserializing and the ByteBuffer passed to the serialization code are simply wrapping the byte[] in the collections.

If you're putting a collection *into* a Thrift struct, then you're correct - there is no positive performance impact of using ByteBuffers over byte[] (unless you already have ByteBuffers from NIO operations...). However, on the read side, having to read into a byte[] when you could just tag the underlying buffer with a ByteBuffer is a performance loss. That's what I'm referring to. If we make all collections use byte[] instead of ByteBuffer, you never get the ByteBuffer read benefits.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980326#action_12980326 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

I don't see how you can catch the puts if I do the following:

a.setBinary_map(new HashMap<String,byte[]>());

and then fiddle with the collection returned by a.getBinary_map()

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980441#action_12980441 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

I hear your arguments loud and clear, but I still think that you're underestimating the difficulty most people will have working with ByteBuffers.

And from my point of view the risk associated with the exposure of the internal buffers mitigates the benefits of the performance gain to the point that the ByteBuffer way should be a Java generator option.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980263#action_12980263 ] 

Bryan Duxbury commented on THRIFT-1035:
---------------------------------------

I don't think that going back to byte[] just for collections is the way to go. How about, instead, we offer two sets of accessors for collections with binary elements: one with the ByteBuffers, and one with byte[]. Then we could make a wrapper list/set/map implementation that pulls the byte[] out of the ByteBuffers as needed.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980359#action_12980359 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

I see this issue as more than a backwards compatibility one. To me it's a matter of making Thrift usable by a vast majority of programmers.

As we've seen in THRIFT-830 and THRIFT-882, using ByteBuffer can be tricky even for expert programmers, to boot, ByteBuffer are not Thread Safe and thus would impose extraneous precautions on their use.

For example, in the Cassandra  API, SlicePredicate has a list<binary> field, this will lead to List<ByteBuffer> objects being generated and thus their use would impose extra synchronization.

I agree that my fix introduces an inconsistency internally, but on the user facing side we've worked after THRIFT-830 so binary fields could be set/retrieved using byte[], right now collections of binary fields cannot use byte[], so we suffer from a lack of external consistency which, in my point of view, is more important to have than internal consistency.

On the performance side, could you pinpoint cases where using byte[] in collections would be less efficient that the ByteBuffers, since the retrieved arrays will be those backing the ByteBuffers when deserializing and the ByteBuffer passed to the serialization code are simply wrapping the byte[] in the collections.

I'd like to have other people express their view on this issue, I think it would be really sad to go with your second option.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979714#action_12979714 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

Another annoyance with ByteBuffer is that none of their method are thread safe, therefore bad things might happen :-(

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980406#action_12980406 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

I understand the performance benefit when the returned ByteBuffers are slices of the underlying buffer, but don't you feel this exposes internal structures too much?

With this approach you can access the array backing the transport's buffer and modify it without you knowing it (i.e. a user simply using the returned ByteBuffer), which means that modifying one binary field (the uderlying array) might modify another binary field unintentionnaly (because the user misunderstood how ByteBuffers worked and assumed the buffer's array was all his). Therefore I don't think ByteBuffers would be the best choice if Thrift was started from scratch today.

But we're not talking about rewriting Thrift...

bq. I think that the cost of giving a backwards compatible interface to collections is very high, and that's why I'm counseling that we don't do it.

On the performance benefit, the re-use of underlying buffers (byte[]) is only done in selected transports, for other transports (TSocket, THttpClient), the use of ByteBuffer induces a performance penalty since not only do we have to allocate a new byte[] but also a ByteBuffer that wraps that array.



> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980324#action_12980324 ] 

Bryan Duxbury commented on THRIFT-1035:
---------------------------------------

I don't see this as a problem at all. We can make the wrapping collection class do wraps on puts too.

The BinaryWrapper approach is costly and unnecessary, since it adds multiple bytes to the serialized form and extra references to the in-memory structure.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980300#action_12980300 ] 

Mathias Herberts commented on THRIFT-1035:
------------------------------------------

If we go the way you suggest, how do you imagine the following scenario:

struct A {
  1: map<string,binary> bin_map,
}

A a = new A();

a.putToBin_map("foo", "bar".getBytes());

// At that point you might have wrapped the value into a ByteBuffer, but then:

a.getBin_map().put("foo", "foobar".getBytes());

// Won't be caught by Thrift and thus the internal ByteBuffer representation will be broken.

I think it's better we use ByteBuffer internally for binary fields that are not included in collections but byte[] for those that are. If you really want to have ByteBuffers, you can always use

struct BinaryWrapper {
  1: binary content,
}

and declare collections which contain BinaryWrapper, this will add an additional Object but might still lead to better performance for large arrays.


> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathias Herberts updated THRIFT-1035:
-------------------------------------

    Attachment: THRIFT-1035-2.patch

New version of the patch which adds the generation of an additional constructor which accepts byte[] instead of ByteBuffer.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980430#action_12980430 ] 

Bryan Duxbury commented on THRIFT-1035:
---------------------------------------

bq. don't you feel this exposes internal structures too much?

No, because if you expose any less, *you lose the performance benefit.*

bq. modifying one binary field (the underlying array) might modify another binary field

This is impossible, if you're actually respecting the boundaries given to you in the ByteBuffer.

bq. the re-use of underlying buffers (byte[]) is only done in selected transports, for other transports (TSocket, THttpClient), the use of ByteBuffer induces a performance penalty since not only do we have to allocate a new byte[] but also a ByteBuffer that wraps that array.

Untrue. Firstly, users should never be using a raw TSocket - you always want a BufferedTransport or a FramedTransport on top of it. The performance without it is atrocious. Secondly, THttpClient *should* be using a Framed Transport or a TMemoryInputTransport to wrap the underlying buffer/stream/whatever it gets back for the same reasons. Thus, to my knowledge, every sane transport benefits from the use of ByteBuffers. Even TDeserializer takes advantage of the direct-buffer-access wrapping opportunities.

bq. But we're not talking about rewriting Thrift...

No, but we are talking about advancing it as a project. The switch to ByteBuffers was a discussed means to improving performance and saving memory, etc. As much as I like to make users happy, I don't think we should be bending over backwards to make Thrift "easy" if it means we can't offer other things like performance, flexibility, and consistency.

For that matter, I don't even buy the argument that it's that much easier. I think it's just that it's the familiar way. There are helper methods to bridge the gap, and the use of ByteBuffers encourages good application design.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathias Herberts closed THRIFT-1035.
------------------------------------

    Resolution: Won't Fix

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Mathias Herberts (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathias Herberts updated THRIFT-1035:
-------------------------------------

    Attachment: THRIFT-1035.patch

This patch modifies the Java generator so it generates byte[] for binary data inside collections.

All existing tests pass.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-1035) Container types containing binary data are parameterized with ByteBuffer in the generated Java code

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980329#action_12980329 ] 

Bryan Duxbury commented on THRIFT-1035:
---------------------------------------

Doing the set call you propose would definitely result in making a new Map<String,ByteBuffer> with the contents of the argument. Subsequent "fiddles" with the result of getBinary_map() would be against the new, wrapped version.

> Container types containing binary data are parameterized with ByteBuffer in the generated Java code
> ---------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1035
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1035
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Compiler, Java - Library
>    Affects Versions: 0.4, 0.5, 0.6, 0.7
>         Environment: All
>            Reporter: Mathias Herberts
>         Attachments: THRIFT-1035-2.patch, THRIFT-1035.patch
>
>
> Since THRIFT-830, binary fields are internally handled using ByteBuffer.
> Release 0.4.0 was the first to expose the ByteBuffer to the outside world (replacing previous methods returning/accepting byte[]).
> THRIFT-882 lead to the methods accepting/returning byte[] being available again, as it was deemed more reasonable not to expose the ByteBuffer too much as their use could be cumbersome. This lead to 0.5.0 being backward compatible with 0.3.0 on the binary fields front.
> During that time, nobody noticed that container types that contained binary data had their generated Java code changed to collections parameterized with ByteBuffer instead of byte[].
> list<binary> -> List<ByteBuffer>
> set<binary> -> Set<ByteBuffer>
> map<binary,...> -> Map<ByteBuffer,...>
> map<...,binary> -> Map<...,ByteBuffer>
> This introduces confusion in the API and still exposes ByteBuffer when discussion on THRIFT-882 concluded this should be avoided.
> We need to provide a way to offer the original parameterization with byte[] as this will simplify working with that type of collection and thus will increase the odds of Thrift's adoption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.