You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Jens Geyer <je...@hotmail.com> on 2014/03/06 09:06:54 UTC

list

How about a Thrift compiler warning whenever a list<byte> occurs in the IDL? Seems to be a common beginner's trap, I have seen that quite a few times now.

Opinions?
________________________________
Von: Ben Craig
Gesendet: 05.03.2014 23:40
An: dev@thrift.apache.org
Betreff: Re: File Upload through thrift

Don't use list<byte>.  Large list performance is pretty bad, because
elements tend to be serialized one at a time.  Use 'binary' or 'string'
instead.  You also get significantly fewer copies this way.

Rush Manbert <ru...@manbert.com> wrote on 03/05/2014 04:31:47 PM:

> From: Rush Manbert <ru...@manbert.com>
> To: dev@thrift.apache.org,
> Date: 03/05/2014 04:32 PM
> Subject: Re: File Upload through thrift
>
> Hi Sachith,
>
> I would define a thrift struct that has metadata about the file,
> followed by a member of type list<byte> that stores the file data. I
> would also define the metadata so it can support transmitting the
> file in chunks (data that says this is chunk m of n, for instance),
> where each chunk is sent in a separate message to the server. I
> would also include some unique cookie value so you can differentiate
> between two client processes sending the same file.
>
> On the client side, figure out how many chunks you need to send the
> file, then split the file data up between that many instances of
> your Thrift class. Iterate over the Thrift classes and send each to
> the server.
>
> On the server side, receive the sendFile messages. Use the unique
> cookie from within the structure to sort the structures you receive
> into a separate bin for each file you are receiving
> (std::map<std::string, std::vector<YourThriftStructType *> > would
> probably work). When you have received all n chunks for a file
> (which you can tell because of the metadata), then use the collected
> Thrift structures to reconstitute the file on the server side. The
> server method can return a value that tells whether or not it thinks
> the transmission has completed. This approach lets you field send
> messages from many clients simultaneously with a single server instance.
>
> That's my first cut. It probably requires some refinement. ;-)
>
> - Rush
>
>
> On Mar 5, 2014, at 1:45 PM, Sachith Withana wrote:
>
> > Thanks Henrique and Jens,
> >
> > Ideally we'd like to support multiple Gigabytes of data.
> > As you said, we will potentially end up having a server to do file
> > uploading and downloading through URLs for Big files.
> >
> > But we'd like to support small file (~10MB) uploads using the Thrift
> > interface at least for now.
> >
> > What would be the best approach in achieving that?
> >
> >
> > On Wed, Mar 5, 2014 at 4:32 PM, Jens Geyer <je...@hotmail.com>
wrote:
> >
> >> [...] faster to send a URL and download it separately, [...]
> >>>
> >>
> >> Ack, I was about to write the same.
> >>
> >>
> >>
> >> -----Ursprüngliche Nachricht----- From: Henrique Mendonça
> >> Sent: Wednesday, March 5, 2014 10:24 PM
> >> To: dev@thrift.apache.org
> >> Subject: Re: File Upload through thrift
> >>
> >>
> >> Hi,
> >> You can do this, how big are your files? You might find that some
> >> implementations do have a size limit, not sure about Java, though.
> >> Anyway, is generally faster to send a URL and download it separately,

if it
> >> fits you...
> >>
> >>
> >> On 5 March 2014 20:56, Sachith Withana <sw...@gmail.com> wrote:
> >>
> >> I'm planning on converting the file into binary type and then do the
> >>> transfer to the server.
> >>>
> >>> What are the possible drawbacks of using this?
> >>>
> >>> This feature is for Apache Airavata. We are dealing with multiple
language
> >>> clients and a main Java Server.
> >>> We have to upload ( and download) large files too.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 5, 2014 at 1:47 PM, Sachith Withana
<sw...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I need to transfer files between the Thrift clients and Servers.
> >>>> There can be large files to be transferred.
> >>>>
> >>>> Can someone please suggest a way of obtaining this?
> >>>>
> >>>> --
> >>>> Thanks,
> >>>> Sachith Withana
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Sachith Withana
> >>>
> >>>
> >>
> >
> >
> > --
> > Thanks,
> > Sachith Withana
>

Re: list

Posted by Roger Meier <ro...@bufferoverflow.ch>.
I like the compiler warning idea and it can be removed as soon as the  
implementations are more efficient.
-roger

Quoting Ben Craig <be...@apache.org>:

> Alternatively, list<byte> could be made efficient.  That would likely have
>
> some source compatibility (but not network compatibility) ramifications.
> The other numeric list types are more difficult because of endianness.
> Those can't be made particularly efficient without authoring a
> TLittleEndianBinaryProtocol.
>
> I'm fine with a warning at the .idl level.  I don't know if other
> languages have perf issues with list<byte>, but C++ does.  If list<byte>
> and string are about the same for other languages, then it may make sense
> as a c++ only warning.
>
> Jens Geyer <je...@hotmail.com> wrote on 03/06/2014 02:06:54 AM:
>
>> From: Jens Geyer <je...@hotmail.com>
>> To: "dev@thrift.apache.org" <de...@thrift.apache.org>,
>> Date: 03/06/2014 02:07 AM
>> Subject: list<byte>
>>
>> How about a Thrift compiler warning whenever a list<byte> occurs in
>> the IDL? Seems to be a common beginner's trap, I have seen that
>> quite a few times now.
>>
>> Opinions?
>> ________________________
> ________
>> Von: Ben Craig
>> Gesendet: 05.03.2014 23:40
>> An: dev@thrift.apache.org
>> Betreff: Re: File Upload through thrift
>>
>> Don't use list<byte>.  Large list performance is pretty bad, because
>> elements tend to be serialized one at a time.  Use 'binary' or 'string'
>> instead.  You also get significantly fewer copies this way.
>>
>> Rush Manbert <ru...@manbert.com> wrote on 03/05/2014 04:31:47 PM:
>>
>> > From: Rush Manbert <ru...@manbert.com>
>> > To: dev@thrift.apache.org,
>> > Date: 03/05/2014 04:32 PM
>> > Subject: Re: File Upload through thrift
>> >
>> > Hi Sachith,
>> >
>> > I would define a thrift struct that has metadata about the file,
>> > followed by a member of type list<byte> that stores the file data. I
>> > would also define the metadata so it can support transmitting the
>> > file in chunks (data that says this is chunk m of n, for instance),
>> > where each chunk is sent in a separate message to the server. I
>> > would also include some unique cookie value so you can differentiate
>> > between two client processes sending the same file.
>> >
>> > On the client side, figure out how many chunks you need to send the
>> > file, then split the file data up between that many instances of
>> > your Thrift class. Iterate over the Thrift classes and send each to
>> > the server.
>> >
>> > On the server side, receive the sendFile messages. Use the unique
>> > cookie from within the structure to sort the structures you receive
>> > into a separate bin for each file you are receiving
>> > (std::map<std::string, std::vector<YourThriftStructType *> > would
>> > probably work). When you have received all n chunks for a file
>> > (which you can tell because of the metadata), then use the collected
>> > Thrift structures to reconstitute the file on the server side. The
>> > server method can return a value that tells whether or not it thinks
>> > the transmission has completed. This approach lets you field send
>> > messages from many clients simultaneously with a single server
> instance.
>> >
>> > That's my first cut. It probably requires some refinement. ;-)
>> >
>> > - Rush
>> >
>> >
>> > On Mar 5, 2014, at 1:45 PM, Sachith Withana wrote:
>> >
>> > > Thanks Henrique and Jens,
>> > >
>> > > Ideally we'd like to support multiple Gigabytes of data.
>> > > As you said, we will potentially end up having a server to do file
>> > > uploading and downloading through URLs for Big files.
>> > >
>> > > But we'd like to support small file (~10MB) uploads using the Thrift
>> > > interface at least for now.
>> > >
>> > > What would be the best approach in achieving that?
>> > >
>> > >
>> > > On Wed, Mar 5, 2014 at 4:32 PM, Jens Geyer <je...@hotmail.com>
>> wrote:
>> > >
>> > >> [...] faster to send a URL and download it separately, [...]
>> > >>>
>> > >>
>> > >> Ack, I was about to write the same.
>> > >>
>> > >>
>> > >>
>> > >> -----Ursprüngliche Nachricht----- From: Henrique Mendonça
>> > >> Sent: Wednesday, March 5, 2014 10:24 PM
>> > >> To: dev@thrift.apache.org
>> > >> Subject: Re: File Upload through thrift
>> > >>
>> > >>
>> > >> Hi,
>> > >> You can do this, how big are your files? You might find that some
>> > >> implementations do have a size limit, not sure about Java, though.
>> > >> Anyway, is generally faster to send a URL and download it
> separately,
>>
>> if it
>> > >> fits you...
>> > >>
>> > >>
>> > >> On 5 March 2014 20:56, Sachith Withana <sw...@gmail.com> wrote:
>> > >>
>> > >> I'm planning on converting the file into binary type and then do
> the
>> > >>> transfer to the server.
>> > >>>
>> > >>> What are the possible drawbacks of using this?
>> > >>>
>> > >>> This feature is for Apache Airavata. We are dealing with multiple
>> language
>> > >>> clients and a main Java Server.
>> > >>> We have to upload ( and download) large files too.
>> > >>>
>> > >>> Thanks.
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Wed, Mar 5, 2014 at 1:47 PM, Sachith Withana
>> <sw...@gmail.com>
>> > >>> wrote:
>> > >>>
>> > >>>> Hi all,
>> > >>>>
>> > >>>> I need to transfer files between the Thrift clients and Servers.
>> > >>>> There can be large files to be transferred.
>> > >>>>
>> > >>>> Can someone please suggest a way of obtaining this?
>> > >>>>
>> > >>>> --
>> > >>>> Thanks,
>> > >>>> Sachith Withana
>> > >>>>
>> > >>>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Thanks,
>> > >>> Sachith Withana
>> > >>>
>> > >>>
>> > >>
>> > >
>> > >
>> > > --
>> > > Thanks,
>> > > Sachith Withana
>> >



Re: list

Posted by Ben Craig <be...@apache.org>.
Alternatively, list<byte> could be made efficient.  That would likely have 

some source compatibility (but not network compatibility) ramifications. 
The other numeric list types are more difficult because of endianness. 
Those can't be made particularly efficient without authoring a 
TLittleEndianBinaryProtocol.

I'm fine with a warning at the .idl level.  I don't know if other 
languages have perf issues with list<byte>, but C++ does.  If list<byte> 
and string are about the same for other languages, then it may make sense 
as a c++ only warning.

Jens Geyer <je...@hotmail.com> wrote on 03/06/2014 02:06:54 AM:

> From: Jens Geyer <je...@hotmail.com>
> To: "dev@thrift.apache.org" <de...@thrift.apache.org>, 
> Date: 03/06/2014 02:07 AM
> Subject: list<byte>
> 
> How about a Thrift compiler warning whenever a list<byte> occurs in 
> the IDL? Seems to be a common beginner's trap, I have seen that 
> quite a few times now.
> 
> Opinions?
> ________________________
________
> Von: Ben Craig
> Gesendet: 05.03.2014 23:40
> An: dev@thrift.apache.org
> Betreff: Re: File Upload through thrift
> 
> Don't use list<byte>.  Large list performance is pretty bad, because
> elements tend to be serialized one at a time.  Use 'binary' or 'string'
> instead.  You also get significantly fewer copies this way.
> 
> Rush Manbert <ru...@manbert.com> wrote on 03/05/2014 04:31:47 PM:
> 
> > From: Rush Manbert <ru...@manbert.com>
> > To: dev@thrift.apache.org,
> > Date: 03/05/2014 04:32 PM
> > Subject: Re: File Upload through thrift
> >
> > Hi Sachith,
> >
> > I would define a thrift struct that has metadata about the file,
> > followed by a member of type list<byte> that stores the file data. I
> > would also define the metadata so it can support transmitting the
> > file in chunks (data that says this is chunk m of n, for instance),
> > where each chunk is sent in a separate message to the server. I
> > would also include some unique cookie value so you can differentiate
> > between two client processes sending the same file.
> >
> > On the client side, figure out how many chunks you need to send the
> > file, then split the file data up between that many instances of
> > your Thrift class. Iterate over the Thrift classes and send each to
> > the server.
> >
> > On the server side, receive the sendFile messages. Use the unique
> > cookie from within the structure to sort the structures you receive
> > into a separate bin for each file you are receiving
> > (std::map<std::string, std::vector<YourThriftStructType *> > would
> > probably work). When you have received all n chunks for a file
> > (which you can tell because of the metadata), then use the collected
> > Thrift structures to reconstitute the file on the server side. The
> > server method can return a value that tells whether or not it thinks
> > the transmission has completed. This approach lets you field send
> > messages from many clients simultaneously with a single server 
instance.
> >
> > That's my first cut. It probably requires some refinement. ;-)
> >
> > - Rush
> >
> >
> > On Mar 5, 2014, at 1:45 PM, Sachith Withana wrote:
> >
> > > Thanks Henrique and Jens,
> > >
> > > Ideally we'd like to support multiple Gigabytes of data.
> > > As you said, we will potentially end up having a server to do file
> > > uploading and downloading through URLs for Big files.
> > >
> > > But we'd like to support small file (~10MB) uploads using the Thrift
> > > interface at least for now.
> > >
> > > What would be the best approach in achieving that?
> > >
> > >
> > > On Wed, Mar 5, 2014 at 4:32 PM, Jens Geyer <je...@hotmail.com>
> wrote:
> > >
> > >> [...] faster to send a URL and download it separately, [...]
> > >>>
> > >>
> > >> Ack, I was about to write the same.
> > >>
> > >>
> > >>
> > >> -----Ursprüngliche Nachricht----- From: Henrique Mendonça
> > >> Sent: Wednesday, March 5, 2014 10:24 PM
> > >> To: dev@thrift.apache.org
> > >> Subject: Re: File Upload through thrift
> > >>
> > >>
> > >> Hi,
> > >> You can do this, how big are your files? You might find that some
> > >> implementations do have a size limit, not sure about Java, though.
> > >> Anyway, is generally faster to send a URL and download it 
separately,
> 
> if it
> > >> fits you...
> > >>
> > >>
> > >> On 5 March 2014 20:56, Sachith Withana <sw...@gmail.com> wrote:
> > >>
> > >> I'm planning on converting the file into binary type and then do 
the
> > >>> transfer to the server.
> > >>>
> > >>> What are the possible drawbacks of using this?
> > >>>
> > >>> This feature is for Apache Airavata. We are dealing with multiple
> language
> > >>> clients and a main Java Server.
> > >>> We have to upload ( and download) large files too.
> > >>>
> > >>> Thanks.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Mar 5, 2014 at 1:47 PM, Sachith Withana
> <sw...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> I need to transfer files between the Thrift clients and Servers.
> > >>>> There can be large files to be transferred.
> > >>>>
> > >>>> Can someone please suggest a way of obtaining this?
> > >>>>
> > >>>> --
> > >>>> Thanks,
> > >>>> Sachith Withana
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Thanks,
> > >>> Sachith Withana
> > >>>
> > >>>
> > >>
> > >
> > >
> > > --
> > > Thanks,
> > > Sachith Withana
> >