You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Zee <co...@gmail.com> on 2010/11/15 19:35:26 UTC

Java version of TFileTransport

Hi there,

I am trying to find a way to serialize some Flume events to file via
Thrift's C++ api and read it back from Java api. I tried the following
combinations and found that they don't work:

c++: ( TBinaryProtocol ( TFileTransport ) )
java: ( TBinaryProtocol ( TIOStreamTransport ) )

c++: ( TBinaryProtocol ( TFileTransport ) )
java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )

Is there any solution out there without having to apply the patch in (
https://issues.apache.org/jira/browse/THRIFT-377)? Thanks!

Codano

RE: Java version of TFileTransport

Posted by Mark Slee <ms...@facebook.com>.
TFileTransport chunks up the file on disk so that there are known "good" boundaries. If part of the file gets corrupted or partially written, you know you will always be able to seek to some offset (i.e. a multiple of 200MB or something) and resume reading from the beginning of a record.

I haven't been in the TFileTransport code in ages, so I'm not entirely sure on the specifics. But that's the general gist - TfileTransport does include special formatting in the file to aid in recoverability.

We really should change all the naming here.
TFileTransport should be TChunkedFileTransport or TRecoverableFileTransport
TSimpleFileTransport should be TFileTransport

Cheers,
mcslee

From: Codano [mailto:codano@gmail.com]
Sent: Monday, November 15, 2010 3:45 PM
To: user@thrift.apache.org; bryan@rapleaf.com; Adam Simpkins; Mark Slee
Subject: Re: Java version of TFileTransport

Mark, Adam:

I tried writing with TSimpleFileTransport and reading with TIOStreamTransport and indeed it worked. What benefit does TFileTransport offer compared to TSimpleFileTransport?

Bryan:

When I used the following combo, the Java code simply reads 4 bytes from the FileInputStream (verified via FIS::available()) and then leave the fields of the record uninitialized. There was no exception nor any other error condition flagged.

c++: ( TBinaryProtocol ( TFileTransport ) )

java: ( TBinaryProtocol ( TIOStreamTransport ) )

c++: ( TBinaryProtocol ( TFileTransport ) )

java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )


Re: Java version of TFileTransport

Posted by Codano <co...@gmail.com>.
Mark, Adam:

I tried writing with TSimpleFileTransport and reading with
TIOStreamTransport and indeed it worked. What benefit does TFileTransport
offer compared to TSimpleFileTransport?

Bryan:

When I used the following combo, the Java code simply reads 4 bytes from the
FileInputStream (verified via FIS::available()) and then leave the fields of
the record uninitialized. There was no exception nor any other error
condition flagged.

c++: ( TBinaryProtocol ( TFileTransport ) )
java: ( TBinaryProtocol ( TIOStreamTransport ) )

c++: ( TBinaryProtocol ( TFileTransport ) )
java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )

Re: Java version of TFileTransport

Posted by Codano <co...@gmail.com>.
Does TFileTransport have better protection against partial write of a record
in case the writer dies unexpectedly?

On Mon, Nov 15, 2010 at 11:53 AM, Adam Simpkins <si...@facebook.com>wrote:

> You could try using TFDTransport in C++ instead of TFileTransport.
> TFDTransport writes directly to a file descriptor, TFileTransport adds
> additional header information around each message that
> TIOStreamTransport won't be able understand.
>
> --
> Adam Simpkins
> simpkins@facebook.com
>
> On Mon, Nov 15, 2010 at 10:35:26AM -0800, Zee wrote:
> > Hi there,
> >
> > I am trying to find a way to serialize some Flume events to file via
> > Thrift's C++ api and read it back from Java api. I tried the following
> > combinations and found that they don't work:
> >
> > c++: ( TBinaryProtocol ( TFileTransport ) )
> > java: ( TBinaryProtocol ( TIOStreamTransport ) )
> >
> > c++: ( TBinaryProtocol ( TFileTransport ) )
> > java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )
> >
> > Is there any solution out there without having to apply the patch in (
> > https://issues.apache.org/jira/browse/THRIFT-377)? Thanks!
> >
> > Codano
>

Re: Java version of TFileTransport

Posted by Adam Simpkins <si...@facebook.com>.
You could try using TFDTransport in C++ instead of TFileTransport.
TFDTransport writes directly to a file descriptor, TFileTransport adds
additional header information around each message that
TIOStreamTransport won't be able understand.

-- 
Adam Simpkins
simpkins@facebook.com

On Mon, Nov 15, 2010 at 10:35:26AM -0800, Zee wrote:
> Hi there,
> 
> I am trying to find a way to serialize some Flume events to file via
> Thrift's C++ api and read it back from Java api. I tried the following
> combinations and found that they don't work:
> 
> c++: ( TBinaryProtocol ( TFileTransport ) )
> java: ( TBinaryProtocol ( TIOStreamTransport ) )
> 
> c++: ( TBinaryProtocol ( TFileTransport ) )
> java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )
> 
> Is there any solution out there without having to apply the patch in (
> https://issues.apache.org/jira/browse/THRIFT-377)? Thanks!
> 
> Codano

RE: Java version of TFileTransport

Posted by Mark Slee <ms...@facebook.com>.
The issue here is of opaque naming. TFileTransport in C++ is actually a more advanced file transport which does chunking and writes boundary headers.

This should work fine if you use the C++ TSimpleFileTransport instead.

Cheers,
mcslee

-----Original Message-----
From: Bryan Duxbury [mailto:bryan@rapleaf.com] 
Sent: Monday, November 15, 2010 10:39 AM
To: user@thrift.apache.org
Subject: Re: Java version of TFileTransport

It would be helpful if you could provide stack traces or error messages.

On Mon, Nov 15, 2010 at 10:35 AM, Zee <co...@gmail.com> wrote:

> Hi there,
>
> I am trying to find a way to serialize some Flume events to file via
> Thrift's C++ api and read it back from Java api. I tried the following
> combinations and found that they don't work:
>
> c++: ( TBinaryProtocol ( TFileTransport ) )
> java: ( TBinaryProtocol ( TIOStreamTransport ) )
>
> c++: ( TBinaryProtocol ( TFileTransport ) )
> java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )
>
> Is there any solution out there without having to apply the patch in (
> https://issues.apache.org/jira/browse/THRIFT-377)? Thanks!
>
> Codano
>

Re: Java version of TFileTransport

Posted by Bryan Duxbury <br...@rapleaf.com>.
It would be helpful if you could provide stack traces or error messages.

On Mon, Nov 15, 2010 at 10:35 AM, Zee <co...@gmail.com> wrote:

> Hi there,
>
> I am trying to find a way to serialize some Flume events to file via
> Thrift's C++ api and read it back from Java api. I tried the following
> combinations and found that they don't work:
>
> c++: ( TBinaryProtocol ( TFileTransport ) )
> java: ( TBinaryProtocol ( TIOStreamTransport ) )
>
> c++: ( TBinaryProtocol ( TFileTransport ) )
> java: ( TBinaryProtocol ( TFramedTransport ( TIOStreamTransport ) ) )
>
> Is there any solution out there without having to apply the patch in (
> https://issues.apache.org/jira/browse/THRIFT-377)? Thanks!
>
> Codano
>