You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Eric Lubow <er...@gmail.com> on 2009/10/15 17:26:08 UTC

Thrift Perl API Timeout Issues

Using the Thrift Perl API into Cassandra, I am running into what is
endearingly referred to as the 4 bytes of doom:
 TSocket: timed out reading 4 bytes from localhost:9160

The script I am using is fairly simple.  I have a text file that has about
3.6 million lines that are formatted like:  foo@bar.com  1234

The Cassandra dataset is a single column family called Users in the Mailings
keyspace with a data layout of:
Users = {
    'foo@example.com': {
        email: 'foo@example.com',
        person_id: '123456',
        send_dates_2009-09-30: '2245',
        send_dates_2009-10-01: '2247',
    },
}
There are about 3.5 million rows in the Users column family and each row has
no more than 4 columns (listed above).  Some only have 3 (one of the
send_dates_YYYY-MM-DD isn't there).

The script parses it and then connects to Cassandra and does a get_slice and
counts the return values adding that to a hash:
     my ($value) = $client->get_slice(
         'Mailings',
         $email,
         Cassandra::ColumnParent->new({
                 column_family => 'Users',
             }),
         Cassandra::SlicePredicate->new({
                 slice_range => Cassandra::SliceRange->new({
                         start => 'send_dates_2009-09-29',
                         finish => 'send_dates_2009-10-30',
                     }),
             }),
         Cassandra::ConsistencyLevel::ONE
     );
     $counter{($#{$value} + 1)}++;

For the most part, this script times out after 1 minute or so. Replacing the
get_slice with a get_count, I can get it to about 2 million queries before I
get the timeout.  Replacing the get_slice with a get, I make it to about 2.5
million before I get the timeout.  The only way I could get it to run all
the way through was to add a 1/100 of a second sleep during every iteration.
 I was able to get the script to complete when I shut down everything else
on the machine (and it took 177m to complete).  But since this is a
semi-production machine, I had to turn everything back on afterwards.

So for poops and laughs (at the recommendation of jbellis), I rewrote the
script in Python and it has since run (using get_slice) 3 times fully
without timing out (approximately 130m in Python) with everything else
running on the machine.

My question is, having seen this same thing in the PHP API and it is my
understanding that the Perl API was based on the PHP API, could
http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here too?  Is
anyone else seeing this issue?  If so, have you gotten around it?

Thanks.

-e

Re: Thrift Perl API Timeout Issues

Posted by Jake Luciani <ja...@gmail.com>.

Perhaps we should add this to the thrift/Cassandra FAQ?


On Oct 16, 2009, at 9:36 AM, Simon Smith <si...@gmail.com> wrote:

> I don't have an opinion on the default timeout.  But in my experience
> with other applications, you want to consciously make a choice about
> what your timeout, based on your architecture and performance
> requirements.  You're much better off explicitly setting a timeout
> that will cause your transaction to finish in a time a little longer
> than you'd like and then either re-try or error out the transaction.
> An alternate approach is is to set a quick timeout, one that is just
> over the 99.?th percentile of transaction times, and then retry.  (But
> whatever you do, don't just retry endlessly, or you may end up with
> this terrible growing mess of transactions retrying.)
>
> In either case, it's a good idea to be monitoring the frequency of
> timeouts, so if they increase over the baseline you can track down the
> cause and fix it.
>
> Just my $0.02.
>
> Simon
>
> On Thu, Oct 15, 2009 at 11:33 PM, Eric Lubow <er...@gmail.com>  
> wrote:
>> So I ran the tests again twice with a huge timeout and it managed  
>> to run in
>> just under 3 hours both times.  So this issue is definitely related  
>> to the
>> timeouts.  It might be worth changing the default timeouts for Perl  
>> to match
>> the infinite timeouts for Python.  Thanks for the quick responses.
>> -e

Re: Thrift Perl API Timeout Issues

Posted by Eric Lubow <er...@gmail.com>.

Simon,
   I understand what you're saying and tend to agree with that philosophy.
 I think the issue has more to do with the undocumentedness (if thats not a
word, it should be) of the Perl Thrift/Cassandra API in general.  That is
something I hope to change in the near future.  Timeouts are definitely
something I am going to make sure gets noted when I write up some code
examples as well.

-e

On Fri, Oct 16, 2009 at 9:36 AM, Simon Smith <si...@gmail.com> wrote:

> I don't have an opinion on the default timeout.  But in my experience
> with other applications, you want to consciously make a choice about
> what your timeout, based on your architecture and performance
> requirements.  You're much better off explicitly setting a timeout
> that will cause your transaction to finish in a time a little longer
> than you'd like and then either re-try or error out the transaction.
> An alternate approach is is to set a quick timeout, one that is just
> over the 99.?th percentile of transaction times, and then retry.  (But
> whatever you do, don't just retry endlessly, or you may end up with
> this terrible growing mess of transactions retrying.)
>
> In either case, it's a good idea to be monitoring the frequency of
> timeouts, so if they increase over the baseline you can track down the
> cause and fix it.
>
> Just my $0.02.
>
> Simon
>
> On Thu, Oct 15, 2009 at 11:33 PM, Eric Lubow <er...@gmail.com> wrote:
> > So I ran the tests again twice with a huge timeout and it managed to run
> in
> > just under 3 hours both times.  So this issue is definitely related to
> the
> > timeouts.  It might be worth changing the default timeouts for Perl to
> match
> > the infinite timeouts for Python.  Thanks for the quick responses.
> > -e
>

Re: Thrift Perl API Timeout Issues

Posted by Simon Smith <si...@gmail.com>.

I don't have an opinion on the default timeout.  But in my experience
with other applications, you want to consciously make a choice about
what your timeout, based on your architecture and performance
requirements.  You're much better off explicitly setting a timeout
that will cause your transaction to finish in a time a little longer
than you'd like and then either re-try or error out the transaction.
An alternate approach is is to set a quick timeout, one that is just
over the 99.?th percentile of transaction times, and then retry.  (But
whatever you do, don't just retry endlessly, or you may end up with
this terrible growing mess of transactions retrying.)

In either case, it's a good idea to be monitoring the frequency of
timeouts, so if they increase over the baseline you can track down the
cause and fix it.

Just my $0.02.

Simon

On Thu, Oct 15, 2009 at 11:33 PM, Eric Lubow <er...@gmail.com> wrote:
> So I ran the tests again twice with a huge timeout and it managed to run in
> just under 3 hours both times.  So this issue is definitely related to the
> timeouts.  It might be worth changing the default timeouts for Perl to match
> the infinite timeouts for Python.  Thanks for the quick responses.
> -e

Re: Thrift Perl API Timeout Issues

Posted by Eric Lubow <er...@gmail.com>.

So I ran the tests again twice with a huge timeout and it managed to run in
just under 3 hours both times.  So this issue is definitely related to the
timeouts.  It might be worth changing the default timeouts for Perl to match
the infinite timeouts for Python.  Thanks for the quick responses.
-e

On Thu, Oct 15, 2009 at 2:48 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Are you also using Perl?
>
> On Thu, Oct 15, 2009 at 1:38 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > I see a similar thing happening all the time.  I get around it by closing
> > the current connection and reconnecting after a sleep.  Although I am
> able
> > to do quite a few inserts between errors, so I'm not sure if it's the
> > exact problem.
> >
> > -Anthony
>

Re: Thrift Perl API Timeout Issues

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

Yes, I use perl to do ETL type stuff, dump mysql, postgres tables and load
them into cassandra.  Although I also see these errors in front end PHP systems
which connect to cassandra.

-Anthony

On Thu, Oct 15, 2009 at 01:48:14PM -0500, Jonathan Ellis wrote:
> Are you also using Perl?
> 
> On Thu, Oct 15, 2009 at 1:38 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > I see a similar thing happening all the time.  I get around it by closing
> > the current connection and reconnecting after a sleep.  Although I am able
> > to do quite a few inserts between errors, so I'm not sure if it's the
> > exact problem.
> >
> > -Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Thrift Perl API Timeout Issues

Posted by Jonathan Ellis <jb...@gmail.com>.

Are you also using Perl?

On Thu, Oct 15, 2009 at 1:38 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> I see a similar thing happening all the time.  I get around it by closing
> the current connection and reconnecting after a sleep.  Although I am able
> to do quite a few inserts between errors, so I'm not sure if it's the
> exact problem.
>
> -Anthony

Re: Thrift Perl API Timeout Issues

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

I see a similar thing happening all the time.  I get around it by closing
the current connection and reconnecting after a sleep.  Although I am able
to do quite a few inserts between errors, so I'm not sure if it's the
exact problem.

-Anthony

On Thu, Oct 15, 2009 at 11:26:08AM -0400, Eric Lubow wrote:
> Using the Thrift Perl API into Cassandra, I am running into what is
> endearingly referred to as the 4 bytes of doom:
>  TSocket: timed out reading 4 bytes from localhost:9160
> 
> The script I am using is fairly simple.  I have a text file that has about
> 3.6 million lines that are formatted like:  foo@bar.com  1234
> 
> The Cassandra dataset is a single column family called Users in the Mailings
> keyspace with a data layout of:
> Users = {
>     'foo@example.com': {
>         email: 'foo@example.com',
>         person_id: '123456',
>         send_dates_2009-09-30: '2245',
>         send_dates_2009-10-01: '2247',
>     },
> }
> There are about 3.5 million rows in the Users column family and each row has
> no more than 4 columns (listed above).  Some only have 3 (one of the
> send_dates_YYYY-MM-DD isn't there).
> 
> The script parses it and then connects to Cassandra and does a get_slice and
> counts the return values adding that to a hash:
>      my ($value) = $client->get_slice(
>          'Mailings',
>          $email,
>          Cassandra::ColumnParent->new({
>                  column_family => 'Users',
>              }),
>          Cassandra::SlicePredicate->new({
>                  slice_range => Cassandra::SliceRange->new({
>                          start => 'send_dates_2009-09-29',
>                          finish => 'send_dates_2009-10-30',
>                      }),
>              }),
>          Cassandra::ConsistencyLevel::ONE
>      );
>      $counter{($#{$value} + 1)}++;
> 
> For the most part, this script times out after 1 minute or so. Replacing the
> get_slice with a get_count, I can get it to about 2 million queries before I
> get the timeout.  Replacing the get_slice with a get, I make it to about 2.5
> million before I get the timeout.  The only way I could get it to run all
> the way through was to add a 1/100 of a second sleep during every iteration.
>  I was able to get the script to complete when I shut down everything else
> on the machine (and it took 177m to complete).  But since this is a
> semi-production machine, I had to turn everything back on afterwards.
> 
> So for poops and laughs (at the recommendation of jbellis), I rewrote the
> script in Python and it has since run (using get_slice) 3 times fully
> without timing out (approximately 130m in Python) with everything else
> running on the machine.
> 
> My question is, having seen this same thing in the PHP API and it is my
> understanding that the Perl API was based on the PHP API, could
> http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here too?  Is
> anyone else seeing this issue?  If so, have you gotten around it?
> 
> Thanks.
> 
> -e

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Thrift Perl API Timeout Issues

Posted by Jake Luciani <ja...@gmail.com>.

What happens if you set it to 100000?



On Oct 15, 2009, at 11:48 AM, Eric Lubow <er...@gmail.com> wrote:

> My connection section of the script is here:
>  # Connect to the database
>  my $socket = new Thrift::Socket('localhost',9160);
>     $socket->setSendTimeout(2500);
>     $socket->setRecvTimeout(7500);
>  my $transport = new Thrift::BufferedTransport($socket,2048,2048);
>  my $protocol = new Thrift::BinaryProtocol($transport);
>  my $client = Cassandra::CassandraClient->new($protocol);
>
> I even tried it with combinations of 1024 as the size and 1000 as  
> the SendTimeout and 5000 as the RecvTimeout.
>
> -e
>
> On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <ja...@gmail.com>  
> wrote:
> I think it's 100ms. I need to increase it to match python I guess.
>
> Sent from my iPhone
>
>
> On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jb...@gmail.com>  
> wrote:
>
> What is the default?
>
> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <ja...@gmail.com>  
> wrote:
> You need to call
> $socket->setRecvTimeout()
> With a higher number in ms.
>
>
> On Oct 15, 2009, at 11:26 AM, Eric Lubow <er...@gmail.com> wrote:
>
> Using the Thrift Perl API into Cassandra, I am running into what is
> endearingly referred to as the 4 bytes of doom:
>  TSocket: timed out reading 4 bytes from localhost:9160
> The script I am using is fairly simple.  I have a text file that has  
> about
> 3.6 million lines that are formatted like:  foo@bar.com  1234
> The Cassandra dataset is a single column family called Users in the  
> Mailings
> keyspace with a data layout of:
> Users = {
>    'foo@example.com': {
>        email: 'foo@example.com',
>        person_id: '123456',
>        send_dates_2009-09-30: '2245',
>        send_dates_2009-10-01: '2247',
>    },
> }
> There are about 3.5 million rows in the Users column family and each  
> row has
> no more than 4 columns (listed above).  Some only have 3 (one of the
> send_dates_YYYY-MM-DD isn't there).
> The script parses it and then connects to Cassandra and does a  
> get_slice and
> counts the return values adding that to a hash:
>     my ($value) = $client->get_slice(
>         'Mailings',
>         $email,
>         Cassandra::ColumnParent->new({
>                 column_family => 'Users',
>             }),
>         Cassandra::SlicePredicate->new({
>                 slice_range => Cassandra::SliceRange->new({
>                         start => 'send_dates_2009-09-29',
>                         finish => 'send_dates_2009-10-30',
>                     }),
>             }),
>         Cassandra::ConsistencyLevel::ONE
>     );
>     $counter{($#{$value} + 1)}++;
> For the most part, this script times out after 1 minute or so.  
> Replacing the
> get_slice with a get_count, I can get it to about 2 million queries  
> before I
> get the timeout.  Replacing the get_slice with a get, I make it to  
> about 2.5
> million before I get the timeout.  The only way I could get it to  
> run all
> the way through was to add a 1/100 of a second sleep during every  
> iteration.
>  I was able to get the script to complete when I shut down  
> everything else
> on the machine (and it took 177m to complete).  But since this is a
> semi-production machine, I had to turn everything back on afterwards.
> So for poops and laughs (at the recommendation of jbellis), I  
> rewrote the
> script in Python and it has since run (using get_slice) 3 times fully
> without timing out (approximately 130m in Python) with everything else
> running on the machine.
> My question is, having seen this same thing in the PHP API and it is  
> my
> understanding that the Perl API was based on the PHP API,
> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl  
> here
> too?  Is anyone else seeing this issue?  If so, have you gotten  
> around it?
> Thanks.
> -e
>

Re: Thrift Perl API Timeout Issues

Posted by Eric Lubow <er...@gmail.com>.

My connection section of the script is here: # Connect to the database
 my $socket = new Thrift::Socket('localhost',9160);
    $socket->setSendTimeout(2500);
    $socket->setRecvTimeout(7500);
 my $transport = new Thrift::BufferedTransport($socket,2048,2048);
 my $protocol = new Thrift::BinaryProtocol($transport);
 my $client = Cassandra::CassandraClient->new($protocol);

I even tried it with combinations of 1024 as the size and 1000 as the
SendTimeout and 5000 as the RecvTimeout.

-e

On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <ja...@gmail.com> wrote:

> I think it's 100ms. I need to increase it to match python I guess.
>
> Sent from my iPhone
>
>
> On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>  What is the default?
>>
>> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <ja...@gmail.com> wrote:
>>
>>> You need to call
>>> $socket->setRecvTimeout()
>>> With a higher number in ms.
>>>
>>>
>>> On Oct 15, 2009, at 11:26 AM, Eric Lubow <er...@gmail.com> wrote:
>>>
>>> Using the Thrift Perl API into Cassandra, I am running into what is
>>> endearingly referred to as the 4 bytes of doom:
>>>  TSocket: timed out reading 4 bytes from localhost:9160
>>> The script I am using is fairly simple.  I have a text file that has
>>> about
>>> 3.6 million lines that are formatted like:  foo@bar.com  1234
>>> The Cassandra dataset is a single column family called Users in the
>>> Mailings
>>> keyspace with a data layout of:
>>> Users = {
>>>    'foo@example.com': {
>>>        email: 'foo@example.com',
>>>        person_id: '123456',
>>>        send_dates_2009-09-30: '2245',
>>>        send_dates_2009-10-01: '2247',
>>>    },
>>> }
>>> There are about 3.5 million rows in the Users column family and each row
>>> has
>>> no more than 4 columns (listed above).  Some only have 3 (one of the
>>> send_dates_YYYY-MM-DD isn't there).
>>> The script parses it and then connects to Cassandra and does a get_slice
>>> and
>>> counts the return values adding that to a hash:
>>>     my ($value) = $client->get_slice(
>>>         'Mailings',
>>>         $email,
>>>         Cassandra::ColumnParent->new({
>>>                 column_family => 'Users',
>>>             }),
>>>         Cassandra::SlicePredicate->new({
>>>                 slice_range => Cassandra::SliceRange->new({
>>>                         start => 'send_dates_2009-09-29',
>>>                         finish => 'send_dates_2009-10-30',
>>>                     }),
>>>             }),
>>>         Cassandra::ConsistencyLevel::ONE
>>>     );
>>>     $counter{($#{$value} + 1)}++;
>>> For the most part, this script times out after 1 minute or so. Replacing
>>> the
>>> get_slice with a get_count, I can get it to about 2 million queries
>>> before I
>>> get the timeout.  Replacing the get_slice with a get, I make it to about
>>> 2.5
>>> million before I get the timeout.  The only way I could get it to run all
>>> the way through was to add a 1/100 of a second sleep during every
>>> iteration.
>>>  I was able to get the script to complete when I shut down everything
>>> else
>>> on the machine (and it took 177m to complete).  But since this is a
>>> semi-production machine, I had to turn everything back on afterwards.
>>> So for poops and laughs (at the recommendation of jbellis), I rewrote the
>>> script in Python and it has since run (using get_slice) 3 times fully
>>> without timing out (approximately 130m in Python) with everything else
>>> running on the machine.
>>> My question is, having seen this same thing in the PHP API and it is my
>>> understanding that the Perl API was based on the PHP API,
>>> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here
>>> too?  Is anyone else seeing this issue?  If so, have you gotten around
>>> it?
>>> Thanks.
>>> -e
>>>
>>

Re: Thrift Perl API Timeout Issues

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Oct 15, 2009 at 10:51 AM, Simon Smith <si...@gmail.com> wrote:
> While on the topic, I'm using the python Thrift interface - if I
> wanted to, how would I change the timeout? I currently do:
>
>  socket = TSocket.TSocket(host,port)
>
> If I wanted to change the timeout would I do something like:
>
> socket.setTimeout(timeout)

Yes, with timeout in ms.

(The default appears to be "never timeout.")

-Jonathan

Re: Thrift Perl API Timeout Issues

Posted by Simon Smith <si...@gmail.com>.

While on the topic, I'm using the python Thrift interface - if I
wanted to, how would I change the timeout? I currently do:

  socket = TSocket.TSocket(host,port)

If I wanted to change the timeout would I do something like:

socket.setTimeout(timeout)

or...?

Sorry if I should be able to see this by looking at the code - I'm new
to python.

Thanks,

Simon

On Thu, Oct 15, 2009 at 11:42 AM, Jake Luciani <ja...@gmail.com> wrote:
> I think it's 100ms. I need to increase it to match python I guess.
>
> Sent from my iPhone
>
> On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> What is the default?
>>
>> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <ja...@gmail.com> wrote:
>>>
>>> You need to call
>>> $socket->setRecvTimeout()
>>> With a higher number in ms.
>>>
>>>

Re: Thrift Perl API Timeout Issues

Posted by Jake Luciani <ja...@gmail.com>.

I think it's 100ms. I need to increase it to match python I guess.

Sent from my iPhone

On Oct 15, 2009, at 11:40 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> What is the default?
>
> On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <ja...@gmail.com>  
> wrote:
>> You need to call
>> $socket->setRecvTimeout()
>> With a higher number in ms.
>>
>>
>> On Oct 15, 2009, at 11:26 AM, Eric Lubow <er...@gmail.com>  
>> wrote:
>>
>> Using the Thrift Perl API into Cassandra, I am running into what is
>> endearingly referred to as the 4 bytes of doom:
>>  TSocket: timed out reading 4 bytes from localhost:9160
>> The script I am using is fairly simple.  I have a text file that  
>> has about
>> 3.6 million lines that are formatted like:  foo@bar.com  1234
>> The Cassandra dataset is a single column family called Users in the  
>> Mailings
>> keyspace with a data layout of:
>> Users = {
>>     'foo@example.com': {
>>         email: 'foo@example.com',
>>         person_id: '123456',
>>         send_dates_2009-09-30: '2245',
>>         send_dates_2009-10-01: '2247',
>>     },
>> }
>> There are about 3.5 million rows in the Users column family and  
>> each row has
>> no more than 4 columns (listed above).  Some only have 3 (one of the
>> send_dates_YYYY-MM-DD isn't there).
>> The script parses it and then connects to Cassandra and does a  
>> get_slice and
>> counts the return values adding that to a hash:
>>      my ($value) = $client->get_slice(
>>          'Mailings',
>>          $email,
>>          Cassandra::ColumnParent->new({
>>                  column_family => 'Users',
>>              }),
>>          Cassandra::SlicePredicate->new({
>>                  slice_range => Cassandra::SliceRange->new({
>>                          start => 'send_dates_2009-09-29',
>>                          finish => 'send_dates_2009-10-30',
>>                      }),
>>              }),
>>          Cassandra::ConsistencyLevel::ONE
>>      );
>>      $counter{($#{$value} + 1)}++;
>> For the most part, this script times out after 1 minute or so.  
>> Replacing the
>> get_slice with a get_count, I can get it to about 2 million queries  
>> before I
>> get the timeout.  Replacing the get_slice with a get, I make it to  
>> about 2.5
>> million before I get the timeout.  The only way I could get it to  
>> run all
>> the way through was to add a 1/100 of a second sleep during every  
>> iteration.
>>  I was able to get the script to complete when I shut down  
>> everything else
>> on the machine (and it took 177m to complete).  But since this is a
>> semi-production machine, I had to turn everything back on afterwards.
>> So for poops and laughs (at the recommendation of jbellis), I  
>> rewrote the
>> script in Python and it has since run (using get_slice) 3 times fully
>> without timing out (approximately 130m in Python) with everything  
>> else
>> running on the machine.
>> My question is, having seen this same thing in the PHP API and it  
>> is my
>> understanding that the Perl API was based on the PHP API,
>> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl  
>> here
>> too?  Is anyone else seeing this issue?  If so, have you gotten  
>> around it?
>> Thanks.
>> -e

Re: Thrift Perl API Timeout Issues

Posted by Jonathan Ellis <jb...@gmail.com>.

What is the default?

On Thu, Oct 15, 2009 at 10:37 AM, Jake Luciani <ja...@gmail.com> wrote:
> You need to call
> $socket->setRecvTimeout()
> With a higher number in ms.
>
>
> On Oct 15, 2009, at 11:26 AM, Eric Lubow <er...@gmail.com> wrote:
>
> Using the Thrift Perl API into Cassandra, I am running into what is
> endearingly referred to as the 4 bytes of doom:
>  TSocket: timed out reading 4 bytes from localhost:9160
> The script I am using is fairly simple.  I have a text file that has about
> 3.6 million lines that are formatted like:  foo@bar.com  1234
> The Cassandra dataset is a single column family called Users in the Mailings
> keyspace with a data layout of:
> Users = {
>     'foo@example.com': {
>         email: 'foo@example.com',
>         person_id: '123456',
>         send_dates_2009-09-30: '2245',
>         send_dates_2009-10-01: '2247',
>     },
> }
> There are about 3.5 million rows in the Users column family and each row has
> no more than 4 columns (listed above).  Some only have 3 (one of the
> send_dates_YYYY-MM-DD isn't there).
> The script parses it and then connects to Cassandra and does a get_slice and
> counts the return values adding that to a hash:
>      my ($value) = $client->get_slice(
>          'Mailings',
>          $email,
>          Cassandra::ColumnParent->new({
>                  column_family => 'Users',
>              }),
>          Cassandra::SlicePredicate->new({
>                  slice_range => Cassandra::SliceRange->new({
>                          start => 'send_dates_2009-09-29',
>                          finish => 'send_dates_2009-10-30',
>                      }),
>              }),
>          Cassandra::ConsistencyLevel::ONE
>      );
>      $counter{($#{$value} + 1)}++;
> For the most part, this script times out after 1 minute or so. Replacing the
> get_slice with a get_count, I can get it to about 2 million queries before I
> get the timeout.  Replacing the get_slice with a get, I make it to about 2.5
> million before I get the timeout.  The only way I could get it to run all
> the way through was to add a 1/100 of a second sleep during every iteration.
>  I was able to get the script to complete when I shut down everything else
> on the machine (and it took 177m to complete).  But since this is a
> semi-production machine, I had to turn everything back on afterwards.
> So for poops and laughs (at the recommendation of jbellis), I rewrote the
> script in Python and it has since run (using get_slice) 3 times fully
> without timing out (approximately 130m in Python) with everything else
> running on the machine.
> My question is, having seen this same thing in the PHP API and it is my
> understanding that the Perl API was based on the PHP API,
> could http://issues.apache.org/jira/browse/THRIFT-347 apply to Perl here
> too?  Is anyone else seeing this issue?  If so, have you gotten around it?
> Thanks.
> -e

Re: Thrift Perl API Timeout Issues

Posted by Jake Luciani <ja...@gmail.com>.

You need to call
$socket->setRecvTimeout()

With a higher number in ms.


On Oct 15, 2009, at 11:26 AM, Eric Lubow <er...@gmail.com> wrote:

> Using the Thrift Perl API into Cassandra, I am running into what is  
> endearingly referred to as the 4 bytes of doom:
>
>  TSocket: timed out reading 4 bytes from localhost:9160
>
> The script I am using is fairly simple.  I have a text file that has  
> about 3.6 million lines that are formatted like:  foo@bar.com  1234
>
> The Cassandra dataset is a single column family called Users in the  
> Mailings keyspace with a data layout of:
> Users = {
>     'foo@example.com': {
>         email: 'foo@example.com',
>         person_id: '123456',
>         send_dates_2009-09-30: '2245',
>         send_dates_2009-10-01: '2247',
>     },
> }
> There are about 3.5 million rows in the Users column family and each  
> row has no more than 4 columns (listed above).  Some only have 3  
> (one of the send_dates_YYYY-MM-DD isn't there).
>
> The script parses it and then connects to Cassandra and does a  
> get_slice and counts the return values adding that to a hash:
>      my ($value) = $client->get_slice(
>          'Mailings',
>          $email,
>          Cassandra::ColumnParent->new({
>                  column_family => 'Users',
>              }),
>          Cassandra::SlicePredicate->new({
>                  slice_range => Cassandra::SliceRange->new({
>                          start => 'send_dates_2009-09-29',
>                          finish => 'send_dates_2009-10-30',
>                      }),
>              }),
>          Cassandra::ConsistencyLevel::ONE
>      );
>      $counter{($#{$value} + 1)}++;
>
> For the most part, this script times out after 1 minute or so.  
> Replacing the get_slice with a get_count, I can get it to about 2  
> million queries before I get the timeout.  Replacing the get_slice  
> with a get, I make it to about 2.5 million before I get the  
> timeout.  The only way I could get it to run all the way through was  
> to add a 1/100 of a second sleep during every iteration.  I was able  
> to get the script to complete when I shut down everything else on  
> the machine (and it took 177m to complete).  But since this is a  
> semi-production machine, I had to turn everything back on afterwards.
>
> So for poops and laughs (at the recommendation of jbellis), I  
> rewrote the script in Python and it has since run (using get_slice)  
> 3 times fully without timing out (approximately 130m in Python) with  
> everything else running on the machine.
>
> My question is, having seen this same thing in the PHP API and it is  
> my understanding that the Perl API was based on the PHP API, could http://issues.apache.org/jira/browse/THRIFT-347 
>  apply to Perl here too?  Is anyone else seeing this issue?  If so,  
> have you gotten around it?
>
> Thanks.
>
> -e