You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/06/24 06:21:29 UTC

Adding large text blob causes read timeout...

I have a table with a schema mostly of small fields.  About 30 of them.

The primary key is:

    primary key( bucket, sequence )

… I have 100 buckets and the idea is that sequence is ever increasing.
 This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.

I'm running

SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence ASC
LIMIT 1000;

… using the have driver.

If I add ALL the fields, except one, so 29 fields, the query is fast.  Only
129ms….

However, if I add the 'html' field, which is snapshot of HTML obvious, the
query times out…

I'm going to add tracing and try to track it down further, but I suspect
I'm doing something stupid.

Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?

cqlsh doesn't time out … it actually works fine but it uses 100% CPU while
writing out the data so it's not a good comparison unfortunately


ception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
 at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
 at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
 at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried:
dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
 at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Adding large text blob causes read timeout...

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Can you do you query in the cli after setting "tracing on"?


On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan <do...@gmail.com> wrote:

> Yes but adding the extra one ends up by * 1000. The limit in CQL3
> specifies the number of logical rows, not the number of physical columns in
> the storage engine
> Le 24 juin 2014 08:30, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>
> oh.. the difference between the the ONE field and the remaining 29 is
>> massive.
>>
>> It's like 200ms for just the 29 columns.. adding the extra one cause it
>> to timeout .. > 5000ms...
>>
>>
>> On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Don't forget that when you do the Select with limit set to 1000,
>>> Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
>>> logical row).
>>>
>>> Adding one extra big html column may be too much and cause timeout. Try
>>> to:
>>>
>>> 1. Select only the big html only
>>> 2. Or reduce the limit incrementally until no timeout
>>> Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>>>
>>> I have a table with a schema mostly of small fields.  About 30 of them.
>>>>
>>>> The primary key is:
>>>>
>>>>     primary key( bucket, sequence )
>>>>
>>>> … I have 100 buckets and the idea is that sequence is ever increasing.
>>>>  This way I can read from bucket zero, and everything after sequence N and
>>>> get all the writes ordered by time.
>>>>
>>>> I'm running
>>>>
>>>> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
>>>> ASC LIMIT 1000;
>>>>
>>>> … using the have driver.
>>>>
>>>> If I add ALL the fields, except one, so 29 fields, the query is fast.
>>>>  Only 129ms….
>>>>
>>>> However, if I add the 'html' field, which is snapshot of HTML obvious,
>>>> the query times out…
>>>>
>>>> I'm going to add tracing and try to track it down further, but I
>>>> suspect I'm doing something stupid.
>>>>
>>>> Is it going to burn me that the data is UTF8 encoded? I can't image
>>>> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
>>>> something silly under the covers?
>>>>
>>>> cqlsh doesn't time out … it actually works fine but it uses 100% CPU
>>>> while writing out the data so it's not a good comparison unfortunately
>>>>
>>>>
>>>> ception in thread "main"
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (tried: ...:9042
>>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>>>  at
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>>> at
>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>>>>  at
>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
>>>> at
>>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>>>>  at
>>>> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
>>>> Caused by:
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
>>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>>>  at
>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>>> at
>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>  at java.lang.Thread.run(Thread.java:724)
>>>>
>>>>
>>>> --
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> Skype: *burtonator*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>> <http://spinn3r.com>
>>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations
>>>> are people.
>>>>
>>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Re: Adding large text blob causes read timeout...

Posted by DuyHai Doan <do...@gmail.com>.
Yes but adding the extra one ends up by * 1000. The limit in CQL3 specifies
the number of logical rows, not the number of physical columns in the
storage engine
Le 24 juin 2014 08:30, "Kevin Burton" <bu...@spinn3r.com> a écrit :

> oh.. the difference between the the ONE field and the remaining 29 is
> massive.
>
> It's like 200ms for just the 29 columns.. adding the extra one cause it to
> timeout .. > 5000ms...
>
>
> On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan <do...@gmail.com>
> wrote:
>
>> Don't forget that when you do the Select with limit set to 1000,
>> Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
>> logical row).
>>
>> Adding one extra big html column may be too much and cause timeout. Try
>> to:
>>
>> 1. Select only the big html only
>> 2. Or reduce the limit incrementally until no timeout
>> Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>>
>> I have a table with a schema mostly of small fields.  About 30 of them.
>>>
>>> The primary key is:
>>>
>>>     primary key( bucket, sequence )
>>>
>>> … I have 100 buckets and the idea is that sequence is ever increasing.
>>>  This way I can read from bucket zero, and everything after sequence N and
>>> get all the writes ordered by time.
>>>
>>> I'm running
>>>
>>> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
>>> ASC LIMIT 1000;
>>>
>>> … using the have driver.
>>>
>>> If I add ALL the fields, except one, so 29 fields, the query is fast.
>>>  Only 129ms….
>>>
>>> However, if I add the 'html' field, which is snapshot of HTML obvious,
>>> the query times out…
>>>
>>> I'm going to add tracing and try to track it down further, but I suspect
>>> I'm doing something stupid.
>>>
>>> Is it going to burn me that the data is UTF8 encoded? I can't image
>>> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
>>> something silly under the covers?
>>>
>>> cqlsh doesn't time out … it actually works fine but it uses 100% CPU
>>> while writing out the data so it's not a good comparison unfortunately
>>>
>>>
>>> ception in thread "main"
>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>> tried for query failed (tried: ...:9042
>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>>  at
>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>> at
>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>>>  at
>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
>>> at
>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>>>  at
>>> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
>>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>>> All host(s) tried for query failed (tried:
>>> dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>>  at
>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>> at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>  at java.lang.Thread.run(Thread.java:724)
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> Skype: *burtonator*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations
>>> are people.
>>>
>>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>

Re: Adding large text blob causes read timeout...

Posted by Kevin Burton <bu...@spinn3r.com>.
oh.. the difference between the the ONE field and the remaining 29 is
massive.

It's like 200ms for just the 29 columns.. adding the extra one cause it to
timeout .. > 5000ms...


On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan <do...@gmail.com> wrote:

> Don't forget that when you do the Select with limit set to 1000, Cassandra
> is actually fetching 1000 * 29 physical columns (29 fields per logical
> row).
>
> Adding one extra big html column may be too much and cause timeout. Try to:
>
> 1. Select only the big html only
> 2. Or reduce the limit incrementally until no timeout
> Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>
> I have a table with a schema mostly of small fields.  About 30 of them.
>>
>> The primary key is:
>>
>>     primary key( bucket, sequence )
>>
>> … I have 100 buckets and the idea is that sequence is ever increasing.
>>  This way I can read from bucket zero, and everything after sequence N and
>> get all the writes ordered by time.
>>
>> I'm running
>>
>> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
>> ASC LIMIT 1000;
>>
>> … using the have driver.
>>
>> If I add ALL the fields, except one, so 29 fields, the query is fast.
>>  Only 129ms….
>>
>> However, if I add the 'html' field, which is snapshot of HTML obvious,
>> the query times out…
>>
>> I'm going to add tracing and try to track it down further, but I suspect
>> I'm doing something stupid.
>>
>> Is it going to burn me that the data is UTF8 encoded? I can't image
>> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
>> something silly under the covers?
>>
>> cqlsh doesn't time out … it actually works fine but it uses 100% CPU
>> while writing out the data so it's not a good comparison unfortunately
>>
>>
>> ception in thread "main"
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (tried: ...:9042
>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>  at
>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>> at
>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>>  at
>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
>> at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>>  at
>> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>> All host(s) tried for query failed (tried:
>> dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>  at
>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>> at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:724)
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Adding large text blob causes read timeout...

Posted by DuyHai Doan <do...@gmail.com>.
Don't forget that when you do the Select with limit set to 1000, Cassandra
is actually fetching 1000 * 29 physical columns (29 fields per logical
row).

Adding one extra big html column may be too much and cause timeout. Try to:

1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :

> I have a table with a schema mostly of small fields.  About 30 of them.
>
> The primary key is:
>
>     primary key( bucket, sequence )
>
> … I have 100 buckets and the idea is that sequence is ever increasing.
>  This way I can read from bucket zero, and everything after sequence N and
> get all the writes ordered by time.
>
> I'm running
>
> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
> ASC LIMIT 1000;
>
> … using the have driver.
>
> If I add ALL the fields, except one, so 29 fields, the query is fast.
>  Only 129ms….
>
> However, if I add the 'html' field, which is snapshot of HTML obvious, the
> query times out…
>
> I'm going to add tracing and try to track it down further, but I suspect
> I'm doing something stupid.
>
> Is it going to burn me that the data is UTF8 encoded? I can't image
> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
> something silly under the covers?
>
> cqlsh doesn't time out … it actually works fine but it uses 100% CPU while
> writing out the data so it's not a good comparison unfortunately
>
>
> ception in thread "main"
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (tried: ...:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>  at
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
> at
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>  at
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
> at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>  at
> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (tried:
> dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>  at
> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
> at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:724)
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>