You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/06/24 06:21:29 UTC
Adding large text blob causes read timeout...
I have a table with a schema mostly of small fields. About 30 of them.
The primary key is:
primary key( bucket, sequence )
… I have 100 buckets and the idea is that sequence is ever increasing.
This way I can read from bucket zero, and everything after sequence N and
get all the writes ordered by time.
I'm running
SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence ASC
LIMIT 1000;
… using the have driver.
If I add ALL the fields, except one, so 29 fields, the query is fast. Only
129ms….
However, if I add the 'html' field, which is snapshot of HTML obvious, the
query times out…
I'm going to add tracing and try to track it down further, but I suspect
I'm doing something stupid.
Is it going to burn me that the data is UTF8 encoded? I can't image
decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
something silly under the covers?
cqlsh doesn't time out … it actually works fine but it uses 100% CPU while
writing out the data so it's not a good comparison unfortunately
ception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (tried: ...:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
at
com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried:
dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
(com.datastax.driver.core.exceptions.DriverException: Timeout during read))
at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.
Re: Adding large text blob causes read timeout...
Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Can you do you query in the cli after setting "tracing on"?
On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan <do...@gmail.com> wrote:
> Yes but adding the extra one ends up by * 1000. The limit in CQL3
> specifies the number of logical rows, not the number of physical columns in
> the storage engine
> Le 24 juin 2014 08:30, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>
> oh.. the difference between the the ONE field and the remaining 29 is
>> massive.
>>
>> It's like 200ms for just the 29 columns.. adding the extra one cause it
>> to timeout .. > 5000ms...
>>
>>
>> On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Don't forget that when you do the Select with limit set to 1000,
>>> Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
>>> logical row).
>>>
>>> Adding one extra big html column may be too much and cause timeout. Try
>>> to:
>>>
>>> 1. Select only the big html only
>>> 2. Or reduce the limit incrementally until no timeout
>>> Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>>>
>>> I have a table with a schema mostly of small fields. About 30 of them.
>>>>
>>>> The primary key is:
>>>>
>>>> primary key( bucket, sequence )
>>>>
>>>> … I have 100 buckets and the idea is that sequence is ever increasing.
>>>> This way I can read from bucket zero, and everything after sequence N and
>>>> get all the writes ordered by time.
>>>>
>>>> I'm running
>>>>
>>>> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
>>>> ASC LIMIT 1000;
>>>>
>>>> … using the have driver.
>>>>
>>>> If I add ALL the fields, except one, so 29 fields, the query is fast.
>>>> Only 129ms….
>>>>
>>>> However, if I add the 'html' field, which is snapshot of HTML obvious,
>>>> the query times out…
>>>>
>>>> I'm going to add tracing and try to track it down further, but I
>>>> suspect I'm doing something stupid.
>>>>
>>>> Is it going to burn me that the data is UTF8 encoded? I can't image
>>>> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
>>>> something silly under the covers?
>>>>
>>>> cqlsh doesn't time out … it actually works fine but it uses 100% CPU
>>>> while writing out the data so it's not a good comparison unfortunately
>>>>
>>>>
>>>> ception in thread "main"
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (tried: ...:9042
>>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>>> at
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>>> at
>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>>>> at
>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
>>>> at
>>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>>>> at
>>>> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
>>>> Caused by:
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
>>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>>> at
>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>>> at
>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:724)
>>>>
>>>>
>>>> --
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> Skype: *burtonator*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>> <http://spinn3r.com>
>>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations
>>>> are people.
>>>>
>>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>
--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade
Re: Adding large text blob causes read timeout...
Posted by DuyHai Doan <do...@gmail.com>.
Yes but adding the extra one ends up by * 1000. The limit in CQL3 specifies
the number of logical rows, not the number of physical columns in the
storage engine
Le 24 juin 2014 08:30, "Kevin Burton" <bu...@spinn3r.com> a écrit :
> oh.. the difference between the the ONE field and the remaining 29 is
> massive.
>
> It's like 200ms for just the 29 columns.. adding the extra one cause it to
> timeout .. > 5000ms...
>
>
> On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan <do...@gmail.com>
> wrote:
>
>> Don't forget that when you do the Select with limit set to 1000,
>> Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
>> logical row).
>>
>> Adding one extra big html column may be too much and cause timeout. Try
>> to:
>>
>> 1. Select only the big html only
>> 2. Or reduce the limit incrementally until no timeout
>> Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>>
>> I have a table with a schema mostly of small fields. About 30 of them.
>>>
>>> The primary key is:
>>>
>>> primary key( bucket, sequence )
>>>
>>> … I have 100 buckets and the idea is that sequence is ever increasing.
>>> This way I can read from bucket zero, and everything after sequence N and
>>> get all the writes ordered by time.
>>>
>>> I'm running
>>>
>>> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
>>> ASC LIMIT 1000;
>>>
>>> … using the have driver.
>>>
>>> If I add ALL the fields, except one, so 29 fields, the query is fast.
>>> Only 129ms….
>>>
>>> However, if I add the 'html' field, which is snapshot of HTML obvious,
>>> the query times out…
>>>
>>> I'm going to add tracing and try to track it down further, but I suspect
>>> I'm doing something stupid.
>>>
>>> Is it going to burn me that the data is UTF8 encoded? I can't image
>>> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
>>> something silly under the covers?
>>>
>>> cqlsh doesn't time out … it actually works fine but it uses 100% CPU
>>> while writing out the data so it's not a good comparison unfortunately
>>>
>>>
>>> ception in thread "main"
>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>> tried for query failed (tried: ...:9042
>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>> at
>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>> at
>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>>> at
>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
>>> at
>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>>> at
>>> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
>>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>>> All host(s) tried for query failed (tried:
>>> dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
>>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>>> at
>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>> at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:724)
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> Skype: *burtonator*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations
>>> are people.
>>>
>>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>
Re: Adding large text blob causes read timeout...
Posted by Kevin Burton <bu...@spinn3r.com>.
oh.. the difference between the the ONE field and the remaining 29 is
massive.
It's like 200ms for just the 29 columns.. adding the extra one cause it to
timeout .. > 5000ms...
On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan <do...@gmail.com> wrote:
> Don't forget that when you do the Select with limit set to 1000, Cassandra
> is actually fetching 1000 * 29 physical columns (29 fields per logical
> row).
>
> Adding one extra big html column may be too much and cause timeout. Try to:
>
> 1. Select only the big html only
> 2. Or reduce the limit incrementally until no timeout
> Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
>
> I have a table with a schema mostly of small fields. About 30 of them.
>>
>> The primary key is:
>>
>> primary key( bucket, sequence )
>>
>> … I have 100 buckets and the idea is that sequence is ever increasing.
>> This way I can read from bucket zero, and everything after sequence N and
>> get all the writes ordered by time.
>>
>> I'm running
>>
>> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
>> ASC LIMIT 1000;
>>
>> … using the have driver.
>>
>> If I add ALL the fields, except one, so 29 fields, the query is fast.
>> Only 129ms….
>>
>> However, if I add the 'html' field, which is snapshot of HTML obvious,
>> the query times out…
>>
>> I'm going to add tracing and try to track it down further, but I suspect
>> I'm doing something stupid.
>>
>> Is it going to burn me that the data is UTF8 encoded? I can't image
>> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
>> something silly under the covers?
>>
>> cqlsh doesn't time out … it actually works fine but it uses 100% CPU
>> while writing out the data so it's not a good comparison unfortunately
>>
>>
>> ception in thread "main"
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (tried: ...:9042
>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>> at
>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>> at
>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
>> at
>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
>> at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
>> at
>> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>> All host(s) tried for query failed (tried:
>> dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
>> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
>> at
>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>> at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:724)
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.
Re: Adding large text blob causes read timeout...
Posted by DuyHai Doan <do...@gmail.com>.
Don't forget that when you do the Select with limit set to 1000, Cassandra
is actually fetching 1000 * 29 physical columns (29 fields per logical
row).
Adding one extra big html column may be too much and cause timeout. Try to:
1. Select only the big html only
2. Or reduce the limit incrementally until no timeout
Le 24 juin 2014 06:22, "Kevin Burton" <bu...@spinn3r.com> a écrit :
> I have a table with a schema mostly of small fields. About 30 of them.
>
> The primary key is:
>
> primary key( bucket, sequence )
>
> … I have 100 buckets and the idea is that sequence is ever increasing.
> This way I can read from bucket zero, and everything after sequence N and
> get all the writes ordered by time.
>
> I'm running
>
> SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
> ASC LIMIT 1000;
>
> … using the have driver.
>
> If I add ALL the fields, except one, so 29 fields, the query is fast.
> Only 129ms….
>
> However, if I add the 'html' field, which is snapshot of HTML obvious, the
> query times out…
>
> I'm going to add tracing and try to track it down further, but I suspect
> I'm doing something stupid.
>
> Is it going to burn me that the data is UTF8 encoded? I can't image
> decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
> something silly under the covers?
>
> cqlsh doesn't time out … it actually works fine but it uses 100% CPU while
> writing out the data so it's not a good comparison unfortunately
>
>
> ception in thread "main"
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (tried: ...:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
> at
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
> at
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
> at
> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
> at com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
> at
> com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (tried:
> dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
> at
> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
> at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>