You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yan Chunlu <sp...@gmail.com> on 2011/08/14 18:23:30 UTC

node restart taking too long

I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
generated.  and server can not afford the load then crashed.
after come back, node 3 can not return for more than 96 hours

for 34GB data, the node 2 could restart and back online within 1 hour.

I am not sure what's wrong with node3 and should I restart node 3 again?
thanks!

Address         Status State   Load            Owns    Token

113427455640312821154458202477256070484
node1     Up     Normal  34.11 GB        33.33%  0
node2     Up     Normal  31.44 GB        33.33%
56713727820156410577229101238628035242
node3     Down   Normal  177.55 GB       33.33%
113427455640312821154458202477256070484


the log shows it is still going on, not sure why it is so slow:


 INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) Opening
/cassandra/data/COMMENT
 INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
reading saved cache /cassandra/saved_caches/COMMENT-RowCache
 INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
completed loading (1744370 ms; 200000 keys) row cache for COMMENT
 INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
reading saved cache /cassandra/saved_caches/COMMENT-RowCache
 INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java (line
96) Saved COMMENT-RowCache (200000 items) in 2535 ms

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

if I removed migration and schema sstables, it will show up "Couldn't find
cfId=1000", as I remember, If I leave the error alone, it finally will show
up "InstanceAlreadyExistsException".
I found the log in the cassandra log file(but I could not reproduce it), it
was like this:

ERROR [MutationStage:2834] 2011-08-18 06:30:56,667
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=ColumnFamilies,keyspace=prjspace,columnfamily=FriendsByAccount
    at
org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:261)
    at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:472)
    at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:453)
    at org.apache.cassandra.db.Table.initCf(Table.java:317)
    at org.apache.cassandra.db.Table.<init>(Table.java:254)
    at org.apache.cassandra.db.Table.open(Table.java:110)
    at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:76)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=ColumnFamilies,keyspace=prjspace,columnfamily=FriendsByAccount
    at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:467)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1520)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:986)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:938)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:330)
    at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:516)
    at
org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:257)








On Mon, Aug 22, 2011 at 5:42 AM, aaron morton <aa...@thelastpickle.com>wrote:

> cf already exists is not the same.
>
> Would need the call stack.
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/08/2011, at 1:03 AM, Yan Chunlu wrote:
>
> is that means I could just wait and it will be okay eventually?
>
> I also saw the "column family already exists"(not accurate, something like
> that) Exception, also caused after I delete the migration and schema
> sstables.   but I can not reproduce it, is that a similar problem?
>
> On Sun, Aug 21, 2011 at 7:57 PM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node
>> joins a cluster with existing data after having it's schema cleared.
>>
>> The migrations received from another node are applied one CF at a time,
>> when each CF is added the node will open the existing data files which can
>> take a while. In the mean time it's joined on gossip and is receiving
>> mutations from other nodes that have all the CF's. One the returning node
>> gets through applying the migration the errors should stop.
>>
>> Read is a similar story.
>>
>> Cheers
>>
>>
>>
>>  -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:
>>
>> actually I didn't dropped any CF,  maybe my understanding was totally
>> wrong, I just describe what I thought as belows:
>>
>> I thought by "deleted CFs" means the sstable that useless(since "node
>> repair" and could copy data to another node,  the original sstable might be
>> deleted but not yet).  when I deleted all migration and schema sstables, it
>> somehow "forgot" those files should be deleted, so it read the file and "can
>> not find cfId"...
>>
>>
>> I got to this situation by the following steps: at first I did "node
>> repair" on node2 which failed in the middle(node3 down), and leave the Load
>> as 170GB while average is 30GB.
>>
>> after I brought up node3,  the node2 start up very slow, 4 days past it
>> stil starting.  it seems loading row cache and key cache.  so I disabled
>> those cache by set the value to 0 via cassandra-cli. during this procedure,
>> of course node2 was not reachable so it can not update the schema.
>>
>> after that node2 could be start very quickly, but the "describe cluster"
>> shows it was "UNREACHABLE", so I did as the FAQ says, delete schema,
>> migration sstables and restart node2.
>>
>> then the "Couldn't find cfId=1000'" error start showing up.
>>
>>
>>
>>
>>
>> I have just moved those migration && schema sstables back and start
>> cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the
>> "describe cluster" shows they are the same version now.
>>
>>
>> even this problem solved, I am not sure HOW....... really curious that why
>> just remove "migration* and schema*" sstables could cause  "Couldn't find
>> cfId=1000'"  error.
>>
>> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jb...@gmail.com>wrote:
>>
>>> I'm not sure what problem you're trying to solve.  The exception you
>>> pasted should stop once your clients are no longer trying to use the
>>> dropped CF.
>>>
>>> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <sp...@gmail.com>
>>> wrote:
>>> > that could be the reason, I did nodetool repair(unfinished, data size
>>> > increased 6 times bigger 30G vs 170G) and there should be some unclean
>>> > sstables on that node.
>>> > however upgrade it a tough work for me right now.  could the nodetool
>>> scrub
>>> > help?  or decommission the node and join it again?
>>> >
>>> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com>
>>> wrote:
>>> >>
>>> >> This means you should upgrade, because we've fixed bugs about ignoring
>>> >> deleted CFs since 0.7.4.
>>> >>
>>> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com>
>>> wrote:
>>> >> > the log file shows as follows, not sure what does 'Couldn't find
>>> >> > cfId=1000'
>>> >> > means(google just returned useless results):
>>> >> >
>>> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line
>>> 453)
>>> >> > Found
>>> >> > table data in data directories. Consider using JMX to call
>>> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>>> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
>>> >> > Creating new commitlog segment
>>> >> > /cassandra/commitlog/CommitLog-1313670197705.log
>>> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155)
>>> Replaying
>>> >> > /cassandra/commitlog/CommitLog-1313670030512.log
>>> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314)
>>> Finished
>>> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
>>> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
>>> >> > replay
>>> >> > complete
>>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
>>> >> > Cassandra version: 0.7.4
>>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
>>> >> > Thrift
>>> >> > API version: 19.4.0
>>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
>>> >> > Loading
>>> >> > persisted ring state
>>> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
>>> >> > Starting
>>> >> > up server gossip
>>> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line
>>> 1048)
>>> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
>>> >> > operations)
>>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line
>>> 157)
>>> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line
>>> 164)
>>> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db
>>> (80
>>> >> > bytes)
>>> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
>>> >> > CompactionManager.java
>>> >> > (line 396) Compacting
>>> >> >
>>> >> >
>>> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>>> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
>>> >> > Using
>>> >> > saved token 113427455640312821154458202477256070484
>>> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line
>>> 1048)
>>> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
>>> >> > operations)
>>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line
>>> 157)
>>> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>>> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>>> >> > RowMutationVerbHandler.java
>>> >> > (line 86) Error in row mutation
>>> >> > org.apache.cassandra.db.UnserializableColumnFamilyException:
>>> Couldn't
>>> >> > find
>>> >> > cfId=1000
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>>> >> >     at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >> >     at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >> >     at java.lang.Thread.run(Thread.java:636)
>>> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line
>>> 623)
>>> >> > Node
>>> >> > /node1 has restarted, now UP again
>>> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>>> >> > DebuggableThreadPoolExecutor.java (line 103) Error in
>>> ThreadPoolExecutor
>>> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
>>> >> > keyspace prjkeyspace
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>>> >> >     at
>>> >> >
>>> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>>> >> >     at
>>> >> >
>>> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>>> >> >
>>> >> >
>>> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <
>>> aaron@thelastpickle.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Look in the logs to work find out why the migration did not get to
>>> >> >> node2.
>>> >> >> Otherwise yes you can drop those files.
>>> >> >> Cheers
>>> >> >> -----------------
>>> >> >> Aaron Morton
>>> >> >> Freelance Cassandra Developer
>>> >> >> @aaronmorton
>>> >> >> http://www.thelastpickle.com
>>> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>>> >> >>
>>> >> >> just found out that changes via cassandra-cli, the schema change
>>> didn't
>>> >> >> reach node2. and node2 became unreachable....
>>> >> >> I did as this
>>> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>>> >> >> but after that I just got two schema versons:
>>> >> >>
>>> >> >>
>>> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>>> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>>> >> >>
>>> >> >> is that enough delete Schema* && Migrations* sstables and restart
>>> the
>>> >> >> node?
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com
>>> >
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> thanks a lot for  all the help!  I have gone through the steps and
>>> >> >>> successfully brought up the node2 :)
>>> >> >>>
>>> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > Because the file only preserve the "key" of records, not the
>>> whole
>>> >> >>> > record.
>>> >> >>> > Records for those saved key will be loaded into cassandra during
>>> the
>>> >> >>> > startup
>>> >> >>> > of cassandra.
>>> >> >>> >
>>> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <
>>> springrider@gmail.com>
>>> >> >>> > wrote:
>>> >> >>> >>
>>> >> >>> >> but the data size in the saved_cache are relatively small:
>>> >> >>> >>
>>> >> >>> >> will that cause the load problem?
>>> >> >>> >>
>>> >> >>> >>  ls  -lh  /cassandra/saved_caches/
>>> >> >>> >> total 32M
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>>> >> >>> >> cass-CommentSortsCache-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>>> >> >>> >> cass-CommentSortsCache-RowCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
>>> >> >>> >> cass-CommentVote-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>>> >> >>> >> cass-device_images-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53
>>> cass-images-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
>>> >> >>> >> cass-LinksByUrl-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50
>>> cass-LinkVote-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50
>>> cass-cache-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51
>>> cass-cache-RowCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>>> >> >>> >> cass-SavesByAccount-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
>>> >> >>> >> cass-VotesByDay-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
>>> >> >>> >> cass-VotesByLink-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>>> >> >>> >> system-HintsColumnFamily-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>>> >> >>> >> system-LocationInfo-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>>> >> >>> >> system-Migrations-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30
>>> system-Schema-KeyCache
>>> >> >>> >>
>>> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>>> >> >>> >> <aa...@thelastpickle.com>
>>> >> >>> >> wrote:
>>> >> >>> >> > If you have a node that cannot start up due to issues loading
>>> the
>>> >> >>> >> > saved
>>> >> >>> >> > cache delete the files in the saved_cache directory before
>>> >> >>> >> > starting
>>> >> >>> >> > it.
>>> >> >>> >> >
>>> >> >>> >> > The settings to save the row and key cache are per CF. You
>>> can
>>> >> >>> >> > change
>>> >> >>> >> > them with an update column family statement via the CLI when
>>> >> >>> >> > attached to any
>>> >> >>> >> > node. You may then want to check the saved_caches directory
>>> and
>>> >> >>> >> > delete any
>>> >> >>> >> > files that are left (not sure if they are automatically
>>> deleted).
>>> >> >>> >> >
>>> >> >>> >> > i would recommend:
>>> >> >>> >> > - stop node 2
>>> >> >>> >> > - delete it's saved_cache
>>> >> >>> >> > - make the schema change via another node
>>> >> >>> >> > - startup node 2
>>> >> >>> >> >
>>> >> >>> >> > Cheers
>>> >> >>> >> >
>>> >> >>> >> > -----------------
>>> >> >>> >> > Aaron Morton
>>> >> >>> >> > Freelance Cassandra Developer
>>> >> >>> >> > @aaronmorton
>>> >> >>> >> > http://www.thelastpickle.com
>>> >> >>> >> >
>>> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>>> >> >>> >> >
>>> >> >>> >> >> does this need to be cluster wide? or I could just modify
>>> the
>>> >> >>> >> >> caches
>>> >> >>> >> >> on one node?   since I could not connect to the node with
>>> >> >>> >> >> cassandra-cli, it says "connection refused"
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >> [default@unknown] connect node2/9160;
>>> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection
>>> refused.
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >> so if I change the cache size via other nodes, how could
>>> node2
>>> >> >>> >> >> be
>>> >> >>> >> >> notified the changing?    kill cassandra and start it again
>>> >> >>> >> >> could
>>> >> >>> >> >> make
>>> >> >>> >> >> it update the schema?
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
>>> >> >>> >> >> <th...@wetafx.co.nz>
>>> >> >>> >> >> wrote:
>>> >> >>> >> >>> Hi,
>>> >> >>> >> >>>
>>> >> >>> >> >>> yes, we saw exactly the same messages. We got rid of these
>>> by
>>> >> >>> >> >>> doing
>>> >> >>> >> >>> the
>>> >> >>> >> >>> following:
>>> >> >>> >> >>>
>>> >> >>> >> >>> * Set all row & key caches in your CFs to 0 via
>>> cassandra-cli
>>> >> >>> >> >>> * Kill Cassandra
>>> >> >>> >> >>> * Remove all files in the saved_caches directory
>>> >> >>> >> >>> * Start Cassandra
>>> >> >>> >> >>> * Slowly bring back row & key caches (if desired, we left
>>> them
>>> >> >>> >> >>> off)
>>> >> >>> >> >>>
>>> >> >>> >> >>> Cheers,
>>> >> >>> >> >>>
>>> >> >>> >> >>>        T.
>>> >> >>> >> >>>
>>> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log
>>> level
>>> >> >>> >> >>>> to
>>> >> >>> >> >>>> DEBUG.
>>> >> >>> >> >>>>  just
>>> >> >>> >> >>>> thought even bring up a new node will be faster than start
>>> the
>>> >> >>> >> >>>> old
>>> >> >>> >> >>>> one..... it
>>> >> >>> >> >>>> is wired
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
>>> >> >>> >> >>>> <springrider@gmail.com
>>> >> >>> >> >>>> <ma...@gmail.com>> wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>    but it seems the row cache is cluster wide, how will
>>>  the
>>> >> >>> >> >>>> change
>>> >> >>> >> >>>> of row
>>> >> >>> >> >>>>    cache affect the read speed?
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
>>> >> >>> >> >>>> <jbellis@gmail.com
>>> >> >>> >> >>>>    <ma...@gmail.com>> wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>        Or leave row cache enabled but disable cache saving
>>> >> >>> >> >>>> (and
>>> >> >>> >> >>>> remove the
>>> >> >>> >> >>>>        one already on disk).
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>>> >> >>> >> >>>> <aaron@thelastpickle.com
>>> >> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 547)
>>> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>>> >> >>> >> >>>> cache
>>> >> >>> >> >>>> for
>>> >> >>> >> >>>> COMMENT
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in
>>> the
>>> >> >>> >> >>>>  row
>>> >> >>> >> >>>> cache.
>>> >> >>> >> >>>> Thats a
>>> >> >>> >> >>>>         > pretty big row cache, I would suggest reducing
>>> or
>>> >> >>> >> >>>> disabling
>>> >> >>> >> >>>> it.
>>> >> >>> >> >>>>         > Background
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > and server can not afford the load then crashed.
>>> >> >>> >> >>>> after
>>> >> >>> >> >>>> come
>>> >> >>> >> >>>> back,
>>> >> >>> >> >>>>        node 3 can
>>> >> >>> >> >>>>         > not return for more than 96 hours
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > Crashed how ?
>>> >> >>> >> >>>>         > You may be seeing
>>> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>> >> >>> >> >>>>         > Watch nodetool compactionstats to see when the
>>> >> >>> >> >>>> Merkle
>>> >> >>> >> >>>> tree
>>> >> >>> >> >>>> build
>>> >> >>> >> >>>>        finishes
>>> >> >>> >> >>>>         > and nodetool netstats to see which CF's are
>>> >> >>> >> >>>> streaming.
>>> >> >>> >> >>>>         > Cheers
>>> >> >>> >> >>>>         > -----------------
>>> >> >>> >> >>>>         > Aaron Morton
>>> >> >>> >> >>>>         > Freelance Cassandra Developer
>>> >> >>> >> >>>>         > @aaronmorton
>>> >> >>> >> >>>>         > http://www.thelastpickle.com
>>> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3,
>>> it
>>> >> >>> >> >>>> seems
>>> >> >>> >> >>>> alot
>>> >> >>> >> >>>> data
>>> >> >>> >> >>>>         > generated.  and server can not afford the load
>>> then
>>> >> >>> >> >>>> crashed.
>>> >> >>> >> >>>>         > after come back, node 3 can not return for more
>>> than
>>> >> >>> >> >>>> 96
>>> >> >>> >> >>>> hours
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
>>> >> >>> >> >>>> online
>>> >> >>> >> >>>> within 1
>>> >> >>> >> >>>> hour.
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > I am not sure what's wrong with node3 and should
>>> I
>>> >> >>> >> >>>> restart
>>> >> >>> >> >>>> node
>>> >> >>> >> >>>> 3 again?
>>> >> >>> >> >>>>         > thanks!
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > Address         Status State   Load
>>>  Owns
>>> >> >>> >> >>>>  Token
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%
>>>  0
>>> >> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>>> >> >>> >> >>>>         > 56713727820156410577229101238628035242
>>> >> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > the log shows it is still going on, not sure why
>>> it
>>> >> >>> >> >>>> is
>>> >> >>> >> >>>> so
>>> >> >>> >> >>>> slow:
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
>>> >> >>> >> >>>> SSTableReader.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 154)
>>> >> >>> >> >>>>        Opening
>>> >> >>> >> >>>>         > /cassandra/data/COMMENT
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 275)
>>> >> >>> >> >>>>         > reading saved cache
>>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 547)
>>> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>>> >> >>> >> >>>> cache
>>> >> >>> >> >>>> for
>>> >> >>> >> >>>> COMMENT
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 275)
>>> >> >>> >> >>>>         > reading saved cache
>>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14
>>> 10:24:55,480
>>> >> >>> >> >>>>        CacheWriter.java (line
>>> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in
>>> 2535 ms
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>        --
>>> >> >>> >> >>>>        Jonathan Ellis
>>> >> >>> >> >>>>        Project Chair, Apache Cassandra
>>> >> >>> >> >>>>        co-founder of DataStax, the source for professional
>>> >> >>> >> >>>> Cassandra
>>> >> >>> >> >>>> support
>>> >> >>> >> >>>>        http://www.datastax.com
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>
>>> >> >>> >> >>>
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>>
>
>

Re: node restart taking too long

Posted by aaron morton <aa...@thelastpickle.com>.

cf already exists is not the same. 

Would need the call stack. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2011, at 1:03 AM, Yan Chunlu wrote:

> is that means I could just wait and it will be okay eventually?
> 
> I also saw the "column family already exists"(not accurate, something like that) Exception, also caused after I delete the migration and schema sstables.   but I can not reproduce it, is that a similar problem?
> 
> On Sun, Aug 21, 2011 at 7:57 PM, aaron morton <aa...@thelastpickle.com> wrote:
> I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node joins a cluster with existing data after having it's schema cleared. 
> 
> The migrations received from another node are applied one CF at a time, when each CF is added the node will open the existing data files which can take a while. In the mean time it's joined on gossip and is receiving mutations from other nodes that have all the CF's. One the returning node gets through applying the migration the errors should stop. 
> 
> Read is a similar story.
> 
> Cheers
>  
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:
> 
>> actually I didn't dropped any CF,  maybe my understanding was totally wrong, I just describe what I thought as belows: 
>> 
>> I thought by "deleted CFs" means the sstable that useless(since "node repair" and could copy data to another node,  the original sstable might be deleted but not yet).  when I deleted all migration and schema sstables, it somehow "forgot" those files should be deleted, so it read the file and "can not find cfId"...
>> 
>> 
>> I got to this situation by the following steps: at first I did "node repair" on node2 which failed in the middle(node3 down), and leave the Load as 170GB while average is 30GB.
>> 
>> after I brought up node3,  the node2 start up very slow, 4 days past it stil starting.  it seems loading row cache and key cache.  so I disabled those cache by set the value to 0 via cassandra-cli. during this procedure, of course node2 was not reachable so it can not update the schema.
>> 
>> after that node2 could be start very quickly, but the "describe cluster" shows it was "UNREACHABLE", so I did as the FAQ says, delete schema, migration sstables and restart node2. 
>> 
>> then the "Couldn't find cfId=1000'" error start showing up.
>> 
>> 
>> 
>> 
>> 
>> I have just moved those migration && schema sstables back and start cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the "describe cluster" shows they are the same version now.
>> 
>> 
>> even this problem solved, I am not sure HOW....... really curious that why just remove "migration* and schema*" sstables could cause  "Couldn't find cfId=1000'"  error.
>> 
>> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> I'm not sure what problem you're trying to solve.  The exception you
>> pasted should stop once your clients are no longer trying to use the
>> dropped CF.
>> 
>> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <sp...@gmail.com> wrote:
>> > that could be the reason, I did nodetool repair(unfinished, data size
>> > increased 6 times bigger 30G vs 170G) and there should be some unclean
>> > sstables on that node.
>> > however upgrade it a tough work for me right now.  could the nodetool scrub
>> > help?  or decommission the node and join it again?
>> >
>> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> >>
>> >> This means you should upgrade, because we've fixed bugs about ignoring
>> >> deleted CFs since 0.7.4.
>> >>
>> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com> wrote:
>> >> > the log file shows as follows, not sure what does 'Couldn't find
>> >> > cfId=1000'
>> >> > means(google just returned useless results):
>> >> >
>> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
>> >> > Found
>> >> > table data in data directories. Consider using JMX to call
>> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
>> >> > Creating new commitlog segment
>> >> > /cassandra/commitlog/CommitLog-1313670197705.log
>> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
>> >> > /cassandra/commitlog/CommitLog-1313670030512.log
>> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
>> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
>> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
>> >> > replay
>> >> > complete
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
>> >> > Cassandra version: 0.7.4
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
>> >> > Thrift
>> >> > API version: 19.4.0
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
>> >> > Loading
>> >> > persisted ring state
>> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
>> >> > Starting
>> >> > up server gossip
>> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
>> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
>> >> > operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
>> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
>> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
>> >> > bytes)
>> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
>> >> > CompactionManager.java
>> >> > (line 396) Compacting
>> >> >
>> >> > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
>> >> > Using
>> >> > saved token 113427455640312821154458202477256070484
>> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
>> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
>> >> > operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
>> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>> >> > RowMutationVerbHandler.java
>> >> > (line 86) Error in row mutation
>> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
>> >> > find
>> >> > cfId=1000
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>> >> >     at
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> >     at
>> >> >
>> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >> >     at java.lang.Thread.run(Thread.java:636)
>> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
>> >> > Node
>> >> > /node1 has restarted, now UP again
>> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>> >> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
>> >> > keyspace prjkeyspace
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>> >> >     at
>> >> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>> >> >     at
>> >> >
>> >> > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>> >> >     at
>> >> > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>> >> >
>> >> >
>> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>
>> >> > wrote:
>> >> >>
>> >> >> Look in the logs to work find out why the migration did not get to
>> >> >> node2.
>> >> >> Otherwise yes you can drop those files.
>> >> >> Cheers
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>> >> >>
>> >> >> just found out that changes via cassandra-cli, the schema change didn't
>> >> >> reach node2. and node2 became unreachable....
>> >> >> I did as this
>> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>> >> >> but after that I just got two schema versons:
>> >> >>
>> >> >>
>> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>> >> >>
>> >> >> is that enough delete Schema* && Migrations* sstables and restart the
>> >> >> node?
>> >> >>
>> >> >>
>> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> thanks a lot for  all the help!  I have gone through the steps and
>> >> >>> successfully brought up the node2 :)
>> >> >>>
>> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
>> >> >>> wrote:
>> >> >>> > Because the file only preserve the "key" of records, not the whole
>> >> >>> > record.
>> >> >>> > Records for those saved key will be loaded into cassandra during the
>> >> >>> > startup
>> >> >>> > of cassandra.
>> >> >>> >
>> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> but the data size in the saved_cache are relatively small:
>> >> >>> >>
>> >> >>> >> will that cause the load problem?
>> >> >>> >>
>> >> >>> >>  ls  -lh  /cassandra/saved_caches/
>> >> >>> >> total 32M
>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> >> >>> >> cass-CommentSortsCache-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> >> >>> >> cass-CommentSortsCache-RowCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
>> >> >>> >> cass-CommentVote-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>> >> >>> >> cass-device_images-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
>> >> >>> >> cass-LinksByUrl-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>> >> >>> >> cass-SavesByAccount-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
>> >> >>> >> cass-VotesByDay-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
>> >> >>> >> cass-VotesByLink-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> >> >>> >> system-HintsColumnFamily-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>> >> >>> >> system-LocationInfo-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>> >> >>> >> system-Migrations-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>> >> >>> >>
>> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>> >> >>> >> <aa...@thelastpickle.com>
>> >> >>> >> wrote:
>> >> >>> >> > If you have a node that cannot start up due to issues loading the
>> >> >>> >> > saved
>> >> >>> >> > cache delete the files in the saved_cache directory before
>> >> >>> >> > starting
>> >> >>> >> > it.
>> >> >>> >> >
>> >> >>> >> > The settings to save the row and key cache are per CF. You can
>> >> >>> >> > change
>> >> >>> >> > them with an update column family statement via the CLI when
>> >> >>> >> > attached to any
>> >> >>> >> > node. You may then want to check the saved_caches directory and
>> >> >>> >> > delete any
>> >> >>> >> > files that are left (not sure if they are automatically deleted).
>> >> >>> >> >
>> >> >>> >> > i would recommend:
>> >> >>> >> > - stop node 2
>> >> >>> >> > - delete it's saved_cache
>> >> >>> >> > - make the schema change via another node
>> >> >>> >> > - startup node 2
>> >> >>> >> >
>> >> >>> >> > Cheers
>> >> >>> >> >
>> >> >>> >> > -----------------
>> >> >>> >> > Aaron Morton
>> >> >>> >> > Freelance Cassandra Developer
>> >> >>> >> > @aaronmorton
>> >> >>> >> > http://www.thelastpickle.com
>> >> >>> >> >
>> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >> >>> >> >
>> >> >>> >> >> does this need to be cluster wide? or I could just modify the
>> >> >>> >> >> caches
>> >> >>> >> >> on one node?   since I could not connect to the node with
>> >> >>> >> >> cassandra-cli, it says "connection refused"
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> [default@unknown] connect node2/9160;
>> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> so if I change the cache size via other nodes, how could node2
>> >> >>> >> >> be
>> >> >>> >> >> notified the changing?    kill cassandra and start it again
>> >> >>> >> >> could
>> >> >>> >> >> make
>> >> >>> >> >> it update the schema?
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
>> >> >>> >> >> <th...@wetafx.co.nz>
>> >> >>> >> >> wrote:
>> >> >>> >> >>> Hi,
>> >> >>> >> >>>
>> >> >>> >> >>> yes, we saw exactly the same messages. We got rid of these by
>> >> >>> >> >>> doing
>> >> >>> >> >>> the
>> >> >>> >> >>> following:
>> >> >>> >> >>>
>> >> >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> >> >>> >> >>> * Kill Cassandra
>> >> >>> >> >>> * Remove all files in the saved_caches directory
>> >> >>> >> >>> * Start Cassandra
>> >> >>> >> >>> * Slowly bring back row & key caches (if desired, we left them
>> >> >>> >> >>> off)
>> >> >>> >> >>>
>> >> >>> >> >>> Cheers,
>> >> >>> >> >>>
>> >> >>> >> >>>        T.
>> >> >>> >> >>>
>> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log level
>> >> >>> >> >>>> to
>> >> >>> >> >>>> DEBUG.
>> >> >>> >> >>>>  just
>> >> >>> >> >>>> thought even bring up a new node will be faster than start the
>> >> >>> >> >>>> old
>> >> >>> >> >>>> one..... it
>> >> >>> >> >>>> is wired
>> >> >>> >> >>>>
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
>> >> >>> >> >>>> <springrider@gmail.com
>> >> >>> >> >>>> <ma...@gmail.com>> wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>    but it seems the row cache is cluster wide, how will  the
>> >> >>> >> >>>> change
>> >> >>> >> >>>> of row
>> >> >>> >> >>>>    cache affect the read speed?
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
>> >> >>> >> >>>> <jbellis@gmail.com
>> >> >>> >> >>>>    <ma...@gmail.com>> wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>        Or leave row cache enabled but disable cache saving
>> >> >>> >> >>>> (and
>> >> >>> >> >>>> remove the
>> >> >>> >> >>>>        one already on disk).
>> >> >>> >> >>>>
>> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >> >>> >> >>>> <aaron@thelastpickle.com
>> >> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 547)
>> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>> >> >>> >> >>>> cache
>> >> >>> >> >>>> for
>> >> >>> >> >>>> COMMENT
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the
>> >> >>> >> >>>>  row
>> >> >>> >> >>>> cache.
>> >> >>> >> >>>> Thats a
>> >> >>> >> >>>>         > pretty big row cache, I would suggest reducing or
>> >> >>> >> >>>> disabling
>> >> >>> >> >>>> it.
>> >> >>> >> >>>>         > Background
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > and server can not afford the load then crashed.
>> >> >>> >> >>>> after
>> >> >>> >> >>>> come
>> >> >>> >> >>>> back,
>> >> >>> >> >>>>        node 3 can
>> >> >>> >> >>>>         > not return for more than 96 hours
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > Crashed how ?
>> >> >>> >> >>>>         > You may be seeing
>> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >> >>> >> >>>>         > Watch nodetool compactionstats to see when the
>> >> >>> >> >>>> Merkle
>> >> >>> >> >>>> tree
>> >> >>> >> >>>> build
>> >> >>> >> >>>>        finishes
>> >> >>> >> >>>>         > and nodetool netstats to see which CF's are
>> >> >>> >> >>>> streaming.
>> >> >>> >> >>>>         > Cheers
>> >> >>> >> >>>>         > -----------------
>> >> >>> >> >>>>         > Aaron Morton
>> >> >>> >> >>>>         > Freelance Cassandra Developer
>> >> >>> >> >>>>         > @aaronmorton
>> >> >>> >> >>>>         > http://www.thelastpickle.com
>> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
>> >> >>> >> >>>> seems
>> >> >>> >> >>>> alot
>> >> >>> >> >>>> data
>> >> >>> >> >>>>         > generated.  and server can not afford the load then
>> >> >>> >> >>>> crashed.
>> >> >>> >> >>>>         > after come back, node 3 can not return for more than
>> >> >>> >> >>>> 96
>> >> >>> >> >>>> hours
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
>> >> >>> >> >>>> online
>> >> >>> >> >>>> within 1
>> >> >>> >> >>>> hour.
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > I am not sure what's wrong with node3 and should I
>> >> >>> >> >>>> restart
>> >> >>> >> >>>> node
>> >> >>> >> >>>> 3 again?
>> >> >>> >> >>>>         > thanks!
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > Address         Status State   Load            Owns
>> >> >>> >> >>>>  Token
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>> >> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>> >> >>> >> >>>>         > 56713727820156410577229101238628035242
>> >> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > the log shows it is still going on, not sure why it
>> >> >>> >> >>>> is
>> >> >>> >> >>>> so
>> >> >>> >> >>>> slow:
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
>> >> >>> >> >>>> SSTableReader.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 154)
>> >> >>> >> >>>>        Opening
>> >> >>> >> >>>>         > /cassandra/data/COMMENT
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 275)
>> >> >>> >> >>>>         > reading saved cache
>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 547)
>> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>> >> >>> >> >>>> cache
>> >> >>> >> >>>> for
>> >> >>> >> >>>> COMMENT
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 275)
>> >> >>> >> >>>>         > reading saved cache
>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>> >> >>> >> >>>>        CacheWriter.java (line
>> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>        --
>> >> >>> >> >>>>        Jonathan Ellis
>> >> >>> >> >>>>        Project Chair, Apache Cassandra
>> >> >>> >> >>>>        co-founder of DataStax, the source for professional
>> >> >>> >> >>>> Cassandra
>> >> >>> >> >>>> support
>> >> >>> >> >>>>        http://www.datastax.com
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>> 
> 
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

is that means I could just wait and it will be okay eventually?

I also saw the "column family already exists"(not accurate, something like
that) Exception, also caused after I delete the migration and schema
sstables.   but I can not reproduce it, is that a similar problem?

On Sun, Aug 21, 2011 at 7:57 PM, aaron morton <aa...@thelastpickle.com>wrote:

> I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node
> joins a cluster with existing data after having it's schema cleared.
>
> The migrations received from another node are applied one CF at a time,
> when each CF is added the node will open the existing data files which can
> take a while. In the mean time it's joined on gossip and is receiving
> mutations from other nodes that have all the CF's. One the returning node
> gets through applying the migration the errors should stop.
>
> Read is a similar story.
>
> Cheers
>
>
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:
>
> actually I didn't dropped any CF,  maybe my understanding was totally
> wrong, I just describe what I thought as belows:
>
> I thought by "deleted CFs" means the sstable that useless(since "node
> repair" and could copy data to another node,  the original sstable might be
> deleted but not yet).  when I deleted all migration and schema sstables, it
> somehow "forgot" those files should be deleted, so it read the file and "can
> not find cfId"...
>
>
> I got to this situation by the following steps: at first I did "node
> repair" on node2 which failed in the middle(node3 down), and leave the Load
> as 170GB while average is 30GB.
>
> after I brought up node3,  the node2 start up very slow, 4 days past it
> stil starting.  it seems loading row cache and key cache.  so I disabled
> those cache by set the value to 0 via cassandra-cli. during this procedure,
> of course node2 was not reachable so it can not update the schema.
>
> after that node2 could be start very quickly, but the "describe cluster"
> shows it was "UNREACHABLE", so I did as the FAQ says, delete schema,
> migration sstables and restart node2.
>
> then the "Couldn't find cfId=1000'" error start showing up.
>
>
>
>
>
> I have just moved those migration && schema sstables back and start
> cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the
> "describe cluster" shows they are the same version now.
>
>
> even this problem solved, I am not sure HOW....... really curious that why
> just remove "migration* and schema*" sstables could cause  "Couldn't find
> cfId=1000'"  error.
>
> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jb...@gmail.com>wrote:
>
>> I'm not sure what problem you're trying to solve.  The exception you
>> pasted should stop once your clients are no longer trying to use the
>> dropped CF.
>>
>> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> > that could be the reason, I did nodetool repair(unfinished, data size
>> > increased 6 times bigger 30G vs 170G) and there should be some unclean
>> > sstables on that node.
>> > however upgrade it a tough work for me right now.  could the nodetool
>> scrub
>> > help?  or decommission the node and join it again?
>> >
>> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >>
>> >> This means you should upgrade, because we've fixed bugs about ignoring
>> >> deleted CFs since 0.7.4.
>> >>
>> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> >> > the log file shows as follows, not sure what does 'Couldn't find
>> >> > cfId=1000'
>> >> > means(google just returned useless results):
>> >> >
>> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line
>> 453)
>> >> > Found
>> >> > table data in data directories. Consider using JMX to call
>> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
>> >> > Creating new commitlog segment
>> >> > /cassandra/commitlog/CommitLog-1313670197705.log
>> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155)
>> Replaying
>> >> > /cassandra/commitlog/CommitLog-1313670030512.log
>> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314)
>> Finished
>> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
>> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
>> >> > replay
>> >> > complete
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
>> >> > Cassandra version: 0.7.4
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
>> >> > Thrift
>> >> > API version: 19.4.0
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
>> >> > Loading
>> >> > persisted ring state
>> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
>> >> > Starting
>> >> > up server gossip
>> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line
>> 1048)
>> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
>> >> > operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line
>> 157)
>> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line
>> 164)
>> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db
>> (80
>> >> > bytes)
>> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
>> >> > CompactionManager.java
>> >> > (line 396) Compacting
>> >> >
>> >> >
>> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
>> >> > Using
>> >> > saved token 113427455640312821154458202477256070484
>> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line
>> 1048)
>> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
>> >> > operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line
>> 157)
>> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>> >> > RowMutationVerbHandler.java
>> >> > (line 86) Error in row mutation
>> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
>> >> > find
>> >> > cfId=1000
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>> >> >     at
>> >> >
>> >> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> >     at
>> >> >
>> >> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >> >     at java.lang.Thread.run(Thread.java:636)
>> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line
>> 623)
>> >> > Node
>> >> > /node1 has restarted, now UP again
>> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>> >> > DebuggableThreadPoolExecutor.java (line 103) Error in
>> ThreadPoolExecutor
>> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
>> >> > keyspace prjkeyspace
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>> >> >     at
>> >> >
>> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>> >> >     at
>> >> >
>> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>> >> >
>> >> >
>> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <
>> aaron@thelastpickle.com>
>> >> > wrote:
>> >> >>
>> >> >> Look in the logs to work find out why the migration did not get to
>> >> >> node2.
>> >> >> Otherwise yes you can drop those files.
>> >> >> Cheers
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>> >> >>
>> >> >> just found out that changes via cassandra-cli, the schema change
>> didn't
>> >> >> reach node2. and node2 became unreachable....
>> >> >> I did as this
>> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>> >> >> but after that I just got two schema versons:
>> >> >>
>> >> >>
>> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>> >> >>
>> >> >> is that enough delete Schema* && Migrations* sstables and restart
>> the
>> >> >> node?
>> >> >>
>> >> >>
>> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> thanks a lot for  all the help!  I have gone through the steps and
>> >> >>> successfully brought up the node2 :)
>> >> >>>
>> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
>> >> >>> wrote:
>> >> >>> > Because the file only preserve the "key" of records, not the
>> whole
>> >> >>> > record.
>> >> >>> > Records for those saved key will be loaded into cassandra during
>> the
>> >> >>> > startup
>> >> >>> > of cassandra.
>> >> >>> >
>> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <
>> springrider@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> but the data size in the saved_cache are relatively small:
>> >> >>> >>
>> >> >>> >> will that cause the load problem?
>> >> >>> >>
>> >> >>> >>  ls  -lh  /cassandra/saved_caches/
>> >> >>> >> total 32M
>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> >> >>> >> cass-CommentSortsCache-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> >> >>> >> cass-CommentSortsCache-RowCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
>> >> >>> >> cass-CommentVote-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>> >> >>> >> cass-device_images-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53
>> cass-images-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
>> >> >>> >> cass-LinksByUrl-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50
>> cass-LinkVote-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>> >> >>> >> cass-SavesByAccount-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
>> >> >>> >> cass-VotesByDay-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
>> >> >>> >> cass-VotesByLink-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> >> >>> >> system-HintsColumnFamily-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>> >> >>> >> system-LocationInfo-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>> >> >>> >> system-Migrations-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30
>> system-Schema-KeyCache
>> >> >>> >>
>> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>> >> >>> >> <aa...@thelastpickle.com>
>> >> >>> >> wrote:
>> >> >>> >> > If you have a node that cannot start up due to issues loading
>> the
>> >> >>> >> > saved
>> >> >>> >> > cache delete the files in the saved_cache directory before
>> >> >>> >> > starting
>> >> >>> >> > it.
>> >> >>> >> >
>> >> >>> >> > The settings to save the row and key cache are per CF. You can
>> >> >>> >> > change
>> >> >>> >> > them with an update column family statement via the CLI when
>> >> >>> >> > attached to any
>> >> >>> >> > node. You may then want to check the saved_caches directory
>> and
>> >> >>> >> > delete any
>> >> >>> >> > files that are left (not sure if they are automatically
>> deleted).
>> >> >>> >> >
>> >> >>> >> > i would recommend:
>> >> >>> >> > - stop node 2
>> >> >>> >> > - delete it's saved_cache
>> >> >>> >> > - make the schema change via another node
>> >> >>> >> > - startup node 2
>> >> >>> >> >
>> >> >>> >> > Cheers
>> >> >>> >> >
>> >> >>> >> > -----------------
>> >> >>> >> > Aaron Morton
>> >> >>> >> > Freelance Cassandra Developer
>> >> >>> >> > @aaronmorton
>> >> >>> >> > http://www.thelastpickle.com
>> >> >>> >> >
>> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >> >>> >> >
>> >> >>> >> >> does this need to be cluster wide? or I could just modify the
>> >> >>> >> >> caches
>> >> >>> >> >> on one node?   since I could not connect to the node with
>> >> >>> >> >> cassandra-cli, it says "connection refused"
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> [default@unknown] connect node2/9160;
>> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection
>> refused.
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> so if I change the cache size via other nodes, how could
>> node2
>> >> >>> >> >> be
>> >> >>> >> >> notified the changing?    kill cassandra and start it again
>> >> >>> >> >> could
>> >> >>> >> >> make
>> >> >>> >> >> it update the schema?
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
>> >> >>> >> >> <th...@wetafx.co.nz>
>> >> >>> >> >> wrote:
>> >> >>> >> >>> Hi,
>> >> >>> >> >>>
>> >> >>> >> >>> yes, we saw exactly the same messages. We got rid of these
>> by
>> >> >>> >> >>> doing
>> >> >>> >> >>> the
>> >> >>> >> >>> following:
>> >> >>> >> >>>
>> >> >>> >> >>> * Set all row & key caches in your CFs to 0 via
>> cassandra-cli
>> >> >>> >> >>> * Kill Cassandra
>> >> >>> >> >>> * Remove all files in the saved_caches directory
>> >> >>> >> >>> * Start Cassandra
>> >> >>> >> >>> * Slowly bring back row & key caches (if desired, we left
>> them
>> >> >>> >> >>> off)
>> >> >>> >> >>>
>> >> >>> >> >>> Cheers,
>> >> >>> >> >>>
>> >> >>> >> >>>        T.
>> >> >>> >> >>>
>> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log
>> level
>> >> >>> >> >>>> to
>> >> >>> >> >>>> DEBUG.
>> >> >>> >> >>>>  just
>> >> >>> >> >>>> thought even bring up a new node will be faster than start
>> the
>> >> >>> >> >>>> old
>> >> >>> >> >>>> one..... it
>> >> >>> >> >>>> is wired
>> >> >>> >> >>>>
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
>> >> >>> >> >>>> <springrider@gmail.com
>> >> >>> >> >>>> <ma...@gmail.com>> wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>    but it seems the row cache is cluster wide, how will
>>  the
>> >> >>> >> >>>> change
>> >> >>> >> >>>> of row
>> >> >>> >> >>>>    cache affect the read speed?
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
>> >> >>> >> >>>> <jbellis@gmail.com
>> >> >>> >> >>>>    <ma...@gmail.com>> wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>        Or leave row cache enabled but disable cache saving
>> >> >>> >> >>>> (and
>> >> >>> >> >>>> remove the
>> >> >>> >> >>>>        one already on disk).
>> >> >>> >> >>>>
>> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >> >>> >> >>>> <aaron@thelastpickle.com
>> >> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 547)
>> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>> >> >>> >> >>>> cache
>> >> >>> >> >>>> for
>> >> >>> >> >>>> COMMENT
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in
>> the
>> >> >>> >> >>>>  row
>> >> >>> >> >>>> cache.
>> >> >>> >> >>>> Thats a
>> >> >>> >> >>>>         > pretty big row cache, I would suggest reducing or
>> >> >>> >> >>>> disabling
>> >> >>> >> >>>> it.
>> >> >>> >> >>>>         > Background
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > and server can not afford the load then crashed.
>> >> >>> >> >>>> after
>> >> >>> >> >>>> come
>> >> >>> >> >>>> back,
>> >> >>> >> >>>>        node 3 can
>> >> >>> >> >>>>         > not return for more than 96 hours
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > Crashed how ?
>> >> >>> >> >>>>         > You may be seeing
>> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >> >>> >> >>>>         > Watch nodetool compactionstats to see when the
>> >> >>> >> >>>> Merkle
>> >> >>> >> >>>> tree
>> >> >>> >> >>>> build
>> >> >>> >> >>>>        finishes
>> >> >>> >> >>>>         > and nodetool netstats to see which CF's are
>> >> >>> >> >>>> streaming.
>> >> >>> >> >>>>         > Cheers
>> >> >>> >> >>>>         > -----------------
>> >> >>> >> >>>>         > Aaron Morton
>> >> >>> >> >>>>         > Freelance Cassandra Developer
>> >> >>> >> >>>>         > @aaronmorton
>> >> >>> >> >>>>         > http://www.thelastpickle.com
>> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3,
>> it
>> >> >>> >> >>>> seems
>> >> >>> >> >>>> alot
>> >> >>> >> >>>> data
>> >> >>> >> >>>>         > generated.  and server can not afford the load
>> then
>> >> >>> >> >>>> crashed.
>> >> >>> >> >>>>         > after come back, node 3 can not return for more
>> than
>> >> >>> >> >>>> 96
>> >> >>> >> >>>> hours
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
>> >> >>> >> >>>> online
>> >> >>> >> >>>> within 1
>> >> >>> >> >>>> hour.
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > I am not sure what's wrong with node3 and should
>> I
>> >> >>> >> >>>> restart
>> >> >>> >> >>>> node
>> >> >>> >> >>>> 3 again?
>> >> >>> >> >>>>         > thanks!
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > Address         Status State   Load
>>  Owns
>> >> >>> >> >>>>  Token
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%
>>  0
>> >> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>> >> >>> >> >>>>         > 56713727820156410577229101238628035242
>> >> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > the log shows it is still going on, not sure why
>> it
>> >> >>> >> >>>> is
>> >> >>> >> >>>> so
>> >> >>> >> >>>> slow:
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
>> >> >>> >> >>>> SSTableReader.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 154)
>> >> >>> >> >>>>        Opening
>> >> >>> >> >>>>         > /cassandra/data/COMMENT
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 275)
>> >> >>> >> >>>>         > reading saved cache
>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 547)
>> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>> >> >>> >> >>>> cache
>> >> >>> >> >>>> for
>> >> >>> >> >>>> COMMENT
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 275)
>> >> >>> >> >>>>         > reading saved cache
>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14
>> 10:24:55,480
>> >> >>> >> >>>>        CacheWriter.java (line
>> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535
>> ms
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>        --
>> >> >>> >> >>>>        Jonathan Ellis
>> >> >>> >> >>>>        Project Chair, Apache Cassandra
>> >> >>> >> >>>>        co-founder of DataStax, the source for professional
>> >> >>> >> >>>> Cassandra
>> >> >>> >> >>>> support
>> >> >>> >> >>>>        http://www.datastax.com
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>

Re: node restart taking too long

Posted by aaron morton <aa...@thelastpickle.com>.

I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node joins a cluster with existing data after having it's schema cleared. 

The migrations received from another node are applied one CF at a time, when each CF is added the node will open the existing data files which can take a while. In the mean time it's joined on gossip and is receiving mutations from other nodes that have all the CF's. One the returning node gets through applying the migration the errors should stop. 

Read is a similar story.

Cheers
 


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:

> actually I didn't dropped any CF,  maybe my understanding was totally wrong, I just describe what I thought as belows: 
> 
> I thought by "deleted CFs" means the sstable that useless(since "node repair" and could copy data to another node,  the original sstable might be deleted but not yet).  when I deleted all migration and schema sstables, it somehow "forgot" those files should be deleted, so it read the file and "can not find cfId"...
> 
> 
> I got to this situation by the following steps: at first I did "node repair" on node2 which failed in the middle(node3 down), and leave the Load as 170GB while average is 30GB.
> 
> after I brought up node3,  the node2 start up very slow, 4 days past it stil starting.  it seems loading row cache and key cache.  so I disabled those cache by set the value to 0 via cassandra-cli. during this procedure, of course node2 was not reachable so it can not update the schema.
> 
> after that node2 could be start very quickly, but the "describe cluster" shows it was "UNREACHABLE", so I did as the FAQ says, delete schema, migration sstables and restart node2. 
> 
> then the "Couldn't find cfId=1000'" error start showing up.
> 
> 
> 
> 
> 
> I have just moved those migration && schema sstables back and start cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the "describe cluster" shows they are the same version now.
> 
> 
> even this problem solved, I am not sure HOW....... really curious that why just remove "migration* and schema*" sstables could cause  "Couldn't find cfId=1000'"  error.
> 
> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> I'm not sure what problem you're trying to solve.  The exception you
> pasted should stop once your clients are no longer trying to use the
> dropped CF.
> 
> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <sp...@gmail.com> wrote:
> > that could be the reason, I did nodetool repair(unfinished, data size
> > increased 6 times bigger 30G vs 170G) and there should be some unclean
> > sstables on that node.
> > however upgrade it a tough work for me right now.  could the nodetool scrub
> > help?  or decommission the node and join it again?
> >
> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> >>
> >> This means you should upgrade, because we've fixed bugs about ignoring
> >> deleted CFs since 0.7.4.
> >>
> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com> wrote:
> >> > the log file shows as follows, not sure what does 'Couldn't find
> >> > cfId=1000'
> >> > means(google just returned useless results):
> >> >
> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> >> > Found
> >> > table data in data directories. Consider using JMX to call
> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> >> > Creating new commitlog segment
> >> > /cassandra/commitlog/CommitLog-1313670197705.log
> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> >> > /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
> >> > replay
> >> > complete
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> >> > Cassandra version: 0.7.4
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
> >> > Thrift
> >> > API version: 19.4.0
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
> >> > Loading
> >> > persisted ring state
> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> >> > Starting
> >> > up server gossip
> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> >> > bytes)
> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
> >> > CompactionManager.java
> >> > (line 396) Compacting
> >> >
> >> > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
> >> > Using
> >> > saved token 113427455640312821154458202477256070484
> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> >> > RowMutationVerbHandler.java
> >> > (line 86) Error in row mutation
> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
> >> > find
> >> > cfId=1000
> >> >     at
> >> >
> >> > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> >> >     at
> >> >
> >> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> >> >     at
> >> >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> >     at
> >> >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> >     at java.lang.Thread.run(Thread.java:636)
> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
> >> > Node
> >> > /node1 has restarted, now UP again
> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> >> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> >> > keyspace prjkeyspace
> >> >     at
> >> >
> >> > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
> >> >     at
> >> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
> >> >     at
> >> > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
> >> >
> >> >
> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>
> >> > wrote:
> >> >>
> >> >> Look in the logs to work find out why the migration did not get to
> >> >> node2.
> >> >> Otherwise yes you can drop those files.
> >> >> Cheers
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Cassandra Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
> >> >>
> >> >> just found out that changes via cassandra-cli, the schema change didn't
> >> >> reach node2. and node2 became unreachable....
> >> >> I did as this
> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> >> >> but after that I just got two schema versons:
> >> >>
> >> >>
> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
> >> >>
> >> >> is that enough delete Schema* && Migrations* sstables and restart the
> >> >> node?
> >> >>
> >> >>
> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> thanks a lot for  all the help!  I have gone through the steps and
> >> >>> successfully brought up the node2 :)
> >> >>>
> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
> >> >>> wrote:
> >> >>> > Because the file only preserve the "key" of records, not the whole
> >> >>> > record.
> >> >>> > Records for those saved key will be loaded into cassandra during the
> >> >>> > startup
> >> >>> > of cassandra.
> >> >>> >
> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> but the data size in the saved_cache are relatively small:
> >> >>> >>
> >> >>> >> will that cause the load problem?
> >> >>> >>
> >> >>> >>  ls  -lh  /cassandra/saved_caches/
> >> >>> >> total 32M
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> >>> >> cass-CommentSortsCache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> >>> >> cass-CommentSortsCache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
> >> >>> >> cass-CommentVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
> >> >>> >> cass-device_images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
> >> >>> >> cass-LinksByUrl-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> >> >>> >> cass-SavesByAccount-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
> >> >>> >> cass-VotesByDay-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
> >> >>> >> cass-VotesByLink-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> >>> >> system-HintsColumnFamily-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> >> >>> >> system-LocationInfo-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
> >> >>> >> system-Migrations-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >> >>> >>
> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
> >> >>> >> <aa...@thelastpickle.com>
> >> >>> >> wrote:
> >> >>> >> > If you have a node that cannot start up due to issues loading the
> >> >>> >> > saved
> >> >>> >> > cache delete the files in the saved_cache directory before
> >> >>> >> > starting
> >> >>> >> > it.
> >> >>> >> >
> >> >>> >> > The settings to save the row and key cache are per CF. You can
> >> >>> >> > change
> >> >>> >> > them with an update column family statement via the CLI when
> >> >>> >> > attached to any
> >> >>> >> > node. You may then want to check the saved_caches directory and
> >> >>> >> > delete any
> >> >>> >> > files that are left (not sure if they are automatically deleted).
> >> >>> >> >
> >> >>> >> > i would recommend:
> >> >>> >> > - stop node 2
> >> >>> >> > - delete it's saved_cache
> >> >>> >> > - make the schema change via another node
> >> >>> >> > - startup node 2
> >> >>> >> >
> >> >>> >> > Cheers
> >> >>> >> >
> >> >>> >> > -----------------
> >> >>> >> > Aaron Morton
> >> >>> >> > Freelance Cassandra Developer
> >> >>> >> > @aaronmorton
> >> >>> >> > http://www.thelastpickle.com
> >> >>> >> >
> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >>> >> >
> >> >>> >> >> does this need to be cluster wide? or I could just modify the
> >> >>> >> >> caches
> >> >>> >> >> on one node?   since I could not connect to the node with
> >> >>> >> >> cassandra-cli, it says "connection refused"
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> [default@unknown] connect node2/9160;
> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> so if I change the cache size via other nodes, how could node2
> >> >>> >> >> be
> >> >>> >> >> notified the changing?    kill cassandra and start it again
> >> >>> >> >> could
> >> >>> >> >> make
> >> >>> >> >> it update the schema?
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
> >> >>> >> >> <th...@wetafx.co.nz>
> >> >>> >> >> wrote:
> >> >>> >> >>> Hi,
> >> >>> >> >>>
> >> >>> >> >>> yes, we saw exactly the same messages. We got rid of these by
> >> >>> >> >>> doing
> >> >>> >> >>> the
> >> >>> >> >>> following:
> >> >>> >> >>>
> >> >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >> >>> >> >>> * Kill Cassandra
> >> >>> >> >>> * Remove all files in the saved_caches directory
> >> >>> >> >>> * Start Cassandra
> >> >>> >> >>> * Slowly bring back row & key caches (if desired, we left them
> >> >>> >> >>> off)
> >> >>> >> >>>
> >> >>> >> >>> Cheers,
> >> >>> >> >>>
> >> >>> >> >>>        T.
> >> >>> >> >>>
> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log level
> >> >>> >> >>>> to
> >> >>> >> >>>> DEBUG.
> >> >>> >> >>>>  just
> >> >>> >> >>>> thought even bring up a new node will be faster than start the
> >> >>> >> >>>> old
> >> >>> >> >>>> one..... it
> >> >>> >> >>>> is wired
> >> >>> >> >>>>
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
> >> >>> >> >>>> <springrider@gmail.com
> >> >>> >> >>>> <ma...@gmail.com>> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>    but it seems the row cache is cluster wide, how will  the
> >> >>> >> >>>> change
> >> >>> >> >>>> of row
> >> >>> >> >>>>    cache affect the read speed?
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
> >> >>> >> >>>> <jbellis@gmail.com
> >> >>> >> >>>>    <ma...@gmail.com>> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>        Or leave row cache enabled but disable cache saving
> >> >>> >> >>>> (and
> >> >>> >> >>>> remove the
> >> >>> >> >>>>        one already on disk).
> >> >>> >> >>>>
> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >> >>> >> >>>> <aaron@thelastpickle.com
> >> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the
> >> >>> >> >>>>  row
> >> >>> >> >>>> cache.
> >> >>> >> >>>> Thats a
> >> >>> >> >>>>         > pretty big row cache, I would suggest reducing or
> >> >>> >> >>>> disabling
> >> >>> >> >>>> it.
> >> >>> >> >>>>         > Background
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > and server can not afford the load then crashed.
> >> >>> >> >>>> after
> >> >>> >> >>>> come
> >> >>> >> >>>> back,
> >> >>> >> >>>>        node 3 can
> >> >>> >> >>>>         > not return for more than 96 hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Crashed how ?
> >> >>> >> >>>>         > You may be seeing
> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>> >> >>>>         > Watch nodetool compactionstats to see when the
> >> >>> >> >>>> Merkle
> >> >>> >> >>>> tree
> >> >>> >> >>>> build
> >> >>> >> >>>>        finishes
> >> >>> >> >>>>         > and nodetool netstats to see which CF's are
> >> >>> >> >>>> streaming.
> >> >>> >> >>>>         > Cheers
> >> >>> >> >>>>         > -----------------
> >> >>> >> >>>>         > Aaron Morton
> >> >>> >> >>>>         > Freelance Cassandra Developer
> >> >>> >> >>>>         > @aaronmorton
> >> >>> >> >>>>         > http://www.thelastpickle.com
> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
> >> >>> >> >>>> seems
> >> >>> >> >>>> alot
> >> >>> >> >>>> data
> >> >>> >> >>>>         > generated.  and server can not afford the load then
> >> >>> >> >>>> crashed.
> >> >>> >> >>>>         > after come back, node 3 can not return for more than
> >> >>> >> >>>> 96
> >> >>> >> >>>> hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
> >> >>> >> >>>> online
> >> >>> >> >>>> within 1
> >> >>> >> >>>> hour.
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I am not sure what's wrong with node3 and should I
> >> >>> >> >>>> restart
> >> >>> >> >>>> node
> >> >>> >> >>>> 3 again?
> >> >>> >> >>>>         > thanks!
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Address         Status State   Load            Owns
> >> >>> >> >>>>  Token
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
> >> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >> >>> >> >>>>         > 56713727820156410577229101238628035242
> >> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > the log shows it is still going on, not sure why it
> >> >>> >> >>>> is
> >> >>> >> >>>> so
> >> >>> >> >>>> slow:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
> >> >>> >> >>>> SSTableReader.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 154)
> >> >>> >> >>>>        Opening
> >> >>> >> >>>>         > /cassandra/data/COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
> >> >>> >> >>>>        CacheWriter.java (line
> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>        --
> >> >>> >> >>>>        Jonathan Ellis
> >> >>> >> >>>>        Project Chair, Apache Cassandra
> >> >>> >> >>>>        co-founder of DataStax, the source for professional
> >> >>> >> >>>> Cassandra
> >> >>> >> >>>> support
> >> >>> >> >>>>        http://www.datastax.com
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >
> >> >>> >> >
> >> >>> >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

actually I didn't dropped any CF,  maybe my understanding was totally wrong,
I just describe what I thought as belows:

I thought by "deleted CFs" means the sstable that useless(since "node
repair" and could copy data to another node,  the original sstable might be
deleted but not yet).  when I deleted all migration and schema sstables, it
somehow "forgot" those files should be deleted, so it read the file and "can
not find cfId"...


I got to this situation by the following steps: at first I did "node repair"
on node2 which failed in the middle(node3 down), and leave the Load as 170GB
while average is 30GB.

after I brought up node3,  the node2 start up very slow, 4 days past it stil
starting.  it seems loading row cache and key cache.  so I disabled those
cache by set the value to 0 via cassandra-cli. during this procedure, of
course node2 was not reachable so it can not update the schema.

after that node2 could be start very quickly, but the "describe cluster"
shows it was "UNREACHABLE", so I did as the FAQ says, delete schema,
migration sstables and restart node2.

then the "Couldn't find cfId=1000'" error start showing up.





I have just moved those migration && schema sstables back and start
cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the
"describe cluster" shows they are the same version now.


even this problem solved, I am not sure HOW....... really curious that why
just remove "migration* and schema*" sstables could cause  "Couldn't find
cfId=1000'"  error.

On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> I'm not sure what problem you're trying to solve.  The exception you
> pasted should stop once your clients are no longer trying to use the
> dropped CF.
>
> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <sp...@gmail.com>
> wrote:
> > that could be the reason, I did nodetool repair(unfinished, data size
> > increased 6 times bigger 30G vs 170G) and there should be some unclean
> > sstables on that node.
> > however upgrade it a tough work for me right now.  could the nodetool
> scrub
> > help?  or decommission the node and join it again?
> >
> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> This means you should upgrade, because we've fixed bugs about ignoring
> >> deleted CFs since 0.7.4.
> >>
> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >> > the log file shows as follows, not sure what does 'Couldn't find
> >> > cfId=1000'
> >> > means(google just returned useless results):
> >> >
> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> >> > Found
> >> > table data in data directories. Consider using JMX to call
> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> >> > Creating new commitlog segment
> >> > /cassandra/commitlog/CommitLog-1313670197705.log
> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155)
> Replaying
> >> > /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314)
> Finished
> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
> >> > replay
> >> > complete
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> >> > Cassandra version: 0.7.4
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
> >> > Thrift
> >> > API version: 19.4.0
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
> >> > Loading
> >> > persisted ring state
> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> >> > Starting
> >> > up server gossip
> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line
> 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db
> (80
> >> > bytes)
> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
> >> > CompactionManager.java
> >> > (line 396) Compacting
> >> >
> >> >
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
> >> > Using
> >> > saved token 113427455640312821154458202477256070484
> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line
> 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> >> > RowMutationVerbHandler.java
> >> > (line 86) Error in row mutation
> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
> >> > find
> >> > cfId=1000
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> >> >     at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> >     at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> >     at java.lang.Thread.run(Thread.java:636)
> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
> >> > Node
> >> > /node1 has restarted, now UP again
> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> >> > DebuggableThreadPoolExecutor.java (line 103) Error in
> ThreadPoolExecutor
> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> >> > keyspace prjkeyspace
> >> >     at
> >> >
> >> >
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
> >> >     at
> >> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
> >> >     at
> >> >
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
> >> >
> >> >
> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <
> aaron@thelastpickle.com>
> >> > wrote:
> >> >>
> >> >> Look in the logs to work find out why the migration did not get to
> >> >> node2.
> >> >> Otherwise yes you can drop those files.
> >> >> Cheers
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Cassandra Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
> >> >>
> >> >> just found out that changes via cassandra-cli, the schema change
> didn't
> >> >> reach node2. and node2 became unreachable....
> >> >> I did as this
> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> >> >> but after that I just got two schema versons:
> >> >>
> >> >>
> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
> >> >>
> >> >> is that enough delete Schema* && Migrations* sstables and restart the
> >> >> node?
> >> >>
> >> >>
> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> thanks a lot for  all the help!  I have gone through the steps and
> >> >>> successfully brought up the node2 :)
> >> >>>
> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
> >> >>> wrote:
> >> >>> > Because the file only preserve the "key" of records, not the whole
> >> >>> > record.
> >> >>> > Records for those saved key will be loaded into cassandra during
> the
> >> >>> > startup
> >> >>> > of cassandra.
> >> >>> >
> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <
> springrider@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> but the data size in the saved_cache are relatively small:
> >> >>> >>
> >> >>> >> will that cause the load problem?
> >> >>> >>
> >> >>> >>  ls  -lh  /cassandra/saved_caches/
> >> >>> >> total 32M
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> >>> >> cass-CommentSortsCache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> >>> >> cass-CommentSortsCache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
> >> >>> >> cass-CommentVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
> >> >>> >> cass-device_images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
> >> >>> >> cass-LinksByUrl-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50
> cass-LinkVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> >> >>> >> cass-SavesByAccount-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
> >> >>> >> cass-VotesByDay-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
> >> >>> >> cass-VotesByLink-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> >>> >> system-HintsColumnFamily-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> >> >>> >> system-LocationInfo-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
> >> >>> >> system-Migrations-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30
> system-Schema-KeyCache
> >> >>> >>
> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
> >> >>> >> <aa...@thelastpickle.com>
> >> >>> >> wrote:
> >> >>> >> > If you have a node that cannot start up due to issues loading
> the
> >> >>> >> > saved
> >> >>> >> > cache delete the files in the saved_cache directory before
> >> >>> >> > starting
> >> >>> >> > it.
> >> >>> >> >
> >> >>> >> > The settings to save the row and key cache are per CF. You can
> >> >>> >> > change
> >> >>> >> > them with an update column family statement via the CLI when
> >> >>> >> > attached to any
> >> >>> >> > node. You may then want to check the saved_caches directory and
> >> >>> >> > delete any
> >> >>> >> > files that are left (not sure if they are automatically
> deleted).
> >> >>> >> >
> >> >>> >> > i would recommend:
> >> >>> >> > - stop node 2
> >> >>> >> > - delete it's saved_cache
> >> >>> >> > - make the schema change via another node
> >> >>> >> > - startup node 2
> >> >>> >> >
> >> >>> >> > Cheers
> >> >>> >> >
> >> >>> >> > -----------------
> >> >>> >> > Aaron Morton
> >> >>> >> > Freelance Cassandra Developer
> >> >>> >> > @aaronmorton
> >> >>> >> > http://www.thelastpickle.com
> >> >>> >> >
> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >>> >> >
> >> >>> >> >> does this need to be cluster wide? or I could just modify the
> >> >>> >> >> caches
> >> >>> >> >> on one node?   since I could not connect to the node with
> >> >>> >> >> cassandra-cli, it says "connection refused"
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> [default@unknown] connect node2/9160;
> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection
> refused.
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> so if I change the cache size via other nodes, how could node2
> >> >>> >> >> be
> >> >>> >> >> notified the changing?    kill cassandra and start it again
> >> >>> >> >> could
> >> >>> >> >> make
> >> >>> >> >> it update the schema?
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
> >> >>> >> >> <th...@wetafx.co.nz>
> >> >>> >> >> wrote:
> >> >>> >> >>> Hi,
> >> >>> >> >>>
> >> >>> >> >>> yes, we saw exactly the same messages. We got rid of these by
> >> >>> >> >>> doing
> >> >>> >> >>> the
> >> >>> >> >>> following:
> >> >>> >> >>>
> >> >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >> >>> >> >>> * Kill Cassandra
> >> >>> >> >>> * Remove all files in the saved_caches directory
> >> >>> >> >>> * Start Cassandra
> >> >>> >> >>> * Slowly bring back row & key caches (if desired, we left
> them
> >> >>> >> >>> off)
> >> >>> >> >>>
> >> >>> >> >>> Cheers,
> >> >>> >> >>>
> >> >>> >> >>>        T.
> >> >>> >> >>>
> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log level
> >> >>> >> >>>> to
> >> >>> >> >>>> DEBUG.
> >> >>> >> >>>>  just
> >> >>> >> >>>> thought even bring up a new node will be faster than start
> the
> >> >>> >> >>>> old
> >> >>> >> >>>> one..... it
> >> >>> >> >>>> is wired
> >> >>> >> >>>>
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
> >> >>> >> >>>> <springrider@gmail.com
> >> >>> >> >>>> <ma...@gmail.com>> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>    but it seems the row cache is cluster wide, how will  the
> >> >>> >> >>>> change
> >> >>> >> >>>> of row
> >> >>> >> >>>>    cache affect the read speed?
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
> >> >>> >> >>>> <jbellis@gmail.com
> >> >>> >> >>>>    <ma...@gmail.com>> wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>        Or leave row cache enabled but disable cache saving
> >> >>> >> >>>> (and
> >> >>> >> >>>> remove the
> >> >>> >> >>>>        one already on disk).
> >> >>> >> >>>>
> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >> >>> >> >>>> <aaron@thelastpickle.com
> >> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the
> >> >>> >> >>>>  row
> >> >>> >> >>>> cache.
> >> >>> >> >>>> Thats a
> >> >>> >> >>>>         > pretty big row cache, I would suggest reducing or
> >> >>> >> >>>> disabling
> >> >>> >> >>>> it.
> >> >>> >> >>>>         > Background
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > and server can not afford the load then crashed.
> >> >>> >> >>>> after
> >> >>> >> >>>> come
> >> >>> >> >>>> back,
> >> >>> >> >>>>        node 3 can
> >> >>> >> >>>>         > not return for more than 96 hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Crashed how ?
> >> >>> >> >>>>         > You may be seeing
> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>> >> >>>>         > Watch nodetool compactionstats to see when the
> >> >>> >> >>>> Merkle
> >> >>> >> >>>> tree
> >> >>> >> >>>> build
> >> >>> >> >>>>        finishes
> >> >>> >> >>>>         > and nodetool netstats to see which CF's are
> >> >>> >> >>>> streaming.
> >> >>> >> >>>>         > Cheers
> >> >>> >> >>>>         > -----------------
> >> >>> >> >>>>         > Aaron Morton
> >> >>> >> >>>>         > Freelance Cassandra Developer
> >> >>> >> >>>>         > @aaronmorton
> >> >>> >> >>>>         > http://www.thelastpickle.com
> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
> >> >>> >> >>>> seems
> >> >>> >> >>>> alot
> >> >>> >> >>>> data
> >> >>> >> >>>>         > generated.  and server can not afford the load
> then
> >> >>> >> >>>> crashed.
> >> >>> >> >>>>         > after come back, node 3 can not return for more
> than
> >> >>> >> >>>> 96
> >> >>> >> >>>> hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
> >> >>> >> >>>> online
> >> >>> >> >>>> within 1
> >> >>> >> >>>> hour.
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I am not sure what's wrong with node3 and should I
> >> >>> >> >>>> restart
> >> >>> >> >>>> node
> >> >>> >> >>>> 3 again?
> >> >>> >> >>>>         > thanks!
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Address         Status State   Load
>  Owns
> >> >>> >> >>>>  Token
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
> >> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >> >>> >> >>>>         > 56713727820156410577229101238628035242
> >> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > the log shows it is still going on, not sure why
> it
> >> >>> >> >>>> is
> >> >>> >> >>>> so
> >> >>> >> >>>> slow:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
> >> >>> >> >>>> SSTableReader.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 154)
> >> >>> >> >>>>        Opening
> >> >>> >> >>>>         > /cassandra/data/COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14
> 10:24:55,480
> >> >>> >> >>>>        CacheWriter.java (line
> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535
> ms
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>        --
> >> >>> >> >>>>        Jonathan Ellis
> >> >>> >> >>>>        Project Chair, Apache Cassandra
> >> >>> >> >>>>        co-founder of DataStax, the source for professional
> >> >>> >> >>>> Cassandra
> >> >>> >> >>>> support
> >> >>> >> >>>>        http://www.datastax.com
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >
> >> >>> >> >
> >> >>> >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: node restart taking too long

Posted by Jonathan Ellis <jb...@gmail.com>.

I'm not sure what problem you're trying to solve.  The exception you
pasted should stop once your clients are no longer trying to use the
dropped CF.

On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <sp...@gmail.com> wrote:
> that could be the reason, I did nodetool repair(unfinished, data size
> increased 6 times bigger 30G vs 170G) and there should be some unclean
> sstables on that node.
> however upgrade it a tough work for me right now.  could the nodetool scrub
> help?  or decommission the node and join it again?
>
> On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> This means you should upgrade, because we've fixed bugs about ignoring
>> deleted CFs since 0.7.4.
>>
>> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com> wrote:
>> > the log file shows as follows, not sure what does 'Couldn't find
>> > cfId=1000'
>> > means(google just returned useless results):
>> >
>> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
>> > Found
>> > table data in data directories. Consider using JMX to call
>> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
>> > Creating new commitlog segment
>> > /cassandra/commitlog/CommitLog-1313670197705.log
>> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
>> > /cassandra/commitlog/CommitLog-1313670030512.log
>> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
>> > reading /cassandra/commitlog/CommitLog-1313670030512.log
>> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
>> > replay
>> > complete
>> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
>> > Cassandra version: 0.7.4
>> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
>> > Thrift
>> > API version: 19.4.0
>> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
>> > Loading
>> > persisted ring state
>> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
>> > Starting
>> > up server gossip
>> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
>> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
>> > operations)
>> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
>> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
>> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
>> > bytes)
>> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
>> > CompactionManager.java
>> > (line 396) Compacting
>> >
>> > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
>> > Using
>> > saved token 113427455640312821154458202477256070484
>> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
>> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
>> > operations)
>> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
>> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>> > RowMutationVerbHandler.java
>> > (line 86) Error in row mutation
>> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
>> > find
>> > cfId=1000
>> >     at
>> >
>> > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>> >     at
>> >
>> > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>> >     at
>> >
>> > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>> >     at
>> >
>> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>> >     at
>> >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >     at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >     at java.lang.Thread.run(Thread.java:636)
>> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
>> > Node
>> > /node1 has restarted, now UP again
>> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
>> > keyspace prjkeyspace
>> >     at
>> >
>> > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>> >     at
>> >
>> > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>> >     at
>> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>> >     at
>> >
>> > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>> >     at
>> >
>> > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>> >     at
>> >
>> > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>> >     at
>> > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>> >
>> >
>> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>
>> > wrote:
>> >>
>> >> Look in the logs to work find out why the migration did not get to
>> >> node2.
>> >> Otherwise yes you can drop those files.
>> >> Cheers
>> >> -----------------
>> >> Aaron Morton
>> >> Freelance Cassandra Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>> >>
>> >> just found out that changes via cassandra-cli, the schema change didn't
>> >> reach node2. and node2 became unreachable....
>> >> I did as this
>> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>> >> but after that I just got two schema versons:
>> >>
>> >>
>> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>> >>
>> >> is that enough delete Schema* && Migrations* sstables and restart the
>> >> node?
>> >>
>> >>
>> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>
>> >> wrote:
>> >>>
>> >>> thanks a lot for  all the help!  I have gone through the steps and
>> >>> successfully brought up the node2 :)
>> >>>
>> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
>> >>> wrote:
>> >>> > Because the file only preserve the "key" of records, not the whole
>> >>> > record.
>> >>> > Records for those saved key will be loaded into cassandra during the
>> >>> > startup
>> >>> > of cassandra.
>> >>> >
>> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> but the data size in the saved_cache are relatively small:
>> >>> >>
>> >>> >> will that cause the load problem?
>> >>> >>
>> >>> >>  ls  -lh  /cassandra/saved_caches/
>> >>> >> total 32M
>> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> >>> >> cass-CommentSortsCache-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> >>> >> cass-CommentSortsCache-RowCache
>> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
>> >>> >> cass-CommentVote-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>> >>> >> cass-device_images-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
>> >>> >> cass-LinksByUrl-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>> >>> >> cass-SavesByAccount-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
>> >>> >> cass-VotesByDay-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
>> >>> >> cass-VotesByLink-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> >>> >> system-HintsColumnFamily-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>> >>> >> system-LocationInfo-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>> >>> >> system-Migrations-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>> >>> >>
>> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>> >>> >> <aa...@thelastpickle.com>
>> >>> >> wrote:
>> >>> >> > If you have a node that cannot start up due to issues loading the
>> >>> >> > saved
>> >>> >> > cache delete the files in the saved_cache directory before
>> >>> >> > starting
>> >>> >> > it.
>> >>> >> >
>> >>> >> > The settings to save the row and key cache are per CF. You can
>> >>> >> > change
>> >>> >> > them with an update column family statement via the CLI when
>> >>> >> > attached to any
>> >>> >> > node. You may then want to check the saved_caches directory and
>> >>> >> > delete any
>> >>> >> > files that are left (not sure if they are automatically deleted).
>> >>> >> >
>> >>> >> > i would recommend:
>> >>> >> > - stop node 2
>> >>> >> > - delete it's saved_cache
>> >>> >> > - make the schema change via another node
>> >>> >> > - startup node 2
>> >>> >> >
>> >>> >> > Cheers
>> >>> >> >
>> >>> >> > -----------------
>> >>> >> > Aaron Morton
>> >>> >> > Freelance Cassandra Developer
>> >>> >> > @aaronmorton
>> >>> >> > http://www.thelastpickle.com
>> >>> >> >
>> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >>> >> >
>> >>> >> >> does this need to be cluster wide? or I could just modify the
>> >>> >> >> caches
>> >>> >> >> on one node?   since I could not connect to the node with
>> >>> >> >> cassandra-cli, it says "connection refused"
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> [default@unknown] connect node2/9160;
>> >>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> so if I change the cache size via other nodes, how could node2
>> >>> >> >> be
>> >>> >> >> notified the changing?    kill cassandra and start it again
>> >>> >> >> could
>> >>> >> >> make
>> >>> >> >> it update the schema?
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
>> >>> >> >> <th...@wetafx.co.nz>
>> >>> >> >> wrote:
>> >>> >> >>> Hi,
>> >>> >> >>>
>> >>> >> >>> yes, we saw exactly the same messages. We got rid of these by
>> >>> >> >>> doing
>> >>> >> >>> the
>> >>> >> >>> following:
>> >>> >> >>>
>> >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> >>> >> >>> * Kill Cassandra
>> >>> >> >>> * Remove all files in the saved_caches directory
>> >>> >> >>> * Start Cassandra
>> >>> >> >>> * Slowly bring back row & key caches (if desired, we left them
>> >>> >> >>> off)
>> >>> >> >>>
>> >>> >> >>> Cheers,
>> >>> >> >>>
>> >>> >> >>>        T.
>> >>> >> >>>
>> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >>> >> >>>>
>> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log level
>> >>> >> >>>> to
>> >>> >> >>>> DEBUG.
>> >>> >> >>>>  just
>> >>> >> >>>> thought even bring up a new node will be faster than start the
>> >>> >> >>>> old
>> >>> >> >>>> one..... it
>> >>> >> >>>> is wired
>> >>> >> >>>>
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:225@1313068845474382
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:453@1310999270198313
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:26@1313199902088827
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:157@1313097239332314
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:41729@1313190821826229
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:6@1313174157301203
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:98@1312011362250907
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:42@1313201711997005
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:96@1312939986190155
>> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> 76616c7565:false:621@1313192538616112
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
>> >>> >> >>>> <springrider@gmail.com
>> >>> >> >>>> <ma...@gmail.com>> wrote:
>> >>> >> >>>>
>> >>> >> >>>>    but it seems the row cache is cluster wide, how will  the
>> >>> >> >>>> change
>> >>> >> >>>> of row
>> >>> >> >>>>    cache affect the read speed?
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
>> >>> >> >>>> <jbellis@gmail.com
>> >>> >> >>>>    <ma...@gmail.com>> wrote:
>> >>> >> >>>>
>> >>> >> >>>>        Or leave row cache enabled but disable cache saving
>> >>> >> >>>> (and
>> >>> >> >>>> remove the
>> >>> >> >>>>        one already on disk).
>> >>> >> >>>>
>> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >>> >> >>>> <aaron@thelastpickle.com
>> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> >>> >> >>>> ColumnFamilyStore.java
>> >>> >> >>>> (line 547)
>> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>> >>> >> >>>> cache
>> >>> >> >>>> for
>> >>> >> >>>> COMMENT
>> >>> >> >>>>         >
>> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the
>> >>> >> >>>>  row
>> >>> >> >>>> cache.
>> >>> >> >>>> Thats a
>> >>> >> >>>>         > pretty big row cache, I would suggest reducing or
>> >>> >> >>>> disabling
>> >>> >> >>>> it.
>> >>> >> >>>>         > Background
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >>> >> >>>>         >
>> >>> >> >>>>         > and server can not afford the load then crashed.
>> >>> >> >>>> after
>> >>> >> >>>> come
>> >>> >> >>>> back,
>> >>> >> >>>>        node 3 can
>> >>> >> >>>>         > not return for more than 96 hours
>> >>> >> >>>>         >
>> >>> >> >>>>         > Crashed how ?
>> >>> >> >>>>         > You may be seeing
>> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >>> >> >>>>         > Watch nodetool compactionstats to see when the
>> >>> >> >>>> Merkle
>> >>> >> >>>> tree
>> >>> >> >>>> build
>> >>> >> >>>>        finishes
>> >>> >> >>>>         > and nodetool netstats to see which CF's are
>> >>> >> >>>> streaming.
>> >>> >> >>>>         > Cheers
>> >>> >> >>>>         > -----------------
>> >>> >> >>>>         > Aaron Morton
>> >>> >> >>>>         > Freelance Cassandra Developer
>> >>> >> >>>>         > @aaronmorton
>> >>> >> >>>>         > http://www.thelastpickle.com
>> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
>> >>> >> >>>> seems
>> >>> >> >>>> alot
>> >>> >> >>>> data
>> >>> >> >>>>         > generated.  and server can not afford the load then
>> >>> >> >>>> crashed.
>> >>> >> >>>>         > after come back, node 3 can not return for more than
>> >>> >> >>>> 96
>> >>> >> >>>> hours
>> >>> >> >>>>         >
>> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
>> >>> >> >>>> online
>> >>> >> >>>> within 1
>> >>> >> >>>> hour.
>> >>> >> >>>>         >
>> >>> >> >>>>         > I am not sure what's wrong with node3 and should I
>> >>> >> >>>> restart
>> >>> >> >>>> node
>> >>> >> >>>> 3 again?
>> >>> >> >>>>         > thanks!
>> >>> >> >>>>         >
>> >>> >> >>>>         > Address         Status State   Load            Owns
>> >>> >> >>>>  Token
>> >>> >> >>>>         >
>> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>> >>> >> >>>>         > 56713727820156410577229101238628035242
>> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         > the log shows it is still going on, not sure why it
>> >>> >> >>>> is
>> >>> >> >>>> so
>> >>> >> >>>> slow:
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
>> >>> >> >>>> SSTableReader.java
>> >>> >> >>>> (line
>> >>> >> >>>> 154)
>> >>> >> >>>>        Opening
>> >>> >> >>>>         > /cassandra/data/COMMENT
>> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>> >>> >> >>>> ColumnFamilyStore.java
>> >>> >> >>>> (line 275)
>> >>> >> >>>>         > reading saved cache
>> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> >>> >> >>>> ColumnFamilyStore.java
>> >>> >> >>>> (line 547)
>> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
>> >>> >> >>>> cache
>> >>> >> >>>> for
>> >>> >> >>>> COMMENT
>> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>> >>> >> >>>> ColumnFamilyStore.java
>> >>> >> >>>> (line 275)
>> >>> >> >>>>         > reading saved cache
>> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>> >>> >> >>>>        CacheWriter.java (line
>> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>         >
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>        --
>> >>> >> >>>>        Jonathan Ellis
>> >>> >> >>>>        Project Chair, Apache Cassandra
>> >>> >> >>>>        co-founder of DataStax, the source for professional
>> >>> >> >>>> Cassandra
>> >>> >> >>>> support
>> >>> >> >>>>        http://www.datastax.com
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

that could be the reason, I did nodetool repair(unfinished, data size
increased 6 times bigger 30G vs 170G) and there should be some unclean
sstables on that node.

however upgrade it a tough work for me right now.  could the nodetool scrub
help?  or decommission the node and join it again?


On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> This means you should upgrade, because we've fixed bugs about ignoring
> deleted CFs since 0.7.4.
>
> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com> wrote:
> > the log file shows as follows, not sure what does 'Couldn't find
> cfId=1000'
> > means(google just returned useless results):
> >
> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> Found
> > table data in data directories. Consider using JMX to call
> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> > Creating new commitlog segment
> > /cassandra/commitlog/CommitLog-1313670197705.log
> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> > /cassandra/commitlog/CommitLog-1313670030512.log
> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> > reading /cassandra/commitlog/CommitLog-1313670030512.log
> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
> > complete
> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> > Cassandra version: 0.7.4
> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
> Thrift
> > API version: 19.4.0
> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
> Loading
> > persisted ring state
> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> Starting
> > up server gossip
> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
> operations)
> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> > bytes)
> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
> CompactionManager.java
> > (line 396) Compacting
> >
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
> > saved token 113427455640312821154458202477256070484
> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
> operations)
> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> RowMutationVerbHandler.java
> > (line 86) Error in row mutation
> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
> find
> > cfId=1000
> >     at
> >
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> >     at
> >
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> >     at
> >
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> >     at
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >     at java.lang.Thread.run(Thread.java:636)
> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
> Node
> > /node1 has restarted, now UP again
> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> > keyspace prjkeyspace
> >     at
> >
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> >     at
> >
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
> >     at
> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
> >     at
> >
> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
> >     at
> >
> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
> >     at
> >
> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
> >     at
> > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
> >
> >
> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> Look in the logs to work find out why the migration did not get to
> node2.
> >> Otherwise yes you can drop those files.
> >> Cheers
> >> -----------------
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
> >>
> >> just found out that changes via cassandra-cli, the schema change didn't
> >> reach node2. and node2 became unreachable....
> >> I did as this
> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> >> but after that I just got two schema versons:
> >>
> >>
> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
> >>
> >> is that enough delete Schema* && Migrations* sstables and restart the
> >> node?
> >>
> >>
> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >>>
> >>> thanks a lot for  all the help!  I have gone through the steps and
> >>> successfully brought up the node2 :)
> >>>
> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com>
> wrote:
> >>> > Because the file only preserve the "key" of records, not the whole
> >>> > record.
> >>> > Records for those saved key will be loaded into cassandra during the
> >>> > startup
> >>> > of cassandra.
> >>> >
> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> but the data size in the saved_cache are relatively small:
> >>> >>
> >>> >> will that cause the load problem?
> >>> >>
> >>> >>  ls  -lh  /cassandra/saved_caches/
> >>> >> total 32M
> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >>> >> cass-CommentSortsCache-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >>> >> cass-CommentSortsCache-RowCache
> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
> cass-CommentVote-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
> >>> >> cass-device_images-KeyCache
> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
> cass-LinksByUrl-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> >>> >> cass-SavesByAccount-KeyCache
> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
> cass-VotesByDay-KeyCache
> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
> cass-VotesByLink-KeyCache
> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >>> >> system-HintsColumnFamily-KeyCache
> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> >>> >> system-LocationInfo-KeyCache
> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
> >>> >> system-Migrations-KeyCache
> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >>> >>
> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
> >>> >> <aa...@thelastpickle.com>
> >>> >> wrote:
> >>> >> > If you have a node that cannot start up due to issues loading the
> >>> >> > saved
> >>> >> > cache delete the files in the saved_cache directory before
> starting
> >>> >> > it.
> >>> >> >
> >>> >> > The settings to save the row and key cache are per CF. You can
> >>> >> > change
> >>> >> > them with an update column family statement via the CLI when
> >>> >> > attached to any
> >>> >> > node. You may then want to check the saved_caches directory and
> >>> >> > delete any
> >>> >> > files that are left (not sure if they are automatically deleted).
> >>> >> >
> >>> >> > i would recommend:
> >>> >> > - stop node 2
> >>> >> > - delete it's saved_cache
> >>> >> > - make the schema change via another node
> >>> >> > - startup node 2
> >>> >> >
> >>> >> > Cheers
> >>> >> >
> >>> >> > -----------------
> >>> >> > Aaron Morton
> >>> >> > Freelance Cassandra Developer
> >>> >> > @aaronmorton
> >>> >> > http://www.thelastpickle.com
> >>> >> >
> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >>> >> >
> >>> >> >> does this need to be cluster wide? or I could just modify the
> >>> >> >> caches
> >>> >> >> on one node?   since I could not connect to the node with
> >>> >> >> cassandra-cli, it says "connection refused"
> >>> >> >>
> >>> >> >>
> >>> >> >> [default@unknown] connect node2/9160;
> >>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
> >>> >> >>
> >>> >> >>
> >>> >> >> so if I change the cache size via other nodes, how could node2 be
> >>> >> >> notified the changing?    kill cassandra and start it again could
> >>> >> >> make
> >>> >> >> it update the schema?
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
> >>> >> >> <th...@wetafx.co.nz>
> >>> >> >> wrote:
> >>> >> >>> Hi,
> >>> >> >>>
> >>> >> >>> yes, we saw exactly the same messages. We got rid of these by
> >>> >> >>> doing
> >>> >> >>> the
> >>> >> >>> following:
> >>> >> >>>
> >>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >>> >> >>> * Kill Cassandra
> >>> >> >>> * Remove all files in the saved_caches directory
> >>> >> >>> * Start Cassandra
> >>> >> >>> * Slowly bring back row & key caches (if desired, we left them
> >>> >> >>> off)
> >>> >> >>>
> >>> >> >>> Cheers,
> >>> >> >>>
> >>> >> >>>        T.
> >>> >> >>>
> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >>> >> >>>>
> >>> >> >>>>  I saw alot slicequeryfilter things if changed the log level to
> >>> >> >>>> DEBUG.
> >>> >> >>>>  just
> >>> >> >>>> thought even bring up a new node will be faster than start the
> >>> >> >>>> old
> >>> >> >>>> one..... it
> >>> >> >>>> is wired
> >>> >> >>>>
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:225@1313068845474382
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:453@1310999270198313
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:26@1313199902088827
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:157@1313097239332314
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> >>> >> >>>> 76616c7565:false:41729@1313190821826229
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:6@1313174157301203
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:98@1312011362250907
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:42@1313201711997005
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:96@1312939986190155
> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
> (line
> >>> >> >>>> 123)
> >>> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:621@1313192538616112
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
> >>> >> >>>> <springrider@gmail.com
> >>> >> >>>> <ma...@gmail.com>> wrote:
> >>> >> >>>>
> >>> >> >>>>    but it seems the row cache is cluster wide, how will  the
> >>> >> >>>> change
> >>> >> >>>> of row
> >>> >> >>>>    cache affect the read speed?
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
> >>> >> >>>> <jbellis@gmail.com
> >>> >> >>>>    <ma...@gmail.com>> wrote:
> >>> >> >>>>
> >>> >> >>>>        Or leave row cache enabled but disable cache saving (and
> >>> >> >>>> remove the
> >>> >> >>>>        one already on disk).
> >>> >> >>>>
> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >>> >> >>>> <aaron@thelastpickle.com
> >>> >> >>>>        <ma...@thelastpickle.com>> wrote:
> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> >>> >> >>>> ColumnFamilyStore.java
> >>> >> >>>> (line 547)
> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
> >>> >> >>>> for
> >>> >> >>>> COMMENT
> >>> >> >>>>         >
> >>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the
>  row
> >>> >> >>>> cache.
> >>> >> >>>> Thats a
> >>> >> >>>>         > pretty big row cache, I would suggest reducing or
> >>> >> >>>> disabling
> >>> >> >>>> it.
> >>> >> >>>>         > Background
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >>> >> >>>>         >
> >>> >> >>>>         > and server can not afford the load then crashed.
> after
> >>> >> >>>> come
> >>> >> >>>> back,
> >>> >> >>>>        node 3 can
> >>> >> >>>>         > not return for more than 96 hours
> >>> >> >>>>         >
> >>> >> >>>>         > Crashed how ?
> >>> >> >>>>         > You may be seeing
> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >>> >> >>>>         > Watch nodetool compactionstats to see when the Merkle
> >>> >> >>>> tree
> >>> >> >>>> build
> >>> >> >>>>        finishes
> >>> >> >>>>         > and nodetool netstats to see which CF's are
> streaming.
> >>> >> >>>>         > Cheers
> >>> >> >>>>         > -----------------
> >>> >> >>>>         > Aaron Morton
> >>> >> >>>>         > Freelance Cassandra Developer
> >>> >> >>>>         > @aaronmorton
> >>> >> >>>>         > http://www.thelastpickle.com
> >>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
> >>> >> >>>> seems
> >>> >> >>>> alot
> >>> >> >>>> data
> >>> >> >>>>         > generated.  and server can not afford the load then
> >>> >> >>>> crashed.
> >>> >> >>>>         > after come back, node 3 can not return for more than
> 96
> >>> >> >>>> hours
> >>> >> >>>>         >
> >>> >> >>>>         > for 34GB data, the node 2 could restart and back
> online
> >>> >> >>>> within 1
> >>> >> >>>> hour.
> >>> >> >>>>         >
> >>> >> >>>>         > I am not sure what's wrong with node3 and should I
> >>> >> >>>> restart
> >>> >> >>>> node
> >>> >> >>>> 3 again?
> >>> >> >>>>         > thanks!
> >>> >> >>>>         >
> >>> >> >>>>         > Address         Status State   Load            Owns
> >>> >> >>>>  Token
> >>> >> >>>>         >
> >>> >> >>>>         > 113427455640312821154458202477256070484
> >>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
> >>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >>> >> >>>>         > 56713727820156410577229101238628035242
> >>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >>> >> >>>>         > 113427455640312821154458202477256070484
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         > the log shows it is still going on, not sure why it
> is
> >>> >> >>>> so
> >>> >> >>>> slow:
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734
> SSTableReader.java
> >>> >> >>>> (line
> >>> >> >>>> 154)
> >>> >> >>>>        Opening
> >>> >> >>>>         > /cassandra/data/COMMENT
> >>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
> >>> >> >>>> ColumnFamilyStore.java
> >>> >> >>>> (line 275)
> >>> >> >>>>         > reading saved cache
> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> >>> >> >>>> ColumnFamilyStore.java
> >>> >> >>>> (line 547)
> >>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
> >>> >> >>>> for
> >>> >> >>>> COMMENT
> >>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
> >>> >> >>>> ColumnFamilyStore.java
> >>> >> >>>> (line 275)
> >>> >> >>>>         > reading saved cache
> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
> >>> >> >>>>        CacheWriter.java (line
> >>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>         >
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>        --
> >>> >> >>>>        Jonathan Ellis
> >>> >> >>>>        Project Chair, Apache Cassandra
> >>> >> >>>>        co-founder of DataStax, the source for professional
> >>> >> >>>> Cassandra
> >>> >> >>>> support
> >>> >> >>>>        http://www.datastax.com
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >
> >>> >> >
> >>> >
> >>> >
> >>>
> >>
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: node restart taking too long

Posted by Jonathan Ellis <jb...@gmail.com>.

This means you should upgrade, because we've fixed bugs about ignoring
deleted CFs since 0.7.4.

On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <sp...@gmail.com> wrote:
> the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
> means(google just returned useless results):
>
> INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) Found
> table data in data directories. Consider using JMX to call
> org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> Creating new commitlog segment
> /cassandra/commitlog/CommitLog-1313670197705.log
>  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> /cassandra/commitlog/CommitLog-1313670030512.log
>  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> reading /cassandra/commitlog/CommitLog-1313670030512.log
>  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
> complete
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> Cassandra version: 0.7.4
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift
> API version: 19.4.0
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading
> persisted ring state
>  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) Starting
> up server gossip
>  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> bytes)
>  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java
> (line 396) Compacting
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
> saved token 113427455640312821154458202477256070484
>  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> ERROR [MutationStage:28] 2011-08-18 07:23:18,246 RowMutationVerbHandler.java
> (line 86) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
> cfId=1000
>     at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>     at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>     at
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     at java.lang.Thread.run(Thread.java:636)
>  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node
> /node1 has restarted, now UP again
> ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> keyspace prjkeyspace
>     at
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>     at
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>     at
> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>     at
> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>     at
> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>     at
> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>     at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>
>
> On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> Look in the logs to work find out why the migration did not get to node2.
>> Otherwise yes you can drop those files.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>>
>> just found out that changes via cassandra-cli, the schema change didn't
>> reach node2. and node2 became unreachable....
>> I did as this
>> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>> but after that I just got two schema versons:
>>
>>
>> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>>
>> is that enough delete Schema* && Migrations* sstables and restart the
>> node?
>>
>>
>> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com> wrote:
>>>
>>> thanks a lot for  all the help!  I have gone through the steps and
>>> successfully brought up the node2 :)
>>>
>>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com> wrote:
>>> > Because the file only preserve the "key" of records, not the whole
>>> > record.
>>> > Records for those saved key will be loaded into cassandra during the
>>> > startup
>>> > of cassandra.
>>> >
>>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
>>> > wrote:
>>> >>
>>> >> but the data size in the saved_cache are relatively small:
>>> >>
>>> >> will that cause the load problem?
>>> >>
>>> >>  ls  -lh  /cassandra/saved_caches/
>>> >> total 32M
>>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>>> >> cass-CommentSortsCache-KeyCache
>>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>>> >> cass-CommentSortsCache-RowCache
>>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
>>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>>> >> cass-device_images-KeyCache
>>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
>>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>>> >> cass-SavesByAccount-KeyCache
>>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
>>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
>>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>>> >> system-HintsColumnFamily-KeyCache
>>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>>> >> system-LocationInfo-KeyCache
>>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>>> >> system-Migrations-KeyCache
>>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>>> >>
>>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>>> >> <aa...@thelastpickle.com>
>>> >> wrote:
>>> >> > If you have a node that cannot start up due to issues loading the
>>> >> > saved
>>> >> > cache delete the files in the saved_cache directory before starting
>>> >> > it.
>>> >> >
>>> >> > The settings to save the row and key cache are per CF. You can
>>> >> > change
>>> >> > them with an update column family statement via the CLI when
>>> >> > attached to any
>>> >> > node. You may then want to check the saved_caches directory and
>>> >> > delete any
>>> >> > files that are left (not sure if they are automatically deleted).
>>> >> >
>>> >> > i would recommend:
>>> >> > - stop node 2
>>> >> > - delete it's saved_cache
>>> >> > - make the schema change via another node
>>> >> > - startup node 2
>>> >> >
>>> >> > Cheers
>>> >> >
>>> >> > -----------------
>>> >> > Aaron Morton
>>> >> > Freelance Cassandra Developer
>>> >> > @aaronmorton
>>> >> > http://www.thelastpickle.com
>>> >> >
>>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>>> >> >
>>> >> >> does this need to be cluster wide? or I could just modify the
>>> >> >> caches
>>> >> >> on one node?   since I could not connect to the node with
>>> >> >> cassandra-cli, it says "connection refused"
>>> >> >>
>>> >> >>
>>> >> >> [default@unknown] connect node2/9160;
>>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
>>> >> >>
>>> >> >>
>>> >> >> so if I change the cache size via other nodes, how could node2 be
>>> >> >> notified the changing?    kill cassandra and start it again could
>>> >> >> make
>>> >> >> it update the schema?
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
>>> >> >> <th...@wetafx.co.nz>
>>> >> >> wrote:
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> yes, we saw exactly the same messages. We got rid of these by
>>> >> >>> doing
>>> >> >>> the
>>> >> >>> following:
>>> >> >>>
>>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>>> >> >>> * Kill Cassandra
>>> >> >>> * Remove all files in the saved_caches directory
>>> >> >>> * Start Cassandra
>>> >> >>> * Slowly bring back row & key caches (if desired, we left them
>>> >> >>> off)
>>> >> >>>
>>> >> >>> Cheers,
>>> >> >>>
>>> >> >>>        T.
>>> >> >>>
>>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>>> >> >>>>
>>> >> >>>>  I saw alot slicequeryfilter things if changed the log level to
>>> >> >>>> DEBUG.
>>> >> >>>>  just
>>> >> >>>> thought even bring up a new node will be faster than start the
>>> >> >>>> old
>>> >> >>>> one..... it
>>> >> >>>> is wired
>>> >> >>>>
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647:
>>> >> >>>> 76616c7565:false:41729@1313190821826229
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line
>>> >> >>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu
>>> >> >>>> <springrider@gmail.com
>>> >> >>>> <ma...@gmail.com>> wrote:
>>> >> >>>>
>>> >> >>>>    but it seems the row cache is cluster wide, how will  the
>>> >> >>>> change
>>> >> >>>> of row
>>> >> >>>>    cache affect the read speed?
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis
>>> >> >>>> <jbellis@gmail.com
>>> >> >>>>    <ma...@gmail.com>> wrote:
>>> >> >>>>
>>> >> >>>>        Or leave row cache enabled but disable cache saving (and
>>> >> >>>> remove the
>>> >> >>>>        one already on disk).
>>> >> >>>>
>>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>>> >> >>>> <aaron@thelastpickle.com
>>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>>> >> >>>> ColumnFamilyStore.java
>>> >> >>>> (line 547)
>>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
>>> >> >>>> for
>>> >> >>>> COMMENT
>>> >> >>>>         >
>>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
>>> >> >>>> cache.
>>> >> >>>> Thats a
>>> >> >>>>         > pretty big row cache, I would suggest reducing or
>>> >> >>>> disabling
>>> >> >>>> it.
>>> >> >>>>         > Background
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>> >> >>>>         >
>>> >> >>>>         > and server can not afford the load then crashed. after
>>> >> >>>> come
>>> >> >>>> back,
>>> >> >>>>        node 3 can
>>> >> >>>>         > not return for more than 96 hours
>>> >> >>>>         >
>>> >> >>>>         > Crashed how ?
>>> >> >>>>         > You may be seeing
>>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>> >> >>>>         > Watch nodetool compactionstats to see when the Merkle
>>> >> >>>> tree
>>> >> >>>> build
>>> >> >>>>        finishes
>>> >> >>>>         > and nodetool netstats to see which CF's are streaming.
>>> >> >>>>         > Cheers
>>> >> >>>>         > -----------------
>>> >> >>>>         > Aaron Morton
>>> >> >>>>         > Freelance Cassandra Developer
>>> >> >>>>         > @aaronmorton
>>> >> >>>>         > http://www.thelastpickle.com
>>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
>>> >> >>>> seems
>>> >> >>>> alot
>>> >> >>>> data
>>> >> >>>>         > generated.  and server can not afford the load then
>>> >> >>>> crashed.
>>> >> >>>>         > after come back, node 3 can not return for more than 96
>>> >> >>>> hours
>>> >> >>>>         >
>>> >> >>>>         > for 34GB data, the node 2 could restart and back online
>>> >> >>>> within 1
>>> >> >>>> hour.
>>> >> >>>>         >
>>> >> >>>>         > I am not sure what's wrong with node3 and should I
>>> >> >>>> restart
>>> >> >>>> node
>>> >> >>>> 3 again?
>>> >> >>>>         > thanks!
>>> >> >>>>         >
>>> >> >>>>         > Address         Status State   Load            Owns
>>> >> >>>>  Token
>>> >> >>>>         >
>>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>>> >> >>>>         > 56713727820156410577229101238628035242
>>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         > the log shows it is still going on, not sure why it is
>>> >> >>>> so
>>> >> >>>> slow:
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
>>> >> >>>> (line
>>> >> >>>> 154)
>>> >> >>>>        Opening
>>> >> >>>>         > /cassandra/data/COMMENT
>>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>>> >> >>>> ColumnFamilyStore.java
>>> >> >>>> (line 275)
>>> >> >>>>         > reading saved cache
>>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>>> >> >>>> ColumnFamilyStore.java
>>> >> >>>> (line 547)
>>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
>>> >> >>>> for
>>> >> >>>> COMMENT
>>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>>> >> >>>> ColumnFamilyStore.java
>>> >> >>>> (line 275)
>>> >> >>>>         > reading saved cache
>>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>>> >> >>>>        CacheWriter.java (line
>>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>        --
>>> >> >>>>        Jonathan Ellis
>>> >> >>>>        Project Chair, Apache Cassandra
>>> >> >>>>        co-founder of DataStax, the source for professional
>>> >> >>>> Cassandra
>>> >> >>>> support
>>> >> >>>>        http://www.datastax.com
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>>
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: node restart taking too long

Posted by Peter Schuller <pe...@infidyne.com>.

Can you post the complete Cassandra log starting with the initial
start-up of the node after having removed schema/migrations?

-- 
/ Peter Schuller (@scode on twitter)

Re: node restart taking too long

Posted by Peter Schuller <pe...@infidyne.com>.

> the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
> means(google just returned useless results):

Those should be the indication that the schema is wrong on the node.
Reads and writes are being received from other nodes pertaining to
column families it does not know about.

I don't know, without investigation, why the instructions from the
wiki don't work though. You did the procedure of restarting the node
with the migrations/schema removed, right?

-- 
/ Peter Schuller (@scode on twitter)

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

any suggestion? thanks!

On Fri, Aug 19, 2011 at 10:26 PM, Yan Chunlu <sp...@gmail.com> wrote:

> the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
> means(google just returned useless results):
>
>
> INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> Found table data in data directories. Consider using JMX to call
> org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> Creating new commitlog segment
> /cassandra/commitlog/CommitLog-1313670197705.log
>  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> /cassandra/commitlog/CommitLog-1313670030512.log
>  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> reading /cassandra/commitlog/CommitLog-1313670030512.log
>  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
> complete
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> Cassandra version: 0.7.4
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift
> API version: 19.4.0
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading
> persisted ring state
>  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> Starting up server gossip
>  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> bytes)
>  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java
> (line 396) Compacting
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
> saved token 113427455640312821154458202477256070484
>  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> RowMutationVerbHandler.java (line 86) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
> cfId=1000
>     at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>     at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>     at
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     at java.lang.Thread.run(Thread.java:636)
>  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node
> /node1 has restarted, now UP again
> ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> keyspace prjkeyspace
>     at
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>     at
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>     at
> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>     at
> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>     at
> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>     at
> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>     at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>
>
>
> On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> Look in the logs to work find out why the migration did not get to node2.
>>
>> Otherwise yes you can drop those files.
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>>
>> just found out that changes via cassandra-cli, the schema change didn't
>> reach node2. and node2 became unreachable....
>>
>> I did as this document:
>> http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>>
>> but after that I just got two schema versons:
>>
>>
>>
>> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>>  2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>>
>>
>> is that enough delete Schema* && Migrations* sstables and restart the
>> node?
>>
>>
>>
>> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com>wrote:
>>
>>> thanks a lot for  all the help!  I have gone through the steps and
>>> successfully brought up the node2 :)
>>>
>>>
>>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com> wrote:
>>> > Because the file only preserve the "key" of records, not the whole
>>> record.
>>> > Records for those saved key will be loaded into cassandra during the
>>> startup
>>> > of cassandra.
>>> >
>>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
>>> wrote:
>>> >>
>>> >> but the data size in the saved_cache are relatively small:
>>> >>
>>> >> will that cause the load problem?
>>> >>
>>> >>  ls  -lh  /cassandra/saved_caches/
>>> >> total 32M
>>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>>> >> cass-CommentSortsCache-KeyCache
>>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>>> >> cass-CommentSortsCache-RowCache
>>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
>>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>>> cass-device_images-KeyCache
>>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
>>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>>> cass-SavesByAccount-KeyCache
>>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
>>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
>>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>>> >> system-HintsColumnFamily-KeyCache
>>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>>> system-LocationInfo-KeyCache
>>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>>> system-Migrations-KeyCache
>>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>>> >>
>>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <
>>> aaron@thelastpickle.com>
>>> >> wrote:
>>> >> > If you have a node that cannot start up due to issues loading the
>>> saved
>>> >> > cache delete the files in the saved_cache directory before starting
>>> it.
>>> >> >
>>> >> > The settings to save the row and key cache are per CF. You can
>>> change
>>> >> > them with an update column family statement via the CLI when
>>> attached to any
>>> >> > node. You may then want to check the saved_caches directory and
>>> delete any
>>> >> > files that are left (not sure if they are automatically deleted).
>>> >> >
>>> >> > i would recommend:
>>> >> > - stop node 2
>>> >> > - delete it's saved_cache
>>> >> > - make the schema change via another node
>>> >> > - startup node 2
>>> >> >
>>> >> > Cheers
>>> >> >
>>> >> > -----------------
>>> >> > Aaron Morton
>>> >> > Freelance Cassandra Developer
>>> >> > @aaronmorton
>>> >> > http://www.thelastpickle.com
>>> >> >
>>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>>> >> >
>>> >> >> does this need to be cluster wide? or I could just modify the
>>> caches
>>> >> >> on one node?   since I could not connect to the node with
>>> >> >> cassandra-cli, it says "connection refused"
>>> >> >>
>>> >> >>
>>> >> >> [default@unknown] connect node2/9160;
>>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
>>> >> >>
>>> >> >>
>>> >> >> so if I change the cache size via other nodes, how could node2 be
>>> >> >> notified the changing?    kill cassandra and start it again could
>>> make
>>> >> >> it update the schema?
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <
>>> tholzer@wetafx.co.nz>
>>> >> >> wrote:
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> yes, we saw exactly the same messages. We got rid of these by
>>> doing
>>> >> >>> the
>>> >> >>> following:
>>> >> >>>
>>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>>> >> >>> * Kill Cassandra
>>> >> >>> * Remove all files in the saved_caches directory
>>> >> >>> * Start Cassandra
>>> >> >>> * Slowly bring back row & key caches (if desired, we left them
>>> off)
>>> >> >>>
>>> >> >>> Cheers,
>>> >> >>>
>>> >> >>>        T.
>>> >> >>>
>>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>>> >> >>>>
>>> >> >>>>  I saw alot slicequeryfilter things if changed the log level to
>>> >> >>>> DEBUG.
>>> >> >>>>  just
>>> >> >>>> thought even bring up a new node will be faster than start the
>>> old
>>> >> >>>> one..... it
>>> >> >>>> is wired
>>> >> >>>>
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647:
>>> 76616c7565:false:225@1313068845474382
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647:
>>> 76616c7565:false:453@1310999270198313
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647:
>>> 76616c7565:false:157@1313097239332314
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647:
>>> 76616c7565:false:41729@1313190821826229
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line
>>> 123)
>>> >> >>>> collecting 0 of 2147483647:
>>> 76616c7565:false:621@1313192538616112
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <
>>> springrider@gmail.com
>>> >> >>>> <ma...@gmail.com>> wrote:
>>> >> >>>>
>>> >> >>>>    but it seems the row cache is cluster wide, how will  the
>>> change
>>> >> >>>> of row
>>> >> >>>>    cache affect the read speed?
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <
>>> jbellis@gmail.com
>>> >> >>>>    <ma...@gmail.com>> wrote:
>>> >> >>>>
>>> >> >>>>        Or leave row cache enabled but disable cache saving (and
>>> >> >>>> remove the
>>> >> >>>>        one already on disk).
>>> >> >>>>
>>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>>> >> >>>> <aaron@thelastpickle.com
>>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>>> ColumnFamilyStore.java
>>> >> >>>> (line 547)
>>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
>>> for
>>> >> >>>> COMMENT
>>> >> >>>>         >
>>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
>>> >> >>>> cache.
>>> >> >>>> Thats a
>>> >> >>>>         > pretty big row cache, I would suggest reducing or
>>> disabling
>>> >> >>>> it.
>>> >> >>>>         > Background
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>> >> >>>>         >
>>> >> >>>>         > and server can not afford the load then crashed. after
>>> come
>>> >> >>>> back,
>>> >> >>>>        node 3 can
>>> >> >>>>         > not return for more than 96 hours
>>> >> >>>>         >
>>> >> >>>>         > Crashed how ?
>>> >> >>>>         > You may be seeing
>>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>> >> >>>>         > Watch nodetool compactionstats to see when the Merkle
>>> tree
>>> >> >>>> build
>>> >> >>>>        finishes
>>> >> >>>>         > and nodetool netstats to see which CF's are streaming.
>>> >> >>>>         > Cheers
>>> >> >>>>         > -----------------
>>> >> >>>>         > Aaron Morton
>>> >> >>>>         > Freelance Cassandra Developer
>>> >> >>>>         > @aaronmorton
>>> >> >>>>         > http://www.thelastpickle.com
>>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it
>>> seems
>>> >> >>>> alot
>>> >> >>>> data
>>> >> >>>>         > generated.  and server can not afford the load then
>>> >> >>>> crashed.
>>> >> >>>>         > after come back, node 3 can not return for more than 96
>>> >> >>>> hours
>>> >> >>>>         >
>>> >> >>>>         > for 34GB data, the node 2 could restart and back online
>>> >> >>>> within 1
>>> >> >>>> hour.
>>> >> >>>>         >
>>> >> >>>>         > I am not sure what's wrong with node3 and should I
>>> restart
>>> >> >>>> node
>>> >> >>>> 3 again?
>>> >> >>>>         > thanks!
>>> >> >>>>         >
>>> >> >>>>         > Address         Status State   Load            Owns
>>> >> >>>>  Token
>>> >> >>>>         >
>>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>>> >> >>>>         > 56713727820156410577229101238628035242
>>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         > the log shows it is still going on, not sure why it is
>>> so
>>> >> >>>> slow:
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
>>> >> >>>> (line
>>> >> >>>> 154)
>>> >> >>>>        Opening
>>> >> >>>>         > /cassandra/data/COMMENT
>>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>>> ColumnFamilyStore.java
>>> >> >>>> (line 275)
>>> >> >>>>         > reading saved cache
>>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>>> ColumnFamilyStore.java
>>> >> >>>> (line 547)
>>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
>>> for
>>> >> >>>> COMMENT
>>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>>> ColumnFamilyStore.java
>>> >> >>>> (line 275)
>>> >> >>>>         > reading saved cache
>>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>>> >> >>>>        CacheWriter.java (line
>>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>         >
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>        --
>>> >> >>>>        Jonathan Ellis
>>> >> >>>>        Project Chair, Apache Cassandra
>>> >> >>>>        co-founder of DataStax, the source for professional
>>> Cassandra
>>> >> >>>> support
>>> >> >>>>        http://www.datastax.com
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>>
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>>
>>
>>
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
means(google just returned useless results):


INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) Found
table data in data directories. Consider using JMX to call
org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
 INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
Creating new commitlog segment
/cassandra/commitlog/CommitLog-1313670197705.log
 INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
/cassandra/commitlog/CommitLog-1313670030512.log
 INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
reading /cassandra/commitlog/CommitLog-1313670030512.log
 INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
complete
 INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
Cassandra version: 0.7.4
 INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift
API version: 19.4.0
 INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading
persisted ring state
 INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) Starting
up server gossip
 INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations)
 INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
 INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
bytes)
 INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java
(line 396) Compacting
[SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
 INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
saved token 113427455640312821154458202477256070484
 INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations)
 INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
ERROR [MutationStage:28] 2011-08-18 07:23:18,246 RowMutationVerbHandler.java
(line 86) Error in row mutation
org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
cfId=1000
    at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
    at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
    at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
 INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node
/node1 has restarted, now UP again
ERROR [ReadStage:1] 2011-08-18 07:23:18,254
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
keyspace prjkeyspace
    at
org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
    at
org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
    at
org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
    at
org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
    at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
    at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
    at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)



On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Look in the logs to work find out why the migration did not get to node2.
>
> Otherwise yes you can drop those files.
>
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>
> just found out that changes via cassandra-cli, the schema change didn't
> reach node2. and node2 became unreachable....
>
> I did as this document:
> http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>
> but after that I just got two schema versons:
>
>
>
> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>  2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>
>
> is that enough delete Schema* && Migrations* sstables and restart the node?
>
>
>
> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com> wrote:
>
>> thanks a lot for  all the help!  I have gone through the steps and
>> successfully brought up the node2 :)
>>
>>
>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com> wrote:
>> > Because the file only preserve the "key" of records, not the whole
>> record.
>> > Records for those saved key will be loaded into cassandra during the
>> startup
>> > of cassandra.
>> >
>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
>> wrote:
>> >>
>> >> but the data size in the saved_cache are relatively small:
>> >>
>> >> will that cause the load problem?
>> >>
>> >>  ls  -lh  /cassandra/saved_caches/
>> >> total 32M
>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> >> cass-CommentSortsCache-KeyCache
>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> >> cass-CommentSortsCache-RowCache
>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>> cass-device_images-KeyCache
>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>> cass-SavesByAccount-KeyCache
>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> >> system-HintsColumnFamily-KeyCache
>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>> system-LocationInfo-KeyCache
>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>> >>
>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aaron@thelastpickle.com
>> >
>> >> wrote:
>> >> > If you have a node that cannot start up due to issues loading the
>> saved
>> >> > cache delete the files in the saved_cache directory before starting
>> it.
>> >> >
>> >> > The settings to save the row and key cache are per CF. You can change
>> >> > them with an update column family statement via the CLI when attached
>> to any
>> >> > node. You may then want to check the saved_caches directory and
>> delete any
>> >> > files that are left (not sure if they are automatically deleted).
>> >> >
>> >> > i would recommend:
>> >> > - stop node 2
>> >> > - delete it's saved_cache
>> >> > - make the schema change via another node
>> >> > - startup node 2
>> >> >
>> >> > Cheers
>> >> >
>> >> > -----------------
>> >> > Aaron Morton
>> >> > Freelance Cassandra Developer
>> >> > @aaronmorton
>> >> > http://www.thelastpickle.com
>> >> >
>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >> >
>> >> >> does this need to be cluster wide? or I could just modify the caches
>> >> >> on one node?   since I could not connect to the node with
>> >> >> cassandra-cli, it says "connection refused"
>> >> >>
>> >> >>
>> >> >> [default@unknown] connect node2/9160;
>> >> >> Exception connecting to node2/9160. Reason: Connection refused.
>> >> >>
>> >> >>
>> >> >> so if I change the cache size via other nodes, how could node2 be
>> >> >> notified the changing?    kill cassandra and start it again could
>> make
>> >> >> it update the schema?
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <tholzer@wetafx.co.nz
>> >
>> >> >> wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> yes, we saw exactly the same messages. We got rid of these by doing
>> >> >>> the
>> >> >>> following:
>> >> >>>
>> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> >> >>> * Kill Cassandra
>> >> >>> * Remove all files in the saved_caches directory
>> >> >>> * Start Cassandra
>> >> >>> * Slowly bring back row & key caches (if desired, we left them off)
>> >> >>>
>> >> >>> Cheers,
>> >> >>>
>> >> >>>        T.
>> >> >>>
>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >> >>>>
>> >> >>>>  I saw alot slicequeryfilter things if changed the log level to
>> >> >>>> DEBUG.
>> >> >>>>  just
>> >> >>>> thought even bring up a new node will be faster than start the old
>> >> >>>> one..... it
>> >> >>>> is wired
>> >> >>>>
>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647:
>> 76616c7565:false:41729@1313190821826229
>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line
>> 123)
>> >> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <
>> springrider@gmail.com
>> >> >>>> <ma...@gmail.com>> wrote:
>> >> >>>>
>> >> >>>>    but it seems the row cache is cluster wide, how will  the
>> change
>> >> >>>> of row
>> >> >>>>    cache affect the read speed?
>> >> >>>>
>> >> >>>>
>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <
>> jbellis@gmail.com
>> >> >>>>    <ma...@gmail.com>> wrote:
>> >> >>>>
>> >> >>>>        Or leave row cache enabled but disable cache saving (and
>> >> >>>> remove the
>> >> >>>>        one already on disk).
>> >> >>>>
>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >> >>>> <aaron@thelastpickle.com
>> >> >>>>        <ma...@thelastpickle.com>> wrote:
>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> ColumnFamilyStore.java
>> >> >>>> (line 547)
>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
>> for
>> >> >>>> COMMENT
>> >> >>>>         >
>> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
>> >> >>>> cache.
>> >> >>>> Thats a
>> >> >>>>         > pretty big row cache, I would suggest reducing or
>> disabling
>> >> >>>> it.
>> >> >>>>         > Background
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >> >>>>         >
>> >> >>>>         > and server can not afford the load then crashed. after
>> come
>> >> >>>> back,
>> >> >>>>        node 3 can
>> >> >>>>         > not return for more than 96 hours
>> >> >>>>         >
>> >> >>>>         > Crashed how ?
>> >> >>>>         > You may be seeing
>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >> >>>>         > Watch nodetool compactionstats to see when the Merkle
>> tree
>> >> >>>> build
>> >> >>>>        finishes
>> >> >>>>         > and nodetool netstats to see which CF's are streaming.
>> >> >>>>         > Cheers
>> >> >>>>         > -----------------
>> >> >>>>         > Aaron Morton
>> >> >>>>         > Freelance Cassandra Developer
>> >> >>>>         > @aaronmorton
>> >> >>>>         > http://www.thelastpickle.com
>> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems
>> >> >>>> alot
>> >> >>>> data
>> >> >>>>         > generated.  and server can not afford the load then
>> >> >>>> crashed.
>> >> >>>>         > after come back, node 3 can not return for more than 96
>> >> >>>> hours
>> >> >>>>         >
>> >> >>>>         > for 34GB data, the node 2 could restart and back online
>> >> >>>> within 1
>> >> >>>> hour.
>> >> >>>>         >
>> >> >>>>         > I am not sure what's wrong with node3 and should I
>> restart
>> >> >>>> node
>> >> >>>> 3 again?
>> >> >>>>         > thanks!
>> >> >>>>         >
>> >> >>>>         > Address         Status State   Load            Owns
>> >> >>>>  Token
>> >> >>>>         >
>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>> >> >>>>         > 56713727820156410577229101238628035242
>> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         > the log shows it is still going on, not sure why it is
>> so
>> >> >>>> slow:
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
>> >> >>>> (line
>> >> >>>> 154)
>> >> >>>>        Opening
>> >> >>>>         > /cassandra/data/COMMENT
>> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
>> ColumnFamilyStore.java
>> >> >>>> (line 275)
>> >> >>>>         > reading saved cache
>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
>> ColumnFamilyStore.java
>> >> >>>> (line 547)
>> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache
>> for
>> >> >>>> COMMENT
>> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
>> ColumnFamilyStore.java
>> >> >>>> (line 275)
>> >> >>>>         > reading saved cache
>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>> >> >>>>        CacheWriter.java (line
>> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>         >
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>        --
>> >> >>>>        Jonathan Ellis
>> >> >>>>        Project Chair, Apache Cassandra
>> >> >>>>        co-founder of DataStax, the source for professional
>> Cassandra
>> >> >>>> support
>> >> >>>>        http://www.datastax.com
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >
>> >> >
>> >
>> >
>>
>>
>
>

Re: node restart taking too long

Posted by aaron morton <aa...@thelastpickle.com>.

Look in the logs to work find out why the migration did not get to node2.

Otherwise yes you can drop those files.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:

> just found out that changes via cassandra-cli, the schema change didn't reach node2. and node2 became unreachable....
> 
> I did as this document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> 
> but after that I just got two schema versons:
> 
> 
> 
> 	ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
> 	2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
> 
> 
> is that enough delete Schema* && Migrations* sstables and restart the node?
> 
> 
>  
> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com> wrote:
> thanks a lot for  all the help!  I have gone through the steps and successfully brought up the node2 :)
> 
> 
> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com> wrote:
> > Because the file only preserve the "key" of records, not the whole record.
> > Records for those saved key will be loaded into cassandra during the startup
> > of cassandra.
> >
> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com> wrote:
> >>
> >> but the data size in the saved_cache are relatively small:
> >>
> >> will that cause the load problem?
> >>
> >>  ls  -lh  /cassandra/saved_caches/
> >> total 32M
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> cass-CommentSortsCache-KeyCache
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> cass-CommentSortsCache-RowCache
> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> system-HintsColumnFamily-KeyCache
> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50 system-LocationInfo-KeyCache
> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >>
> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aa...@thelastpickle.com>
> >> wrote:
> >> > If you have a node that cannot start up due to issues loading the saved
> >> > cache delete the files in the saved_cache directory before starting it.
> >> >
> >> > The settings to save the row and key cache are per CF. You can change
> >> > them with an update column family statement via the CLI when attached to any
> >> > node. You may then want to check the saved_caches directory and delete any
> >> > files that are left (not sure if they are automatically deleted).
> >> >
> >> > i would recommend:
> >> > - stop node 2
> >> > - delete it's saved_cache
> >> > - make the schema change via another node
> >> > - startup node 2
> >> >
> >> > Cheers
> >> >
> >> > -----------------
> >> > Aaron Morton
> >> > Freelance Cassandra Developer
> >> > @aaronmorton
> >> > http://www.thelastpickle.com
> >> >
> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >
> >> >> does this need to be cluster wide? or I could just modify the caches
> >> >> on one node?   since I could not connect to the node with
> >> >> cassandra-cli, it says "connection refused"
> >> >>
> >> >>
> >> >> [default@unknown] connect node2/9160;
> >> >> Exception connecting to node2/9160. Reason: Connection refused.
> >> >>
> >> >>
> >> >> so if I change the cache size via other nodes, how could node2 be
> >> >> notified the changing?    kill cassandra and start it again could make
> >> >> it update the schema?
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz>
> >> >> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> yes, we saw exactly the same messages. We got rid of these by doing
> >> >>> the
> >> >>> following:
> >> >>>
> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >> >>> * Kill Cassandra
> >> >>> * Remove all files in the saved_caches directory
> >> >>> * Start Cassandra
> >> >>> * Slowly bring back row & key caches (if desired, we left them off)
> >> >>>
> >> >>> Cheers,
> >> >>>
> >> >>>        T.
> >> >>>
> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>>>
> >> >>>>  I saw alot slicequeryfilter things if changed the log level to
> >> >>>> DEBUG.
> >> >>>>  just
> >> >>>> thought even bring up a new node will be faster than start the old
> >> >>>> one..... it
> >> >>>> is wired
> >> >>>>
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
> >> >>>> <ma...@gmail.com>> wrote:
> >> >>>>
> >> >>>>    but it seems the row cache is cluster wide, how will  the change
> >> >>>> of row
> >> >>>>    cache affect the read speed?
> >> >>>>
> >> >>>>
> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jbellis@gmail.com
> >> >>>>    <ma...@gmail.com>> wrote:
> >> >>>>
> >> >>>>        Or leave row cache enabled but disable cache saving (and
> >> >>>> remove the
> >> >>>>        one already on disk).
> >> >>>>
> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >> >>>> <aaron@thelastpickle.com
> >> >>>>        <ma...@thelastpickle.com>> wrote:
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
> >> >>>> (line 547)
> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
> >> >>>> COMMENT
> >> >>>>         >
> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
> >> >>>> cache.
> >> >>>> Thats a
> >> >>>>         > pretty big row cache, I would suggest reducing or disabling
> >> >>>> it.
> >> >>>>         > Background
> >> >>>>
> >> >>>>
> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>>>         >
> >> >>>>         > and server can not afford the load then crashed. after come
> >> >>>> back,
> >> >>>>        node 3 can
> >> >>>>         > not return for more than 96 hours
> >> >>>>         >
> >> >>>>         > Crashed how ?
> >> >>>>         > You may be seeing
> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>>>         > Watch nodetool compactionstats to see when the Merkle tree
> >> >>>> build
> >> >>>>        finishes
> >> >>>>         > and nodetool netstats to see which CF's are streaming.
> >> >>>>         > Cheers
> >> >>>>         > -----------------
> >> >>>>         > Aaron Morton
> >> >>>>         > Freelance Cassandra Developer
> >> >>>>         > @aaronmorton
> >> >>>>         > http://www.thelastpickle.com
> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >> >>>>         >
> >> >>>>         >
> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems
> >> >>>> alot
> >> >>>> data
> >> >>>>         > generated.  and server can not afford the load then
> >> >>>> crashed.
> >> >>>>         > after come back, node 3 can not return for more than 96
> >> >>>> hours
> >> >>>>         >
> >> >>>>         > for 34GB data, the node 2 could restart and back online
> >> >>>> within 1
> >> >>>> hour.
> >> >>>>         >
> >> >>>>         > I am not sure what's wrong with node3 and should I restart
> >> >>>> node
> >> >>>> 3 again?
> >> >>>>         > thanks!
> >> >>>>         >
> >> >>>>         > Address         Status State   Load            Owns  
> >> >>>>  Token
> >> >>>>         >
> >> >>>>         > 113427455640312821154458202477256070484
> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >> >>>>         > 56713727820156410577229101238628035242
> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >> >>>>         > 113427455640312821154458202477256070484
> >> >>>>         >
> >> >>>>         >
> >> >>>>         > the log shows it is still going on, not sure why it is so
> >> >>>> slow:
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
> >> >>>> (line
> >> >>>> 154)
> >> >>>>        Opening
> >> >>>>         > /cassandra/data/COMMENT
> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java
> >> >>>> (line 275)
> >> >>>>         > reading saved cache
> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
> >> >>>> (line 547)
> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
> >> >>>> COMMENT
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java
> >> >>>> (line 275)
> >> >>>>         > reading saved cache
> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
> >> >>>>        CacheWriter.java (line
> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>        --
> >> >>>>        Jonathan Ellis
> >> >>>>        Project Chair, Apache Cassandra
> >> >>>>        co-founder of DataStax, the source for professional Cassandra
> >> >>>> support
> >> >>>>        http://www.datastax.com
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >
> >> >
> >
> >
> 
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

just found out that changes via cassandra-cli, the schema change didn't
reach node2. and node2 became unreachable....

I did as this document:
http://wiki.apache.org/cassandra/FAQ#schema_disagreement

but after that I just got two schema versons:



ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]


is that enough delete Schema* && Migrations* sstables and restart the node?



On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <sp...@gmail.com> wrote:

> thanks a lot for  all the help!  I have gone through the steps and
> successfully brought up the node2 :)
>
>
> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com> wrote:
> > Because the file only preserve the "key" of records, not the whole
> record.
> > Records for those saved key will be loaded into cassandra during the
> startup
> > of cassandra.
> >
> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com>
> wrote:
> >>
> >> but the data size in the saved_cache are relatively small:
> >>
> >> will that cause the load problem?
> >>
> >>  ls  -lh  /cassandra/saved_caches/
> >> total 32M
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> cass-CommentSortsCache-KeyCache
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> cass-CommentSortsCache-RowCache
> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> cass-SavesByAccount-KeyCache
> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> system-HintsColumnFamily-KeyCache
> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> system-LocationInfo-KeyCache
> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >>
> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aa...@thelastpickle.com>
> >> wrote:
> >> > If you have a node that cannot start up due to issues loading the
> saved
> >> > cache delete the files in the saved_cache directory before starting
> it.
> >> >
> >> > The settings to save the row and key cache are per CF. You can change
> >> > them with an update column family statement via the CLI when attached
> to any
> >> > node. You may then want to check the saved_caches directory and delete
> any
> >> > files that are left (not sure if they are automatically deleted).
> >> >
> >> > i would recommend:
> >> > - stop node 2
> >> > - delete it's saved_cache
> >> > - make the schema change via another node
> >> > - startup node 2
> >> >
> >> > Cheers
> >> >
> >> > -----------------
> >> > Aaron Morton
> >> > Freelance Cassandra Developer
> >> > @aaronmorton
> >> > http://www.thelastpickle.com
> >> >
> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >
> >> >> does this need to be cluster wide? or I could just modify the caches
> >> >> on one node?   since I could not connect to the node with
> >> >> cassandra-cli, it says "connection refused"
> >> >>
> >> >>
> >> >> [default@unknown] connect node2/9160;
> >> >> Exception connecting to node2/9160. Reason: Connection refused.
> >> >>
> >> >>
> >> >> so if I change the cache size via other nodes, how could node2 be
> >> >> notified the changing?    kill cassandra and start it again could
> make
> >> >> it update the schema?
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz>
> >> >> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> yes, we saw exactly the same messages. We got rid of these by doing
> >> >>> the
> >> >>> following:
> >> >>>
> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >> >>> * Kill Cassandra
> >> >>> * Remove all files in the saved_caches directory
> >> >>> * Start Cassandra
> >> >>> * Slowly bring back row & key caches (if desired, we left them off)
> >> >>>
> >> >>> Cheers,
> >> >>>
> >> >>>        T.
> >> >>>
> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>>>
> >> >>>>  I saw alot slicequeryfilter things if changed the log level to
> >> >>>> DEBUG.
> >> >>>>  just
> >> >>>> thought even bring up a new node will be faster than start the old
> >> >>>> one..... it
> >> >>>> is wired
> >> >>>>
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:41729@1313190821826229
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
> >> >>>> <ma...@gmail.com>> wrote:
> >> >>>>
> >> >>>>    but it seems the row cache is cluster wide, how will  the change
> >> >>>> of row
> >> >>>>    cache affect the read speed?
> >> >>>>
> >> >>>>
> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <
> jbellis@gmail.com
> >> >>>>    <ma...@gmail.com>> wrote:
> >> >>>>
> >> >>>>        Or leave row cache enabled but disable cache saving (and
> >> >>>> remove the
> >> >>>>        one already on disk).
> >> >>>>
> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >> >>>> <aaron@thelastpickle.com
> >> >>>>        <ma...@thelastpickle.com>> wrote:
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> ColumnFamilyStore.java
> >> >>>> (line 547)
> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
> >> >>>> COMMENT
> >> >>>>         >
> >> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
> >> >>>> cache.
> >> >>>> Thats a
> >> >>>>         > pretty big row cache, I would suggest reducing or
> disabling
> >> >>>> it.
> >> >>>>         > Background
> >> >>>>
> >> >>>>
> >> >>>>
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>>>         >
> >> >>>>         > and server can not afford the load then crashed. after
> come
> >> >>>> back,
> >> >>>>        node 3 can
> >> >>>>         > not return for more than 96 hours
> >> >>>>         >
> >> >>>>         > Crashed how ?
> >> >>>>         > You may be seeing
> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>>>         > Watch nodetool compactionstats to see when the Merkle
> tree
> >> >>>> build
> >> >>>>        finishes
> >> >>>>         > and nodetool netstats to see which CF's are streaming.
> >> >>>>         > Cheers
> >> >>>>         > -----------------
> >> >>>>         > Aaron Morton
> >> >>>>         > Freelance Cassandra Developer
> >> >>>>         > @aaronmorton
> >> >>>>         > http://www.thelastpickle.com
> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >> >>>>         >
> >> >>>>         >
> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems
> >> >>>> alot
> >> >>>> data
> >> >>>>         > generated.  and server can not afford the load then
> >> >>>> crashed.
> >> >>>>         > after come back, node 3 can not return for more than 96
> >> >>>> hours
> >> >>>>         >
> >> >>>>         > for 34GB data, the node 2 could restart and back online
> >> >>>> within 1
> >> >>>> hour.
> >> >>>>         >
> >> >>>>         > I am not sure what's wrong with node3 and should I
> restart
> >> >>>> node
> >> >>>> 3 again?
> >> >>>>         > thanks!
> >> >>>>         >
> >> >>>>         > Address         Status State   Load            Owns
> >> >>>>  Token
> >> >>>>         >
> >> >>>>         > 113427455640312821154458202477256070484
> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >> >>>>         > 56713727820156410577229101238628035242
> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >> >>>>         > 113427455640312821154458202477256070484
> >> >>>>         >
> >> >>>>         >
> >> >>>>         > the log shows it is still going on, not sure why it is so
> >> >>>> slow:
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
> >> >>>> (line
> >> >>>> 154)
> >> >>>>        Opening
> >> >>>>         > /cassandra/data/COMMENT
> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
> ColumnFamilyStore.java
> >> >>>> (line 275)
> >> >>>>         > reading saved cache
> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> ColumnFamilyStore.java
> >> >>>> (line 547)
> >> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
> >> >>>> COMMENT
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
> ColumnFamilyStore.java
> >> >>>> (line 275)
> >> >>>>         > reading saved cache
> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
> >> >>>>        CacheWriter.java (line
> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>        --
> >> >>>>        Jonathan Ellis
> >> >>>>        Project Chair, Apache Cassandra
> >> >>>>        co-founder of DataStax, the source for professional
> Cassandra
> >> >>>> support
> >> >>>>        http://www.datastax.com
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >
> >> >
> >
> >
>
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

thanks a lot for  all the help!  I have gone through the steps and
successfully brought up the node2 :)

On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yu...@gmail.com> wrote:
> Because the file only preserve the "key" of records, not the whole record.
> Records for those saved key will be loaded into cassandra during the
startup
> of cassandra.
>
> On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com> wrote:
>>
>> but the data size in the saved_cache are relatively small:
>>
>> will that cause the load problem?
>>
>>  ls  -lh  /cassandra/saved_caches/
>> total 32M
>> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> cass-CommentSortsCache-KeyCache
>> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> cass-CommentSortsCache-RowCache
>> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
>> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
>> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
>> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
>> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
>> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
>> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> system-HintsColumnFamily-KeyCache
>> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50 system-LocationInfo-KeyCache
>> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
>> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>>
>> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> > If you have a node that cannot start up due to issues loading the saved
>> > cache delete the files in the saved_cache directory before starting it.
>> >
>> > The settings to save the row and key cache are per CF. You can change
>> > them with an update column family statement via the CLI when attached
to any
>> > node. You may then want to check the saved_caches directory and delete
any
>> > files that are left (not sure if they are automatically deleted).
>> >
>> > i would recommend:
>> > - stop node 2
>> > - delete it's saved_cache
>> > - make the schema change via another node
>> > - startup node 2
>> >
>> > Cheers
>> >
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >
>> >> does this need to be cluster wide? or I could just modify the caches
>> >> on one node?   since I could not connect to the node with
>> >> cassandra-cli, it says "connection refused"
>> >>
>> >>
>> >> [default@unknown] connect node2/9160;
>> >> Exception connecting to node2/9160. Reason: Connection refused.
>> >>
>> >>
>> >> so if I change the cache size via other nodes, how could node2 be
>> >> notified the changing?    kill cassandra and start it again could make
>> >> it update the schema?
>> >>
>> >>
>> >>
>> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz>
>> >> wrote:
>> >>> Hi,
>> >>>
>> >>> yes, we saw exactly the same messages. We got rid of these by doing
>> >>> the
>> >>> following:
>> >>>
>> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> >>> * Kill Cassandra
>> >>> * Remove all files in the saved_caches directory
>> >>> * Start Cassandra
>> >>> * Slowly bring back row & key caches (if desired, we left them off)
>> >>>
>> >>> Cheers,
>> >>>
>> >>>        T.
>> >>>
>> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >>>>
>> >>>>  I saw alot slicequeryfilter things if changed the log level to
>> >>>> DEBUG.
>> >>>>  just
>> >>>> thought even bring up a new node will be faster than start the old
>> >>>> one..... it
>> >>>> is wired
>> >>>>
>> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line
123)
>> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
>> >>>> <ma...@gmail.com>> wrote:
>> >>>>
>> >>>>    but it seems the row cache is cluster wide, how will  the change
>> >>>> of row
>> >>>>    cache affect the read speed?
>> >>>>
>> >>>>
>> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <
jbellis@gmail.com
>> >>>>    <ma...@gmail.com>> wrote:
>> >>>>
>> >>>>        Or leave row cache enabled but disable cache saving (and
>> >>>> remove the
>> >>>>        one already on disk).
>> >>>>
>> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >>>> <aaron@thelastpickle.com
>> >>>>        <ma...@thelastpickle.com>> wrote:
>> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
ColumnFamilyStore.java
>> >>>> (line 547)
>> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>> >>>> COMMENT
>> >>>>         >
>> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
>> >>>> cache.
>> >>>> Thats a
>> >>>>         > pretty big row cache, I would suggest reducing or
disabling
>> >>>> it.
>> >>>>         > Background
>> >>>>
>> >>>>
>> >>>>
http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >>>>         >
>> >>>>         > and server can not afford the load then crashed. after
come
>> >>>> back,
>> >>>>        node 3 can
>> >>>>         > not return for more than 96 hours
>> >>>>         >
>> >>>>         > Crashed how ?
>> >>>>         > You may be seeing
>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >>>>         > Watch nodetool compactionstats to see when the Merkle tree
>> >>>> build
>> >>>>        finishes
>> >>>>         > and nodetool netstats to see which CF's are streaming.
>> >>>>         > Cheers
>> >>>>         > -----------------
>> >>>>         > Aaron Morton
>> >>>>         > Freelance Cassandra Developer
>> >>>>         > @aaronmorton
>> >>>>         > http://www.thelastpickle.com
>> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >>>>         >
>> >>>>         >
>> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems
>> >>>> alot
>> >>>> data
>> >>>>         > generated.  and server can not afford the load then
>> >>>> crashed.
>> >>>>         > after come back, node 3 can not return for more than 96
>> >>>> hours
>> >>>>         >
>> >>>>         > for 34GB data, the node 2 could restart and back online
>> >>>> within 1
>> >>>> hour.
>> >>>>         >
>> >>>>         > I am not sure what's wrong with node3 and should I restart
>> >>>> node
>> >>>> 3 again?
>> >>>>         > thanks!
>> >>>>         >
>> >>>>         > Address         Status State   Load            Owns
>> >>>>  Token
>> >>>>         >
>> >>>>         > 113427455640312821154458202477256070484
>> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>> >>>>         > node2     Up     Normal  31.44 GB        33.33%
>> >>>>         > 56713727820156410577229101238628035242
>> >>>>         > node3     Down   Normal  177.55 GB       33.33%
>> >>>>         > 113427455640312821154458202477256070484
>> >>>>         >
>> >>>>         >
>> >>>>         > the log shows it is still going on, not sure why it is so
>> >>>> slow:
>> >>>>         >
>> >>>>         >
>> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
>> >>>> (line
>> >>>> 154)
>> >>>>        Opening
>> >>>>         > /cassandra/data/COMMENT
>> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
ColumnFamilyStore.java
>> >>>> (line 275)
>> >>>>         > reading saved cache
>> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
ColumnFamilyStore.java
>> >>>> (line 547)
>> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>> >>>> COMMENT
>> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
ColumnFamilyStore.java
>> >>>> (line 275)
>> >>>>         > reading saved cache
>> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>> >>>>        CacheWriter.java (line
>> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>         >
>> >>>>
>> >>>>
>> >>>>
>> >>>>        --
>> >>>>        Jonathan Ellis
>> >>>>        Project Chair, Apache Cassandra
>> >>>>        co-founder of DataStax, the source for professional Cassandra
>> >>>> support
>> >>>>        http://www.datastax.com
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >
>> >
>
>

Re: node restart taking too long

Posted by Boris Yen <yu...@gmail.com>.

Because the file only preserve the "key" of records, not the whole record.
Records for those saved key will be loaded into cassandra during the startup
of cassandra.

On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <sp...@gmail.com> wrote:

> but the data size in the saved_cache are relatively small:
>
> will that cause the load problem?
>
>  ls  -lh  /cassandra/saved_caches/
> total 32M
> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> cass-CommentSortsCache-KeyCache
> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> cass-CommentSortsCache-RowCache
> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> system-HintsColumnFamily-KeyCache
> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50 system-LocationInfo-KeyCache
> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>
> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aa...@thelastpickle.com>
> wrote:
> > If you have a node that cannot start up due to issues loading the saved
> cache delete the files in the saved_cache directory before starting it.
> >
> > The settings to save the row and key cache are per CF. You can change
> them with an update column family statement via the CLI when attached to any
> node. You may then want to check the saved_caches directory and delete any
> files that are left (not sure if they are automatically deleted).
> >
> > i would recommend:
> > - stop node 2
> > - delete it's saved_cache
> > - make the schema change via another node
> > - startup node 2
> >
> > Cheers
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >
> >> does this need to be cluster wide? or I could just modify the caches
> >> on one node?   since I could not connect to the node with
> >> cassandra-cli, it says "connection refused"
> >>
> >>
> >> [default@unknown] connect node2/9160;
> >> Exception connecting to node2/9160. Reason: Connection refused.
> >>
> >>
> >> so if I change the cache size via other nodes, how could node2 be
> >> notified the changing?    kill cassandra and start it again could make
> >> it update the schema?
> >>
> >>
> >>
> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz>
> wrote:
> >>> Hi,
> >>>
> >>> yes, we saw exactly the same messages. We got rid of these by doing the
> >>> following:
> >>>
> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >>> * Kill Cassandra
> >>> * Remove all files in the saved_caches directory
> >>> * Start Cassandra
> >>> * Slowly bring back row & key caches (if desired, we left them off)
> >>>
> >>> Cheers,
> >>>
> >>>        T.
> >>>
> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >>>>
> >>>>  I saw alot slicequeryfilter things if changed the log level to DEBUG.
> >>>>  just
> >>>> thought even bring up a new node will be faster than start the old
> >>>> one..... it
> >>>> is wired
> >>>>
> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
> >>>> <ma...@gmail.com>> wrote:
> >>>>
> >>>>    but it seems the row cache is cluster wide, how will  the change of
> row
> >>>>    cache affect the read speed?
> >>>>
> >>>>
> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jbellis@gmail.com
> >>>>    <ma...@gmail.com>> wrote:
> >>>>
> >>>>        Or leave row cache enabled but disable cache saving (and remove
> the
> >>>>        one already on disk).
> >>>>
> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >>>> <aaron@thelastpickle.com
> >>>>        <ma...@thelastpickle.com>> wrote:
> >>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
> >>>> (line 547)
> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
> >>>> COMMENT
> >>>>         >
> >>>>         > It's taking 29 minutes to load 200,000 rows in the  row
> cache.
> >>>> Thats a
> >>>>         > pretty big row cache, I would suggest reducing or disabling
> it.
> >>>>         > Background
> >>>>
> >>>>
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >>>>         >
> >>>>         > and server can not afford the load then crashed. after come
> >>>> back,
> >>>>        node 3 can
> >>>>         > not return for more than 96 hours
> >>>>         >
> >>>>         > Crashed how ?
> >>>>         > You may be seeing
> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >>>>         > Watch nodetool compactionstats to see when the Merkle tree
> build
> >>>>        finishes
> >>>>         > and nodetool netstats to see which CF's are streaming.
> >>>>         > Cheers
> >>>>         > -----------------
> >>>>         > Aaron Morton
> >>>>         > Freelance Cassandra Developer
> >>>>         > @aaronmorton
> >>>>         > http://www.thelastpickle.com
> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >>>>         >
> >>>>         >
> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems
> alot
> >>>> data
> >>>>         > generated.  and server can not afford the load then crashed.
> >>>>         > after come back, node 3 can not return for more than 96
> hours
> >>>>         >
> >>>>         > for 34GB data, the node 2 could restart and back online
> within 1
> >>>> hour.
> >>>>         >
> >>>>         > I am not sure what's wrong with node3 and should I restart
> node
> >>>> 3 again?
> >>>>         > thanks!
> >>>>         >
> >>>>         > Address         Status State   Load            Owns    Token
> >>>>         >
> >>>>         > 113427455640312821154458202477256070484
> >>>>         > node1     Up     Normal  34.11 GB        33.33%  0
> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >>>>         > 56713727820156410577229101238628035242
> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >>>>         > 113427455640312821154458202477256070484
> >>>>         >
> >>>>         >
> >>>>         > the log shows it is still going on, not sure why it is so
> slow:
> >>>>         >
> >>>>         >
> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
> (line
> >>>> 154)
> >>>>        Opening
> >>>>         > /cassandra/data/COMMENT
> >>>>         >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java
> >>>> (line 275)
> >>>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
> >>>> (line 547)
> >>>>         > completed loading (1744370 ms; 200000 keys) row cache for
> >>>> COMMENT
> >>>>         >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java
> >>>> (line 275)
> >>>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
> >>>>        CacheWriter.java (line
> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>
> >>>>
> >>>>
> >>>>        --
> >>>>        Jonathan Ellis
> >>>>        Project Chair, Apache Cassandra
> >>>>        co-founder of DataStax, the source for professional Cassandra
> >>>> support
> >>>>        http://www.datastax.com
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >
> >
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

but the data size in the saved_cache are relatively small:

will that cause the load problem?

 ls  -lh  /cassandra/saved_caches/
total 32M
-rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53 cass-CommentSortsCache-KeyCache
-rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29 cass-CommentSortsCache-RowCache
-rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
-rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
-rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
-rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
-rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
-rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
-rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
-rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
-rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
-rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
-rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
-rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
-rw-r--r-- 1 cass cass   28 2011-08-14 12:50 system-HintsColumnFamily-KeyCache
-rw-r--r-- 1 cass cass    5 2011-08-14 12:50 system-LocationInfo-KeyCache
-rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
-rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache

On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aa...@thelastpickle.com> wrote:
> If you have a node that cannot start up due to issues loading the saved cache delete the files in the saved_cache directory before starting it.
>
> The settings to save the row and key cache are per CF. You can change them with an update column family statement via the CLI when attached to any node. You may then want to check the saved_caches directory and delete any files that are left (not sure if they are automatically deleted).
>
> i would recommend:
> - stop node 2
> - delete it's saved_cache
> - make the schema change via another node
> - startup node 2
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>
>> does this need to be cluster wide? or I could just modify the caches
>> on one node?   since I could not connect to the node with
>> cassandra-cli, it says "connection refused"
>>
>>
>> [default@unknown] connect node2/9160;
>> Exception connecting to node2/9160. Reason: Connection refused.
>>
>>
>> so if I change the cache size via other nodes, how could node2 be
>> notified the changing?    kill cassandra and start it again could make
>> it update the schema?
>>
>>
>>
>> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz> wrote:
>>> Hi,
>>>
>>> yes, we saw exactly the same messages. We got rid of these by doing the
>>> following:
>>>
>>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>>> * Kill Cassandra
>>> * Remove all files in the saved_caches directory
>>> * Start Cassandra
>>> * Slowly bring back row & key caches (if desired, we left them off)
>>>
>>> Cheers,
>>>
>>>        T.
>>>
>>> On 16/08/11 23:35, Yan Chunlu wrote:
>>>>
>>>>  I saw alot slicequeryfilter things if changed the log level to DEBUG.
>>>>  just
>>>> thought even bring up a new node will be faster than start the old
>>>> one..... it
>>>> is wired
>>>>
>>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>>>>
>>>>
>>>>
>>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
>>>> <ma...@gmail.com>> wrote:
>>>>
>>>>    but it seems the row cache is cluster wide, how will  the change of row
>>>>    cache affect the read speed?
>>>>
>>>>
>>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jbellis@gmail.com
>>>>    <ma...@gmail.com>> wrote:
>>>>
>>>>        Or leave row cache enabled but disable cache saving (and remove the
>>>>        one already on disk).
>>>>
>>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>>>> <aaron@thelastpickle.com
>>>>        <ma...@thelastpickle.com>> wrote:
>>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>>>> (line 547)
>>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>>>> COMMENT
>>>>         >
>>>>         > It's taking 29 minutes to load 200,000 rows in the  row cache.
>>>> Thats a
>>>>         > pretty big row cache, I would suggest reducing or disabling it.
>>>>         > Background
>>>>
>>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>>>         >
>>>>         > and server can not afford the load then crashed. after come
>>>> back,
>>>>        node 3 can
>>>>         > not return for more than 96 hours
>>>>         >
>>>>         > Crashed how ?
>>>>         > You may be seeing
>>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>>>         > Watch nodetool compactionstats to see when the Merkle tree build
>>>>        finishes
>>>>         > and nodetool netstats to see which CF's are streaming.
>>>>         > Cheers
>>>>         > -----------------
>>>>         > Aaron Morton
>>>>         > Freelance Cassandra Developer
>>>>         > @aaronmorton
>>>>         > http://www.thelastpickle.com
>>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>>>         >
>>>>         >
>>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot
>>>> data
>>>>         > generated.  and server can not afford the load then crashed.
>>>>         > after come back, node 3 can not return for more than 96 hours
>>>>         >
>>>>         > for 34GB data, the node 2 could restart and back online within 1
>>>> hour.
>>>>         >
>>>>         > I am not sure what's wrong with node3 and should I restart node
>>>> 3 again?
>>>>         > thanks!
>>>>         >
>>>>         > Address         Status State   Load            Owns    Token
>>>>         >
>>>>         > 113427455640312821154458202477256070484
>>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>>>>         > node2     Up     Normal  31.44 GB        33.33%
>>>>         > 56713727820156410577229101238628035242
>>>>         > node3     Down   Normal  177.55 GB       33.33%
>>>>         > 113427455640312821154458202477256070484
>>>>         >
>>>>         >
>>>>         > the log shows it is still going on, not sure why it is so slow:
>>>>         >
>>>>         >
>>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line
>>>> 154)
>>>>        Opening
>>>>         > /cassandra/data/COMMENT
>>>>         >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java
>>>> (line 275)
>>>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>>>> (line 547)
>>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>>>> COMMENT
>>>>         >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java
>>>> (line 275)
>>>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>>>>        CacheWriter.java (line
>>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>         >
>>>>
>>>>
>>>>
>>>>        --
>>>>        Jonathan Ellis
>>>>        Project Chair, Apache Cassandra
>>>>        co-founder of DataStax, the source for professional Cassandra
>>>> support
>>>>        http://www.datastax.com
>>>>
>>>>
>>>>
>>>
>>>
>
>

Re: node restart taking too long

Posted by aaron morton <aa...@thelastpickle.com>.

If you have a node that cannot start up due to issues loading the saved cache delete the files in the saved_cache directory before starting it. 

The settings to save the row and key cache are per CF. You can change them with an update column family statement via the CLI when attached to any node. You may then want to check the saved_caches directory and delete any files that are left (not sure if they are automatically deleted). 

i would recommend:
- stop node 2
- delete it's saved_cache
- make the schema change via another node
- startup node 2

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:

> does this need to be cluster wide? or I could just modify the caches
> on one node?   since I could not connect to the node with
> cassandra-cli, it says "connection refused"
> 
> 
> [default@unknown] connect node2/9160;
> Exception connecting to node2/9160. Reason: Connection refused.
> 
> 
> so if I change the cache size via other nodes, how could node2 be
> notified the changing?    kill cassandra and start it again could make
> it update the schema?
> 
> 
> 
> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz> wrote:
>> Hi,
>> 
>> yes, we saw exactly the same messages. We got rid of these by doing the
>> following:
>> 
>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> * Kill Cassandra
>> * Remove all files in the saved_caches directory
>> * Start Cassandra
>> * Slowly bring back row & key caches (if desired, we left them off)
>> 
>> Cheers,
>> 
>>        T.
>> 
>> On 16/08/11 23:35, Yan Chunlu wrote:
>>> 
>>>  I saw alot slicequeryfilter things if changed the log level to DEBUG.
>>>  just
>>> thought even bring up a new node will be faster than start the old
>>> one..... it
>>> is wired
>>> 
>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>>> 
>>> 
>>> 
>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
>>> <ma...@gmail.com>> wrote:
>>> 
>>>    but it seems the row cache is cluster wide, how will  the change of row
>>>    cache affect the read speed?
>>> 
>>> 
>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jbellis@gmail.com
>>>    <ma...@gmail.com>> wrote:
>>> 
>>>        Or leave row cache enabled but disable cache saving (and remove the
>>>        one already on disk).
>>> 
>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>>> <aaron@thelastpickle.com
>>>        <ma...@thelastpickle.com>> wrote:
>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>>> (line 547)
>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>>> COMMENT
>>>         >
>>>         > It's taking 29 minutes to load 200,000 rows in the  row cache.
>>> Thats a
>>>         > pretty big row cache, I would suggest reducing or disabling it.
>>>         > Background
>>> 
>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>>         >
>>>         > and server can not afford the load then crashed. after come
>>> back,
>>>        node 3 can
>>>         > not return for more than 96 hours
>>>         >
>>>         > Crashed how ?
>>>         > You may be seeing
>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>>         > Watch nodetool compactionstats to see when the Merkle tree build
>>>        finishes
>>>         > and nodetool netstats to see which CF's are streaming.
>>>         > Cheers
>>>         > -----------------
>>>         > Aaron Morton
>>>         > Freelance Cassandra Developer
>>>         > @aaronmorton
>>>         > http://www.thelastpickle.com
>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>>         >
>>>         >
>>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot
>>> data
>>>         > generated.  and server can not afford the load then crashed.
>>>         > after come back, node 3 can not return for more than 96 hours
>>>         >
>>>         > for 34GB data, the node 2 could restart and back online within 1
>>> hour.
>>>         >
>>>         > I am not sure what's wrong with node3 and should I restart node
>>> 3 again?
>>>         > thanks!
>>>         >
>>>         > Address         Status State   Load            Owns    Token
>>>         >
>>>         > 113427455640312821154458202477256070484
>>>         > node1     Up     Normal  34.11 GB        33.33%  0
>>>         > node2     Up     Normal  31.44 GB        33.33%
>>>         > 56713727820156410577229101238628035242
>>>         > node3     Down   Normal  177.55 GB       33.33%
>>>         > 113427455640312821154458202477256070484
>>>         >
>>>         >
>>>         > the log shows it is still going on, not sure why it is so slow:
>>>         >
>>>         >
>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line
>>> 154)
>>>        Opening
>>>         > /cassandra/data/COMMENT
>>>         >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java
>>> (line 275)
>>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>>> (line 547)
>>>         > completed loading (1744370 ms; 200000 keys) row cache for
>>> COMMENT
>>>         >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java
>>> (line 275)
>>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>>>        CacheWriter.java (line
>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>>>         >
>>>         >
>>>         >
>>>         >
>>>         >
>>>         >
>>> 
>>> 
>>> 
>>>        --
>>>        Jonathan Ellis
>>>        Project Chair, Apache Cassandra
>>>        co-founder of DataStax, the source for professional Cassandra
>>> support
>>>        http://www.datastax.com
>>> 
>>> 
>>> 
>> 
>>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

does this need to be cluster wide? or I could just modify the caches
on one node?   since I could not connect to the node with
cassandra-cli, it says "connection refused"


[default@unknown] connect node2/9160;
Exception connecting to node2/9160. Reason: Connection refused.


so if I change the cache size via other nodes, how could node2 be
notified the changing?    kill cassandra and start it again could make
it update the schema?



On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <th...@wetafx.co.nz> wrote:
> Hi,
>
> yes, we saw exactly the same messages. We got rid of these by doing the
> following:
>
> * Set all row & key caches in your CFs to 0 via cassandra-cli
> * Kill Cassandra
> * Remove all files in the saved_caches directory
> * Start Cassandra
> * Slowly bring back row & key caches (if desired, we left them off)
>
> Cheers,
>
>        T.
>
> On 16/08/11 23:35, Yan Chunlu wrote:
>>
>>  I saw alot slicequeryfilter things if changed the log level to DEBUG.
>>  just
>> thought even bring up a new node will be faster than start the old
>> one..... it
>> is wired
>>
>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>>
>>
>>
>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>    but it seems the row cache is cluster wide, how will  the change of row
>>    cache affect the read speed?
>>
>>
>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jbellis@gmail.com
>>    <ma...@gmail.com>> wrote:
>>
>>        Or leave row cache enabled but disable cache saving (and remove the
>>        one already on disk).
>>
>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> <aaron@thelastpickle.com
>>        <ma...@thelastpickle.com>> wrote:
>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>> (line 547)
>>         > completed loading (1744370 ms; 200000 keys) row cache for
>> COMMENT
>>         >
>>         > It's taking 29 minutes to load 200,000 rows in the  row cache.
>> Thats a
>>         > pretty big row cache, I would suggest reducing or disabling it.
>>         > Background
>>
>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>         >
>>         > and server can not afford the load then crashed. after come
>> back,
>>        node 3 can
>>         > not return for more than 96 hours
>>         >
>>         > Crashed how ?
>>         > You may be seeing
>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>         > Watch nodetool compactionstats to see when the Merkle tree build
>>        finishes
>>         > and nodetool netstats to see which CF's are streaming.
>>         > Cheers
>>         > -----------------
>>         > Aaron Morton
>>         > Freelance Cassandra Developer
>>         > @aaronmorton
>>         > http://www.thelastpickle.com
>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>         >
>>         >
>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot
>> data
>>         > generated.  and server can not afford the load then crashed.
>>         > after come back, node 3 can not return for more than 96 hours
>>         >
>>         > for 34GB data, the node 2 could restart and back online within 1
>> hour.
>>         >
>>         > I am not sure what's wrong with node3 and should I restart node
>> 3 again?
>>         > thanks!
>>         >
>>         > Address         Status State   Load            Owns    Token
>>         >
>>         > 113427455640312821154458202477256070484
>>         > node1     Up     Normal  34.11 GB        33.33%  0
>>         > node2     Up     Normal  31.44 GB        33.33%
>>         > 56713727820156410577229101238628035242
>>         > node3     Down   Normal  177.55 GB       33.33%
>>         > 113427455640312821154458202477256070484
>>         >
>>         >
>>         > the log shows it is still going on, not sure why it is so slow:
>>         >
>>         >
>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line
>> 154)
>>        Opening
>>         > /cassandra/data/COMMENT
>>         >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java
>> (line 275)
>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>> (line 547)
>>         > completed loading (1744370 ms; 200000 keys) row cache for
>> COMMENT
>>         >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java
>> (line 275)
>>         > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>>        CacheWriter.java (line
>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>>         >
>>         >
>>         >
>>         >
>>         >
>>         >
>>
>>
>>
>>        --
>>        Jonathan Ellis
>>        Project Chair, Apache Cassandra
>>        co-founder of DataStax, the source for professional Cassandra
>> support
>>        http://www.datastax.com
>>
>>
>>
>
>

Re: node restart taking too long

Posted by Teijo Holzer <th...@wetafx.co.nz>.

Hi,

yes, we saw exactly the same messages. We got rid of these by doing the following:

* Set all row & key caches in your CFs to 0 via cassandra-cli
* Kill Cassandra
* Remove all files in the saved_caches directory
* Start Cassandra
* Slowly bring back row & key caches (if desired, we left them off)

Cheers,

	T.

On 16/08/11 23:35, Yan Chunlu wrote:
>   I saw alot slicequeryfilter things if changed the log level to DEBUG.  just
> thought even bring up a new node will be faster than start the old one..... it
> is wired
>
> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>
>
>
> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
> <ma...@gmail.com>> wrote:
>
>     but it seems the row cache is cluster wide, how will  the change of row
>     cache affect the read speed?
>
>
>     On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jbellis@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Or leave row cache enabled but disable cache saving (and remove the
>         one already on disk).
>
>         On Sun, Aug 14, 2011 at 5:05 PM, aaron morton <aaron@thelastpickle.com
>         <ma...@thelastpickle.com>> wrote:
>          >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
>          > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>          >
>          > It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
>          > pretty big row cache, I would suggest reducing or disabling it.
>          > Background
>         http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>          >
>          > and server can not afford the load then crashed. after come back,
>         node 3 can
>          > not return for more than 96 hours
>          >
>          > Crashed how ?
>          > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
>          > Watch nodetool compactionstats to see when the Merkle tree build
>         finishes
>          > and nodetool netstats to see which CF's are streaming.
>          > Cheers
>          > -----------------
>          > Aaron Morton
>          > Freelance Cassandra Developer
>          > @aaronmorton
>          > http://www.thelastpickle.com
>          > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>          >
>          >
>          > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
>          > generated.  and server can not afford the load then crashed.
>          > after come back, node 3 can not return for more than 96 hours
>          >
>          > for 34GB data, the node 2 could restart and back online within 1 hour.
>          >
>          > I am not sure what's wrong with node3 and should I restart node 3 again?
>          > thanks!
>          >
>          > Address         Status State   Load            Owns    Token
>          >
>          > 113427455640312821154458202477256070484
>          > node1     Up     Normal  34.11 GB        33.33%  0
>          > node2     Up     Normal  31.44 GB        33.33%
>          > 56713727820156410577229101238628035242
>          > node3     Down   Normal  177.55 GB       33.33%
>          > 113427455640312821154458202477256070484
>          >
>          >
>          > the log shows it is still going on, not sure why it is so slow:
>          >
>          >
>          >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154)
>         Opening
>          > /cassandra/data/COMMENT
>          >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
>          > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>          >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
>          > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>          >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
>          > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>          >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
>         CacheWriter.java (line
>          > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>          >
>          >
>          >
>          >
>          >
>          >
>
>
>
>         --
>         Jonathan Ellis
>         Project Chair, Apache Cassandra
>         co-founder of DataStax, the source for professional Cassandra support
>         http://www.datastax.com
>
>
>

Re: node restart taking too long

Posted by aaron morton <aa...@thelastpickle.com>.

the logs say it took a long time to read a saved row cache. Try removing the files from the saved_caches dir as Jonathan suggested. 

The collecting log lines with the INT max count are indicative of the IdentityQueryFilter. One of the places it is used is when adding rows to the cache. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16/08/2011, at 11:35 PM, Yan Chunlu wrote:

>  I saw alot slicequeryfilter things if changed the log level to DEBUG.  just thought even bring up a new node will be faster than start the old one..... it is wired
> 
> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
> 
> 
> 
> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <sp...@gmail.com> wrote:
> but it seems the row cache is cluster wide, how will  the change of row cache affect the read speed?
> 
> 
> On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Or leave row cache enabled but disable cache saving (and remove the
> one already on disk).
> 
> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton <aa...@thelastpickle.com> wrote:
> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
> >
> > It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
> > pretty big row cache, I would suggest reducing or disabling it.
> > Background http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >
> > and server can not afford the load then crashed. after come back, node 3 can
> > not return for more than 96 hours
> >
> > Crashed how ?
> > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
> > Watch nodetool compactionstats to see when the Merkle tree build finishes
> > and nodetool netstats to see which CF's are streaming.
> > Cheers
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >
> >
> > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
> > generated.  and server can not afford the load then crashed.
> > after come back, node 3 can not return for more than 96 hours
> >
> > for 34GB data, the node 2 could restart and back online within 1 hour.
> >
> > I am not sure what's wrong with node3 and should I restart node 3 again?
> > thanks!
> >
> > Address         Status State   Load            Owns    Token
> >
> > 113427455640312821154458202477256070484
> > node1     Up     Normal  34.11 GB        33.33%  0
> > node2     Up     Normal  31.44 GB        33.33%
> > 56713727820156410577229101238628035242
> > node3     Down   Normal  177.55 GB       33.33%
> > 113427455640312821154458202477256070484
> >
> >
> > the log shows it is still going on, not sure why it is so slow:
> >
> >
> >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) Opening
> > /cassandra/data/COMMENT
> >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
> >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java (line
> > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >
> >
> >
> >
> >
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

 I saw alot slicequeryfilter things if changed the log level to DEBUG.  just
thought even bring up a new node will be faster than start the old one.....
it is wired

DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112



On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <sp...@gmail.com> wrote:

> but it seems the row cache is cluster wide, how will  the change of row
> cache affect the read speed?
>
>
> On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> Or leave row cache enabled but disable cache saving (and remove the
>> one already on disk).
>>
>> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
>> > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>> >
>> > It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
>> > pretty big row cache, I would suggest reducing or disabling it.
>> > Background
>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >
>> > and server can not afford the load then crashed. after come back, node 3
>> can
>> > not return for more than 96 hours
>> >
>> > Crashed how ?
>> > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
>> > Watch nodetool compactionstats to see when the Merkle tree build
>> finishes
>> > and nodetool netstats to see which CF's are streaming.
>> > Cheers
>> > -----------------
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >
>> >
>> > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
>> > generated.  and server can not afford the load then crashed.
>> > after come back, node 3 can not return for more than 96 hours
>> >
>> > for 34GB data, the node 2 could restart and back online within 1 hour.
>> >
>> > I am not sure what's wrong with node3 and should I restart node 3 again?
>> > thanks!
>> >
>> > Address         Status State   Load            Owns    Token
>> >
>> > 113427455640312821154458202477256070484
>> > node1     Up     Normal  34.11 GB        33.33%  0
>> > node2     Up     Normal  31.44 GB        33.33%
>> > 56713727820156410577229101238628035242
>> > node3     Down   Normal  177.55 GB       33.33%
>> > 113427455640312821154458202477256070484
>> >
>> >
>> > the log shows it is still going on, not sure why it is so slow:
>> >
>> >
>> >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154)
>> Opening
>> > /cassandra/data/COMMENT
>> >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
>> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
>> > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>> >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
>> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>> >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java
>> (line
>> > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>

Re: node restart taking too long

Posted by Yan Chunlu <sp...@gmail.com>.

but it seems the row cache is cluster wide, how will  the change of row
cache affect the read speed?

On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> Or leave row cache enabled but disable cache saving (and remove the
> one already on disk).
>
> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton <aa...@thelastpickle.com>
> wrote:
> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
> >
> > It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
> > pretty big row cache, I would suggest reducing or disabling it.
> > Background
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >
> > and server can not afford the load then crashed. after come back, node 3
> can
> > not return for more than 96 hours
> >
> > Crashed how ?
> > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
> > Watch nodetool compactionstats to see when the Merkle tree build finishes
> > and nodetool netstats to see which CF's are streaming.
> > Cheers
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >
> >
> > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
> > generated.  and server can not afford the load then crashed.
> > after come back, node 3 can not return for more than 96 hours
> >
> > for 34GB data, the node 2 could restart and back online within 1 hour.
> >
> > I am not sure what's wrong with node3 and should I restart node 3 again?
> > thanks!
> >
> > Address         Status State   Load            Owns    Token
> >
> > 113427455640312821154458202477256070484
> > node1     Up     Normal  34.11 GB        33.33%  0
> > node2     Up     Normal  31.44 GB        33.33%
> > 56713727820156410577229101238628035242
> > node3     Down   Normal  177.55 GB       33.33%
> > 113427455640312821154458202477256070484
> >
> >
> > the log shows it is still going on, not sure why it is so slow:
> >
> >
> >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154)
> Opening
> > /cassandra/data/COMMENT
> >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> > completed loading (1744370 ms; 200000 keys) row cache for COMMENT
> >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java
> (line
> > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: node restart taking too long

Posted by Jonathan Ellis <jb...@gmail.com>.

Or leave row cache enabled but disable cache saving (and remove the
one already on disk).

On Sun, Aug 14, 2011 at 5:05 PM, aaron morton <aa...@thelastpickle.com> wrote:
>  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>
> It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
> pretty big row cache, I would suggest reducing or disabling it.
> Background http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>
> and server can not afford the load then crashed. after come back, node 3 can
> not return for more than 96 hours
>
> Crashed how ?
> You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
> Watch nodetool compactionstats to see when the Merkle tree build finishes
> and nodetool netstats to see which CF's are streaming.
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>
>
> I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
> generated.  and server can not afford the load then crashed.
> after come back, node 3 can not return for more than 96 hours
>
> for 34GB data, the node 2 could restart and back online within 1 hour.
>
> I am not sure what's wrong with node3 and should I restart node 3 again?
> thanks!
>
> Address         Status State   Load            Owns    Token
>
> 113427455640312821154458202477256070484
> node1     Up     Normal  34.11 GB        33.33%  0
> node2     Up     Normal  31.44 GB        33.33%
> 56713727820156410577229101238628035242
> node3     Down   Normal  177.55 GB       33.33%
> 113427455640312821154458202477256070484
>
>
> the log shows it is still going on, not sure why it is so slow:
>
>
>  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) Opening
> /cassandra/data/COMMENT
>  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
> reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
> reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java (line
> 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>
>
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: node restart taking too long

Posted by aaron morton <aa...@thelastpickle.com>.

>  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547) completed loading (1744370 ms; 200000 keys) row cache for COMMENT
It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a pretty big row cache, I would suggest reducing or disabling it. 
Background http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra

> and server can not afford the load then crashed. after come back, node 3 can not return for more than 96 hours
Crashed how ?

You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280 
Watch nodetool compactionstats to see when the Merkle tree build finishes and nodetool netstats to see which CF's are streaming. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15 Aug 2011, at 04:23, Yan Chunlu wrote:

> 
> 
> I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data generated.  and server can not afford the load then crashed.
> after come back, node 3 can not return for more than 96 hours
> 
> for 34GB data, the node 2 could restart and back online within 1 hour.
> 
> I am not sure what's wrong with node3 and should I restart node 3 again? thanks!
> 
> Address         Status State   Load            Owns    Token
>                                                        113427455640312821154458202477256070484
> node1     Up     Normal  34.11 GB        33.33%  0
> node2     Up     Normal  31.44 GB        33.33%  56713727820156410577229101238628035242
> node3     Down   Normal  177.55 GB       33.33%  113427455640312821154458202477256070484
> 
> 
> the log shows it is still going on, not sure why it is so slow:
> 
> 
>  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) Opening /cassandra/data/COMMENT
>  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275) reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547) completed loading (1744370 ms; 200000 keys) row cache for COMMENT
>  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275) reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java (line 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
> 
> 
> 
>