You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Elias Ross <ge...@noderunner.net> on 2013/11/06 06:47:01 UTC

cleanup failure; FileNotFoundException deleting (wrong?) db file

I'm seeing the following:

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db (No
such file or directory)
        at
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
        at
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1212)
        at
org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:54)
        at
org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:1032)
        at
org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:594)
        at
org.apache.cassandra.db.compaction.CompactionManager.access$500(CompactionManager.java:73)
        at
org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:327)
        at
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:253)

This is on an install with multiple data directories. The actual directory
contains files named something else:

[rhq@st11p01ad-rhq006 ~]$ ls -l
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-*
-rw-r--r-- 1 rhq rhq 849924573 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db
-rw-r--r-- 1 rhq rhq        75 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Digest.sha1
-rw-r--r-- 1 rhq rhq    151696 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Filter.db
-rw-r--r-- 1 rhq rhq   2186766 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Index.db
-rw-r--r-- 1 rhq rhq      5957 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Statistics.db
-rw-r--r-- 1 rhq rhq     15276 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Summary.db
-rw-r--r-- 1 rhq rhq        72 Nov  1 14:24
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-TOC.txt


It seems like it's missing the files it needs to hit? Is there something I
can do here?

Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

Posted by Elias Ross <ge...@noderunner.net>.
On Fri, Nov 8, 2013 at 10:31 AM, Elias Ross <ge...@noderunner.net> wrote:


> On Thu, Nov 7, 2013 at 7:01 PM, Krishna Chaitanya <bn...@gmail.com>wrote:
>
>> Check if its an issue with permissions or broken links..
>>
>>
> I don't think permissions are an issue. You might be on to something
> regarding the links.
>
>
As it turns out (and I noted in CASSANDRA-6298 already) this was a user
issue. One of my links was pointing to the same drive:

lrwxrwxrwx    1 root root     6 Oct 30 18:37 data05 -> /data5
lrwxrwxrwx    1 root root     6 Oct 30 18:37 data06 -> /data5

Thanks for the help everyone, I'm happy it's all working. I'm not so happy
that I messed up my configuration like this.

Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

Posted by Elias Ross <ge...@noderunner.net>.
On Thu, Nov 7, 2013 at 7:01 PM, Krishna Chaitanya <bn...@gmail.com>wrote:

> Check if its an issue with permissions or broken links..
>
>
I don't think permissions are an issue. You might be on to something
regarding the links.

I've been seeing this on 4 nodes, configured identically.

Here's what I think the problem may be: (or may be a combination of a few
problems)

1. I have symlinked the data directories. This confuses Cassandra in some
way, causing it to create multiple files. Does Cassandra care if the data
directory was symlinked from someplace? Would this cause an issue.

lrwxrwxrwx    1 root root     6 Oct 30 18:37 data01 -> /data1 # [1]

Evidence for:
a. Somehow it's creating duplicate hard links.
b. It is unlikely other Cassandra users would have setup their directories
like this and this seems like a serious bug.
c. Also, my other cluster is nearly identical (OS, JVM, 6 drives, same
Cassandra/RHQ, hardware similar) and not seeing the same issues, although
that is a two node cluster.

If I were to grep through, I guess I would see if there's a chance the path
that Java sees, maybe File.getAbsoluteFile() (which might resolve the link)
doesn't match the path of another file. In other words, it is a Cassandra
bug, based on some assumptions from the JVM


2. When I created the cluster, I had a single data directory for each node.
I then added 5 more. Somehow Cassandra mis-remembers where the data was
put, causing all sorts of issues. How does Cassandra decide where to put
its data and where to read it from? What happens when additional data
directories are added? There could be a bug in the code.

Evidence for:
a. Somehow it's looking for data in the wrong directory. It also seems
unlikely a user would create a cluster, then add 5 more drives.

# [1] The reason the links are setup is because the mount points didn't
match my Puppet setup, which sets up my directory permissions. So I added
the links to compensate.

Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

Posted by Krishna Chaitanya <bn...@gmail.com>.
Check if its an issue with permissions or broken links..
On Nov 6, 2013 11:17 AM, "Elias Ross" <ge...@noderunner.net> wrote:

>
> I'm seeing the following:
>
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db (No
> such file or directory)
>         at
> org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
>         at
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1212)
>         at
> org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:54)
>         at
> org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:1032)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:594)
>         at
> org.apache.cassandra.db.compaction.CompactionManager.access$500(CompactionManager.java:73)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:327)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:253)
>
> This is on an install with multiple data directories. The actual directory
> contains files named something else:
>
> [rhq@st11p01ad-rhq006 ~]$ ls -l
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-*
> -rw-r--r-- 1 rhq rhq 849924573 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db
> -rw-r--r-- 1 rhq rhq        75 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Digest.sha1
> -rw-r--r-- 1 rhq rhq    151696 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Filter.db
> -rw-r--r-- 1 rhq rhq   2186766 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Index.db
> -rw-r--r-- 1 rhq rhq      5957 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Statistics.db
> -rw-r--r-- 1 rhq rhq     15276 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Summary.db
> -rw-r--r-- 1 rhq rhq        72 Nov  1 14:24
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-TOC.txt
>
>
> It seems like it's missing the files it needs to hit? Is there something I
> can do here?
>

Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

Posted by Elias Ross <ge...@noderunner.net>.
On Wed, Nov 6, 2013 at 9:10 AM, Keith Freeman <8f...@gmail.com> wrote:

> Is it possible that the keyspace was dropped then re-created (
> https://issues.apache.org/jira/browse/CASSANDRA-4857)? I've seen similar
> stack traces in that case.
>
>
Thanks for the pointer.

There's a program (RHQ) that's managing my server and may have done the
create-drop-create sequence by mistake.

I also wonder if adding additional data directories after re-starting the
server may cause issues. What I mean is adding more dirs to
'data_file_directories' in cassandra.yaml, then restarting.

Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

Posted by Keith Freeman <8f...@gmail.com>.
Is it possible that the keyspace was dropped then re-created ( 
https://issues.apache.org/jira/browse/CASSANDRA-4857)? I've seen similar 
stack traces in that case.

On 11/05/2013 10:47 PM, Elias Ross wrote:
>
> I'm seeing the following:
>
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db (No 
> such file or directory)
>         at 
> org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1212)
>         at 
> org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:54)
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:1032)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:594)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager.access$500(CompactionManager.java:73)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:327)
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:253)
>
> This is on an install with multiple data directories. The actual 
> directory contains files named something else:
>
> [rhq@st11p01ad-rhq006 ~]$ ls -l 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-*
> -rw-r--r-- 1 rhq rhq 849924573 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db
> -rw-r--r-- 1 rhq rhq        75 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Digest.sha1
> -rw-r--r-- 1 rhq rhq    151696 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Filter.db
> -rw-r--r-- 1 rhq rhq   2186766 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Index.db
> -rw-r--r-- 1 rhq rhq      5957 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Statistics.db
> -rw-r--r-- 1 rhq rhq     15276 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Summary.db
> -rw-r--r-- 1 rhq rhq        72 Nov  1 14:24 
> /data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-TOC.txt
>
>
> It seems like it's missing the files it needs to hit? Is there 
> something I can do here?