You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/02/01 06:51:28 UTC

[jira] Created: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Temp files for failed compactions/streaming not cleaned up
----------------------------------------------------------

                 Key: CASSANDRA-2088
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Stu Hood
             Fix For: 0.7.2


>From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003435#comment-13003435 ] 

Jonathan Ellis commented on CASSANDRA-2088:
-------------------------------------------

bq. Are the sorts of failures we're considering for compaction ones that come from the CompactionIterator or SSTableScanner ?

Both. Also I suppose it's possible for the writer to error out from lack of disk space since it only checks at the beginning for space and doesn't "reserve" it vs flushes.

bq. Are we considering failures during the streaming or when processing the data after the stream has finished?

The former is much more common (I've never seen the latter reported), so I'd start with that.

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.7.4
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2088:
--------------------------------------

    Fix Version/s:     (was: 0.7.4)
                   0.8

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2088) Clean up after failed (repair) streaming operation

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2088:
--------------------------------------

         Reviewer: jbellis
      Description:     (was: From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.)
         Priority: Minor  (was: Major)
    Fix Version/s:     (was: 0.8)
                   0.7.5
         Assignee: Sylvain Lebresne  (was: Aaron Morton)
          Summary: Clean up after failed (repair) streaming operation  (was: Temp files for failed compactions/streaming not cleaned up)

Created CASSANDRA-2468 for compaction cleanup. Will close this one for streaming.

> Clean up after failed (repair) streaming operation
> --------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 0.7.5
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018942#comment-13018942 ] 

Jonathan Ellis commented on CASSANDRA-2088:
-------------------------------------------

bq. I'm attaching a patch against 0.7

Is that 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch?  I don't see the connection to .tmp files.  (Also: have you verified that the channel will actually infinite-loop returning 0?  Kind of odd behavior, although I guess it's technically within-spec.)

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Morton reassigned CASSANDRA-2088:
---------------------------------------

    Assignee: Aaron Morton

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.7.4
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018972#comment-13018972 ] 

Jonathan Ellis commented on CASSANDRA-2088:
-------------------------------------------

+1, and can you move some of that explanation inline as a comment?

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019005#comment-13019005 ] 

Sylvain Lebresne commented on CASSANDRA-2088:
---------------------------------------------

Committed that first part. I think we should keep that open to fix the tmp files for failed compaction and move the rest to another ticket (like CASSANDRA-2433 for instance).

About the attached patch on cleaning up failed compaction:
  * We should also handle cleanup and scrub
  * We should handle SSTableWriter.Builder as it is yet another place where we could miss to cleanup a tmp file on error.
  * In theory a failed flush could leave a tmp file behind. If that happens having a tmp file would be the least of your problem but for completeness sake we could handle it.
  * The logging when failing to close iwriter and dataFile in SSTableWriter could probably go at error (we should not be failing there, if we do something is wrong)
  * That's nitpick but I'm not a huge fan of catching RuntimeException in this case as this pollute the code for something that would be a programming error (that's probably debatable though). Maybe another solution would be to have this in the final block. It means making sure closeAndDelete() is ok with the file being already closed and/or deleted and having this final block *after* the closeAndOpenReader call.


> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003264#comment-13003264 ] 

Aaron Morton commented on CASSANDRA-2088:
-----------------------------------------

I'm keen to try this ticket (to learn more about compaction and repair) if it's not already been worked on. Also if it's ok for me to take a couple of days while I dig into this.

For compaction I'm looking in
- CompactionManager.doCompaction where it creates a new SSTableWriter via cfs.createCompactionWriter() 
- CompactionManager.doCleanupCompaction() also uses an SSTableWriter

Are the sorts of failures we're considering for compaction ones that come from the CompactionIterator or SSTableScanner ?

For repair I'm looking in:
- IncomingStreamReader appears to clean up the temporary pending file in some error situations. Do we have any more info on the sorts of failures here? e.g. If there is an IOException sending the re-stream message, or a non checked exception it will fail to cleaup the file. 
- I'm looking into what happens in StreamInSession.finished() closeIfFinished()
- Are we considering failures during the streaming or when processing the data after the stream has finished?

Any guidance welcome. 

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.7.4
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2088:
----------------------------------------

    Attachment: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch

I think there is a few different things here and I think we should separate them somehow.

Fixing the fact that streaming leave tmp files around when it fails is a 2 lines fix and I think this is simple enough that it could go to 0.7. I'm attaching a patch against 0.7. It's extracted from Aaron first patch, although rebased on 0.7 (and fix a bug).

Making repair aware that there has been some failures is actually more complicated so that should go in 0.8.1 or something (and should go to CASSANDRA-2433 or another ticket that describe the problem better). 

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Morton updated CASSANDRA-2088:
------------------------------------

    Attachment: 0002-delete-partial-sstable-if-compaction-error.patch
                0001-detect-streaming-failures-and-cleanup-temp-files.patch

patch 0001 tracks failures during AES streaming, files for failed Stream sessions are cleaned up and repair is allowed to continue. Failed files are logged at the StreamSession, TreeRequest, and RepairSession level. 

patch 0002 handle exceptions when doing a (normal) compaction and deletes the temp SSTable. The SSTableWriter components are closed before deletion so that windows will delete correctly. 

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2088:
----------------------------------------

    Fix Version/s:     (was: 0.7.2)
                   0.7.3

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.7.3
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989053#comment-12989053 ] 

Stu Hood commented on CASSANDRA-2088:
-------------------------------------

Regarding repair: http://www.mail-archive.com/user@cassandra.apache.org/msg09259.html
And compaction: CASSANDRA-2084

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 0.7.2
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019171#comment-13019171 ] 

Aaron Morton commented on CASSANDRA-2088:
-----------------------------------------

Thanks will take another look at the cleanup for compaction. 


> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018965#comment-13018965 ] 

Sylvain Lebresne commented on CASSANDRA-2088:
---------------------------------------------

bq. Is that 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch? I don't see the connection to .tmp files. (Also: have you verified that the channel will actually infinite-loop returning 0? Kind of odd behavior, although I guess it's technically within-spec.)

Yes. IncomingStreamReader does clean the tmp file when there is an expection (there's an enclosing 'try catch'). The problem is that no exception is raised if the other side of the connection dies. What will happen then is the read will infinitely read 0 bytes. So this actually avoid the infinite loop returning 0 (and so I think answered your second question, so it wasn't very clear).

Note that without this patch, there is an infinite loop that will hold a socket open forever (and consume cpu, though very few probably in that case). So this is not just merely a fix of deleting the tmp files. But it does as a consequence of correctly raising an exception when should be.

> Temp files for failed compactions/streaming not cleaned up
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-2088
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Aaron Morton
>             Fix For: 0.8
>
>         Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch
>
>
> From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira