You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/06/09 16:00:20 UTC

[jira] [Commented] (KAFKA-2170) 10 LogTest cases failed for file.renameTo failed under windows

    [ https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044582#comment-16044582 ] 

ASF GitHub Bot commented on KAFKA-2170:
---------------------------------------

GitHub user nxmbriggs404 opened a pull request:

    https://github.com/apache/kafka/pull/3283

    KAFKA-2170: Updated Fixes For Windows Platform

    During stress testing of kafka 0.10.2.1 on a Windows platform, our group has encountered some issues that appear to be known to the community but not fully addressed by kafka.  Using:
    
    https://github.com/apache/kafka/pull/154
    
    as a guide, we have made derived changes to the source code and automated tests such that the "clients" and "core" tests pass for us on Windows and Linux platforms.  Our stress tests succeed as well.
    
    This pull request adapts those changes to merge and build with kafka/trunk.  The "clients" and "core" tests from kafka/trunk pass on Linux for us with these changes in place, and all tests pass on Windows except:
    
    ConsumerBounceTest (intermittent failures)
    TransactionsTest
    DeleteTopicTest.testDeleteTopicWithCleaner
    EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards
    
    Our intention is to help efforts to further kafka support for the Windows platform.  Our changes are the work of engineers from Nexidia building upon the work found in the aforementioned pull request link, and they are contributed to the community per kafka's open source license.
    
    We welcome all feedback and look forward to working with the kafka community.
    
    Matt Briggs
    Principal Software Engineer
    Nexidia, a NICE Analytics Company
    www.nexidia.com

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nxmbriggs404/kafka nx-windows-fixes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3283
    
----
commit 6ee3c167c6e2daa8ce4564d98f9f63967a0efece
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-06T15:10:58Z

    Handle log file deletion and renaming on Windows
    
    Special treatment is needed for the deletion and renaming of log and
    index files on Windows, due to the latter's general inability to
    perform those operations while a file is opened or memory mapped.
    
    The changes in this commit are essentially adapted from:
    
    https://github.com/apache/kafka/pull/154
    
    More detailed background information on the issues can also be found
    via that link.

commit a0cd773a8d89d7df90fc75ce55a46fd8bb93d368
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-06T15:21:23Z

    Colliding log filenames cause test failures on Windows
    
    This commit addresses an edge case with compaction and asynchronous
    deletion of log files initially encountered when debugging:
    
    LogCleanerTest.testRecoveryAfterCrash
    
    failures on Windows.  It appears that troubles arise when compaction
    of logs results in two segments having the same base address, hence
    the same file names, and the segments get scheduled for background
    deletion.  If one segment's files are pending deletion at the time the
    other segment's files are scheduled for deletion, the file rename
    attempted during the latter will fail on Windows (due to the typical
    Windows issues with open/memory-mapped files).  It doesn't appear like
    we can simply close out the former files, as it seems that kafka
    intends to have them open for concurrent readers until the file
    deletion interval has fully passed.
    
    The fix in this commit basically sidesteps the issue by ensuring files
    scheduled for background delete are renamed uniquely (by injecting a
    UUID into the filename).  Essentially this follows the approach taken
    with LogManager.asyncDelete and Log.DeleteDirSuffix.
    
    Collision related errors were also observed when running a custom
    stress test on Windows against a standalone kafka server.  The test
    code caused extremely frequent compaction of the __consumer_offsets
    topic partitions, which resulted in collisions of the latter's log
    files when they were scheduled for deletion.  Use of the UUID was
    successful in avoiding collision related issues in this context.

commit 3633d493bc3c0de3f177eecd11e374be74d4ac32
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-06T15:27:59Z

    Fixing log recovery crash on Windows
    
    When a sanity check failure was detected by log recovery code, an
    attempt to delete index files that were memory-mapped would lead to
    a crash on Windows.  This commit adjusts the code to unmap, delete,
    recreate, and remap index files such the recovery can continue.
    
    Issue was found via the LogTest.testCorruptLog test

commit 6b8debd2b906d8691dba73fbbc917f10febf4959
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-06T19:11:31Z

    Windows-driven resource cleanup in tests
    
    Fix exceptions encountered when running automated tests on Windows,
    arising from the latter's issues with removal/renaming of
    opened/memory-mapped files.
    
    Includes changes adapted from:
    
    https://github.com/apache/kafka/pull/154

commit f0c6af8997f1b05a0d29b211dd6cf0d587370f8e
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-07T18:29:18Z

    Fixing timing related ProducerTest failures
    
    Discovered when running on Windows virtual machines

commit 2b9a9e2bafd8b7fb48f54c0813641e22508be58c
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-07T18:37:55Z

    Dangling open lock files cause Windows test failures
    
    Log manager code would fail to close an open lock file if it was
    unsuccessful in actually locking the file.  On Windows this would lead
    to automated test failures as cleanup code attempted to remove the
    directory while the lock file remained open.  This could be seen
    for example in:
    
    LogManagerTest.testTwoLogManagersUsingSameDirFails

commit cb083c0ccd2a220c983e65802df1eb16e65b109d
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-07T15:05:49Z

    Fixed SASL test failures on Windows
    
    JAAS config file written by test code contained a keytab location
    which caused numerous SASL tests to fail.  Issue was the keytab
    location containing backslashes on Windows, which were escaped when
    writing out the the JAAS config.  Switching over to Unix forward
    slashes resolved the issue.

commit 319ce1d2da9ebac2d99b384de871e34f9d184a84
Author: Matt Briggs <mb...@nexidia.com>
Date:   2017-06-07T17:00:37Z

    Fixed checkstyle suppressions for Windows
    
    Some suppressions were not being obeyed, resulting in checkstyle
    failures on Windows but not linux.  Root issue is described here:
    
    http://rolf-engelhard.de/2012/11/using-checkstyles-suppression-filters-on-windows-and-linux/

----


> 10 LogTest cases failed for  file.renameTo failed under windows
> ---------------------------------------------------------------
>
>                 Key: KAFKA-2170
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2170
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 0.10.1.0
>         Environment: Windows
>            Reporter: Honghai Chen
>            Assignee: Jay Kreps
>
> get latest code from trunk, then run test 
> gradlew  -i core:test --tests kafka.log.LogTest
> Got 10 cases failed for same reason:
> kafka.common.KafkaStorageException: Failed to change the log file suffix from  to .deleted for log segment 0
> 	at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:259)
> 	at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:756)
> 	at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:747)
> 	at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:514)
> 	at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:514)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at kafka.log.Log.deleteOldSegments(Log.scala:514)
> 	at kafka.log.LogTest.testAsyncDelete(LogTest.scala:633)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> 	at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> 	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
> 	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
> 	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
> 	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)
> 	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> 	at org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> 	at org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> 	at $Proxy2.processTestClass(Unknown Source)
> 	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:105)
> 	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> 	at org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:355)
> 	at org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:64)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
>  
>  
> testCompactedTopicConstraints
> java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
> 	at java.io.RandomAccessFile.setLength(Native Method)
> 	at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285)
> 	at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
> 	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:298)
> 	at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
> 	at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
> 	at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> 	at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> 	at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:298)
> 	at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
> 	at kafka.log.LogSegment.recover(LogSegment.scala:198)
> 	at kafka.log.Log.recoverLog(Log.scala:238)
> 	at kafka.log.Log.loadSegments(Log.scala:210)
> 	at kafka.log.Log.<init>(Log.scala:83)
> 	at kafka.log.LogTest.testCompactedTopicConstraints(LogTest.scala:370)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> 	at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> 	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
> 	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
> 	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
> 	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)
> 	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> 	at org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> 	at org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> 	at $Proxy2.processTestClass(Unknown Source)
> 	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:105)
> 	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> 	at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> 	at org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:355)
> 	at org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:64)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> LogTest. testAsyncDelete 
> LogTest. testCompactedTopicConstraints 
> LogTest. testCorruptLog 
> LogTest. testIndexRebuild 
> LogTest. testIndexResizingAtTruncation 
> LogTest. testLogRecoversToCorrectOffset 
> LogTest. testOpenDeletesObsoleteFiles 
> LogTest. testReopenThenTruncate 
> LogTest. testThatGarbageCollectingSegmentsDoesntChangeOffset 
> LogTest. testTruncateTo 
> Do we need call log.close and then call rename, and then reopen it?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)