You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/06/09 16:00:20 UTC
[jira] [Commented] (KAFKA-2170) 10 LogTest cases failed for
file.renameTo failed under windows
[ https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044582#comment-16044582 ]
ASF GitHub Bot commented on KAFKA-2170:
---------------------------------------
GitHub user nxmbriggs404 opened a pull request:
https://github.com/apache/kafka/pull/3283
KAFKA-2170: Updated Fixes For Windows Platform
During stress testing of kafka 0.10.2.1 on a Windows platform, our group has encountered some issues that appear to be known to the community but not fully addressed by kafka. Using:
https://github.com/apache/kafka/pull/154
as a guide, we have made derived changes to the source code and automated tests such that the "clients" and "core" tests pass for us on Windows and Linux platforms. Our stress tests succeed as well.
This pull request adapts those changes to merge and build with kafka/trunk. The "clients" and "core" tests from kafka/trunk pass on Linux for us with these changes in place, and all tests pass on Windows except:
ConsumerBounceTest (intermittent failures)
TransactionsTest
DeleteTopicTest.testDeleteTopicWithCleaner
EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards
Our intention is to help efforts to further kafka support for the Windows platform. Our changes are the work of engineers from Nexidia building upon the work found in the aforementioned pull request link, and they are contributed to the community per kafka's open source license.
We welcome all feedback and look forward to working with the kafka community.
Matt Briggs
Principal Software Engineer
Nexidia, a NICE Analytics Company
www.nexidia.com
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nxmbriggs404/kafka nx-windows-fixes
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/3283.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3283
----
commit 6ee3c167c6e2daa8ce4564d98f9f63967a0efece
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-06T15:10:58Z
Handle log file deletion and renaming on Windows
Special treatment is needed for the deletion and renaming of log and
index files on Windows, due to the latter's general inability to
perform those operations while a file is opened or memory mapped.
The changes in this commit are essentially adapted from:
https://github.com/apache/kafka/pull/154
More detailed background information on the issues can also be found
via that link.
commit a0cd773a8d89d7df90fc75ce55a46fd8bb93d368
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-06T15:21:23Z
Colliding log filenames cause test failures on Windows
This commit addresses an edge case with compaction and asynchronous
deletion of log files initially encountered when debugging:
LogCleanerTest.testRecoveryAfterCrash
failures on Windows. It appears that troubles arise when compaction
of logs results in two segments having the same base address, hence
the same file names, and the segments get scheduled for background
deletion. If one segment's files are pending deletion at the time the
other segment's files are scheduled for deletion, the file rename
attempted during the latter will fail on Windows (due to the typical
Windows issues with open/memory-mapped files). It doesn't appear like
we can simply close out the former files, as it seems that kafka
intends to have them open for concurrent readers until the file
deletion interval has fully passed.
The fix in this commit basically sidesteps the issue by ensuring files
scheduled for background delete are renamed uniquely (by injecting a
UUID into the filename). Essentially this follows the approach taken
with LogManager.asyncDelete and Log.DeleteDirSuffix.
Collision related errors were also observed when running a custom
stress test on Windows against a standalone kafka server. The test
code caused extremely frequent compaction of the __consumer_offsets
topic partitions, which resulted in collisions of the latter's log
files when they were scheduled for deletion. Use of the UUID was
successful in avoiding collision related issues in this context.
commit 3633d493bc3c0de3f177eecd11e374be74d4ac32
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-06T15:27:59Z
Fixing log recovery crash on Windows
When a sanity check failure was detected by log recovery code, an
attempt to delete index files that were memory-mapped would lead to
a crash on Windows. This commit adjusts the code to unmap, delete,
recreate, and remap index files such the recovery can continue.
Issue was found via the LogTest.testCorruptLog test
commit 6b8debd2b906d8691dba73fbbc917f10febf4959
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-06T19:11:31Z
Windows-driven resource cleanup in tests
Fix exceptions encountered when running automated tests on Windows,
arising from the latter's issues with removal/renaming of
opened/memory-mapped files.
Includes changes adapted from:
https://github.com/apache/kafka/pull/154
commit f0c6af8997f1b05a0d29b211dd6cf0d587370f8e
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-07T18:29:18Z
Fixing timing related ProducerTest failures
Discovered when running on Windows virtual machines
commit 2b9a9e2bafd8b7fb48f54c0813641e22508be58c
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-07T18:37:55Z
Dangling open lock files cause Windows test failures
Log manager code would fail to close an open lock file if it was
unsuccessful in actually locking the file. On Windows this would lead
to automated test failures as cleanup code attempted to remove the
directory while the lock file remained open. This could be seen
for example in:
LogManagerTest.testTwoLogManagersUsingSameDirFails
commit cb083c0ccd2a220c983e65802df1eb16e65b109d
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-07T15:05:49Z
Fixed SASL test failures on Windows
JAAS config file written by test code contained a keytab location
which caused numerous SASL tests to fail. Issue was the keytab
location containing backslashes on Windows, which were escaped when
writing out the the JAAS config. Switching over to Unix forward
slashes resolved the issue.
commit 319ce1d2da9ebac2d99b384de871e34f9d184a84
Author: Matt Briggs <mb...@nexidia.com>
Date: 2017-06-07T17:00:37Z
Fixed checkstyle suppressions for Windows
Some suppressions were not being obeyed, resulting in checkstyle
failures on Windows but not linux. Root issue is described here:
http://rolf-engelhard.de/2012/11/using-checkstyles-suppression-filters-on-windows-and-linux/
----
> 10 LogTest cases failed for file.renameTo failed under windows
> ---------------------------------------------------------------
>
> Key: KAFKA-2170
> URL: https://issues.apache.org/jira/browse/KAFKA-2170
> Project: Kafka
> Issue Type: Bug
> Components: log
> Affects Versions: 0.10.1.0
> Environment: Windows
> Reporter: Honghai Chen
> Assignee: Jay Kreps
>
> get latest code from trunk, then run test
> gradlew -i core:test --tests kafka.log.LogTest
> Got 10 cases failed for same reason:
> kafka.common.KafkaStorageException: Failed to change the log file suffix from to .deleted for log segment 0
> at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:259)
> at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:756)
> at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:747)
> at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:514)
> at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:514)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at kafka.log.Log.deleteOldSegments(Log.scala:514)
> at kafka.log.LogTest.testAsyncDelete(LogTest.scala:633)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
> at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
> at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
> at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at $Proxy2.processTestClass(Unknown Source)
> at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:105)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:355)
> at org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:64)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
>
> testCompactedTopicConstraints
> java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
> at java.io.RandomAccessFile.setLength(Native Method)
> at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:285)
> at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:298)
> at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
> at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
> at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:298)
> at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
> at kafka.log.LogSegment.recover(LogSegment.scala:198)
> at kafka.log.Log.recoverLog(Log.scala:238)
> at kafka.log.Log.loadSegments(Log.scala:210)
> at kafka.log.Log.<init>(Log.scala:83)
> at kafka.log.LogTest.testCompactedTopicConstraints(LogTest.scala:370)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
> at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
> at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
> at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
> at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
> at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at $Proxy2.processTestClass(Unknown Source)
> at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:105)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:355)
> at org.gradle.internal.concurrent.DefaultExecutorFactory$StoppableExecutorImpl$1.run(DefaultExecutorFactory.java:64)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> LogTest. testAsyncDelete
> LogTest. testCompactedTopicConstraints
> LogTest. testCorruptLog
> LogTest. testIndexRebuild
> LogTest. testIndexResizingAtTruncation
> LogTest. testLogRecoversToCorrectOffset
> LogTest. testOpenDeletesObsoleteFiles
> LogTest. testReopenThenTruncate
> LogTest. testThatGarbageCollectingSegmentsDoesntChangeOffset
> LogTest. testTruncateTo
> Do we need call log.close and then call rename, and then reopen it?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)