You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Michael Osipov (Jira)" <ji...@apache.org> on 2021/05/02 09:31:00 UTC

[jira] [Comment Edited] (MRESOLVER-153) resolver-status.properties file is corrupted due to concurrent writes

    [ https://issues.apache.org/jira/browse/MRESOLVER-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332767#comment-17332767 ] 

Michael Osipov edited comment on MRESOLVER-153 at 5/2/21, 9:30 AM:
-------------------------------------------------------------------

So, first of all, I made a mistake. My analysis for removing the synchronization was not good enough. I need to apologize for that. The purpose of the change was to remove synchronization because it is done on the outside ({{SyncContext}}) already and analyzing the hierarchy from above it truly is except one small case: {{o.e.a.internal.impl.DefaultMetadataResolver.ResolveTask.run()}}.
So what is causing this race condition? Resolver wants to resolve (download) {{maven-metadata.xml}} from all known repositories for the *same* {{G(A(V)?)?}}. Each file instance has its distinct name on the file system (no race conditions possible), *but* some management data is written to {{resolver-status.properties}} next (same level) to these metadata. Since {{executor.execute( errorForwarder.wrap( task ) )}} is running in parallel multiple threads write to the same file concurrently. The {{SyncContext}} does not apply here and it causes the race condition you see. The read fails because the previous writes have corrupted the file.
There are several entry points for {{DefaultMetadataResolver.resolveMetadata()}}.  I can reproduce the issue with the IT case.
[~cstamas], it'd be nice if you could confirm my observations.

While we need to discuss a proper solution, I consider this behavior ugly because it completely breaks our sync idea.

I think there are two options:
* Add named sync into {{ResolveTask}}
* Inline code of {{ResolveTask}}, use {{RepositoryConnector}} directly, and record stuff in serial to avoid race conditions.

Looking at {{ResolveTask}} is runs off a parallel executor which hands off to a parallel executor again. I don't think that there is huge gain in that.

Please comment.


was (Author: michael-o):
So, first of all, I made a mistake. My analysis for removing the synchronization was not good enough. I need to apologize for that. The purpose of the change was to remove synchronization because it is done on the outside ({{SyncContext}}) already and analyzing the hierarchy from above it truly is except one small case: {{o.e.a.internal.impl.DefaultMetadataResolver.ResolveTask.run()}}.
So what is causing this race condition? Resolver wants to resolve (download) {{maven-metadata.xml}} from all known repositories for the *same* {{G(A(V)?)?}}. Each file instance has its distinct name on the file system (no race conditions possible), *but* some management data is written to {{resolver-status.properties}} next (same level) to these metadata. Since {{executor.execute( errorForwarder.wrap( task ) )}} is running in parallel multiple threads write to the same file concurrently. The {{SyncContext}} does not apply here and it causes the race condition you see. The read fails because the previous writes have corrupted the file.
There are several entry points for {{DefaultMetadataResolver.resolveMetadata()}}.  I can reproduce the issue with the IT case.
[~cstamas], it'd be nice if you could confirm my observations.

While we need to discuss a proper solution, I consider this behavior ugly because it completely breaks our sync idea.

I think there are two options:
* Add named sync into {{ResolverTask}}
* Inline code of {{ResolverTask}}, use {{RepositoryConnector}} directly, and record stuff in serial to avoid race conditions.

Looking at {{ResolverTask}} is runs off a parallel executor which hands off to a parallel executor again. I don't think that there is huge gain in that.

Please comment.

> resolver-status.properties file is corrupted due to concurrent writes
> ---------------------------------------------------------------------
>
>                 Key: MRESOLVER-153
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-153
>             Project: Maven Resolver
>          Issue Type: Bug
>          Components: Resolver
>    Affects Versions: 1.6.1
>         Environment: OSX 11.1 on adoptopenjdk-11.0.8+10
>            Reporter: Guy Brand
>            Assignee: Michael Osipov
>            Priority: Major
>             Fix For: 1.7.0
>
>         Attachments: screenshot-1.png, with-global-sync-context.txt, without-global-sync-context.txt
>
>
> In our integration tests which run with Maven {{4.0.0-alpha-1-SNAPSHOT}} after [this commit|https://github.com/apache/maven/commit/7c7de41562a8d83635e8fa21c3a3340298b508a1], we face the following error:
> {code:java}
> [main] [INFO] ------------------------------------------------------------------------
> [main] [INFO] BUILD FAILURE
> [main] [INFO] ------------------------------------------------------------------------
> [main] [INFO] Total time: 0.243 s
> [main] [INFO] Finished at: 2020-12-23T13:48:59+01:00
> [main] [INFO] ------------------------------------------------------------------------
> [main] [ERROR] Malformed \uxxxx encoding.
> java.lang.IllegalArgumentException: Malformed \uxxxx encoding.
>  at java.util.Properties.loadConvert (Properties.java:675)
>  at java.util.Properties.load0 (Properties.java:451)
>  at java.util.Properties.load (Properties.java:404)
>  at org.eclipse.aether.internal.impl.TrackingFileManager.read (TrackingFileManager.java:56)
>  at org.eclipse.aether.internal.impl.DefaultUpdateCheckManager.read (DefaultUpdateCheckManager.java:511)
>  at org.eclipse.aether.internal.impl.DefaultUpdateCheckManager.checkMetadata (DefaultUpdateCheckManager.java:250)
>  at org.eclipse.aether.internal.impl.DefaultMetadataResolver.resolve (DefaultMetadataResolver.java:302)
>  at org.eclipse.aether.internal.impl.DefaultMetadataResolver.resolveMetadata (DefaultMetadataResolver.java:181)
>  at org.eclipse.aether.internal.impl.DefaultRepositorySystem.resolveMetadata (DefaultRepositorySystem.java:277)
>  at org.apache.maven.plugin.version.internal.DefaultPluginVersionResolver.resolveFromRepository (DefaultPluginVersionResolver.java:134)
>  at org.apache.maven.plugin.version.internal.DefaultPluginVersionResolver.resolve (DefaultPluginVersionResolver.java:97)
>  at org.apache.maven.lifecycle.internal.LifecyclePluginResolver.resolveMissingPluginVersions (LifecyclePluginResolver.java:67)
>  at org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments (DefaultLifecycleTaskSegmentCalculator.java:104)
>  at org.apache.maven.lifecycle.internal.DefaultLifecycleTaskSegmentCalculator.calculateTaskSegments (DefaultLifecycleTaskSegmentCalculator.java:86)
>  at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:92)
>  at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:321)
>  at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:206)
>  at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:119)
>  at org.apache.maven.cli.MavenCli.execute (MavenCli.java:982)
>  at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:296)
>  at org.apache.maven.cli.MavenCli.main (MavenCli.java:200)
>  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
>  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
>  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke (Method.java:566)
>  at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
>  at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
>  at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
>  at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
> {code}
> It's not consistently failing and OSX based CI agents fail more often than Linux or Windows ones. After some investigations we saw that as part of https://issues.apache.org/jira/browse/MRESOLVER-132 the synchronization has been removed on the {{TrackingFileManager}} ([https://github.com/apache/maven-resolver/pull/67]). This now leads to concurrent writes on the {{resolver-status.properties}} file in our tests and causes the error during the {{Properties#load()}} method wich then throws the above error. See this screenshot on how these files look like (cannot add the text here as {{null}} characters aren't shown):
>  !screenshot-1.png! 
> When enabling debug log we also see that the {{DefaultMetadataResolver}} threads try to write concurrently to this file:
> {code:java}
> [DefaultMetadataResolver-0-0] [INFO] Downloading from custom-repo2: https://repo.maven.apache.org/maven2/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/maven-metadata.xml
> [DefaultMetadataResolver-0-2] [INFO] Downloading from central-proxy: https://repo.maven.apache.org/maven2/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/maven-metadata.xml
> [DefaultMetadataResolver-0-1] [INFO] Downloading from custom-repo: https://repository.apache.org/snapshots/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/maven-metadata.xml
> [DefaultMetadataResolver-0-1] [DEBUG] Writing tracking file /.../core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties
> [DefaultMetadataResolver-0-2] [DEBUG] Writing tracking file /.../core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties
> [DefaultMetadataResolver-0-0] [DEBUG] Writing tracking file /.../core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties
> {code}
> We [created a Maven integration test|https://github.com/guylabs/maven-integration-testing/commit/56edeb28895a6704ad28e76655ee792d19797a8c] which reproduces the issue. The test is not always failing as in some runs the {{null}} characters are normalized to {{\u0000}} Strings and then written back to the {{resolver-status.properties}} file which then can be read properly by the {{Properties#load()}} method. You can monitor the {{core-it-suite/target/test-classes/mng-xxxx-concurrent-writes-to-resolver-status-file/repo/mng-xxxx-concurrent-writes-to-resolver-status-file/plugin/resolver-status.properties}} file while running the test as there you'll see the corrupted updates.
> Readding the thread synchronization to the {{TrackingFileManager}} would solve this race condition, as file locking is indeed not required as stated in MRESOLVER-132.
> Can we get this change into a new Maven {{4.0.0-alpha-1-SNAPSHOT}} version to be able to test the fix?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)