You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by ppadma <gi...@git.apache.org> on 2016/11/16 17:11:48 UTC

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

GitHub user ppadma opened a pull request:

    https://github.com/apache/drill/pull/653

    DRILL-4831: Running refresh table metadata concurrently randomly fail\u2026

    \u2026s with JsonParseException
    
    To prevent metadata cache files from getting corrupted, write to a temporary file and do rename at the end. All unit and regression tests pass.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ppadma/drill DRILL-4831

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #653
    
----
commit fec9b7ee468cda73dec27f475069898d763fa1c7
Author: Padma Penumarthy <pp...@yahoo.com>
Date:   2016-10-20T12:03:13Z

    DRILL-4831: Running refresh table metadata concurrently randomly fails with JsonParseException

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #653: DRILL-4831: Running refresh table metadata concurrently ra...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/653
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88404356
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,75 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    --- End diff --
    
    Is there a way to get the global queryId created by the Foreman ?  The name of the temp file would best be associated with the queryId.  If that is not possible (I am not sure if the query context is accessible easily from here), at the very least we should re-use the UUID for both the METADATA_FILENAME and METADATA_DIRECTORIES_FILENAME .  It is not a correctness issue but will avoid confusion. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88399778
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,75 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    +    Path tmpPath = new Path(path, METADATA_FILENAME + "." + randomUUID);
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadata);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path, METADATA_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Metadata cache file rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +      throw new IOException("metadata cache file rename failed");
    +    } finally {
    +      if (fs.exists(tmpPath)) {
    +        fs.delete(tmpPath, false);
    +      }
    +    }
       }
     
    -  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID = UUID.randomUUID();
    +    Path tmpPath = new Path(path, METADATA_DIRECTORIES_FILENAME + "." + randomUUID);
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadataDirs);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    --- End diff --
    
    The creation and renaming of the temp file is common to both writeFile() methods except for the METADATA_FILENAME vs METADATA_DIRECTORIES_FILENAME...can you create utility methods and call from both places. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by gparai <gi...@git.apache.org>.
Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88292269
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,65 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    +    Path tmpPath = new Path(path, new String(METADATA_FILENAME + "." + randomUUID));
    --- End diff --
    
    We can avoid creating the additional string. 
    Path tmpPath = new Path(path, METADATA_FILENAME + "." + randomUUID); 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #653: DRILL-4831: Running refresh table metadata concurrently ra...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on the issue:

    https://github.com/apache/drill/pull/653
  
    LGTM.  +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by amansinha100 <gi...@git.apache.org>.
Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88399438
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,75 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    +    Path tmpPath = new Path(path, METADATA_FILENAME + "." + randomUUID);
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadata);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path, METADATA_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Metadata cache file rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +      throw new IOException("metadata cache file rename failed");
    +    } finally {
    +      if (fs.exists(tmpPath)) {
    +        fs.delete(tmpPath, false);
    +      }
    +    }
       }
     
    -  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID = UUID.randomUUID();
    +    Path tmpPath = new Path(path, METADATA_DIRECTORIES_FILENAME + "." + randomUUID);
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadataDirs);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path,  METADATA_DIRECTORIES_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Metadata cache file rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +      throw new IOException("metadata cache file rename failed");
    --- End diff --
    
    This IOException is masking the original exception e.  Better to rethrow using IOException(message, cause) constructor. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #653: DRILL-4831: Running refresh table metadata concurrently ra...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/653
  
    Thanks Gautam for the review.  Updated with all review comments addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by gparai <gi...@git.apache.org>.
Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88306937
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,65 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    +    Path tmpPath = new Path(path, new String(METADATA_FILENAME + "." + randomUUID));
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadata);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path, METADATA_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +    }
       }
     
    -  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID = UUID.randomUUID();
    +    Path tmpPath = new Path(path, new String(METADATA_DIRECTORIES_FILENAME + "." + randomUUID));
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadataDirs);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path,  METADATA_DIRECTORIES_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +    }
    --- End diff --
    
    It looks like the function throws an IOException. We should figure out the code doing so and handle it if related to tmp file creation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/653


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by gparai <gi...@git.apache.org>.
Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88306202
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,65 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    +    Path tmpPath = new Path(path, new String(METADATA_FILENAME + "." + randomUUID));
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadata);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path, METADATA_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +    }
    --- End diff --
    
    Should we add a `finally` section which removes the temp file if it exists i.e. case rename failed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88770685
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,75 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    --- End diff --
    
    Yes, I wanted to use queryId as well. But, it is not easily accessible as you mentioned. I made the change to use same UUID for METADATA_FILENAME and METADATA_DIRECTORIES_FILENAME. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #653: DRILL-4831: Running refresh table metadata concurrently ra...

Posted by ppadma <gi...@git.apache.org>.
Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/653
  
    Updated diffs with all review comments taken care of.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #653: DRILL-4831: Running refresh table metadata concurre...

Posted by gparai <gi...@git.apache.org>.
Github user gparai commented on a diff in the pull request:

    https://github.com/apache/drill/pull/653#discussion_r88306537
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
    @@ -495,31 +499,65 @@ private ParquetFileMetadata_v3 getParquetFileMetadata_v3(ParquetTableMetadata_v3
        * @param p
        * @throws IOException
        */
    -  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadata_v3 parquetTableMetadata, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         module.addSerializer(ColumnMetadata_v3.class, new ColumnMetadata_v3.Serializer());
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID =  UUID.randomUUID();
    +    Path tmpPath = new Path(path, new String(METADATA_FILENAME + "." + randomUUID));
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadata);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path, METADATA_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +    }
       }
     
    -  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, Path p) throws IOException {
    +  private void writeFile(ParquetTableMetadataDirs parquetTableMetadataDirs, String path) throws IOException {
         JsonFactory jsonFactory = new JsonFactory();
         jsonFactory.configure(Feature.AUTO_CLOSE_TARGET, false);
         jsonFactory.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
         ObjectMapper mapper = new ObjectMapper(jsonFactory);
         SimpleModule module = new SimpleModule();
         mapper.registerModule(module);
    -    FSDataOutputStream os = fs.create(p);
    +
    +    // If multiple clients are updating metadata cache file concurrently, the cache file
    +    // can get corrupted. To prevent this, write to a unique temporary file and then do
    +    // atomic rename.
    +    UUID randomUUID = UUID.randomUUID();
    +    Path tmpPath = new Path(path, new String(METADATA_DIRECTORIES_FILENAME + "." + randomUUID));
    +
    +    FSDataOutputStream os = fs.create(tmpPath);
         mapper.writerWithDefaultPrettyPrinter().writeValue(os, parquetTableMetadataDirs);
         os.flush();
         os.close();
    +
    +    // Use fileContext API as FileSystem rename is deprecated.
    +    FileContext fileContext = FileContext.getFileContext(tmpPath.toUri());
    +    Path finalPath = new Path(path,  METADATA_DIRECTORIES_FILENAME);
    +
    +    try {
    +      fileContext.rename(tmpPath, finalPath, Options.Rename.OVERWRITE);
    +    } catch (Exception e) {
    +      logger.info("Rename from {} to {} failed", tmpPath.toString(), finalPath.toString(), e);
    +    }
    --- End diff --
    
    Same comment as earlier regarding `finally`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---