You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/04/27 06:58:33 UTC

[GitHub] [incubator-seatunnel] quanzhian commented on issue #1733: [Bug] [Spark] Can't run SeaTunnel on spark standalone cluster.

quanzhian commented on issue #1733:
URL: https://github.com/apache/incubator-seatunnel/issues/1733#issuecomment-1110617046

   @BenJFan There is an error in the decompression code. The repair code is as follows
   
   find class org.apache.seatunnel.utils.CompressionUtils
   
   Fixed code
   
   ```
       /**
        * Untar an input file into an output file.
        * <p>
        * The output file is created in the output folder, having the same name
        * as the input file, minus the '.tar' extension.
        *
        * @param inputFile the input .tar file
        * @param outputDir the output directory file.
        * @throws IOException           io exception
        * @throws FileNotFoundException file not found exception
        * @throws ArchiveException      archive exception
        */
       public static void unTar(final File inputFile, final File outputDir) throws  IOException, ArchiveException {
   
           LOGGER.info("Untaring {} to dir {}.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath());
   
           final List<File> untaredFiles = new LinkedList<>();
           try (final InputStream is = new FileInputStream(inputFile);
                final TarArchiveInputStream debInputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", is)) {
               TarArchiveEntry entry = null;
               while ((entry = (TarArchiveEntry) debInputStream.getNextEntry()) != null) {
                   final File outputFile = new File(outputDir, entry.getName()).toPath().normalize().toFile();
                   if (entry.isDirectory()) {
                       LOGGER.info("Attempting to write output directory {}.", outputFile.getAbsolutePath());
                       if (!outputFile.exists()) {
                           LOGGER.info("Attempting to create output directory {}.", outputFile.getAbsolutePath());
                           if (!outputFile.mkdirs()) {
                               throw new IllegalStateException(String.format("Couldn't create directory %s.", outputFile.getAbsolutePath()));
                           }
                       }
                   } else {
                       LOGGER.info("Creating output file {}.", outputFile.getAbsolutePath());
                       File outputParentFile = outputFile.getParentFile();
                       if (outputParentFile != null && !outputParentFile.exists()) {
                           outputParentFile.mkdirs();
                       }
                       final OutputStream outputFileStream = new FileOutputStream(outputFile);
                       IOUtils.copy(debInputStream, outputFileStream);
                       outputFileStream.close();
                   }
                   untaredFiles.add(outputFile);
               }
           }
       }
   ```
   
   old code (There is an incorrect code)
   
   ```
      /**
        * Untar an input file into an output file.
        * <p>
        * The output file is created in the output folder, having the same name
        * as the input file, minus the '.tar' extension.
        *
        * @param inputFile the input .tar file
        * @param outputDir the output directory file.
        * @throws IOException           io exception
        * @throws FileNotFoundException file not found exception
        * @throws ArchiveException      archive exception
        */
       public static void unTar(final File inputFile, final File outputDir) throws  IOException, ArchiveException {
   
           LOGGER.info("Untaring {} to dir {}.", inputFile.getAbsolutePath(), outputDir.getAbsolutePath());
   
           final List<File> untaredFiles = new LinkedList<>();
           try (final InputStream is = new FileInputStream(inputFile);
                final TarArchiveInputStream debInputStream = (TarArchiveInputStream) new ArchiveStreamFactory().createArchiveInputStream("tar", is)) {
               TarArchiveEntry entry = null;
               while ((entry = (TarArchiveEntry) debInputStream.getNextEntry()) != null) {
                   final File outputFile = new File(outputDir, entry.getName());
                   if (!outputFile.toPath().normalize().startsWith(outputDir.toPath())) {
                       throw new IllegalStateException("Bad zip entry");
                   }
                   if (entry.isDirectory()) {
                       LOGGER.info("Attempting to write output directory {}.", outputFile.getAbsolutePath());
                       if (!outputFile.exists()) {
                           LOGGER.info("Attempting to create output directory {}.", outputFile.getAbsolutePath());
                           if (!outputFile.mkdirs()) {
                               throw new IllegalStateException(String.format("Couldn't create directory %s.", outputFile.getAbsolutePath()));
                           }
                       }
                   } else {
                       LOGGER.info("Creating output file {}.", outputFile.getAbsolutePath());
                       final OutputStream outputFileStream = new FileOutputStream(outputFile);
                       IOUtils.copy(debInputStream, outputFileStream);
                       outputFileStream.close();
                   }
                   untaredFiles.add(outputFile);
               }
           }
       }
   ```
   
   Here are my test details
   
   ```
   [xxxxxx@bigdata-app03 apache-seatunnel-incubating-2.1.1-SNAPSHOT]# ./bin/start-seatunnel-spark.sh --master yarn --deploy-mode cluster --config /mnt/services/seatunnel/spark_batch.conf
   22/04/27 14:33:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   22/04/27 14:33:44 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
   22/04/27 14:33:44 INFO EsServiceCredentialProvider: Loaded EsServiceCredentialProvider
   22/04/27 14:33:44 INFO Client: Requesting a new application from cluster with 5 NodeManagers
   22/04/27 14:33:44 INFO Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
   22/04/27 14:33:44 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6144 MB per container)
   22/04/27 14:33:44 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
   22/04/27 14:33:44 INFO Client: Setting up container launch context for our AM
   22/04/27 14:33:44 INFO Client: Setting up the launch environment for our AM container
   22/04/27 14:33:44 INFO Client: Preparing resources for our AM container
   22/04/27 14:33:45 INFO EsServiceCredentialProvider: Hadoop Security Enabled = [false]
   22/04/27 14:33:45 INFO EsServiceCredentialProvider: ES Auth Method = [SIMPLE]
   22/04/27 14:33:45 INFO EsServiceCredentialProvider: Are creds required = [false]
   22/04/27 14:33:45 INFO Client: Source and destination file systems are the same. Not copying hdfs:/hdp/apps/3.1.4.0-315/spark2/spark2-hdp-yarn-archive.tar.gz
   22/04/27 14:33:45 INFO Client: Uploading resource file:/mnt/services/seatunnel/apache-seatunnel-incubating-2.1.1-SNAPSHOT/lib/seatunnel-core-spark.jar -> hdfs://nameservice1/user/xxx_user/.sparkStaging/application_1643094720025_42454/seatunnel-core-spark.jar
   22/04/27 14:33:46 INFO Client: Uploading resource file:/mnt/services/seatunnel/apache-seatunnel-incubating-2.1.1-SNAPSHOT/plugins.tar.gz -> hdfs://nameservice1/user/xxx_user/.sparkStaging/application_1643094720025_42454/plugins.tar.gz
   22/04/27 14:33:46 INFO Client: Uploading resource file:/mnt/services/seatunnel/spark_batch.conf -> hdfs://nameservice1/user/xxx_user/.sparkStaging/application_1643094720025_42454/spark_batch.conf
   22/04/27 14:33:46 INFO Client: Uploading resource file:/tmp/spark-5d399c9e-df19-4881-8a0b-67dd57f3f6c2/__spark_conf__1201408946509169751.zip -> hdfs://nameservice1/user/xxx_user/.sparkStaging/application_1643094720025_42454/__spark_conf__.zip
   22/04/27 14:33:46 INFO SecurityManager: Changing view acls to: xxxxxx,xxx_user
   22/04/27 14:33:46 INFO SecurityManager: Changing modify acls to: xxxxxx,xxx_user
   22/04/27 14:33:46 INFO SecurityManager: Changing view acls groups to: 
   22/04/27 14:33:46 INFO SecurityManager: Changing modify acls groups to: 
   22/04/27 14:33:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(xxxxxx, xxx_user); groups with view permissions: Set(); users  with modify permissions: Set(xxxxxx, xxx_user); groups with modify permissions: Set()
   22/04/27 14:33:46 INFO Client: Submitting application application_1643094720025_42454 to ResourceManager
   22/04/27 14:33:46 INFO YarnClientImpl: Submitted application application_1643094720025_42454
   22/04/27 14:33:47 INFO Client: Application report for application_1643094720025_42454 (state: ACCEPTED)
   22/04/27 14:33:47 INFO Client: 
   	 client token: N/A
   	 diagnostics: AM container is launched, waiting for AM container to Register with RM
   	 ApplicationMaster host: N/A
   	 ApplicationMaster RPC port: -1
   	 queue: default
   	 start time: 1651041226887
   	 final status: UNDEFINED
   	 tracking URL: http://bigdata-master01:8088/proxy/application_1643094720025_42454/
   	 user: xxx_user
   22/04/27 14:33:48 INFO Client: Application report for application_1643094720025_42454 (state: ACCEPTED)
   22/04/27 14:33:49 INFO Client: Application report for application_1643094720025_42454 (state: ACCEPTED)
   22/04/27 14:33:50 INFO Client: Application report for application_1643094720025_42454 (state: ACCEPTED)
   22/04/27 14:33:51 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:51 INFO Client: 
   	 client token: N/A
   	 diagnostics: N/A
   	 ApplicationMaster host: 172.18.247.16
   	 ApplicationMaster RPC port: 0
   	 queue: default
   	 start time: 1651041226887
   	 final status: UNDEFINED
   	 tracking URL: http://bigdata-master01:8088/proxy/application_1643094720025_42454/
   	 user: xxx_user
   22/04/27 14:33:52 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:53 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:54 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:55 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:56 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:57 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:58 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:33:59 INFO Client: Application report for application_1643094720025_42454 (state: RUNNING)
   22/04/27 14:34:00 INFO Client: Application report for application_1643094720025_42454 (state: FINISHED)
   22/04/27 14:34:00 INFO Client: 
   	 client token: N/A
   	 diagnostics: N/A
   	 ApplicationMaster host: 172.18.247.16
   	 ApplicationMaster RPC port: 0
   	 queue: default
   	 start time: 1651041226887
   	 final status: SUCCEEDED
   	 tracking URL: http://bigdata-master01:8088/proxy/application_1643094720025_42454/
   	 user: xxx_user
   22/04/27 14:34:00 INFO Client: Deleted staging directory hdfs://nameservice1/user/xxx_user/.sparkStaging/application_1643094720025_42454
   22/04/27 14:34:00 INFO ShutdownHookManager: Shutdown hook called
   22/04/27 14:34:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-5d399c9e-df19-4881-8a0b-67dd57f3f6c2
   22/04/27 14:34:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-121ad009-6b38-468d-a4eb-a5faf4dbb28d
   
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org