You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/08/12 06:50:05 UTC

[GitHub] [dolphinscheduler] xiaohei88 opened a new issue, #11449: [Bug] [data-quality] 数据质量流程失败。

xiaohei88 opened a new issue, #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   跑数据质量流程后,任务快结束的时候报错
   ```java
   22/08/12 05:59:06 ERROR Client: Application diagnostics message: User class threw exception: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
   		at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
   		at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
   		at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:356)
   		at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:290)
   		at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:171)
   		at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3364)
   		at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
   		at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3413)
   		at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3381)
   		at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:486)
   		at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
   		at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:470)
   		at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:572)
   		at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   		at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   		at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   		at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979)
   		at org.apache.dolphinscheduler.data.quality.flow.batch.writer.file.BaseFileWriter.outputImpl(BaseFileWriter.java:113)
   		at org.apache.dolphinscheduler.data.quality.flow.batch.writer.file.HdfsFileWriter.write(HdfsFileWriter.java:40)
   		at org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.executeWriter(SparkBatchExecution.java:130)
   		at org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.execute(SparkBatchExecution.java:58)
   		at org.apache.dolphinscheduler.data.quality.context.DataQualityContext.execute(DataQualityContext.java:62)
   		at org.apache.dolphinscheduler.data.quality.DataQualityApplication.main(DataQualityApplication.java:70)
   		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   		at java.lang.reflect.Method.invoke(Method.java:498)
   		at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
   	Caused by: java.net.UnknownHostException: mycluster
   		... 28 more
   ```
   
   ### What you expected to happen
   
   [INFO] 2022-08-12 05:58:36.927 +0000 [taskAppId=TASK-20220812-6487978935456_7-26-30] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[181] - data quality task command: ....
   'hdfs://mycluster:8020/user/hadoop/data_quality_error_data/0_26_onlyone' ....
   
   这个地址是不对的。我猜测任务会写的时候报错了。   
   
   看源码,我以为:org.apache.dolphinscheduler.common.utils.PropertyUtils加载有问题。
   ```java
    try (InputStream fis = PropertyUtils.class.getResourceAsStream("/common.properties");) {
                   properties.load(fis);
               } catch (IOException e) {
                   logger.error(e.getMessage(), e);
                   System.exit(1);
               }
   ```
   work节点下的common properties我是改过hdfs地址
   
   ### How to reproduce
   
   新建一个数据质量流程。
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.0.0
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] lordk911 commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
lordk911 commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1328727020

   > @SbloodyS Hi,bro please start the issue fresh. Next I can continue to dock this issue as I encountered the same error
   
   @xinxingi I have open a new issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] xinxingi commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
xinxingi commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1319661703

   @SbloodyS Hi,bro please start the issue fresh. Next I can continue to dock this issue as I encountered the same error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1212786445

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   跑数据质量流程后,任务快结束的时候报错
   ```java
   22/08/12 05:59:06 ERROR Client: Application diagnostics message: User class threw exception: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
   		at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
   		at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:139)
   		at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:356)
   		at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:290)
   		at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:171)
   		at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3364)
   		at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
   		at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3413)
   		at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3381)
   		at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:486)
   		at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
   		at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:470)
   		at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:572)
   		at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   		at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   		at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   		at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:979)
   		at org.apache.dolphinscheduler.data.quality.flow.batch.writer.file.BaseFileWriter.outputImpl(BaseFileWriter.java:113)
   		at org.apache.dolphinscheduler.data.quality.flow.batch.writer.file.HdfsFileWriter.write(HdfsFileWriter.java:40)
   		at org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.executeWriter(SparkBatchExecution.java:130)
   		at org.apache.dolphinscheduler.data.quality.execution.SparkBatchExecution.execute(SparkBatchExecution.java:58)
   		at org.apache.dolphinscheduler.data.quality.context.DataQualityContext.execute(DataQualityContext.java:62)
   		at org.apache.dolphinscheduler.data.quality.DataQualityApplication.main(DataQualityApplication.java:70)
   		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   		at java.lang.reflect.Method.invoke(Method.java:498)
   		at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
   	Caused by: java.net.UnknownHostException: mycluster
   		... 28 more
   ```
   
   ### What you expected to happen
   
   [INFO] 2022-08-12 05:58:36.927 +0000 [taskAppId=TASK-20220812-6487978935456_7-26-30] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[181] - data quality task command: ....
   'hdfs://mycluster:8020/user/hadoop/data_quality_error_data/0_26_onlyone' ....
   
   这个地址是不对的。我猜测任务会写的时候报错了。   
   
   看源码,我以为:org.apache.dolphinscheduler.common.utils.PropertyUtils加载有问题。
   ```java
    try (InputStream fis = PropertyUtils.class.getResourceAsStream("/common.properties");) {
                   properties.load(fis);
               } catch (IOException e) {
                   logger.error(e.getMessage(), e);
                   System.exit(1);
               }
   ```
   work节点下的common properties我是改过hdfs地址
   
   ### How to reproduce
   
   新建一个数据质量流程。
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   3.0.0
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1212795908

   Hi @xiaohei88 , please describe in english next time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1243079984

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1250422381

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] closed issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #11449: [Bug] [data-quality] Data quality process failed.
URL: https://github.com/apache/dolphinscheduler/issues/11449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1212786558

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] xiaohei88 commented on issue #11449: [Bug] [data-quality] Data quality process failed.

Posted by GitBox <gi...@apache.org>.
xiaohei88 commented on issue #11449:
URL: https://github.com/apache/dolphinscheduler/issues/11449#issuecomment-1212997638

   Why should I modify the common.properties of the master node


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org