You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/10/14 03:27:09 UTC

[GitHub] [pulsar] handywc opened a new issue #8254: pulsar io hdfs could not execute

handywc opened a new issue #8254:
URL: https://github.com/apache/pulsar/issues/8254


   **Describe the bug**
   
   execute method:
   ![image](https://user-images.githubusercontent.com/25581080/95939580-fa80ad00-0e0e-11eb-84c4-175ef0750f42.png)
   sinkconfig.yaml configuration:
   ![image](https://user-images.githubusercontent.com/25581080/95939610-0cfae680-0e0f-11eb-9bae-dc33cfa02d2f.png)
   but the log shows some error:
   ![image](https://user-images.githubusercontent.com/25581080/95939794-7a0e7c00-0e0f-11eb-81cf-c1df1819b113.png)
   if i change the configuration of directory to “/tmp/bar”, it will write the file to the local file system
   ![image](https://user-images.githubusercontent.com/25581080/95939720-52b7af00-0e0f-11eb-8b84-3e874a846af1.png)
   and even I keep produce messages to the topic, but the local file still is not writen.
   
   pulsar 2.6.0
   puslar-io-hdfs2-2.6.0.nar
   
   could you give some advice,please?
   Thank you.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-709755159


   Anyway, the reason why HDFS sink wrote to local file system is that the configuration didn't work and the Hadoop `FileSystem` use `file:///`  as the default file system.
   
   But it's weird that local runner works but functions worker doesn't work. Could you give some help? @wolfstudy @tuteng @congbobo184 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-723571315


   @handywc see https://stackoverflow.com/questions/32231105/why-is-hsync-not-flushing-my-hdfs-file
   
   The default HDFS sink doesn't specify the `UPDATE_LENGTH` flag to update the metadata of namenode, see https://github.com/apache/pulsar/blob/d7f65451dadc573fc2bb75dbb03cce705ed04d0a/pulsar-io/hdfs2/src/main/java/org/apache/pulsar/io/hdfs2/sink/HdfsSyncThread.java#L73


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-708858891


   `localrun` is using `bin/function-localrunner` to run your sink, while `create` is run your sink in the functions worker. 
   
   It looks like that you deploy the functions worker with broker, so it may be different with running local runner. Could you try to [deploy your functions worker separately](https://pulsar.apache.org/docs/en/functions-worker/#run-functions-worker-separately)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717646203


   @handywc can you add the hadoop.tmp.dir = /tmp/bar configuration in core-site.xml and test again?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717756191


    @handywc can you change the config hadoop.tmp.dir = tmp/bar and change the directory = tmp/bar?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 edited a comment on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 edited a comment on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717645920


   can you add the hadoop.tmp.dir = /tmp/bar configuration in core-site.xml and test again? @handynwc


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-713269927


   can i have a look for you core-site.xml and hdfs-site.xml?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717754413


   I add the port, but still like before.
   ![image](https://user-images.githubusercontent.com/25581080/97405415-9394e500-1932-11eb-9ade-5a6faacc30f3.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 edited a comment on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 edited a comment on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717756191


    @handywc can you change the config hadoop.tmp.dir = tmp/bar and change the directory = tmp/bar then test again


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-708231778


   hi,tks for your reply.
   
   I copy the core-site.xml and the hdfs-site.xml from the namenode of hadoop cluster,and the hosts file is set.
   
   I will write a example code to verfiy if the xml configuration is correct or not later.  
   
   tks again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717654717


   @handywc the fs.defaultFS configuration in your core-site.xml is hdfs://dims32, should we add the port for this? like hdfs://dims32:port


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717753828


   I add the port, but still like before.
   ![image](https://user-images.githubusercontent.com/25581080/97405161-2ed98a80-1932-11eb-8490-a9e888e98299.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-718376348


   @congbobo184 The pulsar01 doesn't have the core-site.xml. It writes to hdfs after when I copy the file to pulsar01.
   Thank you.
   But I have another questions.
   ![image](https://user-images.githubusercontent.com/25581080/97530487-3ad65280-19ed-11eb-9824-74b08032d472.png)
   I set the configuration of syncinterval and maxpendingrecords, but it doesn't seem to work.
   I produce five messages
   ![image](https://user-images.githubusercontent.com/25581080/97530714-a9b3ab80-19ed-11eb-8ed5-aa5ffa81ebd9.png)
   But the file in hdfs is still empty unless I delete the sink .
   ![image](https://user-images.githubusercontent.com/25581080/97530781-cea81e80-19ed-11eb-935b-6ea877c93b3e.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717768683


   Thank you for your patience.
   I change the setting, but it's not ok.
   ![image](https://user-images.githubusercontent.com/25581080/97408777-8d553780-1937-11eb-956a-aa4b9d3e403c.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 edited a comment on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 edited a comment on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717654717


   @handywc the fs.defaultFS configuration in your core-site.xml is `hdfs://dims32`, should we add the port for this? like hdfs://dims32:port


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717649072


   hi,thanks for your reply.
   I add the hadoop.tmp.dir = /tmp/bar configuration in core-site.xml and test again, but still like before.
   ![image](https://user-images.githubusercontent.com/25581080/97382027-641bb380-1905-11eb-9def-363133f5fd91.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717808659


   @handywc I think the pulsar01 don't have core-site.xml, so it will work by local file system.  may be you should copy the core-site.xml to pulsar01. You can try and I don't know if that's true. and then  I look up the code, if you set hadoop.tmp.dir = /tmp/bar and the directory should be "", you can choose one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-708847010


   hi
   when i execute the command:
   ![image](https://user-images.githubusercontent.com/25581080/96067764-0b8ff380-0ecd-11eb-970f-0c7020a117e5.png)
   It can write to hdfs successfully.
   but i execute the command:
   ![image](https://user-images.githubusercontent.com/25581080/96067936-5c075100-0ecd-11eb-8020-3e0faaebe02a.png)
   It writes to local file system.
   The different is between "localrun" and "create".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717645920


   can you add the hadoop.tmp.dir = /tmp/bar configuration in core-site.xml and test again?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717633569


   hi,I upload core-site.xml and hdfs-site.xml.
   By the way, the two files is copied from my hadoop cluster.
   
   [hdfs-site&core-site.zip](https://github.com/apache/pulsar/files/5448931/hdfs-site.core-site.zip)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-732518152


   @BewareMyPower Yeah, you're right. I think it may needs a flag for user to specify whether to set the `UPDATE_LENGTH` flag.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-731955716


   @BewareMyPower hi,tks for your reply.
   So the default HDFS sink is actually not available, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-708165443


   See https://github.com/apache/pulsar/blob/3a298f3404d597e6a94de981c5fbe570264dcba1/pulsar-io/hdfs2/src/main/java/org/apache/pulsar/io/hdfs2/sink/HdfsAbstractSink.java#L92-L99
   
   `path` is created from your `directory` and other config params.
   
   The HDFS sink just uses a Hadoop client to create `FileSystem` and open a file under the configured `directory` to write or append. So you should not add any prefix to the path of `directory`.
   
   > if i change the configuration of directory to “/tmp/bar”, it will write the file to the local file system
   
   It means there's something wrong with your `hdfs-site.xml`, it cannot load the Hadoop file system, so it use local file system instead.
   
   You can write a simple Hadoop client example to verify if the issue is related to the `pdfs-site.xml`, like:
   
   ```java
   Configuration conf = new Configuration();
   conf.addResource(new Path(/* path of your pdfs-site.xml */));
   FileSystem fs = FileSystem.get(conf);
   FSDataOutputStream stream = fs.create(/* file path */);
   // Then use `stream` to write some data
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 removed a comment on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
congbobo184 removed a comment on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717645920


   can you add the hadoop.tmp.dir = /tmp/bar configuration in core-site.xml and test again? @handynwc


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc removed a comment on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc removed a comment on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717754040


   ![image](https://user-images.githubusercontent.com/25581080/97405335-7102cc00-1932-11eb-9e0d-83e1f8f3da26.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717754040


   ![image](https://user-images.githubusercontent.com/25581080/97405335-7102cc00-1932-11eb-9e0d-83e1f8f3da26.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc removed a comment on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc removed a comment on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-717753828


   I add the port, but still like before.
   ![image](https://user-images.githubusercontent.com/25581080/97405161-2ed98a80-1932-11eb-8490-a9e888e98299.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] handywc commented on issue #8254: pulsar io hdfs could not execute

Posted by GitBox <gi...@apache.org>.
handywc commented on issue #8254:
URL: https://github.com/apache/pulsar/issues/8254#issuecomment-709672512


   hi
   I have deployed functions worker separately.
   ![image](https://user-images.githubusercontent.com/25581080/96201707-dd271c80-0f8f-11eb-9cbd-0d0c384de034.png)
   But still like before. I change the admin-url setting to functions worker's IP and Port.
   ![image](https://user-images.githubusercontent.com/25581080/96201791-13fd3280-0f90-11eb-8287-6115479e7ba7.png)
   It also writes to local file system. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org