You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/11 04:01:13 UTC
[GitHub] [hudi] gaoasi opened a new issue #4790: [SUPPORT] hudi-kafka-connect data missing!
gaoasi opened a new issue #4790:
URL: https://github.com/apache/hudi/issues/4790
i used hudi-kafka-connect to load logs from kafka to hudi
config is :
bootstrap.servers=localhost:9092
group.id=hudi-connect-cluster-test
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=60000
listeners=HTTP://localhost:8083
plugin.path=/opt/hudi-kaka-connect/kafka/plugins
the result is:
in the hudi table path,there is only the logs file,no data files
![image](https://user-images.githubusercontent.com/43560964/153535657-4c1ac24e-40d6-4824-8c6e-2b2fde73ae70.png)
why?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] gaoasi commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
gaoasi commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1038681195
thanks for answer!
i have a doubt,may i should configure a separate async compaction to execute compactions every time when the new data arrives,or i can configure a separate async compaction to execute compactions only once when the task first start
@nsivabalan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] gaoasi closed issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
gaoasi closed issue #4790:
URL: https://github.com/apache/hudi/issues/4790
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] gaoasi commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
gaoasi commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039994066
@nsivabalan Thank you very much for your patient answer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] gaoasi commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
gaoasi commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039994066
@nsivabalan Thank you very much for your patient answer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039511479
I will let experts weigh in here. But to my knowledge, scheduling will be done by kafka writers (connect). but for execution, you might have to configure a separate job and ensure compaction gets executed by it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039148288
@yihua @rmahindra123 : Can you folks assist here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1037429147
yes, in kafka connect hudi directly writes to log files. Expectation is that, user should configure a separate async compaction to execute compactions. And then you can start to see base parquet files. But actual writing from kafka nodes directly go into log files.
so this is expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039511479
I will let experts weigh in here. But to my knowledge, scheduling will be done by kafka writers (connect). but for execution, you might have to configure a separate job and ensure compaction gets executed by it.
https://hudi.apache.org/docs/compaction/#hudi-compactor-utility
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039511479
I will let experts weigh in here. But to my knowledge, scheduling will be done by kafka writers (connect). but for execution, you might have to configure a separate job and ensure compaction gets executed by it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1037429147
yes, in kafka connect hudi directly writes to log files. Expectation is that, user should configure a separate async compaction to execute compactions. And then you can start to see base parquet files. But actual writing from kafka nodes directly go into log files.
so this is expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #4790: [SUPPORT] hudi-kafka-connect data missing!
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #4790:
URL: https://github.com/apache/hudi/issues/4790#issuecomment-1039511479
I will let experts weigh in here. But to my knowledge, scheduling will be done by kafka writers (connect). but for execution, you might have to configure a separate job and ensure compaction gets executed by it.
https://hudi.apache.org/docs/compaction/#hudi-compactor-utility
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org