You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "LiJie20190102 (via GitHub)" <gi...@apache.org> on 2023/03/17 09:55:28 UTC

[GitHub] [hudi] LiJie20190102 opened a new issue, #8215: [SUPPORT]

LiJie20190102 opened a new issue, #8215:
URL: https://github.com/apache/hudi/issues/8215

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I use HoodieDeltaStreamer to continuously receive data from kafka and synchronize it to hive, but when I use the spark-shell to query table data, I find that the table data has not changed.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Prepare Kafka's topic and continuously produce data
   2. Utilize hudi-utilities-bundle_2.12-0.13.0.jar continuously receives data from Kafka and synchronizes it to hive
   3.  Using spark-shell queries: spark.sql("select * from test_aa").show(10,false)
   
   **Expected behavior**
   In the spark-shell, each time you execute "select * from test_aa", the latest data in the query table is queried
   
   
   **Environment Description**
   
   * Hudi version : 2.12-0.13.0
   
   * Spark version : 3.2.3
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.4
   
   * Storage (HDFS/S3/GCS..) : hdfs
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1478976949

   @LiJie20190102 Closing this issue. I have added a comment on the issue. Most likely that's a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1477846114

   @LiJie20190102  Also you can set "--conf spark.sql.filesourceTableRelationCacheSize=0" while starting spark shell. It will make spark not cache the relation and you will always get the latest data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] LiJie20190102 commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "LiJie20190102 (via GitHub)" <gi...@apache.org>.
LiJie20190102 commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1477786444

   > @LiJie20190102 Are you able to query the latest data using Hive queries?
   
   Can be queried, this is not a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1477347225

   @LiJie20190102 Let us know if restarting spark-shell resolved your issue. We can close it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1476740454

   @LiJie20190102 I was able to reproduce the issue. When I had spark shell opened, it was still giving me old data while when querying on hive it was giving the latest data. 
   
   In spark shell, it caches the old data frame and when we run again it reads from the cached data frame. That is why we see old data. When I tried terminating the shell and start again, spark shell was also able to fetch the latest data.
   
   <img width="1723" alt="image" src="https://user-images.githubusercontent.com/63430370/226430841-e3923efe-f475-4172-b3b7-27da224a328f.png">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] lokeshj1703 commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "lokeshj1703 (via GitHub)" <gi...@apache.org>.
lokeshj1703 commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1475999391

   @LiJie20190102 Are you able to query the latest data from Hive?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] LiJie20190102 commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "LiJie20190102 (via GitHub)" <gi...@apache.org>.
LiJie20190102 commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1477786755

   > @LiJie20190102 Let us know if restarting spark-shell resolved your issue. We can close it.
   
   Can be queried, this is not a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope closed issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope closed issue #8215: [SUPPORT] spark-shell cannot obtain the latest data
URL: https://github.com/apache/hudi/issues/8215


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] LiJie20190102 commented on issue #8215: [SUPPORT] spark-shell cannot obtain the latest data

Posted by "LiJie20190102 (via GitHub)" <gi...@apache.org>.
LiJie20190102 commented on issue #8215:
URL: https://github.com/apache/hudi/issues/8215#issuecomment-1477857175

   > @LiJie20190102 Also you can set "--conf spark.sql.filesourceTableRelationCacheSize=0" while starting spark shell. It will make spark not cache the relation and you will always get the latest data.
   
   okay, thank you. And if you are interested, can you help me answer this question #8257 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org