You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/17 17:20:40 UTC

[GitHub] [hudi] nochimow opened a new issue #4622: [SUPPORT]

nochimow opened a new issue #4622:
URL: https://github.com/apache/hudi/issues/4622


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1014828799


   @nochimow I can confirm due to time precision change in 0.10.0 there needs to be some fix for redshift. I can't confirm the exact behavior, since the behavior is under the influence of the bug reported in that JIRA, also this is in redshift which i can't verify with. Hope you understand. We just need to prioritize the fix and follow up aws support. Please close this if you don't have further questions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1039183574


   @nsivabalan Would not be useful to list kind of "Know Issues and Limitations" on each release? I think that it's not clear for the users that there is this kind of incompatibility with Redshift Spectrum on 0,10 onwards. For many users the Redshift integration is a core feature. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1017010022


   @nochimow : let us know how did it go. or if you need hudi-cli commands to do restore, let us know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] JorgenG commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
JorgenG commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1037012212


   @nsivabalan Considering that we seem to be at AWS mercy here, could there be an option to have some config flag which uses the old precision? Or is there some new features relying on this change to be present?
   
   We have this chicken and egg problem with adopting dbt now. We want to use redshift spectrum for queries and dbt spark for transforms. (Which was added in 0.10) But that renders spectrum unusable. 
   
   Best regards, Jørgen


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1047399531


   yes, makes sense. I will see where we can document this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1014778381


   @nochimow this is likely caused by the same issue in https://issues.apache.org/jira/browse/HUDI-3056 where the timeline's time precision is not handled properly. But when you write data with 0.9.0, the old time precision was used then partial data is shown in redshift. While we can't troubleshoot any aws services as we're open-source, I'm adding this issue to the ticket to push up the priority. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow edited a comment on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow edited a comment on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1014790581


   @xushiyan I opened a ticket to AWS support about this too to try to get more priority too.
   
   Can you confirm what is the Hudi behaviour in this case? If i write new data into a partition, all this partition will be visible to Redshift again or just the updated/inserted rows?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow edited a comment on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow edited a comment on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1022567915


   Hello,
   I did a workaround to fix this issue that i didnt need to restore my table.
   Apparently, after the hudi table downgrade to 0.9 when I do a Hudi upsert operation into a partition, the rows of the partition become visible for the Redshift Spectrum again, so i forced kind of a fake update for all the partitions in my table.
   I'm still trying to correct all the rows all my partitions. But 95% of my rows are already normal again, maybe the remaining 5% is something i'm still missing on my side that i'm currently checking.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1014943007


   Probably you might have to do a restore to a older commit. Can you give it a try.
   add a savepoint to a commit which was created w/ 0.9.0. and then trigger a restore to that savepointed commit.
   it should restore your entire table to a older snapshot. (its destructive operation though. something to keep in mind)
   essentially restore will delete all data files and timeline files from now until the savepoint to which you are looking to restore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1078326355


   @xushiyan Is it possible to try to escalate this issue with AWS again? There is still no return from their side?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] JorgenG commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
JorgenG commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1036334002


   So for now the status is that using 0.10.x of Hudi is incompatible with redshift spectrum? We are exploring using Hudi for our data lake and came across this very same problem which confused me a lot. I'll do the downgrade now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow edited a comment on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow edited a comment on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1014790581


   @xushiyan I opened a ticket to AWS support about this too to try to get more priority too.
   
   Can you confirm what is the Hudi behaviour in this case? If i write new data into a partition, all rows in this partition will be visible to Redshift again or just the updated/inserted rows?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1014790581


   @xushiyan I opened a ticket to AWS support about this too to try to get more priority too.
   
   Can you confirm what is the Hudi behaviour in this case? If i write new data into a partition, all this partition will be visible to Redshift again or just the updates/inserted rows?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nochimow commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

Posted by GitBox <gi...@apache.org>.
nochimow commented on issue #4622:
URL: https://github.com/apache/hudi/issues/4622#issuecomment-1022567915


   Hello,
   I did a workaround to fix this issue that i didnt need to restore my table.
   Apparently, after the hudi table downgrade to 0.9 when I do a Hudi upsert operation into a partition, the rows of the partition become visible for the Redshift Spectrum again. 
   I'm still trying to correct all the rows all my partitions. But 95% of my rows are already normal again, maybe the remaining 5% is something i'm still missing on my side that i'm currently checking.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org