You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/14 11:56:46 UTC

[GitHub] [hudi] satishmalladi-m opened a new issue, #6107: [SUPPORT]

satishmalladi-m opened a new issue, #6107:
URL: https://github.com/apache/hudi/issues/6107

   Hi 
   
   We are facing an issue when we first doing bulk_insert for batch load which we had 9 million records , we are getting two tables one is RT and other is RO Table for bulk_insert everything looks good , but for delta records when we do upsert we are able to update only one table i.e RT table getting updated but RO table is not able to update , could you please help me on this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1209987093

   @satishmalladi-m as mentioned by @KnightChess and @glory9211 , it's possible that compaction has not run which resulted in RO table not updated. can you confirm if the sync succeeds after compaction ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6107: [SUPPORT]

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1186393779

   looks like something to do with meta sync where RO is not getting sync'ed. please provide scripts and configs for reproducing then we can help from there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1296304199

   analysis and suggestions were provided above. closing due to inactivity


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] satishmalladi-m commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
satishmalladi-m commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1188921945

   > looks like something to do with meta sync where RO is not getting sync'ed. please provide scripts and configs for reproducing then we can help from there.
   
   
   please find below configuration which we are using currently
   
   hudi_options = {
   'hoodie.datasource.write.table.type': self._write_table_type,
   'hoodie.table.name': self._table_name,
   'hoodie.datasource.write.recordkey.field': self._record_key,
   'hoodie.datasource.write.partitionpath.field': self._partition_field,
   'hoodie.datasource.write.precombine.field': self._combine_key,
   'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',
   'hoodie.parquet.max.file.size': "20971520",
   'hoodie.datasource.hive_sync.enable': 'true',
   'hoodie.datasource.hive_sync.table': self._table_name.lower(),
   'hoodie.datasource.hive_sync.partition_fields': self._partition_field,
   'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
   'hoodie.datasource.hive_sync.database': self._hive_database.lower(),
   'hoodie.datasource.write.hive_style_partitioning': 'true',
   'hoodie.datasource.hive_sync.mode': 'hms',
   'hoodie.datasource.hive_sync.support_timestamp': 'true'
   }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] glory9211 commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
glory9211 commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1195603690

   > > looks like something to do with meta sync where RO is not getting sync'ed. please provide scripts and configs for reproducing then we can help from there.
   > 
   > please find below configuration which we are using currently
   > 
   > hudi_options = { 'hoodie.datasource.write.table.type': self._write_table_type, 'hoodie.table.name': self._table_name, 'hoodie.datasource.write.recordkey.field': self._record_key, 'hoodie.datasource.write.partitionpath.field': self._partition_field, 'hoodie.datasource.write.precombine.field': self._combine_key, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator', 'hoodie.parquet.max.file.size': "20971520", 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.table': self._table_name.lower(), 'hoodie.datasource.hive_sync.partition_fields': self._partition_field, 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.database': self._hive_database.lower(), 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': 'true' }
   
   
   As mentioned by @KnightChess RT and RO tables are synced when you run compaction on Hudi MOR tables.
   i.e. The Delta (Avro) Files are merged into the Parquet Files. In HUDI
   COW Tables == Data in Parquet Files
   MOR Tables == Data in Avro + Parquet Files
   
   You can read the configs from the docs [here](https://hudi.apache.org/docs/configurations/)
   
   Some sample configs you should provide
   
   ```
   ## Compaction
       'hoodie.compact.inline.max.delta.seconds' : 60,
       'hoodie.compact.inline.max.delta.commits' : 4,
       'hoodie.compact.inline.trigger.strategy' : 'NUM_OR_TIME',
       'hoodie.compact.inline' : True,
       'hoodie.datasource.compaction.async.enable' : True,
   
   ```
   
   This will trigger compaction after every 60 seconds or 4 delta commits for a streaming job.
   Read more about what is compaction in Hudi [here](https://hudi.apache.org/docs/compaction)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan closed issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
xushiyan closed issue #6107: [SUPPORT] RO table did not get updated while RT table did
URL: https://github.com/apache/hudi/issues/6107


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
KnightChess commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1186743466

   for RO table delta commit, It can only be seen after compact action


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] satishmalladi-m commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
satishmalladi-m commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1186795668

   please find below configuration which we are using currently
   
   hudi_options = {
              'hoodie.datasource.write.table.type': self._write_table_type,
              'hoodie.table.name': self._table_name,
              'hoodie.datasource.write.recordkey.field': self._record_key,
              'hoodie.datasource.write.partitionpath.field': self._partition_field,
              'hoodie.datasource.write.precombine.field': self._combine_key,
              'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator',
              'hoodie.parquet.max.file.size': "20971520",
              'hoodie.datasource.hive_sync.enable': 'true',
              'hoodie.datasource.hive_sync.table': self._table_name.lower(),
              'hoodie.datasource.hive_sync.partition_fields': self._partition_field,
              'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
              'hoodie.datasource.hive_sync.database': self._hive_database.lower(),
              'hoodie.datasource.write.hive_style_partitioning': 'true',
              'hoodie.datasource.hive_sync.mode': 'hms',
              'hoodie.datasource.hive_sync.support_timestamp': 'true'
        }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1229347692

   @satishmalladi-m : any updates please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6107: [SUPPORT] RO table did not get updated while RT table did

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6107:
URL: https://github.com/apache/hudi/issues/6107#issuecomment-1216270037

   @satishmalladi-m @KnightChess @glory9211 : any updates around this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org