You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/06 12:05:00 UTC

[GitHub] [hudi] TamilselvanBalaiah opened a new issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

TamilselvanBalaiah opened a new issue #3423:
URL: https://github.com/apache/hudi/issues/3423


   HI,
   
   I am getting "Table or view not found error", when I am using below transformer sql in hoodie preperties file.
   " hoodie.deltastreamer.transformer.sql=SELECT a.CLM_HDR_ADMISSION_DETAIL_SID, a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, a.ADMISSION_SOURCE_LKPCD, a.PATIENT_STATUS_LKPCD, a.CREATED_BY, to_date(a.CREATED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as CREATED_DATE, a.MODIFIED_BY, to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as MODIFIED_DATE, b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year, string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a, default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid"
   
   Here "default.ad_claim_header_ro" table were successfully loaded into S3 partition buckets(hudi datasets). Since the table was loaded into the target, I am trying to get the column from the table with using inner join.
   
   While reading the data from Ad_Claim_Header table, am getting the "Table or view not found" error. But the tables are already exists in the particular database(default) and S3 path.
   
   I can able to query the Ad_Claim_Header table in HIVE and Spark sql without any issues. Only the problem in Apache Hudi. Is that anywhere, do we need to do any of the configuration for reading the existing tables while doing the processing of another datasets.
   
   Can anyone please help me on this.
   
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  \
     --packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.5 \
     --master yarn --deploy-mode cluster \
     --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
     --conf spark.sql.hive.convertMetastoreParquet=false \
     /usr/lib/hudi/org.apache.hudi_hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
     --table-type MERGE_ON_READ \
     --op BULK_INSERT \
     --source-ordering-field CLM_HDR_ADMISSION_DETAIL_SID \
     --props s3://aws-glue-udp-e2e-bkt-raw/properties/ad-clm-hdr-admission-detail.properties \
     --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
     --target-base-path s3://aws-glue-udp-e2e-bkt-dtlke-raw/adj-claim/PRDMMIS/ad_clm_hdr_admission_detail --target-table default.ad_clm_hdr_admission_detail \
     --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
     --payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
     --enable-hive-sync
     
   Properties File
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.partitionpath.field=year,month
   hoodie.deltastreamer.transformer.sql=SELECT a.CLM_HDR_ADMISSION_DETAIL_SID, a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, a.ADMISSION_SOURCE_LKPCD, a.PATIENT_STATUS_LKPCD, a.CREATED_BY, to_date(a.CREATED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as CREATED_DATE, a.MODIFIED_BY, to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as MODIFIED_DATE, b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year, string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a, default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid
   hoodie.datasource.write.recordkey.field=CLM_HDR_ADMISSION_DETAIL_SID
   hoodie.datasource.write.hive_style_partitioning=true
   #hive sync settings, uncomment if using flag --enable-hive-sync
   hoodie.datasource.hive_sync.partition_fields=year,month
   hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
   hoodie.datasource.hive_sync.table=ad_clm_hdr_admission_detail
   # DFS Source
   hoodie.deltastreamer.source.dfs.root=s3://aws-glue-udp-e2e-bkt-raw/adj-claim/PRDMMIS/AD_CLM_HDR_ADMISSION_DETAIL
   **Environment Description**
   * EMR version : 5.33
   
   * Hudi version : hudi-utilities-bundle_2.11-0.5.2-incubating.jar
   
   * Spark version : 2.4.7
   
   * Hive version : 2.3.7
   
   * Hadoop version : 2.10.1
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) : No
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-901490970


   I am not sure if you can join w/ hudi table from within a transformer. from what I know, incoming df is treated as a table (sparksession.registerTempTable()) and then you can perform a sql query on top of it to produce output df before ingesting to hudi. 
   
   @bvaradar : Can you help here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-989919588


   We have a tracking [jira](https://issues.apache.org/jira/browse/HUDI-1627) to support querying hive tables via sql transformer. We plan to support this for 0.11.0. Closing the issue for now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #3423:
URL: https://github.com/apache/hudi/issues/3423


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-903013013


   The spark invocation needs to have hive configurations setup correctly so that it can talk to HMS and have spark sql working. can you confirm that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] TamilselvanBalaiah commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

TamilselvanBalaiah commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-900170433


   Can anyone please help me on this.
   
   Thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-926274535


   @TamilselvanBalaiah any updates on this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan edited a comment on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

nsivabalan edited a comment on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-901490970


   sorry for the late turn around. 
   I am not sure if you can join w/ hudi table from within a transformer. from what I know, incoming df is treated as a table (sparksession.registerTempTable()) and then you can perform a sql query on top of it to produce output df before ingesting to hudi. 
   
   @bvaradar : Can you help here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer

Posted by GitBox <gi...@apache.org>.

codope commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-905184155


   @TamilselvanBalaiah You have to place the hive-site.xml within `<SPARK_INSTALL>/conf/` dir.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org