You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/06 12:05:00 UTC
[GitHub] [hudi] TamilselvanBalaiah opened a new issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
TamilselvanBalaiah opened a new issue #3423:
URL: https://github.com/apache/hudi/issues/3423
HI,
I am getting "Table or view not found error", when I am using below transformer sql in hoodie preperties file.
" hoodie.deltastreamer.transformer.sql=SELECT a.CLM_HDR_ADMISSION_DETAIL_SID, a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, a.ADMISSION_SOURCE_LKPCD, a.PATIENT_STATUS_LKPCD, a.CREATED_BY, to_date(a.CREATED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as CREATED_DATE, a.MODIFIED_BY, to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as MODIFIED_DATE, b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year, string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a, default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid"
Here "default.ad_claim_header_ro" table were successfully loaded into S3 partition buckets(hudi datasets). Since the table was loaded into the target, I am trying to get the column from the table with using inner join.
While reading the data from Ad_Claim_Header table, am getting the "Table or view not found" error. But the tables are already exists in the particular database(default) and S3 path.
I can able to query the Ad_Claim_Header table in HIVE and Spark sql without any issues. Only the problem in Apache Hudi. Is that anywhere, do we need to do any of the configuration for reading the existing tables while doing the processing of another datasets.
Can anyone please help me on this.
spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
--packages org.apache.hudi:hudi-utilities-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.5 \
--master yarn --deploy-mode cluster \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.hive.convertMetastoreParquet=false \
/usr/lib/hudi/org.apache.hudi_hudi-utilities-bundle_2.11-0.5.2-incubating.jar \
--table-type MERGE_ON_READ \
--op BULK_INSERT \
--source-ordering-field CLM_HDR_ADMISSION_DETAIL_SID \
--props s3://aws-glue-udp-e2e-bkt-raw/properties/ad-clm-hdr-admission-detail.properties \
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
--target-base-path s3://aws-glue-udp-e2e-bkt-dtlke-raw/adj-claim/PRDMMIS/ad_clm_hdr_admission_detail --target-table default.ad_clm_hdr_admission_detail \
--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \
--payload-class org.apache.hudi.payload.AWSDmsAvroPayload \
--enable-hive-sync
Properties File
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
hoodie.datasource.write.partitionpath.field=year,month
hoodie.deltastreamer.transformer.sql=SELECT a.CLM_HDR_ADMISSION_DETAIL_SID, a.CLAIM_HEADER_SID, a.ADMISSION_TYPE_LKPCD, a.ADMISSION_SOURCE_LKPCD, a.PATIENT_STATUS_LKPCD, a.CREATED_BY, to_date(a.CREATED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as CREATED_DATE, a.MODIFIED_BY, to_date(a.MODIFIED_DATE,'DD-MON-YYYY HH24:MI:SS AM') as MODIFIED_DATE, b.clm_enc_flag, string(year(to_date(a.CREATED_DATE))) as year, string(month(to_date(a.CREATED_DATE))) as month FROM <SRC> a, default.ad_claim_header_ro b WHERE a.claim_header_sid = b.claim_header_sid
hoodie.datasource.write.recordkey.field=CLM_HDR_ADMISSION_DETAIL_SID
hoodie.datasource.write.hive_style_partitioning=true
#hive sync settings, uncomment if using flag --enable-hive-sync
hoodie.datasource.hive_sync.partition_fields=year,month
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
hoodie.datasource.hive_sync.table=ad_clm_hdr_admission_detail
# DFS Source
hoodie.deltastreamer.source.dfs.root=s3://aws-glue-udp-e2e-bkt-raw/adj-claim/PRDMMIS/AD_CLM_HDR_ADMISSION_DETAIL
**Environment Description**
* EMR version : 5.33
* Hudi version : hudi-utilities-bundle_2.11-0.5.2-incubating.jar
* Spark version : 2.4.7
* Hive version : 2.3.7
* Hadoop version : 2.10.1
* Storage (HDFS/S3/GCS..) :S3
* Running on Docker? (yes/no) : No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-901490970
I am not sure if you can join w/ hudi table from within a transformer. from what I know, incoming df is treated as a table (sparksession.registerTempTable()) and then you can perform a sql query on top of it to produce output df before ingesting to hudi.
@bvaradar : Can you help here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-989919588
We have a tracking [jira](https://issues.apache.org/jira/browse/HUDI-1627) to support querying hive tables via sql transformer. We plan to support this for 0.11.0. Closing the issue for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3423:
URL: https://github.com/apache/hudi/issues/3423
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-903013013
The spark invocation needs to have hive configurations setup correctly so that it can talk to HMS and have spark sql working. can you confirm that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] TamilselvanBalaiah commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
TamilselvanBalaiah commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-900170433
Can anyone please help me on this.
Thanks in advance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-926274535
@TamilselvanBalaiah any updates on this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-901490970
sorry for the late turn around.
I am not sure if you can join w/ hudi table from within a transformer. from what I know, incoming df is treated as a table (sparksession.registerTempTable()) and then you can perform a sql query on top of it to produce output df before ingesting to hudi.
@bvaradar : Can you help here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] codope commented on issue #3423: [SUPPORT] Table or view not found error in Hoodie Delta Streamer
Posted by GitBox <gi...@apache.org>.
codope commented on issue #3423:
URL: https://github.com/apache/hudi/issues/3423#issuecomment-905184155
@TamilselvanBalaiah You have to place the hive-site.xml within `<SPARK_INSTALL>/conf/` dir.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org