You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ritesh H Shukla (Jira)" <ji...@apache.org> on 2022/06/08 22:06:00 UTC

[jira] [Commented] (HDDS-6584) [spark] Spark-HWC Error log with AcidUtils

    [ https://issues.apache.org/jira/browse/HDDS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551804#comment-17551804 ] 

Ritesh H Shukla commented on HDDS-6584:
---------------------------------------

Based on my initial look at Hive code, this error is due to Hive trying to use FileID with the list command which is not supported by Ozone.

Can you try setting the following config and trying? This will cause Hive to not try operations to get FileID. This can be specified when running the Hive query against an Ozone datastore.
{code:java}
hive.orc.splits.include.fileid=false {code}
Post the first occurrence of the check the flag is set to false for any future operations and from the testing above it looks like the query does complete.

 

> [spark] Spark-HWC Error log with AcidUtils
> ------------------------------------------
>
>                 Key: HDDS-6584
>                 URL: https://issues.apache.org/jira/browse/HDDS-6584
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.3.0
>            Reporter: Soumitra Sulav
>            Priority: Major
>         Attachments: spark-hwc-aicderror-debug.log, spark-hwc-aicderror-info.log
>
>
> AcidUtils error messages are observed with Spark HiveWarehouseConnector with OzoneFilesystem.
> The job doesn't abort but this might lead to issues in acid scenarios.
> *Test:* TPCDS queries are run via spark-hwc on the ozone filesystem.
> Table Info under query
> {code:java}
> |# Detailed Table Information|
> |Database                    |tpcds_src
> |Table                       |store_sales
> |Owner                       |hrt_qa
> |Created Time                |Fri Mar 04 14:48:15 UTC 2022
> |Last Access                 |Thu Jan 01 00:00:00 UTC 1970
> |Created By                  |Spark 2.2 or prior
> |Type                        |EXTERNAL
> |Provider                    |hive
> |Table Properties            |[numFilesErasureCoded=0, bucketing_version=2, transient_lastDdlTime=1646405295]
> |Statistics                  |388445409 bytes
> |Location  |o3fs://hivetest.ozonestage.ozone1/user/hrt_qa/tpcds/tests/data/store_sales
> |Serde Library          |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> |InputFormat            |org.apache.hadoop.mapred.TextInputFormat
> |OutputFormat  |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> |Storage Properties          |[serialization.format=|, field.delim=|]
> |Partition Provider          |Catalog {code}
> *Info Logs :*
> {code:java}
> # spark-shell --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-114.jar --master yarn --deploy-mode client --conf spark.sql.broadcastTimeout=1000 --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.driver.memory=15g --conf spark.network.timeout=1000s --conf spark.sql.crossJoin.enabled=true --conf spark.eventLog.enabled=false --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/quasar-whkave-8.quasar-whkave.root.hwx.site@ROOT.HWX.SITE --conf spark.executor.memory=2g --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator --conf spark.driver.log.persistToDfs.enabled=false --conf spark.security.credentials.hiveserver2.enabled=true --name "PySparkShellT" {code}
> {code:java}
> scala> spark.sql("SELECT * FROM ( SELECT i_category, i_class, i_brand, s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) OVER (PARTITION BY i_category, i_brand, s_store_name, s_company_name) avg_monthly_sales FROM item, store_sales, date_dim, store WHERE ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND ss_store_sk = s_store_sk AND d_year IN (1999) AND ((i_category IN ('Books', 'Electronics', 'Sports') AND i_class IN ('computers', 'stereo', 'football')) OR (i_category IN ('Men', 'Jewelry', 'Women') AND i_class IN ('shirts', 'birdal', 'dresses'))) GROUP BY i_category, i_class, i_brand, s_store_name, s_company_name, d_moy) tmp1 WHERE CASE WHEN (avg_monthly_sales <> 0) THEN (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) ELSE NULL END > 0.1 ORDER BY sum_sales - avg_monthly_sales, s_store_name LIMIT 100").show()
> 22/03/07 12:45:09 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> Hive Session ID = 500e8d1c-0481-4dd8-96ce-7040f9ebea0f
> 22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:15 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:15 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:17 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem {code}
> Attached are [^spark-hwc-aicderror-debug.log]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org