You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ritesh H Shukla (Jira)" <ji...@apache.org> on 2022/06/08 22:06:00 UTC

[jira] [Resolved] (HDDS-6584) [spark] Spark-HWC Error log with AcidUtils

     [ https://issues.apache.org/jira/browse/HDDS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ritesh H Shukla resolved HDDS-6584.
-----------------------------------
    Resolution: Fixed

> [spark] Spark-HWC Error log with AcidUtils
> ------------------------------------------
>
>                 Key: HDDS-6584
>                 URL: https://issues.apache.org/jira/browse/HDDS-6584
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.3.0
>            Reporter: Soumitra Sulav
>            Priority: Major
>         Attachments: spark-hwc-aicderror-debug.log, spark-hwc-aicderror-info.log
>
>
> AcidUtils error messages are observed with Spark HiveWarehouseConnector with OzoneFilesystem.
> The job doesn't abort but this might lead to issues in acid scenarios.
> *Test:* TPCDS queries are run via spark-hwc on the ozone filesystem.
> Table Info under query
> {code:java}
> |# Detailed Table Information|
> |Database                    |tpcds_src
> |Table                       |store_sales
> |Owner                       |hrt_qa
> |Created Time                |Fri Mar 04 14:48:15 UTC 2022
> |Last Access                 |Thu Jan 01 00:00:00 UTC 1970
> |Created By                  |Spark 2.2 or prior
> |Type                        |EXTERNAL
> |Provider                    |hive
> |Table Properties            |[numFilesErasureCoded=0, bucketing_version=2, transient_lastDdlTime=1646405295]
> |Statistics                  |388445409 bytes
> |Location  |o3fs://hivetest.ozonestage.ozone1/user/hrt_qa/tpcds/tests/data/store_sales
> |Serde Library          |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> |InputFormat            |org.apache.hadoop.mapred.TextInputFormat
> |OutputFormat  |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> |Storage Properties          |[serialization.format=|, field.delim=|]
> |Partition Provider          |Catalog {code}
> *Info Logs :*
> {code:java}
> # spark-shell --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-114.jar --master yarn --deploy-mode client --conf spark.sql.broadcastTimeout=1000 --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.driver.memory=15g --conf spark.network.timeout=1000s --conf spark.sql.crossJoin.enabled=true --conf spark.eventLog.enabled=false --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/quasar-whkave-8.quasar-whkave.root.hwx.site@ROOT.HWX.SITE --conf spark.executor.memory=2g --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator --conf spark.driver.log.persistToDfs.enabled=false --conf spark.security.credentials.hiveserver2.enabled=true --name "PySparkShellT" {code}
> {code:java}
> scala> spark.sql("SELECT * FROM ( SELECT i_category, i_class, i_brand, s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) OVER (PARTITION BY i_category, i_brand, s_store_name, s_company_name) avg_monthly_sales FROM item, store_sales, date_dim, store WHERE ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND ss_store_sk = s_store_sk AND d_year IN (1999) AND ((i_category IN ('Books', 'Electronics', 'Sports') AND i_class IN ('computers', 'stereo', 'football')) OR (i_category IN ('Men', 'Jewelry', 'Women') AND i_class IN ('shirts', 'birdal', 'dresses'))) GROUP BY i_category, i_class, i_brand, s_store_name, s_company_name, d_moy) tmp1 WHERE CASE WHEN (avg_monthly_sales <> 0) THEN (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) ELSE NULL END > 0.1 ORDER BY sum_sales - avg_monthly_sales, s_store_name LIMIT 100").show()
> 22/03/07 12:45:09 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> Hive Session ID = 500e8d1c-0481-4dd8-96ce-7040f9ebea0f
> 22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
> 22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
> 22/03/07 12:45:15 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:15 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:17 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem {code}
> Attached are [^spark-hwc-aicderror-debug.log]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org