You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Soumitra Sulav (Jira)" <ji...@apache.org> on 2022/04/14 14:59:00 UTC

[jira] [Created] (HDDS-6584) [spark] Spark-HWC Error log with AcidUtils

Soumitra Sulav created HDDS-6584:
------------------------------------

             Summary: [spark] Spark-HWC Error log with AcidUtils
                 Key: HDDS-6584
                 URL: https://issues.apache.org/jira/browse/HDDS-6584
             Project: Apache Ozone
          Issue Type: Bug
          Components: build
    Affects Versions: 1.3.0
            Reporter: Soumitra Sulav
         Attachments: spark-hwc-aicderror-debug.log, spark-hwc-aicderror-info.log

AcidUtils error messages are observed with Spark HiveWarehouseConnector with OzoneFilesystem.

The job doesn't abort but this might lead to issues in acid scenarios.

*Test:* TPCDS queries are run via spark-hwc on the ozone filesystem.

Table Info under query
{code:java}
|# Detailed Table Information|
|Database                    |tpcds_src
|Table                       |store_sales
|Owner                       |hrt_qa
|Created Time                |Fri Mar 04 14:48:15 UTC 2022
|Last Access                 |Thu Jan 01 00:00:00 UTC 1970
|Created By                  |Spark 2.2 or prior
|Type                        |EXTERNAL
|Provider                    |hive
|Table Properties            |[numFilesErasureCoded=0, bucketing_version=2, transient_lastDdlTime=1646405295]
|Statistics                  |388445409 bytes
|Location  |o3fs://hivetest.ozonestage.ozone1/user/hrt_qa/tpcds/tests/data/store_sales
|Serde Library          |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
|InputFormat            |org.apache.hadoop.mapred.TextInputFormat
|OutputFormat  |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
|Storage Properties          |[serialization.format=|, field.delim=|]
|Partition Provider          |Catalog {code}
*Info Logs :*
{code:java}
# spark-shell --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-114.jar --master yarn --deploy-mode client --conf spark.sql.broadcastTimeout=1000 --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.driver.memory=15g --conf spark.network.timeout=1000s --conf spark.sql.crossJoin.enabled=true --conf spark.eventLog.enabled=false --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/quasar-whkave-8.quasar-whkave.root.hwx.site@ROOT.HWX.SITE --conf spark.executor.memory=2g --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator --conf spark.driver.log.persistToDfs.enabled=false --conf spark.security.credentials.hiveserver2.enabled=true --name "PySparkShellT" {code}
{code:java}
scala> spark.sql("SELECT * FROM ( SELECT i_category, i_class, i_brand, s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) OVER (PARTITION BY i_category, i_brand, s_store_name, s_company_name) avg_monthly_sales FROM item, store_sales, date_dim, store WHERE ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND ss_store_sk = s_store_sk AND d_year IN (1999) AND ((i_category IN ('Books', 'Electronics', 'Sports') AND i_class IN ('computers', 'stereo', 'football')) OR (i_category IN ('Men', 'Jewelry', 'Women') AND i_class IN ('shirts', 'birdal', 'dresses'))) GROUP BY i_category, i_class, i_brand, s_store_name, s_company_name, d_moy) tmp1 WHERE CASE WHEN (avg_monthly_sales <> 0) THEN (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) ELSE NULL END > 0.1 ORDER BY sum_sales - avg_monthly_sales, s_store_name LIMIT 100").show()
22/03/07 12:45:09 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = 500e8d1c-0481-4dd8-96ce-7040f9ebea0f
22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
22/03/07 12:45:15 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:15 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:17 ERROR io.AcidUtils: Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.ozone.OzoneFileSystem {code}
Attached are [^spark-hwc-aicderror-debug.log]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org