You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/13 14:27:04 UTC
[GitHub] [hudi] sassai opened a new issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
sassai opened a new issue #1962:
URL: https://github.com/apache/hudi/issues/1962
**Describe the problem you faced**
I'm running a spark structured streaming application that reads data from kafka and saves it to a partitioned Hudi MERGE_ON_READ table. Hive sync is enabled and I'm able to query the table with the Hive CLI, e.g.:
SELECT * FROM iot_device_ro LIMIT 5;
```console
+------------------------------------+-------------------------------------+--------------------------------------------+---------------------------------------------+----------------------------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------------------------+----------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
| iot_device_ro._hoodie_commit_time | iot_device_ro._hoodie_commit_seqno | iot_device_ro._hoodie_record_key | iot_device_ro._hoodie_partition_path | iot_device_ro._hoodie_file_name | iot_device_ro.deviceid | iot_device_ro.sensorid | iot_device_ro.measurement | iot_device_ro.measure_ts | iot_device_ro.uuid | iot_device_ro.its | iot_device_ro.year | iot_device_ro.month | iot_device_ro.day | iot_device_ro.hour | iot_device_ro.minute |
+------------------------------------+-------------------------------------+--------------------------------------------+---------------------------------------------+----------------------------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------------------------+----------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
| 20200813121124 | 20200813121124_0_1 | uuid:3d387a37-f288-456b-87b7-2b6865cf32e0 | year=2020/month=8/day=13/hour=12/minute=11 | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4 | 1 | 30.228266831690732 | 2020-08-13T08:39:04.528Z | 3d387a37-f288-456b-87b7-2b6865cf32e0 | 2020-08-13 12:11:24 | 2020 | 8 | 13 | 12 | 11 |
| 20200813121124 | 20200813121124_0_2 | uuid:5bed809e-758f-46dc-b1ab-837ad3eb5a6a | year=2020/month=8/day=13/hour=12/minute=11 | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4 | 1 | 31.453188991515226 | 2020-08-13T08:39:19.588Z | 5bed809e-758f-46dc-b1ab-837ad3eb5a6a | 2020-08-13 12:11:24 | 2020 | 8 | 13 | 12 | 11 |
| 20200813121124 | 20200813121124_0_3 | uuid:6d37be34-6e4b-49b0-b3fe-e6552c2aee22 | year=2020/month=8/day=13/hour=12/minute=11 | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4 | 1 | 34.68735798194983 | 2020-08-13T07:45:05.958Z | 6d37be34-6e4b-49b0-b3fe-e6552c2aee22 | 2020-08-13 12:11:24 | 2020 | 8 | 13 | 12 | 11 |
| 20200813121124 | 20200813121124_0_4 | uuid:5c2dbea8-9668-4652-84c6-c82d06aa2805 | year=2020/month=8/day=13/hour=12/minute=11 | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4 | 1 | 33.680806905962264 | 2020-08-12T13:33:20.159Z | 5c2dbea8-9668-4652-84c6-c82d06aa2805 | 2020-08-13 12:11:24 | 2020 | 8 | 13 | 12 | 11 |
| 20200813121124 | 20200813121124_0_5 | uuid:528e6c74-bb44-49da-aa76-059781cc7676 | year=2020/month=8/day=13/hour=12/minute=11 | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4 | 1 | 31.38529683936205 | 2020-08-13T10:57:58.448Z | 528e6c74-bb44-49da-aa76-059781cc7676 | 2020-08-13 12:11:24 | 2020 | 8 | 13 | 12 | 11 |
+------------------------------------+-------------------------------------+--------------------------------------------+---------------------------------------------+----------------------------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------------------------+----------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
```
However if I want to apply a filter on the partition colum the result is empty:
SELECT * FROM iot_device_ro WHERE day=13 LIMIT 10;
```console
+------------------------------------+-------------------------------------+-----------------------------------+---------------------------------------+----------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------+--------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
| iot_device_ro._hoodie_commit_time | iot_device_ro._hoodie_commit_seqno | iot_device_ro._hoodie_record_key | iot_device_ro._hoodie_partition_path | iot_device_ro._hoodie_file_name | iot_device_ro.deviceid | iot_device_ro.sensorid | iot_device_ro.measurement | iot_device_ro.measure_ts | iot_device_ro.uuid | iot_device_ro.its | iot_device_ro.year | iot_device_ro.month | iot_device_ro.day | iot_device_ro.hour | iot_device_ro.minute |
+------------------------------------+-------------------------------------+-----------------------------------+---------------------------------------+----------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------+--------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
+------------------------------------+-------------------------------------+-----------------------------------+---------------------------------------+----------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------+--------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
No rows selected (70.91 secon
```
**To Reproduce**
Steps to reproduce the behavior:
Hudi Datasource Configuration in Spark:
```java
Dataset<Row> output =
streamingInput.withColumn("its", date_format(current_timestamp(), DATE_FORMAT));
DataStreamWriter<Row> writer =
output
.writeStream()
.format("hudi")
.option("hoodie.insert.shuffle.parallelism", "2")
.option("hoodie.upsert.shuffle.parallelism", "2")
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), tableType)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "uuid")
.option(
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY(),
ComplexKeyGenerator.class.getCanonicalName())
.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), partitions)
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "its")
.option(HoodieWriteConfig.TABLE_NAME, tableName)
.option("checkpointLocation", streamingCheckpointingPath)
.option(DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH_OPT_KEY(), "false")
.option(DataSourceWriteOptions.STREAMING_RETRY_CNT_OPT_KEY(), "10")
.outputMode(OutputMode.Append())
.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY(), tableName)
.option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY(), hiveDB)
.option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), hiveJdbcUrl)
.option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), "true")
.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY(), "true")
.option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY(), partitions)
.option(
DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY(),
MultiPartKeysValueExtractor.class.getCanonicalName());
```
Hudi Table Description in Hive:
```console
+--------------------------+------------+----------+
| col_name | data_type | comment |
+--------------------------+------------+----------+
| _hoodie_commit_time | string | |
| _hoodie_commit_seqno | string | |
| _hoodie_record_key | string | |
| _hoodie_partition_path | string | |
| _hoodie_file_name | string | |
| deviceid | string | |
| sensorid | string | |
| measurement | double | |
| measure_ts | string | |
| uuid | string | |
| its | string | |
| year | int | |
| month | int | |
| day | int | |
| hour | int | |
| minute | int | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| year | int | |
| month | int | |
| day | int | |
| hour | int | |
| minute | int | |
+--------------------------+------------+----------+
```
Hive Partitions created by HudiHiveSync:
```console
+---------------------------------------------+
| partition |
+---------------------------------------------+
| year=2020/month=8/day=13/hour=12/minute=11 |
| year=2020/month=8/day=13/hour=12/minute=17 |
| year=2020/month=8/day=13/hour=12/minute=18 |
| year=2020/month=8/day=13/hour=12/minute=19 |
| year=2020/month=8/day=13/hour=12/minute=20 |
| year=2020/month=8/day=13/hour=12/minute=21 |
```
Additional table information:
```console
Detailed Table Information
--
Database: | xxx|
OwnerType: | USER |
Owner: | xxx|
CreateTime: | Thu Aug 13 12:11:49 UTC 2020 |
LastAccessTime: | UNKNOWN |
Retention: | 0 |
Location: | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/streaming/tables/xxx/iot_device |
Table Type: | EXTERNAL_TABLE |
Table Parameters: | |
| EXTERNAL | TRUE
| bucketing_version | 2
| discover.partitions | true
| last_commit_time_sync | 20200813133453
| numFiles | 77
| numPartitions | 68
| numRows | 0
| rawDataSize | 0
| spark.sql.create.version | 2.2 or prior
| spark.sql.sources.schema.numPartCols | 5
| spark.sql.sources.schema.numParts | 1
| spark.sql.sources.schema.part.0 | {\"type\":\"struct\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_commit_seqno\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_record_key\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_partition_path\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_file_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"deviceid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"sensorid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"measurement\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"measure_ts\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"uuid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"its\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"y
ear\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"month\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"day\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"hour\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"minute\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}
| spark.sql.sources.schema.partCol.0 | year
| spark.sql.sources.schema.partCol.1 | month
| spark.sql.sources.schema.partCol.2 | day
| spark.sql.sources.schema.partCol.3 | hour
| spark.sql.sources.schema.partCol.4 | minute
| totalSize | 34194101
| transient_lastDdlTime | 1597324063
| |
Storage Information
SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |
InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat |
OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat |
Compressed: | No |
Num Buckets: | -1 |
Bucket Columns: | [] |
Sort Columns: | [] |
Storage Desc Params: | |
| serialization.format | 1
```
Strangely enough if I run the same query in Spark everything works as expected.
**Expected behavior**
Filter on partition columns can be applied on Hudi tables in Hive CLI.
**Environment Description**
* Hudi version : 0.5.3
* Spark version : 2.4.0
* Hive version : 3.1
* Hadoop version : 3
* Storage (HDFS/S3/GCS..) : ADLS
* Running on Docker? (yes/no) : no
What am I missing? Any help is very much appreciated!
Thank you in advance!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674071601
I'm running Hive within Cloudera Data Platform Public Cloud. I connected to Hive using HUE and Hive CLI.
Beeline Connection:
```console
[root@engineeringhub-master3 ~]# beeline --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://engineeringhub-master3.mtag-fas.ft71-kqhq.cloudera.site:2181/default;httpPath=cliservice;principal=hive/_HOST@MTAG-FAS.FT71-KQHQ.CLOUDERA.SITE;serviceDiscoveryMode=zooKeeper;ssl=true;transportMode=http;zooKeeperNamespace=hiveserver2
20/08/14 13:14:12 [main]: INFO jdbc.HiveConnection: Connected to engineeringhub-master3.mtag-fas.ft71-kqhq.cloudera.site:10001
20/08/14 13:14:12 [main]: WARN jdbc.HiveConnection: Failed to connect to engineeringhub-master3.mtag-fas.ft71-kqhq.cloudera.site:10001
20/08/14 13:14:12 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.input.format at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Beeline version 3.1.3000.7.2.0.0-237 by Apache Hive
beeline>
```
Later on we would like to use JDBC-Connections from external tools like PowerBI or Dbeaver.
I can query and filter all columns excluding the partiton columns: year, month, day, hour, minute.
If I apply aggregations on the partition columns I can filter them. This is very odd.
```sql
SELECT DISTINCT(minute), count(minute)
FROM greentech.iot_device_ro
GROUP BY minute
;
```
but
```sql
SELECT *
FROM greentech.iot_device_ro
WHERE minute=50
;
```
returns an emtpy result list.
Thank you for the quick response.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175
@bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-855191281
https://issues.apache.org/jira/browse/HUDI-1972
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar closed issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
vinothchandar closed issue #1962:
URL: https://github.com/apache/hudi/issues/1962
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai edited a comment on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai edited a comment on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-704079007
@bvaradar: I dropped and synced the table again. Issue still exists unless I set `set hive.fetch.task.conversion=none;` before executing the query. Then everything works as expected.
Please find the table description with and without partition below.
describe formatted address:
```console
+-------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type | comment |
+-------------------------------+----------------------------------------------------+-----------------------+
| _hoodie_commit_time | string | |
| _hoodie_commit_seqno | string | |
| _hoodie_record_key | string | |
| _hoodie_partition_path | string | |
| _hoodie_file_name | string | |
| id | int | |
| zipcode | int | |
| city | string | |
| street | string | |
| streetnumber | int | |
| uuid | string | |
| start_date | string | |
| end_date | string | |
| is_current | boolean | |
| event_time | string | |
| its | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| year | int | |
| month | int | |
| day | int | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | nyc_taxi | NULL |
| OwnerType: | USER | NULL |
| Owner: | srv_tu_usecase2_producer | NULL |
| CreateTime: | Tue Oct 06 07:06:03 UTC 2020 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | abfs://engineering@mtagdatalakeshowcase.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | bucketing_version | 2 |
| | discover.partitions | true |
| | last_commit_time_sync | 20201006065754 |
| | numFiles | 82 |
| | numPartitions | 1 |
| | numRows | 0 |
| | rawDataSize | 0 |
| | totalSize | 642825967 |
| | transient_lastDdlTime | 1601967963 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL |
| InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-------------------------------+----------------------------------------------------+-----------------------+
```
describe formatted address partition(year=2020,month=10,day=6):
```console
+-----------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type | comment |
+-----------------------------------+----------------------------------------------------+-----------------------+
| _hoodie_commit_time | string | |
| _hoodie_commit_seqno | string | |
| _hoodie_record_key | string | |
| _hoodie_partition_path | string | |
| _hoodie_file_name | string | |
| id | int | |
| zipcode | int | |
| city | string | |
| street | string | |
| streetnumber | int | |
| uuid | string | |
| start_date | string | |
| end_date | string | |
| is_current | boolean | |
| event_time | string | |
| its | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| year | int | |
| month | int | |
| day | int | |
| | NULL | NULL |
| # Detailed Partition Information | NULL | NULL |
| Partition Value: | [2020, 10, 6] | NULL |
| Database: | nyc_taxi | NULL |
| Table: | address | NULL |
| CreateTime: | Tue Oct 06 07:06:04 UTC 2020 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Location: | abfs://engineering@mtagdatalakeshowcase.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=6 | NULL |
| Partition Parameters: | NULL | NULL |
| | numFiles | 82 |
| | totalSize | 642825967 |
| | transient_lastDdlTime | 1601967964 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL |
| InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-----------------------------------+----------------------------------------------------+-----------------------+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-673694878
Did you set hive input format ? Also can you confirm you settings given [here](https://hudi.apache.org/docs/docker_demo.html#step-4-a-run-hive-queries) are set.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-691772192
Closing this due to inactivity
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674067152
how are you connecting to hive, is it beeline or some other means? if its beeline, you should be able to pass in configs as given in the quick start page. If by some other means, probably you have to find out how to set hive configs when you bring up the tool. Or probably you can paste the command you used to bring up hive(obfuscating any confidential info). we can try to help you out.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-703735047
@sassai : Location is set wrongly.
| Location: | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1 | NULL |
It should have been abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address only.
Can you try dropping the table and hive syncing again ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-702568499
@bvaradar: Sorry for the late reply. I was not able to investigate this issue further until now.
In the meantime I updated Hudi to 0.6.0 to check if the issue still occurs. Unfortunately yes. I created a table (COPY_ON_WRITE) with testing data for further debugging. Please find the requested information below:
Hudi data set:
```console
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux/.bootstrap
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux/.bootstrap/.fileids
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux/.bootstrap/.partitions
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.temp
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 9133 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171431.commit
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171431.commit.requested
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 999 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171431.inflight
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 1169 2020-10-01 17:18 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171823.commit
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:18 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171823.commit.requested
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 380 2020-10-01 17:18 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171823.inflight
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 2986 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001173346.commit
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001173346.commit.requested
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 1653 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001173346.inflight
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/archived
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 228 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/hoodie.properties
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10
drwxr-xr-x - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 0 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 93 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/.hoodie_partition_metadata
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7095995 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_0-89-2258_20201001173346.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7096113 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_8-25-180_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7126955 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_0-25-172_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7126790 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_1-89-2259_20201001173346.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7144341 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/4f7d1a69-112a-42d1-b0ac-adf8de1e8dad-0_7-25-179_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7120178 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/669c9159-a795-40ac-9827-4551965e1750-0_3-25-175_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7197719 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/ab9c4c36-ea10-47bc-bf05-1555bf07c4ad-0_2-25-174_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7158006 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/ad939d4e-d8dc-4723-b5a5-8ec2c064f3e3-0_6-25-178_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7170312 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/c4c0b379-79e8-4856-9605-e56b1beb1b09-0_4-25-176_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7118844 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/f5fae4bc-396d-4ff6-8b89-01c08559cb50-0_5-25-177_20201001171431.parquet
-rw-r--r-- 1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer 7156753 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/f804937c-8de2-4692-90d9-4b485305d721-0_1-25-173_20201001171431.parquet
```
Latest commit:
```json
{
"partitionToWriteStats" : {
"year=2020/month=10/day=1" : [ {
"fileId" : "08b5ed87-a749-4a82-a298-59071381dbc9-0",
"path" : "year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_0-89-2258_20201001173346.parquet",
"prevCommit" : "20201001171431",
"numWrites" : 110569,
"numDeletes" : 0,
"numUpdateWrites" : 1,
"numInserts" : 0,
"totalWriteBytes" : 7095995,
"totalWriteErrors" : 0,
"tempPath" : null,
"partitionPath" : "year=2020/month=10/day=1",
"totalLogRecords" : 0,
"totalLogFilesCompacted" : 0,
"totalLogSizeCompacted" : 0,
"totalUpdatedRecordsCompacted" : 0,
"totalLogBlocks" : 0,
"totalCorruptLogBlock" : 0,
"totalRollbackBlocks" : 0,
"fileSizeInBytes" : 7095995
}, {
"fileId" : "3c49d05f-8a9b-4365-8158-a32f879d674f-0",
"path" : "year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_1-89-2259_20201001173346.parquet",
"prevCommit" : "20201001171431",
"numWrites" : 111111,
"numDeletes" : 0,
"numUpdateWrites" : 0,
"numInserts" : 1,
"totalWriteBytes" : 7126790,
"totalWriteErrors" : 0,
"tempPath" : null,
"partitionPath" : "year=2020/month=10/day=1",
"totalLogRecords" : 0,
"totalLogFilesCompacted" : 0,
"totalLogSizeCompacted" : 0,
"totalUpdatedRecordsCompacted" : 0,
"totalLogBlocks" : 0,
"totalCorruptLogBlock" : 0,
"totalRollbackBlocks" : 0,
"fileSizeInBytes" : 7126790
} ]
},
"compacted" : false,
"extraMetadata" : {
"schema" : "{\"type\":\"record\",\"name\":\"address_record\",\"namespace\":\"hoodie.address\",\"fields\":[{\"name\":\"id\",\"type\":[\"int\",\"null\"]},{\"name\":\"zipCode\",\"type\":[\"int\",\"null\"]},{\"name\":\"city\",\"type\":[\"string\",\"null\"]},{\"name\":\"street\",\"type\":[\"string\",\"null\"]},{\"name\":\"streetNumber\",\"type\":[\"int\",\"null\"]},{\"name\":\"uuid\",\"type\":[\"string\",\"null\"]},{\"name\":\"start_date\",\"type\":[\"string\",\"null\"]},{\"name\":\"end_date\",\"type\":[\"string\",\"null\"]},{\"name\":\"is_current\",\"type\":\"boolean\"},{\"name\":\"event_time\",\"type\":[\"string\",\"null\"]},{\"name\":\"year\",\"type\":\"int\"},{\"name\":\"month\",\"type\":\"int\"},{\"name\":\"day\",\"type\":\"int\"},{\"name\":\"its\",\"type\":\"string\"}]}"
},
"operationType" : "UPSERT",
"fileIdAndRelativePaths" : {
"08b5ed87-a749-4a82-a298-59071381dbc9-0" : "year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_0-89-2258_20201001173346.parquet",
"3c49d05f-8a9b-4365-8158-a32f879d674f-0" : "year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_1-89-2259_20201001173346.parquet"
},
"totalRecordsDeleted" : 0,
"totalLogRecordsCompacted" : 0,
"totalScanTime" : 0,
"totalCreateTime" : 0,
"totalUpsertTime" : 11307,
"totalCompactedRecordsUpdated" : 0,
"totalLogFilesCompacted" : 0,
"totalLogFilesSize" : 0
}
```
Hudi properties file:
```console
#Properties saved on Thu Oct 01 17:14:31 UTC 2020
#Thu Oct 01 17:14:31 UTC 2020
hoodie.table.name=address
hoodie.archivelog.folder=archived
hoodie.table.type=COPY_ON_WRITE
hoodie.table.version=1
hoodie.timeline.layout.version=1
```
Describe table:
```console
+-----------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type | comment |
+-----------------------------------+----------------------------------------------------+-----------------------+
| _hoodie_commit_time | string | |
| _hoodie_commit_seqno | string | |
| _hoodie_record_key | string | |
| _hoodie_partition_path | string | |
| _hoodie_file_name | string | |
| id | int | |
| zipcode | int | |
| city | string | |
| street | string | |
| streetnumber | int | |
| uuid | string | |
| start_date | string | |
| end_date | string | |
| is_current | boolean | |
| event_time | string | |
| its | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| year | int | |
| month | int | |
| day | int | |
| | NULL | NULL |
| # Detailed Partition Information | NULL | NULL |
| Partition Value: | [2020, 10, 1] | NULL |
| Database: | nyc_taxi | NULL |
| Table: | address | NULL |
| CreateTime: | Thu Oct 01 17:15:51 UTC 2020 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Location: | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1 | NULL |
| Partition Parameters: | NULL | NULL |
| | numFiles | 9 |
| | totalSize | 64289221 |
| | transient_lastDdlTime | 1601572551 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL |
| InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-----------------------------------+----------------------------------------------------+-----------------------+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-704079007
@bvaradar: I dropped and synced the table again. Issue still exists unless I set `set hive.fetch.task.conversion=none;` before executing the query. Then everything works as expected.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-703652716
Update:
Using `set hive.fetch.task.conversion=none;` within the hive session fixed the issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar closed issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1962:
URL: https://github.com/apache/hudi/issues/1962
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-673930053
Hi Sivabalan,
when I try to set the given settings I get the following error in Hive CLI:
```console
20/08/14 07:19:51 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.input.format at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
Beeline version 3.1.3000.7.2.0.0-237 by Apache Hive
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175
@bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-855191204
I think we should doc better, when `set hive.fetch.task.conversion=none;` is to be set with Hudi tables. Closing this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674067788
also, is it that you are having issues with all columns or just few columns?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674302417
@bhasudha / @bvaradar : do you folks have any pointers here. Looks like the input format is not getting set.
```
Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.input.format at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
```
Would that be the issue here?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] sassai edited a comment on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
sassai edited a comment on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-704079007
@bvaradar: I dropped and synced the table again. Issue still exists unless I set `set hive.fetch.task.conversion=none;` before executing the query. Then everything works as expected.
Please find the table description with and without partition below.
describe formatted address:
```console
+-------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type | comment |
+-------------------------------+----------------------------------------------------+-----------------------+
| _hoodie_commit_time | string | |
| _hoodie_commit_seqno | string | |
| _hoodie_record_key | string | |
| _hoodie_partition_path | string | |
| _hoodie_file_name | string | |
| id | int | |
| zipcode | int | |
| city | string | |
| street | string | |
| streetnumber | int | |
| uuid | string | |
| start_date | string | |
| end_date | string | |
| is_current | boolean | |
| event_time | string | |
| its | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| year | int | |
| month | int | |
| day | int | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | nyc_taxi | NULL |
| OwnerType: | USER | NULL |
| Owner: | srv_tu_usecase2_producer | NULL |
| CreateTime: | Tue Oct 06 07:06:03 UTC 2020 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | bucketing_version | 2 |
| | discover.partitions | true |
| | last_commit_time_sync | 20201006065754 |
| | numFiles | 82 |
| | numPartitions | 1 |
| | numRows | 0 |
| | rawDataSize | 0 |
| | totalSize | 642825967 |
| | transient_lastDdlTime | 1601967963 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL |
| InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-------------------------------+----------------------------------------------------+-----------------------+
```
describe formatted address partition(year=2020,month=10,day=6):
```console
+-----------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type | comment |
+-----------------------------------+----------------------------------------------------+-----------------------+
| _hoodie_commit_time | string | |
| _hoodie_commit_seqno | string | |
| _hoodie_record_key | string | |
| _hoodie_partition_path | string | |
| _hoodie_file_name | string | |
| id | int | |
| zipcode | int | |
| city | string | |
| street | string | |
| streetnumber | int | |
| uuid | string | |
| start_date | string | |
| end_date | string | |
| is_current | boolean | |
| event_time | string | |
| its | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| year | int | |
| month | int | |
| day | int | |
| | NULL | NULL |
| # Detailed Partition Information | NULL | NULL |
| Partition Value: | [2020, 10, 6] | NULL |
| Database: | nyc_taxi | NULL |
| Table: | address | NULL |
| CreateTime: | Tue Oct 06 07:06:04 UTC 2020 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Location: | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=6 | NULL |
| Partition Parameters: | NULL | NULL |
| | numFiles | 82 |
| | totalSize | 642825967 |
| | transient_lastDdlTime | 1601967964 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL |
| InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | serialization.format | 1 |
+-----------------------------------+----------------------------------------------------+-----------------------+
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-679316776
For the second case, Hive Metastore would be filtering out partitions and only return specific paths. I think there is some inconsistency between the path used in the filesystem and the one that is present in meta-store.
@sassai : Sorry for the delay, Can you recursively list your hoodie data set and attach the output. Also please add the file contents of latest .commit or .deltacommit file .
Also, add the output for one of the partition with location:
describe formatted table_name partition (year=xxx,month=xxx,day=xxx,hour=xxx,minute=xxx);
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org