You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/13 14:27:04 UTC

[GitHub] [hudi] sassai opened a new issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

sassai opened a new issue #1962:
URL: https://github.com/apache/hudi/issues/1962


   **Describe the problem you faced**
   
   I'm running a spark structured streaming application that reads data from kafka and saves it to a partitioned Hudi MERGE_ON_READ table. Hive sync is enabled and I'm able to query the table with the Hive CLI, e.g.:
   
   SELECT * FROM iot_device_ro LIMIT 5; 
   
   ```console
   +------------------------------------+-------------------------------------+--------------------------------------------+---------------------------------------------+----------------------------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------------------------+----------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
   | iot_device_ro._hoodie_commit_time  | iot_device_ro._hoodie_commit_seqno  |      iot_device_ro._hoodie_record_key      |    iot_device_ro._hoodie_partition_path     |          iot_device_ro._hoodie_file_name           | iot_device_ro.deviceid  | iot_device_ro.sensorid  | iot_device_ro.measurement  | iot_device_ro.measure_ts  |          iot_device_ro.uuid           |  iot_device_ro.its   | iot_device_ro.year  | iot_device_ro.month  | iot_device_ro.day  | iot_device_ro.hour  | iot_device_ro.minute  |
   +------------------------------------+-------------------------------------+--------------------------------------------+---------------------------------------------+----------------------------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------------------------+----------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
   | 20200813121124                     | 20200813121124_0_1                  | uuid:3d387a37-f288-456b-87b7-2b6865cf32e0  | year=2020/month=8/day=13/hour=12/minute=11  | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4              | 1                       | 30.228266831690732         | 2020-08-13T08:39:04.528Z  | 3d387a37-f288-456b-87b7-2b6865cf32e0  | 2020-08-13 12:11:24  | 2020                | 8                    | 13                 | 12                  | 11                    |
   | 20200813121124                     | 20200813121124_0_2                  | uuid:5bed809e-758f-46dc-b1ab-837ad3eb5a6a  | year=2020/month=8/day=13/hour=12/minute=11  | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4              | 1                       | 31.453188991515226         | 2020-08-13T08:39:19.588Z  | 5bed809e-758f-46dc-b1ab-837ad3eb5a6a  | 2020-08-13 12:11:24  | 2020                | 8                    | 13                 | 12                  | 11                    |
   | 20200813121124                     | 20200813121124_0_3                  | uuid:6d37be34-6e4b-49b0-b3fe-e6552c2aee22  | year=2020/month=8/day=13/hour=12/minute=11  | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4              | 1                       | 34.68735798194983          | 2020-08-13T07:45:05.958Z  | 6d37be34-6e4b-49b0-b3fe-e6552c2aee22  | 2020-08-13 12:11:24  | 2020                | 8                    | 13                 | 12                  | 11                    |
   | 20200813121124                     | 20200813121124_0_4                  | uuid:5c2dbea8-9668-4652-84c6-c82d06aa2805  | year=2020/month=8/day=13/hour=12/minute=11  | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4              | 1                       | 33.680806905962264         | 2020-08-12T13:33:20.159Z  | 5c2dbea8-9668-4652-84c6-c82d06aa2805  | 2020-08-13 12:11:24  | 2020                | 8                    | 13                 | 12                  | 11                    |
   | 20200813121124                     | 20200813121124_0_5                  | uuid:528e6c74-bb44-49da-aa76-059781cc7676  | year=2020/month=8/day=13/hour=12/minute=11  | 53c3c919-ff1c-49f6-ba74-4498b635dfb6-0_0-21-23_20200813121124.parquet | iotdevice4              | 1                       | 31.38529683936205          | 2020-08-13T10:57:58.448Z  | 528e6c74-bb44-49da-aa76-059781cc7676  | 2020-08-13 12:11:24  | 2020                | 8                    | 13                 | 12                  | 11                    |
   +------------------------------------+-------------------------------------+--------------------------------------------+---------------------------------------------+----------------------------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------------------------+----------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
   ```
   
   However if I want to apply a filter on the partition colum the result is empty:
   
   SELECT * FROM iot_device_ro WHERE day=13 LIMIT 10;
   
   ```console
   +------------------------------------+-------------------------------------+-----------------------------------+---------------------------------------+----------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------+--------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
   | iot_device_ro._hoodie_commit_time  | iot_device_ro._hoodie_commit_seqno  | iot_device_ro._hoodie_record_key  | iot_device_ro._hoodie_partition_path  | iot_device_ro._hoodie_file_name  | iot_device_ro.deviceid  | iot_device_ro.sensorid  | iot_device_ro.measurement  | iot_device_ro.measure_ts  | iot_device_ro.uuid  | iot_device_ro.its  | iot_device_ro.year  | iot_device_ro.month  | iot_device_ro.day  | iot_device_ro.hour  | iot_device_ro.minute  |
   +------------------------------------+-------------------------------------+-----------------------------------+---------------------------------------+----------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------+--------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
   +------------------------------------+-------------------------------------+-----------------------------------+---------------------------------------+----------------------------------+-------------------------+-------------------------+----------------------------+---------------------------+---------------------+--------------------+---------------------+----------------------+--------------------+---------------------+-----------------------+
   No rows selected (70.91 secon
   ```
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   Hudi Datasource Configuration in Spark:
   
   ```java
   Dataset<Row> output =
           streamingInput.withColumn("its", date_format(current_timestamp(), DATE_FORMAT));
       DataStreamWriter<Row> writer =
           output
               .writeStream()
               .format("hudi")
               .option("hoodie.insert.shuffle.parallelism", "2")
               .option("hoodie.upsert.shuffle.parallelism", "2")
               .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), tableType)
               .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "uuid")
               .option(
                   DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY(),
                   ComplexKeyGenerator.class.getCanonicalName())
               .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), partitions)
               .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "its")
               .option(HoodieWriteConfig.TABLE_NAME, tableName)
               .option("checkpointLocation", streamingCheckpointingPath)
               .option(DataSourceWriteOptions.STREAMING_IGNORE_FAILED_BATCH_OPT_KEY(), "false")
               .option(DataSourceWriteOptions.STREAMING_RETRY_CNT_OPT_KEY(), "10")
               .outputMode(OutputMode.Append())
               .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY(), tableName)
               .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY(), hiveDB)
               .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), hiveJdbcUrl)
               .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), "true")
               .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY(), "true")
               .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY(), partitions)
               .option(
                       DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY(),
                       MultiPartKeysValueExtractor.class.getCanonicalName());
   ```
   
   Hudi Table Description in Hive:
   
   ```console
   +--------------------------+------------+----------+
   |         col_name         | data_type  | comment  |
   +--------------------------+------------+----------+
   | _hoodie_commit_time      | string     |          |
   | _hoodie_commit_seqno     | string     |          |
   | _hoodie_record_key       | string     |          |
   | _hoodie_partition_path   | string     |          |
   | _hoodie_file_name        | string     |          |
   | deviceid                 | string     |          |
   | sensorid                 | string     |          |
   | measurement              | double     |          |
   | measure_ts               | string     |          |
   | uuid                     | string     |          |
   | its                      | string     |          |
   | year                     | int        |          |
   | month                    | int        |          |
   | day                      | int        |          |
   | hour                     | int        |          |
   | minute                   | int        |          |
   |                          | NULL       | NULL     |
   | # Partition Information  | NULL       | NULL     |
   | # col_name               | data_type  | comment  |
   | year                     | int        |          |
   | month                    | int        |          |
   | day                      | int        |          |
   | hour                     | int        |          |
   | minute                   | int        |          |
   +--------------------------+------------+----------+
   ```
   
   Hive Partitions created by HudiHiveSync:
   
   ```console
   +---------------------------------------------+
   |                  partition                  |
   +---------------------------------------------+
   | year=2020/month=8/day=13/hour=12/minute=11  |
   | year=2020/month=8/day=13/hour=12/minute=17  |
   | year=2020/month=8/day=13/hour=12/minute=18  |
   | year=2020/month=8/day=13/hour=12/minute=19  |
   | year=2020/month=8/day=13/hour=12/minute=20  |
   | year=2020/month=8/day=13/hour=12/minute=21  |
   ``` 
   
   Additional table information:
   
   ```console
   
   Detailed Table Information
   --
   Database: | xxx|  
   OwnerType: | USER |  
   Owner: | xxx|  
   CreateTime: | Thu Aug 13 12:11:49 UTC 2020 |  
   LastAccessTime: | UNKNOWN |  
   Retention: | 0 |  
   Location: | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/streaming/tables/xxx/iot_device |  
   Table Type: | EXTERNAL_TABLE |  
   Table Parameters: |   |  
     | EXTERNAL | TRUE
     | bucketing_version | 2
     | discover.partitions | true
     | last_commit_time_sync | 20200813133453
     | numFiles | 77
     | numPartitions | 68
     | numRows | 0
     | rawDataSize | 0
     | spark.sql.create.version | 2.2 or prior
     | spark.sql.sources.schema.numPartCols | 5
     | spark.sql.sources.schema.numParts | 1
     | spark.sql.sources.schema.part.0 | {\"type\":\"struct\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_commit_seqno\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_record_key\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_partition_path\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"_hoodie_file_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"deviceid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"sensorid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"measurement\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"measure_ts\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"uuid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"its\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"y
 ear\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"month\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"day\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"hour\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"minute\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}
     | spark.sql.sources.schema.partCol.0 | year
     | spark.sql.sources.schema.partCol.1 | month
     | spark.sql.sources.schema.partCol.2 | day
     | spark.sql.sources.schema.partCol.3 | hour
     | spark.sql.sources.schema.partCol.4 | minute
     | totalSize | 34194101
     | transient_lastDdlTime | 1597324063
     |   |  
   Storage Information
   SerDe Library: | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe |  
   InputFormat: | org.apache.hudi.hadoop.HoodieParquetInputFormat |  
   OutputFormat: | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat |  
   Compressed: | No |  
   Num Buckets: | -1 |  
   Bucket Columns: | [] |  
   Sort Columns: | [] |  
   Storage Desc Params: |   |  
     | serialization.format | 1
   
   ```
   
   Strangely enough if I run the same query in Spark everything works as expected.
   
   **Expected behavior**
   
   Filter on partition columns can be applied on Hudi tables in Hive CLI.
   
   **Environment Description**
   
   * Hudi version : 0.5.3
   
   * Spark version : 2.4.0
   
   * Hive version : 3.1
   
   * Hadoop version : 3
   
   * Storage (HDFS/S3/GCS..) : ADLS
   
   * Running on Docker? (yes/no) : no
   
   What am I missing? Any help is very much appreciated!
   
   Thank you in advance!
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674071601


   I'm running Hive within Cloudera Data Platform Public Cloud. I connected to Hive using HUE and Hive CLI.
   
   Beeline Connection:
   
   ```console
   [root@engineeringhub-master3 ~]# beeline --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
   SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
   ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
   WARNING: Use "yarn jar" to launch YARN applications.
   SLF4J: Class path contains multiple SLF4J bindings.
   SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.2.0-1.cdh7.2.0.p0.3758356/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
   SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
   SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
   Connecting to jdbc:hive2://engineeringhub-master3.mtag-fas.ft71-kqhq.cloudera.site:2181/default;httpPath=cliservice;principal=hive/_HOST@MTAG-FAS.FT71-KQHQ.CLOUDERA.SITE;serviceDiscoveryMode=zooKeeper;ssl=true;transportMode=http;zooKeeperNamespace=hiveserver2
   20/08/14 13:14:12 [main]: INFO jdbc.HiveConnection: Connected to engineeringhub-master3.mtag-fas.ft71-kqhq.cloudera.site:10001
   20/08/14 13:14:12 [main]: WARN jdbc.HiveConnection: Failed to connect to engineeringhub-master3.mtag-fas.ft71-kqhq.cloudera.site:10001
   20/08/14 13:14:12 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
   Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.input.format at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
   Beeline version 3.1.3000.7.2.0.0-237 by Apache Hive
   beeline>
   ```
   
   Later on we would like to use JDBC-Connections from external tools like PowerBI or Dbeaver.
   
   I can query and filter all columns excluding the partiton columns: year, month, day, hour, minute.
   
   If I apply aggregations on the partition columns I can filter them. This is very odd.
   
   ```sql
   SELECT DISTINCT(minute), count(minute)
   FROM greentech.iot_device_ro
   GROUP BY minute
   ; 
   ```
   
   but 
   
   ```sql
   SELECT *
   FROM greentech.iot_device_ro
   WHERE minute=50
   ; 
   ```
   
   returns an emtpy result list.
   
   Thank you for the quick response.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175


   @bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-855191281


   https://issues.apache.org/jira/browse/HUDI-1972 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar closed issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
vinothchandar closed issue #1962:
URL: https://github.com/apache/hudi/issues/1962


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai edited a comment on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai edited a comment on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-704079007


   @bvaradar: I dropped and synced the table again. Issue still exists unless I set `set hive.fetch.task.conversion=none;` before executing the query. Then everything works as expected.
   
   Please find the table description with and without partition below.
   
   describe formatted address:
   
   ```console
   +-------------------------------+----------------------------------------------------+-----------------------+
   |           col_name            |                     data_type                      |        comment        |
   +-------------------------------+----------------------------------------------------+-----------------------+
   | _hoodie_commit_time           | string                                             |                       |
   | _hoodie_commit_seqno          | string                                             |                       |
   | _hoodie_record_key            | string                                             |                       |
   | _hoodie_partition_path        | string                                             |                       |
   | _hoodie_file_name             | string                                             |                       |
   | id                            | int                                                |                       |
   | zipcode                       | int                                                |                       |
   | city                          | string                                             |                       |
   | street                        | string                                             |                       |
   | streetnumber                  | int                                                |                       |
   | uuid                          | string                                             |                       |
   | start_date                    | string                                             |                       |
   | end_date                      | string                                             |                       |
   | is_current                    | boolean                                            |                       |
   | event_time                    | string                                             |                       |
   | its                           | string                                             |                       |
   |                               | NULL                                               | NULL                  |
   | # Partition Information       | NULL                                               | NULL                  |
   | # col_name                    | data_type                                          | comment               |
   | year                          | int                                                |                       |
   | month                         | int                                                |                       |
   | day                           | int                                                |                       |
   |                               | NULL                                               | NULL                  |
   | # Detailed Table Information  | NULL                                               | NULL                  |
   | Database:                     | nyc_taxi                                           | NULL                  |
   | OwnerType:                    | USER                                               | NULL                  |
   | Owner:                        | srv_tu_usecase2_producer                           | NULL                  |
   | CreateTime:                   | Tue Oct 06 07:06:03 UTC 2020                       | NULL                  |
   | LastAccessTime:               | UNKNOWN                                            | NULL                  |
   | Retention:                    | 0                                                  | NULL                  |
   | Location:                     | abfs://engineering@mtagdatalakeshowcase.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address | NULL                  |
   | Table Type:                   | EXTERNAL_TABLE                                     | NULL                  |
   | Table Parameters:             | NULL                                               | NULL                  |
   |                               | EXTERNAL                                           | TRUE                  |
   |                               | bucketing_version                                  | 2                     |
   |                               | discover.partitions                                | true                  |
   |                               | last_commit_time_sync                              | 20201006065754        |
   |                               | numFiles                                           | 82                    |
   |                               | numPartitions                                      | 1                     |
   |                               | numRows                                            | 0                     |
   |                               | rawDataSize                                        | 0                     |
   |                               | totalSize                                          | 642825967             |
   |                               | transient_lastDdlTime                              | 1601967963            |
   |                               | NULL                                               | NULL                  |
   | # Storage Information         | NULL                                               | NULL                  |
   | SerDe Library:                | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL                  |
   | InputFormat:                  | org.apache.hudi.hadoop.HoodieParquetInputFormat    | NULL                  |
   | OutputFormat:                 | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL                  |
   | Compressed:                   | No                                                 | NULL                  |
   | Num Buckets:                  | -1                                                 | NULL                  |
   | Bucket Columns:               | []                                                 | NULL                  |
   | Sort Columns:                 | []                                                 | NULL                  |
   | Storage Desc Params:          | NULL                                               | NULL                  |
   |                               | serialization.format                               | 1                     |
   +-------------------------------+----------------------------------------------------+-----------------------+
   ```
   
   describe formatted address partition(year=2020,month=10,day=6):
   
   ```console
   +-----------------------------------+----------------------------------------------------+-----------------------+
   |             col_name              |                     data_type                      |        comment        |
   +-----------------------------------+----------------------------------------------------+-----------------------+
   | _hoodie_commit_time               | string                                             |                       |
   | _hoodie_commit_seqno              | string                                             |                       |
   | _hoodie_record_key                | string                                             |                       |
   | _hoodie_partition_path            | string                                             |                       |
   | _hoodie_file_name                 | string                                             |                       |
   | id                                | int                                                |                       |
   | zipcode                           | int                                                |                       |
   | city                              | string                                             |                       |
   | street                            | string                                             |                       |
   | streetnumber                      | int                                                |                       |
   | uuid                              | string                                             |                       |
   | start_date                        | string                                             |                       |
   | end_date                          | string                                             |                       |
   | is_current                        | boolean                                            |                       |
   | event_time                        | string                                             |                       |
   | its                               | string                                             |                       |
   |                                   | NULL                                               | NULL                  |
   | # Partition Information           | NULL                                               | NULL                  |
   | # col_name                        | data_type                                          | comment               |
   | year                              | int                                                |                       |
   | month                             | int                                                |                       |
   | day                               | int                                                |                       |
   |                                   | NULL                                               | NULL                  |
   | # Detailed Partition Information  | NULL                                               | NULL                  |
   | Partition Value:                  | [2020, 10, 6]                                      | NULL                  |
   | Database:                         | nyc_taxi                                           | NULL                  |
   | Table:                            | address                                            | NULL                  |
   | CreateTime:                       | Tue Oct 06 07:06:04 UTC 2020                       | NULL                  |
   | LastAccessTime:                   | UNKNOWN                                            | NULL                  |
   | Location:                         | abfs://engineering@mtagdatalakeshowcase.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=6 | NULL                  |
   | Partition Parameters:             | NULL                                               | NULL                  |
   |                                   | numFiles                                           | 82                    |
   |                                   | totalSize                                          | 642825967             |
   |                                   | transient_lastDdlTime                              | 1601967964            |
   |                                   | NULL                                               | NULL                  |
   | # Storage Information             | NULL                                               | NULL                  |
   | SerDe Library:                    | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL                  |
   | InputFormat:                      | org.apache.hudi.hadoop.HoodieParquetInputFormat    | NULL                  |
   | OutputFormat:                     | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL                  |
   | Compressed:                       | No                                                 | NULL                  |
   | Num Buckets:                      | -1                                                 | NULL                  |
   | Bucket Columns:                   | []                                                 | NULL                  |
   | Sort Columns:                     | []                                                 | NULL                  |
   | Storage Desc Params:              | NULL                                               | NULL                  |
   |                                   | serialization.format                               | 1                     |
   +-----------------------------------+----------------------------------------------------+-----------------------+
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-673694878


   Did you set hive input format ? Also can you confirm you settings given [here](https://hudi.apache.org/docs/docker_demo.html#step-4-a-run-hive-queries) are set. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-691772192


   Closing this due to inactivity 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674067152


   how are you connecting to hive, is it beeline or some other means? if its beeline, you should be able to pass in configs as given in the quick start page. If by some other means, probably you have to find out how to set hive configs when you bring up the tool. Or probably you can paste the command you used to bring up hive(obfuscating any confidential info). we can try to help you out. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-703735047


   @sassai : Location is set wrongly. 
   
   | Location:                         | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1 | NULL                  |
   
   
   It should have been abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address only.  
   
   Can you try dropping the table and hive syncing again ? 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-702568499


   @bvaradar: Sorry for the late reply. I was not able to investigate this issue further until now. 
   
   In the meantime I updated Hudi to 0.6.0 to check if the issue still occurs. Unfortunately yes. I created a table (COPY_ON_WRITE) with testing data for further debugging. Please find the requested information below:
   
   Hudi data set:
   
   ```console
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux/.bootstrap
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux/.bootstrap/.fileids
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.aux/.bootstrap/.partitions
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/.temp
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer       9133 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171431.commit
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171431.commit.requested
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer        999 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171431.inflight
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer       1169 2020-10-01 17:18 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171823.commit
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:18 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171823.commit.requested
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer        380 2020-10-01 17:18 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001171823.inflight
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer       2986 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001173346.commit
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001173346.commit.requested
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer       1653 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/20201001173346.inflight
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/archived
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer        228 2020-10-01 17:14 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/.hoodie/hoodie.properties
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10
   drwxr-xr-x   - 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer          0 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer         93 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/.hoodie_partition_metadata
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7095995 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_0-89-2258_20201001173346.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7096113 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_8-25-180_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7126955 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_0-25-172_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7126790 2020-10-01 17:34 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_1-89-2259_20201001173346.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7144341 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/4f7d1a69-112a-42d1-b0ac-adf8de1e8dad-0_7-25-179_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7120178 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/669c9159-a795-40ac-9827-4551965e1750-0_3-25-175_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7197719 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/ab9c4c36-ea10-47bc-bf05-1555bf07c4ad-0_2-25-174_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7158006 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/ad939d4e-d8dc-4723-b5a5-8ec2c064f3e3-0_6-25-178_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7170312 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/c4c0b379-79e8-4856-9605-e56b1beb1b09-0_4-25-176_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7118844 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/f5fae4bc-396d-4ff6-8b89-01c08559cb50-0_5-25-177_20201001171431.parquet
   -rw-r--r--   1 3d88417a-c602-4b19-b581-ac7265074929 srv_tu_usecase2_producer    7156753 2020-10-01 17:15 abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1/f804937c-8de2-4692-90d9-4b485305d721-0_1-25-173_20201001171431.parquet
   ```
   
   Latest commit:
   
   ```json
   {
     "partitionToWriteStats" : {
       "year=2020/month=10/day=1" : [ {
         "fileId" : "08b5ed87-a749-4a82-a298-59071381dbc9-0",
         "path" : "year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_0-89-2258_20201001173346.parquet",
         "prevCommit" : "20201001171431",
         "numWrites" : 110569,
         "numDeletes" : 0,
         "numUpdateWrites" : 1,
         "numInserts" : 0,
         "totalWriteBytes" : 7095995,
         "totalWriteErrors" : 0,
         "tempPath" : null,
         "partitionPath" : "year=2020/month=10/day=1",
         "totalLogRecords" : 0,
         "totalLogFilesCompacted" : 0,
         "totalLogSizeCompacted" : 0,
         "totalUpdatedRecordsCompacted" : 0,
         "totalLogBlocks" : 0,
         "totalCorruptLogBlock" : 0,
         "totalRollbackBlocks" : 0,
         "fileSizeInBytes" : 7095995
       }, {
         "fileId" : "3c49d05f-8a9b-4365-8158-a32f879d674f-0",
         "path" : "year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_1-89-2259_20201001173346.parquet",
         "prevCommit" : "20201001171431",
         "numWrites" : 111111,
         "numDeletes" : 0,
         "numUpdateWrites" : 0,
         "numInserts" : 1,
         "totalWriteBytes" : 7126790,
         "totalWriteErrors" : 0,
         "tempPath" : null,
         "partitionPath" : "year=2020/month=10/day=1",
         "totalLogRecords" : 0,
         "totalLogFilesCompacted" : 0,
         "totalLogSizeCompacted" : 0,
         "totalUpdatedRecordsCompacted" : 0,
         "totalLogBlocks" : 0,
         "totalCorruptLogBlock" : 0,
         "totalRollbackBlocks" : 0,
         "fileSizeInBytes" : 7126790
       } ]
     },
     "compacted" : false,
     "extraMetadata" : {
       "schema" : "{\"type\":\"record\",\"name\":\"address_record\",\"namespace\":\"hoodie.address\",\"fields\":[{\"name\":\"id\",\"type\":[\"int\",\"null\"]},{\"name\":\"zipCode\",\"type\":[\"int\",\"null\"]},{\"name\":\"city\",\"type\":[\"string\",\"null\"]},{\"name\":\"street\",\"type\":[\"string\",\"null\"]},{\"name\":\"streetNumber\",\"type\":[\"int\",\"null\"]},{\"name\":\"uuid\",\"type\":[\"string\",\"null\"]},{\"name\":\"start_date\",\"type\":[\"string\",\"null\"]},{\"name\":\"end_date\",\"type\":[\"string\",\"null\"]},{\"name\":\"is_current\",\"type\":\"boolean\"},{\"name\":\"event_time\",\"type\":[\"string\",\"null\"]},{\"name\":\"year\",\"type\":\"int\"},{\"name\":\"month\",\"type\":\"int\"},{\"name\":\"day\",\"type\":\"int\"},{\"name\":\"its\",\"type\":\"string\"}]}"
     },
     "operationType" : "UPSERT",
     "fileIdAndRelativePaths" : {
       "08b5ed87-a749-4a82-a298-59071381dbc9-0" : "year=2020/month=10/day=1/08b5ed87-a749-4a82-a298-59071381dbc9-0_0-89-2258_20201001173346.parquet",
       "3c49d05f-8a9b-4365-8158-a32f879d674f-0" : "year=2020/month=10/day=1/3c49d05f-8a9b-4365-8158-a32f879d674f-0_1-89-2259_20201001173346.parquet"
     },
     "totalRecordsDeleted" : 0,
     "totalLogRecordsCompacted" : 0,
     "totalScanTime" : 0,
     "totalCreateTime" : 0,
     "totalUpsertTime" : 11307,
     "totalCompactedRecordsUpdated" : 0,
     "totalLogFilesCompacted" : 0,
     "totalLogFilesSize" : 0
   }
   ```
   
   Hudi properties file:
   
   ```console
   #Properties saved on Thu Oct 01 17:14:31 UTC 2020
   #Thu Oct 01 17:14:31 UTC 2020
   hoodie.table.name=address
   hoodie.archivelog.folder=archived
   hoodie.table.type=COPY_ON_WRITE
   hoodie.table.version=1
   hoodie.timeline.layout.version=1
   ```
   
   Describe table:
   
   ```console
   +-----------------------------------+----------------------------------------------------+-----------------------+
   |             col_name              |                     data_type                      |        comment        |
   +-----------------------------------+----------------------------------------------------+-----------------------+
   | _hoodie_commit_time               | string                                             |                       |
   | _hoodie_commit_seqno              | string                                             |                       |
   | _hoodie_record_key                | string                                             |                       |
   | _hoodie_partition_path            | string                                             |                       |
   | _hoodie_file_name                 | string                                             |                       |
   | id                                | int                                                |                       |
   | zipcode                           | int                                                |                       |
   | city                              | string                                             |                       |
   | street                            | string                                             |                       |
   | streetnumber                      | int                                                |                       |
   | uuid                              | string                                             |                       |
   | start_date                        | string                                             |                       |
   | end_date                          | string                                             |                       |
   | is_current                        | boolean                                            |                       |
   | event_time                        | string                                             |                       |
   | its                               | string                                             |                       |
   |                                   | NULL                                               | NULL                  |
   | # Partition Information           | NULL                                               | NULL                  |
   | # col_name                        | data_type                                          | comment               |
   | year                              | int                                                |                       |
   | month                             | int                                                |                       |
   | day                               | int                                                |                       |
   |                                   | NULL                                               | NULL                  |
   | # Detailed Partition Information  | NULL                                               | NULL                  |
   | Partition Value:                  | [2020, 10, 1]                                      | NULL                  |
   | Database:                         | nyc_taxi                                           | NULL                  |
   | Table:                            | address                                            | NULL                  |
   | CreateTime:                       | Thu Oct 01 17:15:51 UTC 2020                       | NULL                  |
   | LastAccessTime:                   | UNKNOWN                                            | NULL                  |
   | Location:                         | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1 | NULL                  |
   | Partition Parameters:             | NULL                                               | NULL                  |
   |                                   | numFiles                                           | 9                     |
   |                                   | totalSize                                          | 64289221              |
   |                                   | transient_lastDdlTime                              | 1601572551            |
   |                                   | NULL                                               | NULL                  |
   | # Storage Information             | NULL                                               | NULL                  |
   | SerDe Library:                    | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL                  |
   | InputFormat:                      | org.apache.hudi.hadoop.HoodieParquetInputFormat    | NULL                  |
   | OutputFormat:                     | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL                  |
   | Compressed:                       | No                                                 | NULL                  |
   | Num Buckets:                      | -1                                                 | NULL                  |
   | Bucket Columns:                   | []                                                 | NULL                  |
   | Sort Columns:                     | []                                                 | NULL                  |
   | Storage Desc Params:              | NULL                                               | NULL                  |
   |                                   | serialization.format                               | 1                     |
   +-----------------------------------+----------------------------------------------------+-----------------------+
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-704079007


   @bvaradar: I dropped and synced the table again. Issue still exists unless I set `set hive.fetch.task.conversion=none;` before executing the query. Then everything works as expected.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-703652716


   Update: 
   
   Using `set hive.fetch.task.conversion=none;` within the hive session fixed the issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1962:
URL: https://github.com/apache/hudi/issues/1962


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-673930053


   Hi Sivabalan,
   
   when I try to set the given settings I get the following error in Hive CLI:
   
   ```console
   20/08/14 07:19:51 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper
   Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.input.format at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
   Beeline version 3.1.3000.7.2.0.0-237 by Apache Hive
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-767209175


   @bvaradar : guess you missed to follow up on this thread. can you check it out and respond when you can. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-855191204


   I think we should doc better, when `set hive.fetch.task.conversion=none;` is to be set with Hudi tables. Closing this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674067788


   also, is it that you are having issues with all columns or just few columns? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-674302417


   @bhasudha / @bvaradar : do you folks have any pointers here. Looks like the input format is not getting set. 
   ```
   Error: Could not open client transport for any of the Server URI's in ZooKeeper: Failed to open new session: java.lang.IllegalArgumentException: Cannot modify hive.input.format at runtime. It is not in list of params that are allowed to be modified at runtime (state=08S01,code=0)
   ```
   Would that be the issue here? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sassai edited a comment on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
sassai edited a comment on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-704079007


   @bvaradar: I dropped and synced the table again. Issue still exists unless I set `set hive.fetch.task.conversion=none;` before executing the query. Then everything works as expected.
   
   Please find the table description with and without partition below.
   
   describe formatted address:
   
   ```console
   +-------------------------------+----------------------------------------------------+-----------------------+
   |           col_name            |                     data_type                      |        comment        |
   +-------------------------------+----------------------------------------------------+-----------------------+
   | _hoodie_commit_time           | string                                             |                       |
   | _hoodie_commit_seqno          | string                                             |                       |
   | _hoodie_record_key            | string                                             |                       |
   | _hoodie_partition_path        | string                                             |                       |
   | _hoodie_file_name             | string                                             |                       |
   | id                            | int                                                |                       |
   | zipcode                       | int                                                |                       |
   | city                          | string                                             |                       |
   | street                        | string                                             |                       |
   | streetnumber                  | int                                                |                       |
   | uuid                          | string                                             |                       |
   | start_date                    | string                                             |                       |
   | end_date                      | string                                             |                       |
   | is_current                    | boolean                                            |                       |
   | event_time                    | string                                             |                       |
   | its                           | string                                             |                       |
   |                               | NULL                                               | NULL                  |
   | # Partition Information       | NULL                                               | NULL                  |
   | # col_name                    | data_type                                          | comment               |
   | year                          | int                                                |                       |
   | month                         | int                                                |                       |
   | day                           | int                                                |                       |
   |                               | NULL                                               | NULL                  |
   | # Detailed Table Information  | NULL                                               | NULL                  |
   | Database:                     | nyc_taxi                                           | NULL                  |
   | OwnerType:                    | USER                                               | NULL                  |
   | Owner:                        | srv_tu_usecase2_producer                           | NULL                  |
   | CreateTime:                   | Tue Oct 06 07:06:03 UTC 2020                       | NULL                  |
   | LastAccessTime:               | UNKNOWN                                            | NULL                  |
   | Retention:                    | 0                                                  | NULL                  |
   | Location:                     | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address | NULL                  |
   | Table Type:                   | EXTERNAL_TABLE                                     | NULL                  |
   | Table Parameters:             | NULL                                               | NULL                  |
   |                               | EXTERNAL                                           | TRUE                  |
   |                               | bucketing_version                                  | 2                     |
   |                               | discover.partitions                                | true                  |
   |                               | last_commit_time_sync                              | 20201006065754        |
   |                               | numFiles                                           | 82                    |
   |                               | numPartitions                                      | 1                     |
   |                               | numRows                                            | 0                     |
   |                               | rawDataSize                                        | 0                     |
   |                               | totalSize                                          | 642825967             |
   |                               | transient_lastDdlTime                              | 1601967963            |
   |                               | NULL                                               | NULL                  |
   | # Storage Information         | NULL                                               | NULL                  |
   | SerDe Library:                | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL                  |
   | InputFormat:                  | org.apache.hudi.hadoop.HoodieParquetInputFormat    | NULL                  |
   | OutputFormat:                 | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL                  |
   | Compressed:                   | No                                                 | NULL                  |
   | Num Buckets:                  | -1                                                 | NULL                  |
   | Bucket Columns:               | []                                                 | NULL                  |
   | Sort Columns:                 | []                                                 | NULL                  |
   | Storage Desc Params:          | NULL                                               | NULL                  |
   |                               | serialization.format                               | 1                     |
   +-------------------------------+----------------------------------------------------+-----------------------+
   ```
   
   describe formatted address partition(year=2020,month=10,day=6):
   
   ```console
   +-----------------------------------+----------------------------------------------------+-----------------------+
   |             col_name              |                     data_type                      |        comment        |
   +-----------------------------------+----------------------------------------------------+-----------------------+
   | _hoodie_commit_time               | string                                             |                       |
   | _hoodie_commit_seqno              | string                                             |                       |
   | _hoodie_record_key                | string                                             |                       |
   | _hoodie_partition_path            | string                                             |                       |
   | _hoodie_file_name                 | string                                             |                       |
   | id                                | int                                                |                       |
   | zipcode                           | int                                                |                       |
   | city                              | string                                             |                       |
   | street                            | string                                             |                       |
   | streetnumber                      | int                                                |                       |
   | uuid                              | string                                             |                       |
   | start_date                        | string                                             |                       |
   | end_date                          | string                                             |                       |
   | is_current                        | boolean                                            |                       |
   | event_time                        | string                                             |                       |
   | its                               | string                                             |                       |
   |                                   | NULL                                               | NULL                  |
   | # Partition Information           | NULL                                               | NULL                  |
   | # col_name                        | data_type                                          | comment               |
   | year                              | int                                                |                       |
   | month                             | int                                                |                       |
   | day                               | int                                                |                       |
   |                                   | NULL                                               | NULL                  |
   | # Detailed Partition Information  | NULL                                               | NULL                  |
   | Partition Value:                  | [2020, 10, 6]                                      | NULL                  |
   | Database:                         | nyc_taxi                                           | NULL                  |
   | Table:                            | address                                            | NULL                  |
   | CreateTime:                       | Tue Oct 06 07:06:04 UTC 2020                       | NULL                  |
   | LastAccessTime:                   | UNKNOWN                                            | NULL                  |
   | Location:                         | abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=6 | NULL                  |
   | Partition Parameters:             | NULL                                               | NULL                  |
   |                                   | numFiles                                           | 82                    |
   |                                   | totalSize                                          | 642825967             |
   |                                   | transient_lastDdlTime                              | 1601967964            |
   |                                   | NULL                                               | NULL                  |
   | # Storage Information             | NULL                                               | NULL                  |
   | SerDe Library:                    | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | NULL                  |
   | InputFormat:                      | org.apache.hudi.hadoop.HoodieParquetInputFormat    | NULL                  |
   | OutputFormat:                     | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | NULL                  |
   | Compressed:                       | No                                                 | NULL                  |
   | Num Buckets:                      | -1                                                 | NULL                  |
   | Bucket Columns:                   | []                                                 | NULL                  |
   | Sort Columns:                     | []                                                 | NULL                  |
   | Storage Desc Params:              | NULL                                               | NULL                  |
   |                                   | serialization.format                               | 1                     |
   +-----------------------------------+----------------------------------------------------+-----------------------+
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1962:
URL: https://github.com/apache/hudi/issues/1962#issuecomment-679316776


   For the second case, Hive Metastore would be filtering out partitions and only return specific paths. I think there is some inconsistency between the path used in the filesystem and the one that is present in meta-store. 
   
   @sassai : Sorry for the delay, Can you recursively list your hoodie data set and attach the output. Also please add the file contents of latest .commit or .deltacommit file .
   
   Also, add the output for one of the partition with location: 
   describe formatted table_name partition (year=xxx,month=xxx,day=xxx,hour=xxx,minute=xxx);
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org