You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@paimon.apache.org by "huyuanfeng2018 (via GitHub)" <gi...@apache.org> on 2023/11/16 02:04:11 UTC

[I] [Bug] Spark use > or < 对timestamp 类型字段进行判定，结果不如预期 [incubator-paimon]

huyuanfeng2018 opened a new issue, #2325:
URL: https://github.com/apache/incubator-paimon/issues/2325

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/incubator-paimon/issues) and found nothing similar.
   
   
   ### Paimon version
   
   paimon 0.5
   
   
   ### Compute Engine
   
   spark 3.3
   
   ### Minimal reproduce step
   
   1. cdc accesses a table in which the primary key is two fields, one is of int type and the other is of datatime type.
   2. Use two different statements in spark
   `select count(*) from *** where collect_time < '2023-11-14 00:00:00' and collect_time >= '2023-11-13 00:00:00';`
   result : 1151980
   `select count(*) from *** where date(collect_time) ='2023-11-13';`
   result: 1270271
   
   ### What doesn't meet your expectations?
   
   I think the results should be the same twice. I also ran these two statements in trino and the results were the same.
   
   
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Bug] Spark use `>` or `<` determining the timestamp type field, the result is not as expected [incubator-paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.

Zouxxyy commented on issue #2325:
URL: https://github.com/apache/incubator-paimon/issues/2325#issuecomment-1813694787

   Can you provide some missing query result of the count smaller one, so that we can better analyze the reasons~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Bug] Spark use `>` or `<` determining the timestamp type field, the result is not as expected [incubator-paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.

Zouxxyy commented on issue #2325:
URL: https://github.com/apache/incubator-paimon/issues/2325#issuecomment-1814014862

   It's my test in master with spark3.3.3, it seems work well
   ![image](https://github.com/apache/incubator-paimon/assets/37108074/3952539c-036f-4b1d-8526-5d6b14626167)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Bug] Spark use `>` or `<` determining the timestamp type field, the result is not as expected [incubator-paimon]

Posted by "huyuanfeng2018 (via GitHub)" <gi...@apache.org>.

huyuanfeng2018 commented on issue #2325:
URL: https://github.com/apache/incubator-paimon/issues/2325#issuecomment-1813796871

   When I commented out the push-down code, his results became normal.:
   `org.apache.paimon.spark.SparkFilterConverter#convert`
   ![image](https://github.com/apache/incubator-paimon/assets/40817998/8b8775a5-97a1-45c6-8754-8f40c38782e0)
   So two errors are possible
   
   1. An exception occurs in the filter of the file.
   
   2. File metadata statistics are abnormal
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Bug] Spark use `>` or `<` determining the timestamp type field, the result is not as expected [incubator-paimon]

Posted by "huyuanfeng2018 (via GitHub)" <gi...@apache.org>.

huyuanfeng2018 commented on issue #2325:
URL: https://github.com/apache/incubator-paimon/issues/2325#issuecomment-1813782742

   > Can you provide some missing query result of the count smaller one, so that we can better analyze the reasons~
   I don’t have the smaller one.
   When I use where `date(collect_time) ='2023-11-13'` , the filter will not be pushed down to the big paimon to perform dataskipping.
   But using `where collect_time < '2023-11-14 00:00:00' and collect_time >= '2023-11-13 00:00:00'` will trigger the pushed down filter, so I think it is most likely caused by dataskipping. Problem, this is most likely a bug caused by incorrect min max statistics of the file's metadata. If this is the case, I think it is a very serious bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [I] [Bug] Spark use `>` or `<` determining the timestamp type field, the result is not as expected [incubator-paimon]

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.

Zouxxyy commented on issue #2325:
URL: https://github.com/apache/incubator-paimon/issues/2325#issuecomment-1813798763

   Thanks, I'll check it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org