You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "chenpingzeng (via GitHub)" <gi...@apache.org> on 2023/05/23 08:23:51 UTC

[GitHub] [orc] chenpingzeng commented on issue #1512: [C++] RowReaderImpl::next return inconsistant data in certain case

chenpingzeng commented on issue #1512:
URL: https://github.com/apache/orc/issues/1512#issuecomment-1558774853

   Thanks for the advising pr. The key attension was to avoid using memset the notNull.data() to 1 for little bit performance negative effects.
   I would like to share some experience of using result of orc::RowReader.next, not obvious performance improment in tpcds-99 test with 3TB data set when we directly copy orc::StructVectorBatch.fields[i] to dest obj memory, in case when hasNulls=0, bypass the unnessary checking of notNull.data()[i], sure it is meanningful to do this code refacting.  So I think it is not a wasting of performance to ensure all values in notNull.data() are 1 when hasNulls=0.
            On the other side, the problem for ‘miss use of reading notNull.data()‘ did exist after half a year until tpcds99 consistant checking very recently. As I mentioned in the issue, it was extremely difficult to figure out the condition to find out the problem data row, since over 8 billion records in table store_sales, or even more records in other scenes. That is say, from an expert or god view, sure it is user’s problem to read the notNull.data() when hasNull=0. Does any one has considered this question:  do we have stop user stepping into this strap.?(Yes, I think it is a strap that notNull.data() has some 0 values when hasNull=0, it is a data inconsistent in my opinion)
   
   发件人: Gang Wu ***@***.***>
   发送时间: 2023年5月23日 15:32
   收件人: apache/orc ***@***.***>
   抄送: Chenpingzeng ***@***.***>; Mention ***@***.***>
   主题: Re: [apache/orc] [C++] RowReaderImpl::next return inconsistant data in certain case (Issue #1512)
   
   
   Hi @chenpingzeng<https://github.com/chenpingzeng>, https://github.com/apache/orc/pull/1469/files has discussed the same thing. Please check the comment below to see if it solves this issue.
   
   https://github.com/apache/orc/blob/main/c%2B%2B/include/orc/Vector.hh#L40-L44
   
   —
   Reply to this email directly, view it on GitHub<https://github.com/apache/orc/issues/1512#issuecomment-1558697104>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AN4CUN44RBRDXOYQCHSEJ7DXHRRYNANCNFSM6AAAAAAYKKRVYE>.
   You are receiving this because you were mentioned.Message ID: ***@***.***>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org