You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Sikan Chen (Jira)" <ji...@apache.org> on 2021/02/08 06:51:00 UTC

[jira] [Updated] (ARROW-11549) Expression::ToString() doesn't distinguish null and string literal 'null', causing issues with FilterCacheKey

     [ https://issues.apache.org/jira/browse/ARROW-11549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sikan Chen updated ARROW-11549:
-------------------------------
    Description: 
Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return the correct cached filter. {{FilterCacheKey}}'s hash function factors in the string representation of a given expression, however {{Expression::ToString()}} doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache.

In our case, we are building a SQL parser on top of gandiva, but, for instance, both of
{code:java}
WHERE null = null
{code}
and
{code:java}
WHERE 'null' = 'null'
{code}
result in the same string representation of gandiva expression:
{code:java}
bool equal((const string) null, (const string) null)
{code}
A simple test to demonstrate the issue:
{code:java}
  auto f = field("foo", utf8());
  auto schema = arrow::schema({f});
  auto node_a = TreeExprBuilder::MakeStringLiteral("null");
  auto node_b = TreeExprBuilder::MakeStringLiteral("null");
  auto equal_func =
      TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean());
  auto condition = TreeExprBuilder::MakeCondition(equal_func);
  std::shared_ptr<Filter> filter1;
  auto status = Filter::Make(schema, condition, &filter1);
  EXPECT_TRUE(status.ok());

  auto string_type = std::make_shared<arrow::StringType>();
  node_a = TreeExprBuilder::MakeNull(string_type);
  node_b = TreeExprBuilder::MakeNull(string_type);
  equal_func =
      TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean());
  condition = TreeExprBuilder::MakeCondition(equal_func);
  std::shared_ptr<Filter> filter2;
  status = Filter::Make(schema, condition, &filter2);
  EXPECT_TRUE(status.ok());

  EXPECT_TRUE(filter1.get() != filter2.get());
{code}
 
 Making {{LiteralToStream}} add quotes around the literal seems like a quick-and-dirty fix.

  was:
Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return the correct cached filter. {{FilterCacheKey}}'s hash function factors in the string representation of a given expression, however {{Expression::ToString()}} doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache.

In our case, we are building a SQL parser on top of gandiva, but, for instance, both of
{code:java}
WHERE null = null
{code}
and
{code:java}
WHERE 'null' = 'null'
{code}
result in the same string representation of gandiva expression:
{code:java}
bool equal((const string) null, (const string) null)
{code}
A simple test to demonstrate the issue:
{code:java}
  auto f = field("foo", utf8());
  auto schema = arrow::schema({f});
  auto node_a = TreeExprBuilder::MakeStringLiteral("null");
  auto node_b = TreeExprBuilder::MakeStringLiteral("null");
  auto equal_func =
      TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean());
  auto condition = TreeExprBuilder::MakeCondition(equal_func);
  std::shared_ptr<Filter> filter1;
  auto status = Filter::Make(schema, condition, &filter1);
  EXPECT_TRUE(status.ok());

  auto string_type = std::make_shared<arrow::StringType>();
  node_a = TreeExprBuilder::MakeNull(string_type);
  node_b = TreeExprBuilder::MakeNull(string_type);
  equal_func =
      TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean());
  condition = TreeExprBuilder::MakeCondition(equal_func);
  std::shared_ptr<Filter> filter2;
  status = Filter::Make(schema, condition, &filter2);
  EXPECT_TRUE(status.ok());

  EXPECT_TRUE(filter1.get() != filter2.get());
{code}
 
 Making {{LiteralToStream}} adding quotes around the literal seems like a quick-and-dirty fix.


> Expression::ToString() doesn't distinguish null and string literal 'null', causing issues with FilterCacheKey
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11549
>                 URL: https://issues.apache.org/jira/browse/ARROW-11549
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, C++ - Gandiva
>            Reporter: Sikan Chen
>            Priority: Critical
>
> Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return the correct cached filter. {{FilterCacheKey}}'s hash function factors in the string representation of a given expression, however {{Expression::ToString()}} doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache.
> In our case, we are building a SQL parser on top of gandiva, but, for instance, both of
> {code:java}
> WHERE null = null
> {code}
> and
> {code:java}
> WHERE 'null' = 'null'
> {code}
> result in the same string representation of gandiva expression:
> {code:java}
> bool equal((const string) null, (const string) null)
> {code}
> A simple test to demonstrate the issue:
> {code:java}
>   auto f = field("foo", utf8());
>   auto schema = arrow::schema({f});
>   auto node_a = TreeExprBuilder::MakeStringLiteral("null");
>   auto node_b = TreeExprBuilder::MakeStringLiteral("null");
>   auto equal_func =
>       TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean());
>   auto condition = TreeExprBuilder::MakeCondition(equal_func);
>   std::shared_ptr<Filter> filter1;
>   auto status = Filter::Make(schema, condition, &filter1);
>   EXPECT_TRUE(status.ok());
>   auto string_type = std::make_shared<arrow::StringType>();
>   node_a = TreeExprBuilder::MakeNull(string_type);
>   node_b = TreeExprBuilder::MakeNull(string_type);
>   equal_func =
>       TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean());
>   condition = TreeExprBuilder::MakeCondition(equal_func);
>   std::shared_ptr<Filter> filter2;
>   status = Filter::Make(schema, condition, &filter2);
>   EXPECT_TRUE(status.ok());
>   EXPECT_TRUE(filter1.get() != filter2.get());
> {code}
>  
>  Making {{LiteralToStream}} add quotes around the literal seems like a quick-and-dirty fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)