You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/09/20 04:32:00 UTC

[jira] [Commented] (ASTERIXDB-2129) UTF8StringUtil key normalization failure

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606877#comment-17606877 ] 

ASF subversion and git services commented on ASTERIXDB-2129:
------------------------------------------------------------

Commit c60b7f4f011548189dc78372ac109e228d849685 in asterixdb's branch refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=c60b7f4f01 ]

[ASTERIXDB-2129][RT] Fix normalizing non-ascii strings

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
For example, single char strings with a 3-byte char can go out of the
string's buffer boundry

Change-Id: Ic169d5ff20f9bf5ce2ca36bab4ebd241bbc50dca
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17230
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Ali Alsuliman <al...@gmail.com>


> UTF8StringUtil key normalization failure
> ----------------------------------------
>
>                 Key: ASTERIXDB-2129
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2129
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: RT - Runtime, TYPE - Data Model
>            Reporter: Ian Maxon
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>
> This query:
> SELECT text,c
> FROM(
> SELECT h.text AS text, datetime_from_unix_time_in_ms(to_bigint(t.timestamp_ms)) as time
> FROM aca_int AS t
> UNNEST t.hashtags h
> where t.isRelated = 1 and t.`SA-OM` is not missing and t.createdDate is not missing
> ) AS g
> GROUP BY g.text AS text WITH c AS count(g.time)
> ORDER BY c DESC;
> Where the un-nested hashtag field text is in a closed schema, causes this failure:
> {quote}{{Oct 11, 2017 7:10:05 AM org.apache.hyracks.control.cc.dataset.DatasetDirectoryService reportJobFailure
> INFO: job JID:4 failed and is being reported to DatasetDirectoryService
> org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.IllegalArgumentException
> ↪   at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:134)
> ↪   at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:63)
> ↪   at org.apache.hyracks.control.nc.Task.run(Task.java:362)
> ↪   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ↪   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ↪   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException
> ↪   at org.apache.hyracks.util.string.UTF8StringUtil.charAt(UTF8StringUtil.java:60)
> ↪   at org.apache.hyracks.util.string.UTF8StringUtil.normalize(UTF8StringUtil.java:228)
> ↪   at org.apache.hyracks.dataflow.common.data.normalizers.UTF8StringNormalizedKeyComputerFactory$1.normalize(UTF8StringNormalizedKeyComputerFactory.java:33)
> ↪   at org.apache.asterix.dataflow.data.nontagged.keynormalizers.AWrappedAscNormalizedKeyComputerFactory$1.normalize(AWrappedAscNormalizedKeyComputerFactory.java:46)
> ↪   at org.apache.hyracks.dataflow.std.sort.AbstractFrameSorter.sort(AbstractFrameSorter.java:139)
> ↪   at org.apache.hyracks.dataflow.std.sort.AbstractSortRunGenerator.flushFramesToRun(AbstractSortRunGenerator.java:60)
> ↪   at org.apache.hyracks.dataflow.std.sort.AbstractSortRunGenerator.close(AbstractSortRunGenerator.java:50)
> ↪   at org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$SortActivity$1.close(AbstractSorterOperatorDescriptor.java:132)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:60)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:60)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.close(AssignRuntimeFactory.java:119)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:60)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:60)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.std.StreamSelectRuntimeFactory$1.close(StreamSelectRuntimeFactory.java:112)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:60)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.close(AssignRuntimeFactory.java:119)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.close(AbstractOneInputOneOutputOneFramePushRuntime.java:60)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.close(AlgebricksMetaOperatorDescriptor.java:140)
> ↪   at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.close(IndexSearchOperatorNodePushable.java:243)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.std.EmptyTupleSourceRuntimeFactory$1.close(EmptyTupleSourceRuntimeFactory.java:65)
> ↪   at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$1.initialize(AlgebricksMetaOperatorDescriptor.java:104)
> ↪   at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$1(SuperActivityOperatorNodePushable.java:204)
> ↪   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ↪   ... 3 more
> }}{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)