You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Ildar Absalyamov (JIRA)" <ji...@apache.org> on 2017/10/24 04:28:01 UTC
[jira] [Updated] (ASTERIXDB-2141) Pre-sorted bulkload failure

     [ https://issues.apache.org/jira/browse/ASTERIXDB-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ildar Absalyamov updated ASTERIXDB-2141:
----------------------------------------
    Description: 
Bulkloading pre-sorted input fails due to concurrency issue in hash_partition_merge connector. The error is non-deterministic, but the chance of hitting it increases with the length of the input.
The following DDL generates "HYR0046: Unsorted load input" error.
{code:java}
drop dataverse experiments if exists;
create dataverse experiments;
use dataverse experiments;
set hash_merge "true"

create type TweetMessageType as open {
    tweetid: int64
}
create dataset Tweets(TweetMessageType) primary key tweetid; 
load dataset Tweets using localfs (("path"="asterix_nc1://tweets.adm,asterix_nc2://tweets2.adm"),("format"="adm")) pre-sorted;
{code}
despite the fact that input splits are individually sorted (tweets.adm and tweets2.adm):
{code:title=tweets.adm}
{"tweetid":int64("2")}
{"tweetid":int64("4")}
{"tweetid":int64("6")}
{"tweetid":int64("8")}
{"tweetid":int64("10")}
{"tweetid":int64("12")}
{"tweetid":int64("14")}
{"tweetid":int64("16")}
{"tweetid":int64("18")}
{"tweetid":int64("20")}
{code}
{code:title=tweets2.adm}
{"tweetid":int64("1")}
{"tweetid":int64("3")}
{"tweetid":int64("5")}
{"tweetid":int64("7")}
{"tweetid":int64("9")}
{"tweetid":int64("11")}
{"tweetid":int64("13")}
{"tweetid":int64("15")}
{"tweetid":int64("17")}
{"tweetid":int64("19")}
{code}

  was:
Bulkloading pre-sorted input fails due to concurrency issue in hash_partition_merge connector. The following DDL generates "HYR0046: Unsorted load input" error.
The error is non-deterministic, but the chance of hitting it increases with the length of the input.
{code:java}
drop dataverse experiments if exists;
create dataverse experiments;
use dataverse experiments;
set hash_merge "true"

create type TweetMessageType as open {
    tweetid: int64
}
create dataset Tweets(TweetMessageType) primary key tweetid; 
load dataset Tweets using localfs (("path"="asterix_nc1://tweets.adm,asterix_nc2://tweets2.adm"),("format"="adm")) pre-sorted;
{code}
despite the fact that input splits are individually sorted (tweets.adm and tweets2.adm):
{code:title=tweets.adm}
{"tweetid":int64("2")}
{"tweetid":int64("4")}
{"tweetid":int64("6")}
{"tweetid":int64("8")}
{"tweetid":int64("10")}
{"tweetid":int64("12")}
{"tweetid":int64("14")}
{"tweetid":int64("16")}
{"tweetid":int64("18")}
{"tweetid":int64("20")}
{code}
{code:title=tweets2.adm}
{"tweetid":int64("1")}
{"tweetid":int64("3")}
{"tweetid":int64("5")}
{"tweetid":int64("7")}
{"tweetid":int64("9")}
{"tweetid":int64("11")}
{"tweetid":int64("13")}
{"tweetid":int64("15")}
{"tweetid":int64("17")}
{"tweetid":int64("19")}
{code}


> Pre-sorted bulkload failure
> ---------------------------
>
>                 Key: ASTERIXDB-2141
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2141
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Ildar Absalyamov
>            Assignee: Ian Maxon
>
> Bulkloading pre-sorted input fails due to concurrency issue in hash_partition_merge connector. The error is non-deterministic, but the chance of hitting it increases with the length of the input.
> The following DDL generates "HYR0046: Unsorted load input" error.
> {code:java}
> drop dataverse experiments if exists;
> create dataverse experiments;
> use dataverse experiments;
> set hash_merge "true"
> create type TweetMessageType as open {
>     tweetid: int64
> }
> create dataset Tweets(TweetMessageType) primary key tweetid; 
> load dataset Tweets using localfs (("path"="asterix_nc1://tweets.adm,asterix_nc2://tweets2.adm"),("format"="adm")) pre-sorted;
> {code}
> despite the fact that input splits are individually sorted (tweets.adm and tweets2.adm):
> {code:title=tweets.adm}
> {"tweetid":int64("2")}
> {"tweetid":int64("4")}
> {"tweetid":int64("6")}
> {"tweetid":int64("8")}
> {"tweetid":int64("10")}
> {"tweetid":int64("12")}
> {"tweetid":int64("14")}
> {"tweetid":int64("16")}
> {"tweetid":int64("18")}
> {"tweetid":int64("20")}
> {code}
> {code:title=tweets2.adm}
> {"tweetid":int64("1")}
> {"tweetid":int64("3")}
> {"tweetid":int64("5")}
> {"tweetid":int64("7")}
> {"tweetid":int64("9")}
> {"tweetid":int64("11")}
> {"tweetid":int64("13")}
> {"tweetid":int64("15")}
> {"tweetid":int64("17")}
> {"tweetid":int64("19")}
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)