You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kudu.apache.org by Paul Brannan <pa...@thesystech.com> on 2017/02/24 18:53:58 UTC

mixing range and hash partitioning

I'm trying to create a table with one-column range-partitioned and another
column hash-partitioned.  Documentation for add_hash_partitions and
set_range_partition_columns suggest this should be possible ("Tables must
be created with either range, hash, or range and hash partitioning").

I have a schema with three INT64 columns ("time", "key", and "value").
When I create the table, I set up the partitioning:

(*table_creator)
  .table_name("test_table")
  .schema(&schema)
  .add_hash_partitions({"key"}, 2)
  .set_range_partition_columns({"time"})
  .num_replicas(1)
  .Create()

I later try to add a partition:

auto timesplit(KuduSchema & schema, std::int64_t t) {
  auto split = schema.NewRow();
  check_ok(split->SetInt64("time", t));
  return split;
}

alterer->AddRangePartition(
  timesplit(schema, date_start),
  timesplit(schema, next_date_start));

check_ok(alterer->Alter());

But I get an error "Invalid argument: New range partition conflicts with
existing range partition".

How are hash and range partitioning intended to be mixed?

Re: mixing range and hash partitioning

Posted by Paul Brannan <pa...@thesystech.com>.
I think you're right; I'm on 1.2 now, and can't reproduce the behavior I as
seeing.

On Mon, Mar 6, 2017 at 7:57 PM, Dan Burkert <da...@apache.org> wrote:

> Hi Paul,
>
> Sorry for the slow followup, been pulled a few different ways with the
> upcoming 1.3 release.  The issue you run into is KUDU-1792
> <https://issues.apache.org/jira/browse/KUDU-1792>, which was fixed in
> Kudu 1.2.  KUDU-1792 only comes into play when adding a range partition
> where either the upper or lower bound is unbounded, but this is actually
> the case in your repro example due to a copy/paste error where the lower
> limit is being set twice and the upper limit is not being set.  I think the
> fix is to upgrade to Kudu 1.2 and recreate the table if it still has the
> buggy partitions.  Thanks again for the report!
>
> - Dan
>
> On Tue, Feb 28, 2017 at 1:03 PM, Dan Burkert <da...@apache.org>
> wrote:
>
>> Yep: https://issues.apache.org/jira/browse/KUDU-1903
>>
>> - Dan
>>
>> On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>>> Hey Dan,
>>>
>>> Mind filing a critical or blocker JIRA against 1.3 so we can track
>>> remaining things that should go into the branch before release?
>>>
>>> -Todd
>>>
>>> On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <da...@apache.org>
>>> wrote:
>>>
>>>> Hey Paul,
>>>>
>>>> Thanks for checking that out and following up.  I'm going to try and
>>>> root cause this today so that we have plenty of time to get a fix in to 1.3
>>>> if it requires one.   Thanks again for the report. In the meantime, let me
>>>> know if the alter table workaround is not enough for you to make progress
>>>> with Kudu.
>>>>
>>>> -Dan
>>>>
>>>>
>>>> On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <
>>>> paul.brannan@thesystech.com> wrote:
>>>>
>>>> One side-effect of neglecting to drop the unbounded range partition: I
>>>> get a stack trace when I try to scan:
>>>>
>>>> F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
>>>> collection.end() Map key not found: ▒3
>>>> *** Check failure stack trace: ***
>>>>     @     0x7fca2a5506ad  (unknown)
>>>>     @     0x7fca2a55271c  (unknown)
>>>>     @     0x7fca2a550209  (unknown)
>>>>     @     0x7fca2a5530af  (unknown)
>>>>     @     0x7fca2a3de482  (unknown)
>>>>     @     0x7fca2a3dae70  (unknown)
>>>>     @     0x7fca2a3dc100  (unknown)
>>>>     @     0x7fca2a429a44  (unknown)
>>>>     @     0x7fca2a42ab47  (unknown)
>>>>     @     0x7fca2a42e94c  (unknown)
>>>>     @     0x7fca2a43081c  (unknown)
>>>>     @     0x7fca2a5a9a56  (unknown)
>>>>     @     0x7fca2a5aa948  (unknown)
>>>>     @     0x7fca2a41ac8b  (unknown)
>>>>     @     0x7fca2a4dcfc8  (unknown)
>>>>     @     0x7fca290d6182  start_thread
>>>>     @     0x7fca2980947d  clone
>>>>     @              (nil)  (unknown)
>>>>
>>>>
>>>> On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <
>>>> paul.brannan@thesystech.com> wrote:
>>>>
>>>> Is that 4TB per tablet server, regardless of how many tablets it has?
>>>>
>>>> If I have 128GB of data per day, then each tablet server hits the
>>>> recommended limit after about a month.  To store 10 years of data, I would
>>>> need 120 tablet servers to avoid going over the limit.  Is that the best
>>>> solution or is there another alternative?
>>>>
>>>> How many cores are recommended per tablet server?  If I typically only
>>>> scan one day of data at time, could a single core service multiple tablet
>>>> servers?
>>>>
>>>>
>>>> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
>>>> paul.brannan@thesystech.com> wrote:
>>>>
>>>> The test doesn't exactly reproduce what I did in my sample program.
>>>>
>>>> I'm able to successfully drop the unbounded partition in both cases
>>>> (calling set_range_partition_columns only vs calling
>>>> set_range_partition_columns+add_hash_partitions).  However, if I omit
>>>> the call to DropRangePartition, then AddRangePartition succeeds in the
>>>> first case and fails in the second case.  I expect it to succeed in both
>>>> cases or fail in both cases.
>>>>
>>>> I've attached a simple program which demonstrates.
>>>>
>>>>
>>>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org>
>>>> wrote:
>>>>
>>>> Hi Paul,
>>>>
>>>> I can't reproduce the behavior you are describing, I always get a
>>>> single unbounded range partition when creating the table without specifying
>>>> range bounds or splits (regardless of hash partitioning). I searched and
>>>> couldn't find a unit test for this behavior, so I wrote one - you might
>>>> compare your code against my test. https://gerrit.cloudera.
>>>> org/#/c/6153/
>>>>
>>>> Thanks,
>>>> Dan
>>>>
>>>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>>>> paul.brannan@thesystech.com> wrote:
>>>>
>>>> I can verify that dropping the unbounded range partition allows me to
>>>> later add bounded partitions.
>>>>
>>>> If I only have range partitioning (by commenting out the call to
>>>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>>>> whether I first drop the unbounded partition.  This seems surprising; why
>>>> the difference?
>>>>
>>>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
>>>> wrote:
>>>>
>>>> Hi Paul,
>>>>
>>>> I think the issue you are running into is that if you don't add a range
>>>> partition explicitly during table creation (by calling add_range_partition
>>>> or inserting a split with add_range_partition_split), Kudu will default to
>>>> creating 1 unbounded range partition.  So your two options are to add the
>>>> range partition during table creation time, or if you only know that
>>>> partition you want at a later time, you can drop the existing partition
>>>> (alterer->DropRangePartition with two empty rows), then add the range
>>>> partition.  Note that dropping the range partition will effectively
>>>> truncate the table.  This can be done with the same alterer in a single
>>>> transaction.  If you want to see a bunch of examples, you can check out
>>>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in
>>>> tegration-tests/alter_table-test.cc#L1106.
>>>>
>>>> - Dan
>>>>
>>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>>> paul.brannan@thesystech.com> wrote:
>>>>
>>>> I'm trying to create a table with one-column range-partitioned and
>>>> another column hash-partitioned.  Documentation for add_hash_partitions and
>>>> set_range_partition_columns suggest this should be possible ("Tables must
>>>> be created with either range, hash, or range and hash partitioning").
>>>>
>>>> I have a schema with three INT64 columns ("time", "key", and "value").
>>>> When I create the table, I set up the partitioning:
>>>>
>>>> (*table_creator)
>>>>   .table_name("test_table")
>>>>   .schema(&schema)
>>>>   .add_hash_partitions({"key"}, 2)
>>>>   .set_range_partition_columns({"time"})
>>>>   .num_replicas(1)
>>>>   .Create()
>>>>
>>>> I later try to add a partition:
>>>>
>>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>>   auto split = schema.NewRow();
>>>>   check_ok(split->SetInt64("time", t));
>>>>   return split;
>>>> }
>>>>
>>>> alterer->AddRangePartition(
>>>>   timesplit(schema, date_start),
>>>>   timesplit(schema, next_date_start));
>>>>
>>>> check_ok(alterer->Alter());
>>>>
>>>> But I get an error "Invalid argument: New range partition conflicts
>>>> with existing range partition".
>>>>
>>>> How are hash and range partitioning intended to be mixed?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>

Re: mixing range and hash partitioning

Posted by Dan Burkert <da...@apache.org>.
Hi Paul,

Sorry for the slow followup, been pulled a few different ways with the
upcoming 1.3 release.  The issue you run into is KUDU-1792
<https://issues.apache.org/jira/browse/KUDU-1792>, which was fixed in Kudu
1.2.  KUDU-1792 only comes into play when adding a range partition where
either the upper or lower bound is unbounded, but this is actually the case
in your repro example due to a copy/paste error where the lower limit is
being set twice and the upper limit is not being set.  I think the fix is
to upgrade to Kudu 1.2 and recreate the table if it still has the buggy
partitions.  Thanks again for the report!

- Dan

On Tue, Feb 28, 2017 at 1:03 PM, Dan Burkert <da...@apache.org> wrote:

> Yep: https://issues.apache.org/jira/browse/KUDU-1903
>
> - Dan
>
> On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hey Dan,
>>
>> Mind filing a critical or blocker JIRA against 1.3 so we can track
>> remaining things that should go into the branch before release?
>>
>> -Todd
>>
>> On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <da...@apache.org>
>> wrote:
>>
>>> Hey Paul,
>>>
>>> Thanks for checking that out and following up.  I'm going to try and
>>> root cause this today so that we have plenty of time to get a fix in to 1.3
>>> if it requires one.   Thanks again for the report. In the meantime, let me
>>> know if the alter table workaround is not enough for you to make progress
>>> with Kudu.
>>>
>>> -Dan
>>>
>>>
>>> On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>> One side-effect of neglecting to drop the unbounded range partition: I
>>> get a stack trace when I try to scan:
>>>
>>> F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
>>> collection.end() Map key not found: ▒3
>>> *** Check failure stack trace: ***
>>>     @     0x7fca2a5506ad  (unknown)
>>>     @     0x7fca2a55271c  (unknown)
>>>     @     0x7fca2a550209  (unknown)
>>>     @     0x7fca2a5530af  (unknown)
>>>     @     0x7fca2a3de482  (unknown)
>>>     @     0x7fca2a3dae70  (unknown)
>>>     @     0x7fca2a3dc100  (unknown)
>>>     @     0x7fca2a429a44  (unknown)
>>>     @     0x7fca2a42ab47  (unknown)
>>>     @     0x7fca2a42e94c  (unknown)
>>>     @     0x7fca2a43081c  (unknown)
>>>     @     0x7fca2a5a9a56  (unknown)
>>>     @     0x7fca2a5aa948  (unknown)
>>>     @     0x7fca2a41ac8b  (unknown)
>>>     @     0x7fca2a4dcfc8  (unknown)
>>>     @     0x7fca290d6182  start_thread
>>>     @     0x7fca2980947d  clone
>>>     @              (nil)  (unknown)
>>>
>>>
>>> On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>> Is that 4TB per tablet server, regardless of how many tablets it has?
>>>
>>> If I have 128GB of data per day, then each tablet server hits the
>>> recommended limit after about a month.  To store 10 years of data, I would
>>> need 120 tablet servers to avoid going over the limit.  Is that the best
>>> solution or is there another alternative?
>>>
>>> How many cores are recommended per tablet server?  If I typically only
>>> scan one day of data at time, could a single core service multiple tablet
>>> servers?
>>>
>>>
>>> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>> The test doesn't exactly reproduce what I did in my sample program.
>>>
>>> I'm able to successfully drop the unbounded partition in both cases
>>> (calling set_range_partition_columns only vs calling
>>> set_range_partition_columns+add_hash_partitions).  However, if I omit
>>> the call to DropRangePartition, then AddRangePartition succeeds in the
>>> first case and fails in the second case.  I expect it to succeed in both
>>> cases or fail in both cases.
>>>
>>> I've attached a simple program which demonstrates.
>>>
>>>
>>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org>
>>> wrote:
>>>
>>> Hi Paul,
>>>
>>> I can't reproduce the behavior you are describing, I always get a single
>>> unbounded range partition when creating the table without specifying range
>>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>>> find a unit test for this behavior, so I wrote one - you might compare your
>>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>> I can verify that dropping the unbounded range partition allows me to
>>> later add bounded partitions.
>>>
>>> If I only have range partitioning (by commenting out the call to
>>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>>> whether I first drop the unbounded partition.  This seems surprising; why
>>> the difference?
>>>
>>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
>>> wrote:
>>>
>>> Hi Paul,
>>>
>>> I think the issue you are running into is that if you don't add a range
>>> partition explicitly during table creation (by calling add_range_partition
>>> or inserting a split with add_range_partition_split), Kudu will default to
>>> creating 1 unbounded range partition.  So your two options are to add the
>>> range partition during table creation time, or if you only know that
>>> partition you want at a later time, you can drop the existing partition
>>> (alterer->DropRangePartition with two empty rows), then add the range
>>> partition.  Note that dropping the range partition will effectively
>>> truncate the table.  This can be done with the same alterer in a single
>>> transaction.  If you want to see a bunch of examples, you can check out
>>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in
>>> tegration-tests/alter_table-test.cc#L1106.
>>>
>>> - Dan
>>>
>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>> I'm trying to create a table with one-column range-partitioned and
>>> another column hash-partitioned.  Documentation for add_hash_partitions and
>>> set_range_partition_columns suggest this should be possible ("Tables must
>>> be created with either range, hash, or range and hash partitioning").
>>>
>>> I have a schema with three INT64 columns ("time", "key", and "value").
>>> When I create the table, I set up the partitioning:
>>>
>>> (*table_creator)
>>>   .table_name("test_table")
>>>   .schema(&schema)
>>>   .add_hash_partitions({"key"}, 2)
>>>   .set_range_partition_columns({"time"})
>>>   .num_replicas(1)
>>>   .Create()
>>>
>>> I later try to add a partition:
>>>
>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>   auto split = schema.NewRow();
>>>   check_ok(split->SetInt64("time", t));
>>>   return split;
>>> }
>>>
>>> alterer->AddRangePartition(
>>>   timesplit(schema, date_start),
>>>   timesplit(schema, next_date_start));
>>>
>>> check_ok(alterer->Alter());
>>>
>>> But I get an error "Invalid argument: New range partition conflicts with
>>> existing range partition".
>>>
>>> How are hash and range partitioning intended to be mixed?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: mixing range and hash partitioning

Posted by Dan Burkert <da...@apache.org>.
Yep: https://issues.apache.org/jira/browse/KUDU-1903

- Dan

On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Dan,
>
> Mind filing a critical or blocker JIRA against 1.3 so we can track
> remaining things that should go into the branch before release?
>
> -Todd
>
> On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <da...@apache.org>
> wrote:
>
>> Hey Paul,
>>
>> Thanks for checking that out and following up.  I'm going to try and root
>> cause this today so that we have plenty of time to get a fix in to 1.3 if
>> it requires one.   Thanks again for the report. In the meantime, let me
>> know if the alter table workaround is not enough for you to make progress
>> with Kudu.
>>
>> -Dan
>>
>>
>> On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <pa...@thesystech.com>
>> wrote:
>>
>> One side-effect of neglecting to drop the unbounded range partition: I
>> get a stack trace when I try to scan:
>>
>> F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
>> collection.end() Map key not found: ▒3
>> *** Check failure stack trace: ***
>>     @     0x7fca2a5506ad  (unknown)
>>     @     0x7fca2a55271c  (unknown)
>>     @     0x7fca2a550209  (unknown)
>>     @     0x7fca2a5530af  (unknown)
>>     @     0x7fca2a3de482  (unknown)
>>     @     0x7fca2a3dae70  (unknown)
>>     @     0x7fca2a3dc100  (unknown)
>>     @     0x7fca2a429a44  (unknown)
>>     @     0x7fca2a42ab47  (unknown)
>>     @     0x7fca2a42e94c  (unknown)
>>     @     0x7fca2a43081c  (unknown)
>>     @     0x7fca2a5a9a56  (unknown)
>>     @     0x7fca2a5aa948  (unknown)
>>     @     0x7fca2a41ac8b  (unknown)
>>     @     0x7fca2a4dcfc8  (unknown)
>>     @     0x7fca290d6182  start_thread
>>     @     0x7fca2980947d  clone
>>     @              (nil)  (unknown)
>>
>>
>> On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> Is that 4TB per tablet server, regardless of how many tablets it has?
>>
>> If I have 128GB of data per day, then each tablet server hits the
>> recommended limit after about a month.  To store 10 years of data, I would
>> need 120 tablet servers to avoid going over the limit.  Is that the best
>> solution or is there another alternative?
>>
>> How many cores are recommended per tablet server?  If I typically only
>> scan one day of data at time, could a single core service multiple tablet
>> servers?
>>
>>
>> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> The test doesn't exactly reproduce what I did in my sample program.
>>
>> I'm able to successfully drop the unbounded partition in both cases
>> (calling set_range_partition_columns only vs calling
>> set_range_partition_columns+add_hash_partitions).  However, if I omit
>> the call to DropRangePartition, then AddRangePartition succeeds in the
>> first case and fails in the second case.  I expect it to succeed in both
>> cases or fail in both cases.
>>
>> I've attached a simple program which demonstrates.
>>
>>
>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org>
>> wrote:
>>
>> Hi Paul,
>>
>> I can't reproduce the behavior you are describing, I always get a single
>> unbounded range partition when creating the table without specifying range
>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>> find a unit test for this behavior, so I wrote one - you might compare your
>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>
>> Thanks,
>> Dan
>>
>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> I can verify that dropping the unbounded range partition allows me to
>> later add bounded partitions.
>>
>> If I only have range partitioning (by commenting out the call to
>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>> whether I first drop the unbounded partition.  This seems surprising; why
>> the difference?
>>
>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
>> wrote:
>>
>> Hi Paul,
>>
>> I think the issue you are running into is that if you don't add a range
>> partition explicitly during table creation (by calling add_range_partition
>> or inserting a split with add_range_partition_split), Kudu will default to
>> creating 1 unbounded range partition.  So your two options are to add the
>> range partition during table creation time, or if you only know that
>> partition you want at a later time, you can drop the existing partition
>> (alterer->DropRangePartition with two empty rows), then add the range
>> partition.  Note that dropping the range partition will effectively
>> truncate the table.  This can be done with the same alterer in a single
>> transaction.  If you want to see a bunch of examples, you can check out
>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/
>> integration-tests/alter_table-test.cc#L1106.
>>
>> - Dan
>>
>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> I'm trying to create a table with one-column range-partitioned and
>> another column hash-partitioned.  Documentation for add_hash_partitions and
>> set_range_partition_columns suggest this should be possible ("Tables must
>> be created with either range, hash, or range and hash partitioning").
>>
>> I have a schema with three INT64 columns ("time", "key", and "value").
>> When I create the table, I set up the partitioning:
>>
>> (*table_creator)
>>   .table_name("test_table")
>>   .schema(&schema)
>>   .add_hash_partitions({"key"}, 2)
>>   .set_range_partition_columns({"time"})
>>   .num_replicas(1)
>>   .Create()
>>
>> I later try to add a partition:
>>
>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>   auto split = schema.NewRow();
>>   check_ok(split->SetInt64("time", t));
>>   return split;
>> }
>>
>> alterer->AddRangePartition(
>>   timesplit(schema, date_start),
>>   timesplit(schema, next_date_start));
>>
>> check_ok(alterer->Alter());
>>
>> But I get an error "Invalid argument: New range partition conflicts with
>> existing range partition".
>>
>> How are hash and range partitioning intended to be mixed?
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: mixing range and hash partitioning

Posted by Todd Lipcon <to...@cloudera.com>.
Hey Dan,

Mind filing a critical or blocker JIRA against 1.3 so we can track
remaining things that should go into the branch before release?

-Todd

On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <da...@apache.org> wrote:

> Hey Paul,
>
> Thanks for checking that out and following up.  I'm going to try and root
> cause this today so that we have plenty of time to get a fix in to 1.3 if
> it requires one.   Thanks again for the report. In the meantime, let me
> know if the alter table workaround is not enough for you to make progress
> with Kudu.
>
> -Dan
>
>
> On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <pa...@thesystech.com>
> wrote:
>
> One side-effect of neglecting to drop the unbounded range partition: I get
> a stack trace when I try to scan:
>
> F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
> collection.end() Map key not found: ▒3
> *** Check failure stack trace: ***
>     @     0x7fca2a5506ad  (unknown)
>     @     0x7fca2a55271c  (unknown)
>     @     0x7fca2a550209  (unknown)
>     @     0x7fca2a5530af  (unknown)
>     @     0x7fca2a3de482  (unknown)
>     @     0x7fca2a3dae70  (unknown)
>     @     0x7fca2a3dc100  (unknown)
>     @     0x7fca2a429a44  (unknown)
>     @     0x7fca2a42ab47  (unknown)
>     @     0x7fca2a42e94c  (unknown)
>     @     0x7fca2a43081c  (unknown)
>     @     0x7fca2a5a9a56  (unknown)
>     @     0x7fca2a5aa948  (unknown)
>     @     0x7fca2a41ac8b  (unknown)
>     @     0x7fca2a4dcfc8  (unknown)
>     @     0x7fca290d6182  start_thread
>     @     0x7fca2980947d  clone
>     @              (nil)  (unknown)
>
>
> On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <paul.brannan@thesystech.com
> > wrote:
>
> Is that 4TB per tablet server, regardless of how many tablets it has?
>
> If I have 128GB of data per day, then each tablet server hits the
> recommended limit after about a month.  To store 10 years of data, I would
> need 120 tablet servers to avoid going over the limit.  Is that the best
> solution or is there another alternative?
>
> How many cores are recommended per tablet server?  If I typically only
> scan one day of data at time, could a single core service multiple tablet
> servers?
>
>
> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
> paul.brannan@thesystech.com> wrote:
>
> The test doesn't exactly reproduce what I did in my sample program.
>
> I'm able to successfully drop the unbounded partition in both cases
> (calling set_range_partition_columns only vs calling
> set_range_partition_columns+add_hash_partitions).  However, if I omit the
> call to DropRangePartition, then AddRangePartition succeeds in the first
> case and fails in the second case.  I expect it to succeed in both cases or
> fail in both cases.
>
> I've attached a simple program which demonstrates.
>
>
> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org>
> wrote:
>
> Hi Paul,
>
> I can't reproduce the behavior you are describing, I always get a single
> unbounded range partition when creating the table without specifying range
> bounds or splits (regardless of hash partitioning). I searched and couldn't
> find a unit test for this behavior, so I wrote one - you might compare your
> code against my test. https://gerrit.cloudera.org/#/c/6153/
>
> Thanks,
> Dan
>
> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <paul.brannan@thesystech.com
> > wrote:
>
> I can verify that dropping the unbounded range partition allows me to
> later add bounded partitions.
>
> If I only have range partitioning (by commenting out the call to
> add_hash_partitions), adding a bounded partition succeeds, regardless of
> whether I first drop the unbounded partition.  This seems surprising; why
> the difference?
>
> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
> wrote:
>
> Hi Paul,
>
> I think the issue you are running into is that if you don't add a range
> partition explicitly during table creation (by calling add_range_partition
> or inserting a split with add_range_partition_split), Kudu will default to
> creating 1 unbounded range partition.  So your two options are to add the
> range partition during table creation time, or if you only know that
> partition you want at a later time, you can drop the existing partition
> (alterer->DropRangePartition with two empty rows), then add the range
> partition.  Note that dropping the range partition will effectively
> truncate the table.  This can be done with the same alterer in a single
> transaction.  If you want to see a bunch of examples, you can check out
> this unit test: https://github.com/apache/kudu/blob/master/src/
> kudu/integration-tests/alter_table-test.cc#L1106.
>
> - Dan
>
> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
> paul.brannan@thesystech.com> wrote:
>
> I'm trying to create a table with one-column range-partitioned and another
> column hash-partitioned.  Documentation for add_hash_partitions and
> set_range_partition_columns suggest this should be possible ("Tables must
> be created with either range, hash, or range and hash partitioning").
>
> I have a schema with three INT64 columns ("time", "key", and "value").
> When I create the table, I set up the partitioning:
>
> (*table_creator)
>   .table_name("test_table")
>   .schema(&schema)
>   .add_hash_partitions({"key"}, 2)
>   .set_range_partition_columns({"time"})
>   .num_replicas(1)
>   .Create()
>
> I later try to add a partition:
>
> auto timesplit(KuduSchema & schema, std::int64_t t) {
>   auto split = schema.NewRow();
>   check_ok(split->SetInt64("time", t));
>   return split;
> }
>
> alterer->AddRangePartition(
>   timesplit(schema, date_start),
>   timesplit(schema, next_date_start));
>
> check_ok(alterer->Alter());
>
> But I get an error "Invalid argument: New range partition conflicts with
> existing range partition".
>
> How are hash and range partitioning intended to be mixed?
>
>
>
>
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: mixing range and hash partitioning

Posted by Dan Burkert <da...@apache.org>.
Hey Paul,

Thanks for checking that out and following up.  I'm going to try and root
cause this today so that we have plenty of time to get a fix in to 1.3 if
it requires one.   Thanks again for the report. In the meantime, let me
know if the alter table workaround is not enough for you to make progress
with Kudu.

-Dan


On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <pa...@thesystech.com>
wrote:

One side-effect of neglecting to drop the unbounded range partition: I get
a stack trace when I try to scan:

F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
collection.end() Map key not found: ▒3
*** Check failure stack trace: ***
    @     0x7fca2a5506ad  (unknown)
    @     0x7fca2a55271c  (unknown)
    @     0x7fca2a550209  (unknown)
    @     0x7fca2a5530af  (unknown)
    @     0x7fca2a3de482  (unknown)
    @     0x7fca2a3dae70  (unknown)
    @     0x7fca2a3dc100  (unknown)
    @     0x7fca2a429a44  (unknown)
    @     0x7fca2a42ab47  (unknown)
    @     0x7fca2a42e94c  (unknown)
    @     0x7fca2a43081c  (unknown)
    @     0x7fca2a5a9a56  (unknown)
    @     0x7fca2a5aa948  (unknown)
    @     0x7fca2a41ac8b  (unknown)
    @     0x7fca2a4dcfc8  (unknown)
    @     0x7fca290d6182  start_thread
    @     0x7fca2980947d  clone
    @              (nil)  (unknown)


On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <pa...@thesystech.com>
wrote:

Is that 4TB per tablet server, regardless of how many tablets it has?

If I have 128GB of data per day, then each tablet server hits the
recommended limit after about a month.  To store 10 years of data, I would
need 120 tablet servers to avoid going over the limit.  Is that the best
solution or is there another alternative?

How many cores are recommended per tablet server?  If I typically only scan
one day of data at time, could a single core service multiple tablet
servers?


On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <pa...@thesystech.com>
wrote:

The test doesn't exactly reproduce what I did in my sample program.

I'm able to successfully drop the unbounded partition in both cases
(calling set_range_partition_columns only vs calling
set_range_partition_columns+add_hash_partitions).  However, if I omit the
call to DropRangePartition, then AddRangePartition succeeds in the first
case and fails in the second case.  I expect it to succeed in both cases or
fail in both cases.

I've attached a simple program which demonstrates.


On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org> wrote:

Hi Paul,

I can't reproduce the behavior you are describing, I always get a single
unbounded range partition when creating the table without specifying range
bounds or splits (regardless of hash partitioning). I searched and couldn't
find a unit test for this behavior, so I wrote one - you might compare your
code against my test. https://gerrit.cloudera.org/#/c/6153/

Thanks,
Dan

On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <pa...@thesystech.com>
wrote:

I can verify that dropping the unbounded range partition allows me to later
add bounded partitions.

If I only have range partitioning (by commenting out the call to
add_hash_partitions), adding a bounded partition succeeds, regardless of
whether I first drop the unbounded partition.  This seems surprising; why
the difference?

On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org> wrote:

Hi Paul,

I think the issue you are running into is that if you don't add a range
partition explicitly during table creation (by calling add_range_partition
or inserting a split with add_range_partition_split), Kudu will default to
creating 1 unbounded range partition.  So your two options are to add the
range partition during table creation time, or if you only know that
partition you want at a later time, you can drop the existing partition
(alterer->DropRangePartition with two empty rows), then add the range
partition.  Note that dropping the range partition will effectively
truncate the table.  This can be done with the same alterer in a single
transaction.  If you want to see a bunch of examples, you can check out
this unit test:
https://github.com/apache/kudu/blob/master/src/kudu/integration-tests/alter_table-test.cc#L1106
.

- Dan

On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <pa...@thesystech.com>
wrote:

I'm trying to create a table with one-column range-partitioned and another
column hash-partitioned.  Documentation for add_hash_partitions and
set_range_partition_columns suggest this should be possible ("Tables must
be created with either range, hash, or range and hash partitioning").

I have a schema with three INT64 columns ("time", "key", and "value").
When I create the table, I set up the partitioning:

(*table_creator)
  .table_name("test_table")
  .schema(&schema)
  .add_hash_partitions({"key"}, 2)
  .set_range_partition_columns({"time"})
  .num_replicas(1)
  .Create()

I later try to add a partition:

auto timesplit(KuduSchema & schema, std::int64_t t) {
  auto split = schema.NewRow();
  check_ok(split->SetInt64("time", t));
  return split;
}

alterer->AddRangePartition(
  timesplit(schema, date_start),
  timesplit(schema, next_date_start));

check_ok(alterer->Alter());

But I get an error "Invalid argument: New range partition conflicts with
existing range partition".

How are hash and range partitioning intended to be mixed?

Re: mixing range and hash partitioning

Posted by Paul Brannan <pa...@thesystech.com>.
One side-effect of neglecting to drop the unbounded range partition: I get
a stack trace when I try to scan:

F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
collection.end() Map key not found: ▒3
*** Check failure stack trace: ***
    @     0x7fca2a5506ad  (unknown)
    @     0x7fca2a55271c  (unknown)
    @     0x7fca2a550209  (unknown)
    @     0x7fca2a5530af  (unknown)
    @     0x7fca2a3de482  (unknown)
    @     0x7fca2a3dae70  (unknown)
    @     0x7fca2a3dc100  (unknown)
    @     0x7fca2a429a44  (unknown)
    @     0x7fca2a42ab47  (unknown)
    @     0x7fca2a42e94c  (unknown)
    @     0x7fca2a43081c  (unknown)
    @     0x7fca2a5a9a56  (unknown)
    @     0x7fca2a5aa948  (unknown)
    @     0x7fca2a41ac8b  (unknown)
    @     0x7fca2a4dcfc8  (unknown)
    @     0x7fca290d6182  start_thread
    @     0x7fca2980947d  clone
    @              (nil)  (unknown)


On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <pa...@thesystech.com>
wrote:

> Is that 4TB per tablet server, regardless of how many tablets it has?
>
> If I have 128GB of data per day, then each tablet server hits the
> recommended limit after about a month.  To store 10 years of data, I would
> need 120 tablet servers to avoid going over the limit.  Is that the best
> solution or is there another alternative?
>
> How many cores are recommended per tablet server?  If I typically only
> scan one day of data at time, could a single core service multiple tablet
> servers?
>
>
> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
> paul.brannan@thesystech.com> wrote:
>
>> The test doesn't exactly reproduce what I did in my sample program.
>>
>> I'm able to successfully drop the unbounded partition in both cases
>> (calling set_range_partition_columns only vs calling
>> set_range_partition_columns+add_hash_partitions).  However, if I omit
>> the call to DropRangePartition, then AddRangePartition succeeds in the
>> first case and fails in the second case.  I expect it to succeed in both
>> cases or fail in both cases.
>>
>> I've attached a simple program which demonstrates.
>>
>>
>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org>
>> wrote:
>>
>>> Hi Paul,
>>>
>>> I can't reproduce the behavior you are describing, I always get a single
>>> unbounded range partition when creating the table without specifying range
>>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>>> find a unit test for this behavior, so I wrote one - you might compare your
>>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>>
>>> Thanks,
>>> Dan
>>>
>>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>>> I can verify that dropping the unbounded range partition allows me to
>>>> later add bounded partitions.
>>>>
>>>> If I only have range partitioning (by commenting out the call to
>>>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>>>> whether I first drop the unbounded partition.  This seems surprising; why
>>>> the difference?
>>>>
>>>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Paul,
>>>>>
>>>>> I think the issue you are running into is that if you don't add a
>>>>> range partition explicitly during table creation (by calling
>>>>> add_range_partition or inserting a split with add_range_partition_split),
>>>>> Kudu will default to creating 1 unbounded range partition.  So your two
>>>>> options are to add the range partition during table creation time, or if
>>>>> you only know that partition you want at a later time, you can drop the
>>>>> existing partition (alterer->DropRangePartition with two empty rows), then
>>>>> add the range partition.  Note that dropping the range partition will
>>>>> effectively truncate the table.  This can be done with the same alterer in
>>>>> a single transaction.  If you want to see a bunch of examples, you can
>>>>> check out this unit test: https://github.com/apach
>>>>> e/kudu/blob/master/src/kudu/integration-tests/alter_table-te
>>>>> st.cc#L1106.
>>>>>
>>>>> - Dan
>>>>>
>>>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>>>> paul.brannan@thesystech.com> wrote:
>>>>>
>>>>>> I'm trying to create a table with one-column range-partitioned and
>>>>>> another column hash-partitioned.  Documentation for add_hash_partitions and
>>>>>> set_range_partition_columns suggest this should be possible ("Tables must
>>>>>> be created with either range, hash, or range and hash partitioning").
>>>>>>
>>>>>> I have a schema with three INT64 columns ("time", "key", and
>>>>>> "value").  When I create the table, I set up the partitioning:
>>>>>>
>>>>>> (*table_creator)
>>>>>>   .table_name("test_table")
>>>>>>   .schema(&schema)
>>>>>>   .add_hash_partitions({"key"}, 2)
>>>>>>   .set_range_partition_columns({"time"})
>>>>>>   .num_replicas(1)
>>>>>>   .Create()
>>>>>>
>>>>>> I later try to add a partition:
>>>>>>
>>>>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>>>>   auto split = schema.NewRow();
>>>>>>   check_ok(split->SetInt64("time", t));
>>>>>>   return split;
>>>>>> }
>>>>>>
>>>>>> alterer->AddRangePartition(
>>>>>>   timesplit(schema, date_start),
>>>>>>   timesplit(schema, next_date_start));
>>>>>>
>>>>>> check_ok(alterer->Alter());
>>>>>>
>>>>>> But I get an error "Invalid argument: New range partition conflicts
>>>>>> with existing range partition".
>>>>>>
>>>>>> How are hash and range partitioning intended to be mixed?
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: mixing range and hash partitioning

Posted by Paul Brannan <pa...@thesystech.com>.
Is that 4TB per tablet server, regardless of how many tablets it has?

If I have 128GB of data per day, then each tablet server hits the
recommended limit after about a month.  To store 10 years of data, I would
need 120 tablet servers to avoid going over the limit.  Is that the best
solution or is there another alternative?

How many cores are recommended per tablet server?  If I typically only scan
one day of data at time, could a single core service multiple tablet
servers?


On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <pa...@thesystech.com>
wrote:

> The test doesn't exactly reproduce what I did in my sample program.
>
> I'm able to successfully drop the unbounded partition in both cases
> (calling set_range_partition_columns only vs calling
> set_range_partition_columns+add_hash_partitions).  However, if I omit the
> call to DropRangePartition, then AddRangePartition succeeds in the first
> case and fails in the second case.  I expect it to succeed in both cases or
> fail in both cases.
>
> I've attached a simple program which demonstrates.
>
>
> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org>
> wrote:
>
>> Hi Paul,
>>
>> I can't reproduce the behavior you are describing, I always get a single
>> unbounded range partition when creating the table without specifying range
>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>> find a unit test for this behavior, so I wrote one - you might compare your
>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>
>> Thanks,
>> Dan
>>
>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>>> I can verify that dropping the unbounded range partition allows me to
>>> later add bounded partitions.
>>>
>>> If I only have range partitioning (by commenting out the call to
>>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>>> whether I first drop the unbounded partition.  This seems surprising; why
>>> the difference?
>>>
>>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
>>> wrote:
>>>
>>>> Hi Paul,
>>>>
>>>> I think the issue you are running into is that if you don't add a range
>>>> partition explicitly during table creation (by calling add_range_partition
>>>> or inserting a split with add_range_partition_split), Kudu will default to
>>>> creating 1 unbounded range partition.  So your two options are to add the
>>>> range partition during table creation time, or if you only know that
>>>> partition you want at a later time, you can drop the existing partition
>>>> (alterer->DropRangePartition with two empty rows), then add the range
>>>> partition.  Note that dropping the range partition will effectively
>>>> truncate the table.  This can be done with the same alterer in a single
>>>> transaction.  If you want to see a bunch of examples, you can check out
>>>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in
>>>> tegration-tests/alter_table-test.cc#L1106.
>>>>
>>>> - Dan
>>>>
>>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>>> paul.brannan@thesystech.com> wrote:
>>>>
>>>>> I'm trying to create a table with one-column range-partitioned and
>>>>> another column hash-partitioned.  Documentation for add_hash_partitions and
>>>>> set_range_partition_columns suggest this should be possible ("Tables must
>>>>> be created with either range, hash, or range and hash partitioning").
>>>>>
>>>>> I have a schema with three INT64 columns ("time", "key", and
>>>>> "value").  When I create the table, I set up the partitioning:
>>>>>
>>>>> (*table_creator)
>>>>>   .table_name("test_table")
>>>>>   .schema(&schema)
>>>>>   .add_hash_partitions({"key"}, 2)
>>>>>   .set_range_partition_columns({"time"})
>>>>>   .num_replicas(1)
>>>>>   .Create()
>>>>>
>>>>> I later try to add a partition:
>>>>>
>>>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>>>   auto split = schema.NewRow();
>>>>>   check_ok(split->SetInt64("time", t));
>>>>>   return split;
>>>>> }
>>>>>
>>>>> alterer->AddRangePartition(
>>>>>   timesplit(schema, date_start),
>>>>>   timesplit(schema, next_date_start));
>>>>>
>>>>> check_ok(alterer->Alter());
>>>>>
>>>>> But I get an error "Invalid argument: New range partition conflicts
>>>>> with existing range partition".
>>>>>
>>>>> How are hash and range partitioning intended to be mixed?
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: mixing range and hash partitioning

Posted by Paul Brannan <pa...@thesystech.com>.
The test doesn't exactly reproduce what I did in my sample program.

I'm able to successfully drop the unbounded partition in both cases
(calling set_range_partition_columns only vs calling
set_range_partition_columns+add_hash_partitions).  However, if I omit the
call to DropRangePartition, then AddRangePartition succeeds in the first
case and fails in the second case.  I expect it to succeed in both cases or
fail in both cases.

I've attached a simple program which demonstrates.


On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <da...@apache.org> wrote:

> Hi Paul,
>
> I can't reproduce the behavior you are describing, I always get a single
> unbounded range partition when creating the table without specifying range
> bounds or splits (regardless of hash partitioning). I searched and couldn't
> find a unit test for this behavior, so I wrote one - you might compare your
> code against my test. https://gerrit.cloudera.org/#/c/6153/
>
> Thanks,
> Dan
>
> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <paul.brannan@thesystech.com
> > wrote:
>
>> I can verify that dropping the unbounded range partition allows me to
>> later add bounded partitions.
>>
>> If I only have range partitioning (by commenting out the call to
>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>> whether I first drop the unbounded partition.  This seems surprising; why
>> the difference?
>>
>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
>> wrote:
>>
>>> Hi Paul,
>>>
>>> I think the issue you are running into is that if you don't add a range
>>> partition explicitly during table creation (by calling add_range_partition
>>> or inserting a split with add_range_partition_split), Kudu will default to
>>> creating 1 unbounded range partition.  So your two options are to add the
>>> range partition during table creation time, or if you only know that
>>> partition you want at a later time, you can drop the existing partition
>>> (alterer->DropRangePartition with two empty rows), then add the range
>>> partition.  Note that dropping the range partition will effectively
>>> truncate the table.  This can be done with the same alterer in a single
>>> transaction.  If you want to see a bunch of examples, you can check out
>>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in
>>> tegration-tests/alter_table-test.cc#L1106.
>>>
>>> - Dan
>>>
>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>>> I'm trying to create a table with one-column range-partitioned and
>>>> another column hash-partitioned.  Documentation for add_hash_partitions and
>>>> set_range_partition_columns suggest this should be possible ("Tables must
>>>> be created with either range, hash, or range and hash partitioning").
>>>>
>>>> I have a schema with three INT64 columns ("time", "key", and "value").
>>>> When I create the table, I set up the partitioning:
>>>>
>>>> (*table_creator)
>>>>   .table_name("test_table")
>>>>   .schema(&schema)
>>>>   .add_hash_partitions({"key"}, 2)
>>>>   .set_range_partition_columns({"time"})
>>>>   .num_replicas(1)
>>>>   .Create()
>>>>
>>>> I later try to add a partition:
>>>>
>>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>>   auto split = schema.NewRow();
>>>>   check_ok(split->SetInt64("time", t));
>>>>   return split;
>>>> }
>>>>
>>>> alterer->AddRangePartition(
>>>>   timesplit(schema, date_start),
>>>>   timesplit(schema, next_date_start));
>>>>
>>>> check_ok(alterer->Alter());
>>>>
>>>> But I get an error "Invalid argument: New range partition conflicts
>>>> with existing range partition".
>>>>
>>>> How are hash and range partitioning intended to be mixed?
>>>>
>>>>
>>>
>>
>

Re: mixing range and hash partitioning

Posted by Dan Burkert <da...@apache.org>.
Hi Paul,

I can't reproduce the behavior you are describing, I always get a single
unbounded range partition when creating the table without specifying range
bounds or splits (regardless of hash partitioning). I searched and couldn't
find a unit test for this behavior, so I wrote one - you might compare your
code against my test. https://gerrit.cloudera.org/#/c/6153/

Thanks,
Dan

On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <pa...@thesystech.com>
wrote:

> I can verify that dropping the unbounded range partition allows me to
> later add bounded partitions.
>
> If I only have range partitioning (by commenting out the call to
> add_hash_partitions), adding a bounded partition succeeds, regardless of
> whether I first drop the unbounded partition.  This seems surprising; why
> the difference?
>
> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org>
> wrote:
>
>> Hi Paul,
>>
>> I think the issue you are running into is that if you don't add a range
>> partition explicitly during table creation (by calling add_range_partition
>> or inserting a split with add_range_partition_split), Kudu will default to
>> creating 1 unbounded range partition.  So your two options are to add the
>> range partition during table creation time, or if you only know that
>> partition you want at a later time, you can drop the existing partition
>> (alterer->DropRangePartition with two empty rows), then add the range
>> partition.  Note that dropping the range partition will effectively
>> truncate the table.  This can be done with the same alterer in a single
>> transaction.  If you want to see a bunch of examples, you can check out
>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/
>> integration-tests/alter_table-test.cc#L1106.
>>
>> - Dan
>>
>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>>> I'm trying to create a table with one-column range-partitioned and
>>> another column hash-partitioned.  Documentation for add_hash_partitions and
>>> set_range_partition_columns suggest this should be possible ("Tables must
>>> be created with either range, hash, or range and hash partitioning").
>>>
>>> I have a schema with three INT64 columns ("time", "key", and "value").
>>> When I create the table, I set up the partitioning:
>>>
>>> (*table_creator)
>>>   .table_name("test_table")
>>>   .schema(&schema)
>>>   .add_hash_partitions({"key"}, 2)
>>>   .set_range_partition_columns({"time"})
>>>   .num_replicas(1)
>>>   .Create()
>>>
>>> I later try to add a partition:
>>>
>>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>>   auto split = schema.NewRow();
>>>   check_ok(split->SetInt64("time", t));
>>>   return split;
>>> }
>>>
>>> alterer->AddRangePartition(
>>>   timesplit(schema, date_start),
>>>   timesplit(schema, next_date_start));
>>>
>>> check_ok(alterer->Alter());
>>>
>>> But I get an error "Invalid argument: New range partition conflicts with
>>> existing range partition".
>>>
>>> How are hash and range partitioning intended to be mixed?
>>>
>>>
>>
>

Re: mixing range and hash partitioning

Posted by Paul Brannan <pa...@thesystech.com>.
I can verify that dropping the unbounded range partition allows me to later
add bounded partitions.

If I only have range partitioning (by commenting out the call to
add_hash_partitions), adding a bounded partition succeeds, regardless of
whether I first drop the unbounded partition.  This seems surprising; why
the difference?

On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <da...@apache.org> wrote:

> Hi Paul,
>
> I think the issue you are running into is that if you don't add a range
> partition explicitly during table creation (by calling add_range_partition
> or inserting a split with add_range_partition_split), Kudu will default to
> creating 1 unbounded range partition.  So your two options are to add the
> range partition during table creation time, or if you only know that
> partition you want at a later time, you can drop the existing partition
> (alterer->DropRangePartition with two empty rows), then add the range
> partition.  Note that dropping the range partition will effectively
> truncate the table.  This can be done with the same alterer in a single
> transaction.  If you want to see a bunch of examples, you can check out
> this unit test: https://github.com/apache/kudu/blob/master/src/
> kudu/integration-tests/alter_table-test.cc#L1106.
>
> - Dan
>
> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
> paul.brannan@thesystech.com> wrote:
>
>> I'm trying to create a table with one-column range-partitioned and
>> another column hash-partitioned.  Documentation for add_hash_partitions and
>> set_range_partition_columns suggest this should be possible ("Tables must
>> be created with either range, hash, or range and hash partitioning").
>>
>> I have a schema with three INT64 columns ("time", "key", and "value").
>> When I create the table, I set up the partitioning:
>>
>> (*table_creator)
>>   .table_name("test_table")
>>   .schema(&schema)
>>   .add_hash_partitions({"key"}, 2)
>>   .set_range_partition_columns({"time"})
>>   .num_replicas(1)
>>   .Create()
>>
>> I later try to add a partition:
>>
>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>   auto split = schema.NewRow();
>>   check_ok(split->SetInt64("time", t));
>>   return split;
>> }
>>
>> alterer->AddRangePartition(
>>   timesplit(schema, date_start),
>>   timesplit(schema, next_date_start));
>>
>> check_ok(alterer->Alter());
>>
>> But I get an error "Invalid argument: New range partition conflicts with
>> existing range partition".
>>
>> How are hash and range partitioning intended to be mixed?
>>
>>
>

Re: mixing range and hash partitioning

Posted by Dan Burkert <da...@apache.org>.
Hi Paul,

I think the issue you are running into is that if you don't add a range
partition explicitly during table creation (by calling add_range_partition
or inserting a split with add_range_partition_split), Kudu will default to
creating 1 unbounded range partition.  So your two options are to add the
range partition during table creation time, or if you only know that
partition you want at a later time, you can drop the existing partition
(alterer->DropRangePartition with two empty rows), then add the range
partition.  Note that dropping the range partition will effectively
truncate the table.  This can be done with the same alterer in a single
transaction.  If you want to see a bunch of examples, you can check out
this unit test:
https://github.com/apache/kudu/blob/master/src/kudu/integration-tests/alter_table-test.cc#L1106
.

- Dan

On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <pa...@thesystech.com>
wrote:

> I'm trying to create a table with one-column range-partitioned and another
> column hash-partitioned.  Documentation for add_hash_partitions and
> set_range_partition_columns suggest this should be possible ("Tables must
> be created with either range, hash, or range and hash partitioning").
>
> I have a schema with three INT64 columns ("time", "key", and "value").
> When I create the table, I set up the partitioning:
>
> (*table_creator)
>   .table_name("test_table")
>   .schema(&schema)
>   .add_hash_partitions({"key"}, 2)
>   .set_range_partition_columns({"time"})
>   .num_replicas(1)
>   .Create()
>
> I later try to add a partition:
>
> auto timesplit(KuduSchema & schema, std::int64_t t) {
>   auto split = schema.NewRow();
>   check_ok(split->SetInt64("time", t));
>   return split;
> }
>
> alterer->AddRangePartition(
>   timesplit(schema, date_start),
>   timesplit(schema, next_date_start));
>
> check_ok(alterer->Alter());
>
> But I get an error "Invalid argument: New range partition conflicts with
> existing range partition".
>
> How are hash and range partitioning intended to be mixed?
>
>