You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Ivan Weng <iw...@pivotal.io> on 2016/09/24 23:33:02 UTC

Re: Handling updates and deletes in HAWQ

Hi all,

It looks like there are pressing needs for the update/delete support in the
HAWQ and the good news is that this feature is already in the HAWQ roadmap.
+1 for vote this JIRA to get higher priority, https://issues.
apache.org/jira/plugins/servlet/mobile#issue/HAWQ-304

cc  developers,

For update/delete support, maybe we need to think about different scenarios
and minimize the impact of select/insert operation. The scenarios includes
single row update, few columns update for multiple rows, batch update most
of the table, etc. Hopefully we'll have a quick prototype for it.


Regards,
Ivan



On Sat, Sep 24, 2016 at 12:58 AM, Michael André Pearce <
michael.andre.pearce@me.com> wrote:

> The one we use for the versioning data and "compaction" is attached to the
> jira I sent as a sample work around we shared, whilst we await true update
> delete feature.
>
> Sent from my iPhone
>
> On 23 Sep 2016, at 17:17, mujtaba.qadri@gmail.com wrote:
>
> Thanks you for the inputs, anyone has a sample script that I can reuse?
>
> Regards,
> Muji
>
> On Sep 23, 2016, at 11:27 AM, Luis Macedo <lm...@pivotal.io> wrote:
>
> Hi all,
>
> If you have a daily batch that updates the table you can rebuild the table
> in a mirror table and then swap with "alter table ... exchange
> partition...".
>
> The problem with this approach is that you will need to get an exclusive
> lock on the table and any running query will block the operation.
>
> One possibility is to hide the underlying table below a view so the we can
> try to minimize the lock problem. I am not sure if that will relive the
> problem but its a fair try. Maybe someone can comment.
>
>
> Rgds,
>
>
> *Luis Macedo | Sr Platform Architect | **Pivotal Inc *
>
> *Mobile:* +55 11 97616-6438
> *Pivotal.io <http://pivotal.io>*
> *Take care of the customers and the rest takes care of itself*
>
> 2016-09-23 8:37 GMT-03:00 Michael André Pearce <
> michael.andre.pearce@me.com>:
>
>> Ps. Please up vote the jira to help it get priority
>>
>> Sent from my iPhone
>>
>> On 23 Sep 2016, at 12:30, Michael André Pearce <
>> michael.andre.pearce@me.com> wrote:
>>
>> Hi
>>
>> In my company we faced similar issue.
>>
>> I noted on this jira how we're handling this.
>>
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/HAWQ-304
>>
>> Essentially we can have updates as we version the rows and on select use
>> the data of the latest version this allows us during the day to have
>> queries and data changes. Then over night or at weekend depending on
>> frequency of updates to keep performance we "compact" it which at this time
>> we cannot have any selects but we can manage this.
>>
>> Here I guess you can replace our compaction/clean up with simple ctas
>> recreate strategy with your partition exchange strategy.
>>
>> I hope this helps. And would be better if we had native update / delete
>> support as per jira.
>>
>> Cheers
>> Mike
>>
>> Sent from my iPhone
>>
>> On 23 Sep 2016, at 09:47, mujtaba.qadri@gmail.com wrote:
>>
>> Hi Guys,
>>
>> I'm curious to know about any experience simulating updates/deletes to
>> rows using partition exchange strategy.
>>
>> Small lookup tables could be rebuilt by creating staging tables, but
>> since select queries might be running how do we swap out the new table with
>> the old one without select queries failing.
>>
>> Any simple way to do this, if anyone has implemented this workaround
>> please share insights.
>>
>> Regards,
>> Muji
>>
>>
>

Re: Handling updates and deletes in HAWQ

Posted by Ming Li <ml...@pivotal.io>.
Thanks for your contributions.

@mujtaba, If the new version rows are correlate to date, then we can create
partition table for emulating UPDATE, however most scenarios are not in
this case. Moreover the other problem is at present we have partition
performance problem if the number (segment node number*default virtual
segment number*sub partition number * file numbers for each subpartition
table) is too big, it will greatly slow down SELECT query because open such
many files in so many processes need a lot of time. Maybe we need to think
a way to speed up this case.

@Michael, thanks for sharing your workaround with the community. I don't
know the history why HAWQ doesn't support LOCK TABLE, but I think maybe it
is hard to control them in multiple node. Maybe we should open another mail
thread for more deep discussion separately.

Also I want to know the performance downgrade to select max version for
each row. Is there any test benchmark for UPDATE/DELETE



On Sun, Sep 25, 2016 at 7:33 AM, Ivan Weng <iw...@pivotal.io> wrote:

> Hi all,
>
> It looks like there are pressing needs for the update/delete support in the
> HAWQ and the good news is that this feature is already in the HAWQ roadmap.
> +1 for vote this JIRA to get higher priority, https://issues.
> apache.org/jira/plugins/servlet/mobile#issue/HAWQ-304
>
> cc  developers,
>
> For update/delete support, maybe we need to think about different scenarios
> and minimize the impact of select/insert operation. The scenarios includes
> single row update, few columns update for multiple rows, batch update most
> of the table, etc. Hopefully we'll have a quick prototype for it.
>
>
> Regards,
> Ivan
>
>
>
> On Sat, Sep 24, 2016 at 12:58 AM, Michael André Pearce <
> michael.andre.pearce@me.com> wrote:
>
> > The one we use for the versioning data and "compaction" is attached to
> the
> > jira I sent as a sample work around we shared, whilst we await true
> update
> > delete feature.
> >
> > Sent from my iPhone
> >
> > On 23 Sep 2016, at 17:17, mujtaba.qadri@gmail.com wrote:
> >
> > Thanks you for the inputs, anyone has a sample script that I can reuse?
> >
> > Regards,
> > Muji
> >
> > On Sep 23, 2016, at 11:27 AM, Luis Macedo <lm...@pivotal.io> wrote:
> >
> > Hi all,
> >
> > If you have a daily batch that updates the table you can rebuild the
> table
> > in a mirror table and then swap with "alter table ... exchange
> > partition...".
> >
> > The problem with this approach is that you will need to get an exclusive
> > lock on the table and any running query will block the operation.
> >
> > One possibility is to hide the underlying table below a view so the we
> can
> > try to minimize the lock problem. I am not sure if that will relive the
> > problem but its a fair try. Maybe someone can comment.
> >
> >
> > Rgds,
> >
> >
> > *Luis Macedo | Sr Platform Architect | **Pivotal Inc *
> >
> > *Mobile:* +55 11 97616-6438
> > *Pivotal.io <http://pivotal.io>*
> > *Take care of the customers and the rest takes care of itself*
> >
> > 2016-09-23 8:37 GMT-03:00 Michael André Pearce <
> > michael.andre.pearce@me.com>:
> >
> >> Ps. Please up vote the jira to help it get priority
> >>
> >> Sent from my iPhone
> >>
> >> On 23 Sep 2016, at 12:30, Michael André Pearce <
> >> michael.andre.pearce@me.com> wrote:
> >>
> >> Hi
> >>
> >> In my company we faced similar issue.
> >>
> >> I noted on this jira how we're handling this.
> >>
> >> https://issues.apache.org/jira/plugins/servlet/mobile#issue/HAWQ-304
> >>
> >> Essentially we can have updates as we version the rows and on select use
> >> the data of the latest version this allows us during the day to have
> >> queries and data changes. Then over night or at weekend depending on
> >> frequency of updates to keep performance we "compact" it which at this
> time
> >> we cannot have any selects but we can manage this.
> >>
> >> Here I guess you can replace our compaction/clean up with simple ctas
> >> recreate strategy with your partition exchange strategy.
> >>
> >> I hope this helps. And would be better if we had native update / delete
> >> support as per jira.
> >>
> >> Cheers
> >> Mike
> >>
> >> Sent from my iPhone
> >>
> >> On 23 Sep 2016, at 09:47, mujtaba.qadri@gmail.com wrote:
> >>
> >> Hi Guys,
> >>
> >> I'm curious to know about any experience simulating updates/deletes to
> >> rows using partition exchange strategy.
> >>
> >> Small lookup tables could be rebuilt by creating staging tables, but
> >> since select queries might be running how do we swap out the new table
> with
> >> the old one without select queries failing.
> >>
> >> Any simple way to do this, if anyone has implemented this workaround
> >> please share insights.
> >>
> >> Regards,
> >> Muji
> >>
> >>
> >
>

Re: Handling updates and deletes in HAWQ

Posted by Ming Li <ml...@pivotal.io>.
Thanks for your contributions.

@mujtaba, If the new version rows are correlate to date, then we can create
partition table for emulating UPDATE, however most scenarios are not in
this case. Moreover the other problem is at present we have partition
performance problem if the number (segment node number*default virtual
segment number*sub partition number * file numbers for each subpartition
table) is too big, it will greatly slow down SELECT query because open such
many files in so many processes need a lot of time. Maybe we need to think
a way to speed up this case.

@Michael, thanks for sharing your workaround with the community. I don't
know the history why HAWQ doesn't support LOCK TABLE, but I think maybe it
is hard to control them in multiple node. Maybe we should open another mail
thread for more deep discussion separately.

Also I want to know the performance downgrade to select max version for
each row. Is there any test benchmark for UPDATE/DELETE



On Sun, Sep 25, 2016 at 7:33 AM, Ivan Weng <iw...@pivotal.io> wrote:

> Hi all,
>
> It looks like there are pressing needs for the update/delete support in the
> HAWQ and the good news is that this feature is already in the HAWQ roadmap.
> +1 for vote this JIRA to get higher priority, https://issues.
> apache.org/jira/plugins/servlet/mobile#issue/HAWQ-304
>
> cc  developers,
>
> For update/delete support, maybe we need to think about different scenarios
> and minimize the impact of select/insert operation. The scenarios includes
> single row update, few columns update for multiple rows, batch update most
> of the table, etc. Hopefully we'll have a quick prototype for it.
>
>
> Regards,
> Ivan
>
>
>
> On Sat, Sep 24, 2016 at 12:58 AM, Michael André Pearce <
> michael.andre.pearce@me.com> wrote:
>
> > The one we use for the versioning data and "compaction" is attached to
> the
> > jira I sent as a sample work around we shared, whilst we await true
> update
> > delete feature.
> >
> > Sent from my iPhone
> >
> > On 23 Sep 2016, at 17:17, mujtaba.qadri@gmail.com wrote:
> >
> > Thanks you for the inputs, anyone has a sample script that I can reuse?
> >
> > Regards,
> > Muji
> >
> > On Sep 23, 2016, at 11:27 AM, Luis Macedo <lm...@pivotal.io> wrote:
> >
> > Hi all,
> >
> > If you have a daily batch that updates the table you can rebuild the
> table
> > in a mirror table and then swap with "alter table ... exchange
> > partition...".
> >
> > The problem with this approach is that you will need to get an exclusive
> > lock on the table and any running query will block the operation.
> >
> > One possibility is to hide the underlying table below a view so the we
> can
> > try to minimize the lock problem. I am not sure if that will relive the
> > problem but its a fair try. Maybe someone can comment.
> >
> >
> > Rgds,
> >
> >
> > *Luis Macedo | Sr Platform Architect | **Pivotal Inc *
> >
> > *Mobile:* +55 11 97616-6438
> > *Pivotal.io <http://pivotal.io>*
> > *Take care of the customers and the rest takes care of itself*
> >
> > 2016-09-23 8:37 GMT-03:00 Michael André Pearce <
> > michael.andre.pearce@me.com>:
> >
> >> Ps. Please up vote the jira to help it get priority
> >>
> >> Sent from my iPhone
> >>
> >> On 23 Sep 2016, at 12:30, Michael André Pearce <
> >> michael.andre.pearce@me.com> wrote:
> >>
> >> Hi
> >>
> >> In my company we faced similar issue.
> >>
> >> I noted on this jira how we're handling this.
> >>
> >> https://issues.apache.org/jira/plugins/servlet/mobile#issue/HAWQ-304
> >>
> >> Essentially we can have updates as we version the rows and on select use
> >> the data of the latest version this allows us during the day to have
> >> queries and data changes. Then over night or at weekend depending on
> >> frequency of updates to keep performance we "compact" it which at this
> time
> >> we cannot have any selects but we can manage this.
> >>
> >> Here I guess you can replace our compaction/clean up with simple ctas
> >> recreate strategy with your partition exchange strategy.
> >>
> >> I hope this helps. And would be better if we had native update / delete
> >> support as per jira.
> >>
> >> Cheers
> >> Mike
> >>
> >> Sent from my iPhone
> >>
> >> On 23 Sep 2016, at 09:47, mujtaba.qadri@gmail.com wrote:
> >>
> >> Hi Guys,
> >>
> >> I'm curious to know about any experience simulating updates/deletes to
> >> rows using partition exchange strategy.
> >>
> >> Small lookup tables could be rebuilt by creating staging tables, but
> >> since select queries might be running how do we swap out the new table
> with
> >> the old one without select queries failing.
> >>
> >> Any simple way to do this, if anyone has implemented this workaround
> >> please share insights.
> >>
> >> Regards,
> >> Muji
> >>
> >>
> >
>