You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by James Taylor <ja...@apache.org> on 2017/12/01 07:14:42 UTC

Re: Help: setting hbase row timestamp in phoenix upserts ?

The only way I can think of accomplishing this is by using the raw HBase
APIs to write the data but using our utilities to write it in a Phoenix
compatible manner. For example, you could run an UPSERT VALUES statement,
use the PhoenixRuntime.getUncommittedDataIterator()method to get the Cells
that would have been written, update the Cell timestamp as needed, and do
an htable.batch() call to commit them.

On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado <pe...@gmail.com> wrote:

> Hi,
>
> I'm looking for a little bit of help trying to get some light over
> ROW_TIMESTAMP.
>
> Some background over the problem ( simplified ) : I'm working in a project
> that needs to create a "enriched" replica of a RBDMS table based on a
> stream of cdc changes off that table.
>
> Each cdc event contains the timestamp of the change plus all the column
> values 'before' and 'after' the change . And each event is pushed to a
> kafka topic.  Because of certain "non-negotiable" design decisions kafka
> guarantees delivering each event at least once, but doesn't guarantee
> ordering for changes over the same row in the source table.
>
> The final step of the kafka-based flow is sinking the information into
> HBase/Phoenix.
>
> As I cannot get in order delivery guarantee from Kafka I need to use the
> cdc event timestamp to ensure that HBase keeps the latest change over a row.
>
> This fits perfectly well with an HBase table design with VERSIONS=1 and
> using the source event timestamp as HBase row/cells timestamp
>
> The thing is that I cannot find a way to define the value of the HBase
> cell from a Phoenix upsert.
>
> I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm
> devastated now ) that the ROW_TIMESTAMP columns store the date in both
> hbase's cell timestamp and in the primary key, meaning that I cannot
> leverage that functionality to keep only the latest change.
>
> Is there a way of defining hbase's row timestamp when doing the UPSERT -
> even by setting it through some obscure hidden jdbc property - ?
>
> I want to avoid by all means doing a checkAndPut as the volume of changes
> is going to be quite bug.
>
>
>
> --
> Un saludo.
> Pedro Boado.
>

Re: Help: setting hbase row timestamp in phoenix upserts ?

Posted by Pedro Boado <pe...@gmail.com>.

I hadn't seen this Jira. Yes that is essentially it.

On Wed, 11 Jul 2018, 15:49 James Taylor, <ja...@apache.org> wrote:

> I think the answer is PHOENIX-4552. There's an outline of the work involved
> on the JIRA. I think passing through data like that for hints would get
> unwieldy quickly.
>
> On Tue, Jul 10, 2018 at 1:31 PM, Pedro Boado <pe...@gmail.com>
> wrote:
>
> > Hi guys, just a refloat from the @user list.
> >
> > May it be of interest having this functionality for defining HBase
> > timestamps in a per row basis as part of an UPSERT VALUES?
> >
> > For a table defined as
> > CREATE TABLE T0001 ( k VARCHAR PRIMARY KEY, v INTEGER)
> >
> > Allow a hint to extract and override hbase put timestamp through a
> > "virtual" column?
> > UPSERT /*+ ROW_TIMESTAMP(ts) */ INTO T0001(k,v,ts) VALUES
> > ('a',1, 1531253959043)
> >
> > If the column existed and had appropiate type it would also be populated
> > with the same value.
> >
> > Thanks,
> > Pedro.
> >
> >
> > On Fri, 1 Dec 2017 at 07:15, James Taylor <ja...@apache.org>
> wrote:
> >
> > > The only way I can think of accomplishing this is by using the raw
> HBase
> > > APIs to write the data but using our utilities to write it in a Phoenix
> > > compatible manner. For example, you could run an UPSERT VALUES
> statement,
> > > use the PhoenixRuntime.getUncommittedDataIterator()method to get the
> > Cells
> > > that would have been written, update the Cell timestamp as needed, and
> do
> > > an htable.batch() call to commit them.
> > >
> > > On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado <pe...@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm looking for a little bit of help trying to get some light over
> > >> ROW_TIMESTAMP.
> > >>
> > >> Some background over the problem ( simplified ) : I'm working in a
> > >> project that needs to create a "enriched" replica of a RBDMS table
> > based on
> > >> a stream of cdc changes off that table.
> > >>
> > >> Each cdc event contains the timestamp of the change plus all the
> column
> > >> values 'before' and 'after' the change . And each event is pushed to a
> > >> kafka topic.  Because of certain "non-negotiable" design decisions
> kafka
> > >> guarantees delivering each event at least once, but doesn't guarantee
> > >> ordering for changes over the same row in the source table.
> > >>
> > >> The final step of the kafka-based flow is sinking the information into
> > >> HBase/Phoenix.
> > >>
> > >> As I cannot get in order delivery guarantee from Kafka I need to use
> the
> > >> cdc event timestamp to ensure that HBase keeps the latest change over
> a
> > row.
> > >>
> > >> This fits perfectly well with an HBase table design with VERSIONS=1
> and
> > >> using the source event timestamp as HBase row/cells timestamp
> > >>
> > >> The thing is that I cannot find a way to define the value of the HBase
> > >> cell from a Phoenix upsert.
> > >>
> > >> I came across the ROW_TIMESTAMP functionality, but I've just found (
> I'm
> > >> devastated now ) that the ROW_TIMESTAMP columns store the date in both
> > >> hbase's cell timestamp and in the primary key, meaning that I cannot
> > >> leverage that functionality to keep only the latest change.
> > >>
> > >> Is there a way of defining hbase's row timestamp when doing the
> UPSERT -
> > >> even by setting it through some obscure hidden jdbc property - ?
> > >>
> > >> I want to avoid by all means doing a checkAndPut as the volume of
> > changes
> > >> is going to be quite bug.
> > >>
> > >>
> > >>
> > >> --
> > >> Un saludo.
> > >> Pedro Boado.
> > >>
> > >
> >
> > --
> > Un saludo.
> > Pedro Boado.
> >
>

Re: Help: setting hbase row timestamp in phoenix upserts ?

Posted by James Taylor <ja...@apache.org>.

I think the answer is PHOENIX-4552. There's an outline of the work involved
on the JIRA. I think passing through data like that for hints would get
unwieldy quickly.

On Tue, Jul 10, 2018 at 1:31 PM, Pedro Boado <pe...@gmail.com> wrote:

> Hi guys, just a refloat from the @user list.
>
> May it be of interest having this functionality for defining HBase
> timestamps in a per row basis as part of an UPSERT VALUES?
>
> For a table defined as
> CREATE TABLE T0001 ( k VARCHAR PRIMARY KEY, v INTEGER)
>
> Allow a hint to extract and override hbase put timestamp through a
> "virtual" column?
> UPSERT /*+ ROW_TIMESTAMP(ts) */ INTO T0001(k,v,ts) VALUES
> ('a',1, 1531253959043)
>
> If the column existed and had appropiate type it would also be populated
> with the same value.
>
> Thanks,
> Pedro.
>
>
> On Fri, 1 Dec 2017 at 07:15, James Taylor <ja...@apache.org> wrote:
>
> > The only way I can think of accomplishing this is by using the raw HBase
> > APIs to write the data but using our utilities to write it in a Phoenix
> > compatible manner. For example, you could run an UPSERT VALUES statement,
> > use the PhoenixRuntime.getUncommittedDataIterator()method to get the
> Cells
> > that would have been written, update the Cell timestamp as needed, and do
> > an htable.batch() call to commit them.
> >
> > On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado <pe...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I'm looking for a little bit of help trying to get some light over
> >> ROW_TIMESTAMP.
> >>
> >> Some background over the problem ( simplified ) : I'm working in a
> >> project that needs to create a "enriched" replica of a RBDMS table
> based on
> >> a stream of cdc changes off that table.
> >>
> >> Each cdc event contains the timestamp of the change plus all the column
> >> values 'before' and 'after' the change . And each event is pushed to a
> >> kafka topic.  Because of certain "non-negotiable" design decisions kafka
> >> guarantees delivering each event at least once, but doesn't guarantee
> >> ordering for changes over the same row in the source table.
> >>
> >> The final step of the kafka-based flow is sinking the information into
> >> HBase/Phoenix.
> >>
> >> As I cannot get in order delivery guarantee from Kafka I need to use the
> >> cdc event timestamp to ensure that HBase keeps the latest change over a
> row.
> >>
> >> This fits perfectly well with an HBase table design with VERSIONS=1 and
> >> using the source event timestamp as HBase row/cells timestamp
> >>
> >> The thing is that I cannot find a way to define the value of the HBase
> >> cell from a Phoenix upsert.
> >>
> >> I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm
> >> devastated now ) that the ROW_TIMESTAMP columns store the date in both
> >> hbase's cell timestamp and in the primary key, meaning that I cannot
> >> leverage that functionality to keep only the latest change.
> >>
> >> Is there a way of defining hbase's row timestamp when doing the UPSERT -
> >> even by setting it through some obscure hidden jdbc property - ?
> >>
> >> I want to avoid by all means doing a checkAndPut as the volume of
> changes
> >> is going to be quite bug.
> >>
> >>
> >>
> >> --
> >> Un saludo.
> >> Pedro Boado.
> >>
> >
>
> --
> Un saludo.
> Pedro Boado.
>

Re: Help: setting hbase row timestamp in phoenix upserts ?

Posted by Pedro Boado <pe...@gmail.com>.

Hi guys, just a refloat from the @user list.

May it be of interest having this functionality for defining HBase
timestamps in a per row basis as part of an UPSERT VALUES?

For a table defined as
CREATE TABLE T0001 ( k VARCHAR PRIMARY KEY, v INTEGER)

Allow a hint to extract and override hbase put timestamp through a
"virtual" column?
UPSERT /*+ ROW_TIMESTAMP(ts) */ INTO T0001(k,v,ts) VALUES
('a',1, 1531253959043)

If the column existed and had appropiate type it would also be populated
with the same value.

Thanks,
Pedro.


On Fri, 1 Dec 2017 at 07:15, James Taylor <ja...@apache.org> wrote:

> The only way I can think of accomplishing this is by using the raw HBase
> APIs to write the data but using our utilities to write it in a Phoenix
> compatible manner. For example, you could run an UPSERT VALUES statement,
> use the PhoenixRuntime.getUncommittedDataIterator()method to get the Cells
> that would have been written, update the Cell timestamp as needed, and do
> an htable.batch() call to commit them.
>
> On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado <pe...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm looking for a little bit of help trying to get some light over
>> ROW_TIMESTAMP.
>>
>> Some background over the problem ( simplified ) : I'm working in a
>> project that needs to create a "enriched" replica of a RBDMS table based on
>> a stream of cdc changes off that table.
>>
>> Each cdc event contains the timestamp of the change plus all the column
>> values 'before' and 'after' the change . And each event is pushed to a
>> kafka topic.  Because of certain "non-negotiable" design decisions kafka
>> guarantees delivering each event at least once, but doesn't guarantee
>> ordering for changes over the same row in the source table.
>>
>> The final step of the kafka-based flow is sinking the information into
>> HBase/Phoenix.
>>
>> As I cannot get in order delivery guarantee from Kafka I need to use the
>> cdc event timestamp to ensure that HBase keeps the latest change over a row.
>>
>> This fits perfectly well with an HBase table design with VERSIONS=1 and
>> using the source event timestamp as HBase row/cells timestamp
>>
>> The thing is that I cannot find a way to define the value of the HBase
>> cell from a Phoenix upsert.
>>
>> I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm
>> devastated now ) that the ROW_TIMESTAMP columns store the date in both
>> hbase's cell timestamp and in the primary key, meaning that I cannot
>> leverage that functionality to keep only the latest change.
>>
>> Is there a way of defining hbase's row timestamp when doing the UPSERT -
>> even by setting it through some obscure hidden jdbc property - ?
>>
>> I want to avoid by all means doing a checkAndPut as the volume of changes
>> is going to be quite bug.
>>
>>
>>
>> --
>> Un saludo.
>> Pedro Boado.
>>
>

-- 
Un saludo.
Pedro Boado.