You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Yang Zhang <zh...@gmail.com> on 2016/10/13 10:12:56 UTC

Can phoenix support HBase's TimeStamp?

Hello everyone

I saw that we can create a Phoenix table from an exist HBase table,(for
detail
<https://phoenix.apache.org/faq.html#How_I_map_Phoenix_table_to_an_existing_HBase_table>
)
My question is whether Phoenix can supprort the history version of my row?

I am trying to  use Phoenix to store some info which have a lot of common
columns,
such as a table "T1 ( c1, c2, c3, c4 )", many rows share the same
c1,c2,c3,and the variable column is c4,
Using HBase we can put  'T1',  'key1', ' f:c4', 'new value', timestamp,

And i can get previous version of this row,They all share the same c1,c2,c3
whice HBase only store once.

Whether phoenix support to query history version of my row?

I got this jira link <https://issues.apache.org/jira/browse/PHOENIX-590>  ,
This is same as my question.

Hadoop is using for big data, and mlutiple version can help us reduce our
date that unnecessary
I think phoenix should support this feature too.

If Phoenix shouldn't support multiple version, please tell me the reason.


Anyway thansks for your help, First

Re:Re: Can phoenix support HBase's TimeStamp?

Posted by William <yh...@163.com>.

Agreed. It is really difficult to support per cell timestamp control and multi-version queries in a standard and general way. What Phoenix supports today is a very well compromised solution.

If per cell timestamp control and multi-version queries are inescapable requirements, you can try my solution. Otherwise, I strongly recommend you using the existing solution that James provided.

Thanks,
William

At 2016-10-17 13:42:07, "James Taylor" <ja...@apache.org> wrote:
FYI, a couple of timestamp related features that Phoenix supports today include;
- specify/filter on timestamp of a row: http://phoenix.apache.org/rowtimestamp.html
- query as of a past timestamp: http://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API

These were determined to be a good fit with SQL and surface some of the power of HBase. Exposing per cell timestamp control and multi-version queries are difficult in the SQL model, but we're open to suggestions if it can be done in a standard, general way.

Thanks,
James

On Sunday, October 16, 2016, William <yh...@163.com> wrote:

Hi, Zhang Yang,
I've implemented the multi-version feature in my own Phoenix branch. But this implementation is supposed to be working in a very very limited scenario because there were so many things to think about when designing it. Here are some primary problems that we must solve:
* add new syntax to support select with timestamps, we should support select only one version and multi version within a range and the number of versions too. For example:
select * from test timestamps min, max; // select all versions within the specified time range
select * from test timestamps ts; // select a specified version
select * from test version number; // select specified number of versions
select * from test version number timestamps min, max; // select specified number of versions with a specified time range.
Note that this is not standard SQL syntax, which is not recommended.
* Timestamp is a Cell-level property in HBase, so we should support the same thing in Phoenix. But how can we allow different timestamps for different columns in the same row? I modified the ResultSet class and add some methods like 'public Map<Long, T> getAllT(index)' to return all selected versions for a single column. One can call this method on different columns for the same row to retrieve all the things he wants. Users must use PhoenixResultSet instead of ResultSet, this is not recommended either.
* How do we handle index updates/selects for multi-version? This is a messy problem, so my implementation did not support multi-version for index tables.
* do not support GROUP BY, ORDER BY or any nested query/upsert.
* for batch commit, when you upsert the same row with different timestamps, Phoenix can only commit the last timestamps you set. This is meaningless to do this. So I simply forbid this scenario.
* Phoenix encoded the KVs into one Cell at the RS side, but if we want to return multi-versions for different columns, especially different timestamps for different columns, we must not do the encoding. So we must modify the internals of Phoenix to support a brand new read path to do this.

Besides the huge efforts of implementing, IMHO, the primary problem is it's not easy to implement this feature properly, as each one may have a different requirement. You can implementing this feature personally in your personal branch, but i don't know the best way to support this in an official Phoenix release. What do you think of this? Any suggested design?

Thanks.
William.

At 2016-10-13 18:12:56, "Yang Zhang" <zh...@gmail.com> wrote:

Hello everyone

I saw that we can create a Phoenix table from an exist HBase table,(for detail)
My question is whether Phoenix can supprort the history version of my row?

I am trying to use Phoenix to store some info which have a lot of common columns,
such as a table "T1 ( c1, c2, c3, c4 )", many rows share the same c1,c2,c3,and the variable column is c4,
Using HBase we can put 'T1', 'key1', ' f:c4', 'new value', timestamp,

And i can get previous version of this row,They all share the same c1,c2,c3 whice HBase only store once.

Whether phoenix support to query history version of my row?

I got this jira link , This is same as my question.

Hadoop is using for big data, and mlutiple version can help us reduce our date that unnecessary
I think phoenix should support this feature too.

If Phoenix shouldn't support multiple version, please tell me the reason.

Anyway thansks for your help, First

Re: Can phoenix support HBase's TimeStamp?

Posted by James Taylor <ja...@apache.org>.

FYI, a couple of timestamp related features that Phoenix supports today
include;
- specify/filter on timestamp of a row:
http://phoenix.apache.org/rowtimestamp.html
- query as of a past timestamp:
http://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API

These were determined to be a good fit with SQL and surface some of the
power of HBase. Exposing per cell timestamp control and multi-version
queries are difficult in the SQL model, but we're open to suggestions if it
can be done in a standard, general way.

Thanks,
James

On Sunday, October 16, 2016, William <yh...@163.com> wrote:

> Hi, Zhang Yang,
>    I've implemented the multi-version feature in my own Phoenix branch.
> But this implementation is supposed to be working in a very very limited
> scenario because there were so many things to think about when designing
> it. Here are some primary problems that we must solve:
>    * add new syntax to support select with timestamps, we should support
> select only one version and multi version within a range and the number of
> versions too. For example:
>      select * from test timestamps min, max;     // select all versions
> within the specified time range
>      select * from test timestamps ts;           // select a specified
> version
>      select * from test version number;          // select specified
> number of versions
>      select * from test version number timestamps min, max; // select
> specified number of versions with a specified time range.
>      Note that this is not standard SQL syntax, which is not recommended.
>    * Timestamp is a Cell-level property in HBase, so we should support the
> same thing in Phoenix. But how can we allow different timestamps for
> different columns in the same row? I modified the ResultSet class and add
> some methods like 'public Map<Long, T> getAllT(index)' to return all
> selected versions for a single column. One can call this method on
> different columns for the same row to retrieve all the things he wants.
> Users must use PhoenixResultSet instead of ResultSet, this is not
> recommended either.
>    * How do we handle index updates/selects for multi-version? This is a
> messy problem, so my implementation did not support multi-version for index
> tables.
>    * do not support GROUP BY, ORDER BY or any nested query/upsert.
>    * for batch commit, when you upsert the same row with different
> timestamps, Phoenix can only commit the last timestamps you set. This is
> meaningless to do this. So I simply forbid this scenario.
>    * Phoenix encoded the KVs into one Cell at the RS side, but if we want
> to return multi-versions for different columns, especially different
> timestamps for different columns, we must not do the encoding. So we must
> modify the internals of Phoenix to support a brand new read path to do this.
>
>    Besides the huge efforts of implementing, IMHO, the primary problem is
> it's not easy to implement this feature properly,  as each one may have a
> different requirement. You can implementing this feature personally in your
> personal branch, but i don't know the best way to support this in an
> official Phoenix release. What do you think of this? Any suggested design?
>
>   Thanks.
>   William.
>
> At 2016-10-13 18:12:56, "Yang Zhang" <zhang.yang.dm@gmail.com
> <javascript:_e(%7B%7D,'cvml','zhang.yang.dm@gmail.com');>> wrote:
>
> Hello everyone
>
> I saw that we can create a Phoenix table from an exist HBase table,(for
> detail
> <https://phoenix.apache.org/faq.html#How_I_map_Phoenix_table_to_an_existing_HBase_table>
> )
> My question is whether Phoenix can supprort the history version of my row?
>
> I am trying to  use Phoenix to store some info which have a lot of common
> columns,
> such as a table "T1 ( c1, c2, c3, c4 )", many rows share the same
> c1,c2,c3,and the variable column is c4,
> Using HBase we can put  'T1',  'key1', ' f:c4', 'new value', timestamp,
>
> And i can get previous version of this row,They all share the same
> c1,c2,c3 whice HBase only store once.
>
> Whether phoenix support to query history version of my row?
>
> I got this jira link <https://issues.apache.org/jira/browse/PHOENIX-590>
> , This is same as my question.
>
> Hadoop is using for big data, and mlutiple version can help us reduce our
> date that unnecessary
> I think phoenix should support this feature too.
>
> If Phoenix shouldn't support multiple version, please tell me the reason.
>
>
> Anyway thansks for your help, First
>
>
>
>
>
>

Re:Can phoenix support HBase's TimeStamp?

Posted by William <yh...@163.com>.

Hi, Zhang Yang,
   I've implemented the multi-version feature in my own Phoenix branch. But this implementation is supposed to be working in a very very limited scenario because there were so many things to think about when designing it. Here are some primary problems that we must solve:
   * add new syntax to support select with timestamps, we should support select only one version and multi version within a range and the number of versions too. For example:
     select * from test timestamps min, max;     // select all versions within the specified time range
     select * from test timestamps ts;           // select a specified version
     select * from test version number;          // select specified number of versions
     select * from test version number timestamps min, max; // select specified number of versions with a specified time range.
     Note that this is not standard SQL syntax, which is not recommended.
   * Timestamp is a Cell-level property in HBase, so we should support the same thing in Phoenix. But how can we allow different timestamps for different columns in the same row? I modified the ResultSet class and add some methods like 'public Map<Long, T> getAllT(index)' to return all selected versions for a single column. One can call this method on different columns for the same row to retrieve all the things he wants. Users must use PhoenixResultSet instead of ResultSet, this is not recommended either.
   * How do we handle index updates/selects for multi-version? This is a messy problem, so my implementation did not support multi-version for index tables.
   * do not support GROUP BY, ORDER BY or any nested query/upsert.
   * for batch commit, when you upsert the same row with different timestamps, Phoenix can only commit the last timestamps you set. This is meaningless to do this. So I simply forbid this scenario.
   * Phoenix encoded the KVs into one Cell at the RS side, but if we want to return multi-versions for different columns, especially different timestamps for different columns, we must not do the encoding. So we must modify the internals of Phoenix to support a brand new read path to do this.


   Besides the huge efforts of implementing, IMHO, the primary problem is it's not easy to implement this feature properly,  as each one may have a different requirement. You can implementing this feature personally in your personal branch, but i don't know the best way to support this in an official Phoenix release. What do you think of this? Any suggested design?


  Thanks.
  William.


At 2016-10-13 18:12:56, "Yang Zhang" <zh...@gmail.com> wrote:

Hello everyone


I saw that we can create a Phoenix table from an exist HBase table,(for detail)
My question is whether Phoenix can supprort the history version of my row? 


I am trying to  use Phoenix to store some info which have a lot of common columns,
such as a table "T1 ( c1, c2, c3, c4 )", many rows share the same c1,c2,c3,and the variable column is c4,
Using HBase we can put  'T1',  'key1', ' f:c4', 'new value', timestamp,


And i can get previous version of this row,They all share the same c1,c2,c3 whice HBase only store once.


Whether phoenix support to query history version of my row?


I got this jira link  , This is same as my question.


Hadoop is using for big data, and mlutiple version can help us reduce our date that unnecessary
I think phoenix should support this feature too.


If Phoenix shouldn't support multiple version, please tell me the reason.




Anyway thansks for your help, First