You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by "Clay Baenziger (BLOOMBERG/ 731 LEX)" <cb...@bloomberg.net> on 2018/12/21 20:18:26 UTC

Query All Dynamic Columns

Hello,

A user of mine brought up a question around dynamic columns in Phoenix today. The quantity of columns should become asymptotic to a few tends of thousands of columns as their data fills in.

The user want to query all columns in a table and they are today thinking of using views to do this -- but it is ugly management. They have an unbounded number of views -- which will pollute the global catalog and fail relatively quickly.

Has anyone thought about the potentially wasteful[1] approach of scanning all rows in a query to determine columns and then re-running the query for the rows once we know what columns the SQL result will contain. Maybe something cleaner like persisting the set of columns in the statistics table and a SELECT * may return columns with nothing but nulls. Or, even better is there an overall better way to model such a wide schema in Phoenix?

-Clay

[1]: Perhaps some heuristics could allow for not needing to do 2n reads in all cases?

Re: Query All Dynamic Columns

Posted by Thomas D'Silva <td...@salesforce.com>.

With splittable system catalog you should be able to create views without
seeing performance issues.
Chinmay is working on enabling  running a select query to return the
dynamic column values without specifying the dynamic column names and types
ahead of times.
(see https://issues.apache.org/jira/browse/PHOENIX-374). Please take a look
at the JIRA and see if it meets you use case.


On Wed, Dec 26, 2018 at 1:55 PM James Taylor <ja...@apache.org> wrote:

> Persisting dynamic column names+types in Phoenix is exactly what views are
> for.
>
>
> On Wed, Dec 26, 2018 at 12:05 PM Vincent Poon <vi...@apache.org>
> wrote:
>
>> A lot of work is currently going into handling large numbers of views -
>> splittable syscat, view management, etc... but agree that it's not ideal.
>>
>> There's currently no built-in way to do what you want AFAIK, but you can
>> manage the columns yourself in a separate table:
>> - store them all in a single column value, and read that value before
>> doing your query.  HBase checkAndMutate for locking.
>> or
>> - store each column as separate rows.  Then you can do things like filter
>> by column name efficiently.
>> You could 'soft delete' by removing the entries.
>>
>> Would be a nice improvement to have an option to persist dynamic column
>> names+types in Phoenix.
>>
>> On Fri, Dec 21, 2018 at 12:18 PM Clay Baenziger (BLOOMBERG/ 731 LEX) <
>> cbaenziger@bloomberg.net> wrote:
>>
>>> Hello,
>>>
>>> A user of mine brought up a question around dynamic columns in Phoenix
>>> today. The quantity of columns should become asymptotic to a few tends of
>>> thousands of columns as their data fills in.
>>>
>>> The user want to query all columns in a table and they are today
>>> thinking of using views to do this -- but it is ugly management. They have
>>> an unbounded number of views -- which will pollute the global catalog and
>>> fail relatively quickly.
>>>
>>> Has anyone thought about the potentially wasteful[1] approach of
>>> scanning all rows in a query to determine columns and then re-running the
>>> query for the rows once we know what columns the SQL result will contain.
>>> Maybe something cleaner like persisting the set of columns in the
>>> statistics table and a SELECT * may return columns with nothing but nulls.
>>> Or, even better is there an overall better way to model such a wide schema
>>> in Phoenix?
>>>
>>> -Clay
>>>
>>> [1]: Perhaps some heuristics could allow for not needing to do 2n reads
>>> in all cases?
>>>
>>

Re: Query All Dynamic Columns

Posted by James Taylor <ja...@apache.org>.

Persisting dynamic column names+types in Phoenix is exactly what views are
for.


On Wed, Dec 26, 2018 at 12:05 PM Vincent Poon <vi...@apache.org>
wrote:

> A lot of work is currently going into handling large numbers of views -
> splittable syscat, view management, etc... but agree that it's not ideal.
>
> There's currently no built-in way to do what you want AFAIK, but you can
> manage the columns yourself in a separate table:
> - store them all in a single column value, and read that value before
> doing your query.  HBase checkAndMutate for locking.
> or
> - store each column as separate rows.  Then you can do things like filter
> by column name efficiently.
> You could 'soft delete' by removing the entries.
>
> Would be a nice improvement to have an option to persist dynamic column
> names+types in Phoenix.
>
> On Fri, Dec 21, 2018 at 12:18 PM Clay Baenziger (BLOOMBERG/ 731 LEX) <
> cbaenziger@bloomberg.net> wrote:
>
>> Hello,
>>
>> A user of mine brought up a question around dynamic columns in Phoenix
>> today. The quantity of columns should become asymptotic to a few tends of
>> thousands of columns as their data fills in.
>>
>> The user want to query all columns in a table and they are today thinking
>> of using views to do this -- but it is ugly management. They have an
>> unbounded number of views -- which will pollute the global catalog and fail
>> relatively quickly.
>>
>> Has anyone thought about the potentially wasteful[1] approach of scanning
>> all rows in a query to determine columns and then re-running the query for
>> the rows once we know what columns the SQL result will contain. Maybe
>> something cleaner like persisting the set of columns in the statistics
>> table and a SELECT * may return columns with nothing but nulls. Or, even
>> better is there an overall better way to model such a wide schema in
>> Phoenix?
>>
>> -Clay
>>
>> [1]: Perhaps some heuristics could allow for not needing to do 2n reads
>> in all cases?
>>
>

Re: Query All Dynamic Columns

Posted by Vincent Poon <vi...@apache.org>.

A lot of work is currently going into handling large numbers of views -
splittable syscat, view management, etc... but agree that it's not ideal.

There's currently no built-in way to do what you want AFAIK, but you can
manage the columns yourself in a separate table:
- store them all in a single column value, and read that value before doing
your query.  HBase checkAndMutate for locking.
or
- store each column as separate rows.  Then you can do things like filter
by column name efficiently.
You could 'soft delete' by removing the entries.

Would be a nice improvement to have an option to persist dynamic column
names+types in Phoenix.

On Fri, Dec 21, 2018 at 12:18 PM Clay Baenziger (BLOOMBERG/ 731 LEX) <
cbaenziger@bloomberg.net> wrote:

> Hello,
>
> A user of mine brought up a question around dynamic columns in Phoenix
> today. The quantity of columns should become asymptotic to a few tends of
> thousands of columns as their data fills in.
>
> The user want to query all columns in a table and they are today thinking
> of using views to do this -- but it is ugly management. They have an
> unbounded number of views -- which will pollute the global catalog and fail
> relatively quickly.
>
> Has anyone thought about the potentially wasteful[1] approach of scanning
> all rows in a query to determine columns and then re-running the query for
> the rows once we know what columns the SQL result will contain. Maybe
> something cleaner like persisting the set of columns in the statistics
> table and a SELECT * may return columns with nothing but nulls. Or, even
> better is there an overall better way to model such a wide schema in
> Phoenix?
>
> -Clay
>
> [1]: Perhaps some heuristics could allow for not needing to do 2n reads in
> all cases?
>