You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Kumar Palaniappan <kp...@marinsoftware.com> on 2016/01/05 23:51:47 UTC

array of BIGINT index

We have a table with a data type BIGINT[], Since phoenix doesnt support to
index this data type, our queries are doing a full table scan when we have
to do filtering on this field.

What are the alternate approaches? Tried looking into Views, but nope.

Appreciate your time.

Kumar

Re: array of BIGINT index

Posted by James Taylor <ja...@apache.org>.

Sounds good, Kumar. I commented here[1] on PHOENIX-544 to help get you
started.

Thanks,
James

[1]
https://issues.apache.org/jira/browse/PHOENIX-1544?focusedCommentId=15086973&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15086973

On Wed, Jan 6, 2016 at 9:57 PM, Kumar Palaniappan <
kpalaniappan@marinsoftware.com> wrote:

> Thanks James. We will look into it. we need to find a way to overcome the
> full scan.
>
> On Wed, Jan 6, 2016 at 9:26 AM, James Taylor <ja...@apache.org>
> wrote:
>
>> In that case, you'd need PHOENIX-1544 to be implemented.
>>
>> On Wed, Jan 6, 2016 at 8:52 AM, Kumar Palaniappan <
>> kpalaniappan@marinsoftware.com> wrote:
>>
>>> Unfortunately changing the table is not an option for us at this time.
>>>
>>> On Tue, Jan 5, 2016 at 6:27 PM, James Taylor <ja...@apache.org>
>>> wrote:
>>>
>>>> If the "finding customers that have a particular account" is a common
>>>> query, you might consider modifying your schema by pulling the account into
>>>> an optional/nullable row key column, like this:
>>>>
>>>> CREATE TABLE T (CID VARCHAR NOT NULL, AID BIGINT, V1 VARCHAR, V2 VARCHAR
>>>>     CONSTRAINT pk PRIMARY KEY (CID,AID));
>>>>
>>>> Your non PK columns (V1 and V2 in this example) would only be set on
>>>> the row where AID is null, but you'd have new rows for all accounts for a
>>>> given customer, and these rows wouldn't have any other column values.
>>>>
>>>> Then you could create a secondary index on AID:
>>>> CREATE INDEX IDX ON T(AID);
>>>>
>>>> and you'd be able to find all customers for a given account quickly.
>>>>
>>>> You could still efficiently iterate over the account of a given
>>>> customer too:
>>>>     SELECT * FROM T WHERE CID=?
>>>> but your application would need to know that the first row would be the
>>>> customer row and the next rows would contain only the account IDs for that
>>>> customer.
>>>>
>>>> On Tue, Jan 5, 2016 at 3:46 PM, Kumar Palaniappan <
>>>> kpalaniappan@marinsoftware.com> wrote:
>>>>
>>>>> Thanks James for the response. Our use case is, that array holds all
>>>>> the accounts for a particular customer. so the  table and query is
>>>>>
>>>>> CREATE TABLE T ( ID VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>>>
>>>>> find by account is a use case -
>>>>>
>>>>> select  ID from table T where ANY (A);
>>>>>
>>>>> On Tue, Jan 5, 2016 at 3:34 PM, James Taylor <ja...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> There is some limited indexing you can do on an array by creating a
>>>>>> functional index for a particular array element. For example:
>>>>>>     CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>>>>     CREATE INDEX IDX ON T (A[3]);
>>>>>>
>>>>>> In this case, the following query would use the index:
>>>>>>     SELECT K FROM T WHERE A[3] = 5;
>>>>>>
>>>>>> Does this help for your usage?
>>>>>>
>>>>>> Thanks,
>>>>>> James
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
>>>>>> kpalaniappan@marinsoftware.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We have a table with a data type BIGINT[], Since phoenix doesnt
>>>>>>> support to index this data type, our queries are doing a full table scan
>>>>>>> when we have to do filtering on this field.
>>>>>>>
>>>>>>> What are the alternate approaches? Tried looking into Views, but
>>>>>>> nope.
>>>>>>>
>>>>>>> Appreciate your time.
>>>>>>>
>>>>>>> Kumar
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: array of BIGINT index

Posted by Kumar Palaniappan <kp...@marinsoftware.com>.

Thanks James. We will look into it. we need to find a way to overcome the
full scan.

On Wed, Jan 6, 2016 at 9:26 AM, James Taylor <ja...@apache.org> wrote:

> In that case, you'd need PHOENIX-1544 to be implemented.
>
> On Wed, Jan 6, 2016 at 8:52 AM, Kumar Palaniappan <
> kpalaniappan@marinsoftware.com> wrote:
>
>> Unfortunately changing the table is not an option for us at this time.
>>
>> On Tue, Jan 5, 2016 at 6:27 PM, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> If the "finding customers that have a particular account" is a common
>>> query, you might consider modifying your schema by pulling the account into
>>> an optional/nullable row key column, like this:
>>>
>>> CREATE TABLE T (CID VARCHAR NOT NULL, AID BIGINT, V1 VARCHAR, V2 VARCHAR
>>>     CONSTRAINT pk PRIMARY KEY (CID,AID));
>>>
>>> Your non PK columns (V1 and V2 in this example) would only be set on the
>>> row where AID is null, but you'd have new rows for all accounts for a given
>>> customer, and these rows wouldn't have any other column values.
>>>
>>> Then you could create a secondary index on AID:
>>> CREATE INDEX IDX ON T(AID);
>>>
>>> and you'd be able to find all customers for a given account quickly.
>>>
>>> You could still efficiently iterate over the account of a given customer
>>> too:
>>>     SELECT * FROM T WHERE CID=?
>>> but your application would need to know that the first row would be the
>>> customer row and the next rows would contain only the account IDs for that
>>> customer.
>>>
>>> On Tue, Jan 5, 2016 at 3:46 PM, Kumar Palaniappan <
>>> kpalaniappan@marinsoftware.com> wrote:
>>>
>>>> Thanks James for the response. Our use case is, that array holds all
>>>> the accounts for a particular customer. so the  table and query is
>>>>
>>>> CREATE TABLE T ( ID VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>>
>>>> find by account is a use case -
>>>>
>>>> select  ID from table T where ANY (A);
>>>>
>>>> On Tue, Jan 5, 2016 at 3:34 PM, James Taylor <ja...@apache.org>
>>>> wrote:
>>>>
>>>>> There is some limited indexing you can do on an array by creating a
>>>>> functional index for a particular array element. For example:
>>>>>     CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>>>     CREATE INDEX IDX ON T (A[3]);
>>>>>
>>>>> In this case, the following query would use the index:
>>>>>     SELECT K FROM T WHERE A[3] = 5;
>>>>>
>>>>> Does this help for your usage?
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>> On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
>>>>> kpalaniappan@marinsoftware.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> We have a table with a data type BIGINT[], Since phoenix doesnt
>>>>>> support to index this data type, our queries are doing a full table scan
>>>>>> when we have to do filtering on this field.
>>>>>>
>>>>>> What are the alternate approaches? Tried looking into Views, but nope.
>>>>>>
>>>>>> Appreciate your time.
>>>>>>
>>>>>> Kumar
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: array of BIGINT index

Posted by James Taylor <ja...@apache.org>.

In that case, you'd need PHOENIX-1544 to be implemented.

On Wed, Jan 6, 2016 at 8:52 AM, Kumar Palaniappan <
kpalaniappan@marinsoftware.com> wrote:

> Unfortunately changing the table is not an option for us at this time.
>
> On Tue, Jan 5, 2016 at 6:27 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> If the "finding customers that have a particular account" is a common
>> query, you might consider modifying your schema by pulling the account into
>> an optional/nullable row key column, like this:
>>
>> CREATE TABLE T (CID VARCHAR NOT NULL, AID BIGINT, V1 VARCHAR, V2 VARCHAR
>>     CONSTRAINT pk PRIMARY KEY (CID,AID));
>>
>> Your non PK columns (V1 and V2 in this example) would only be set on the
>> row where AID is null, but you'd have new rows for all accounts for a given
>> customer, and these rows wouldn't have any other column values.
>>
>> Then you could create a secondary index on AID:
>> CREATE INDEX IDX ON T(AID);
>>
>> and you'd be able to find all customers for a given account quickly.
>>
>> You could still efficiently iterate over the account of a given customer
>> too:
>>     SELECT * FROM T WHERE CID=?
>> but your application would need to know that the first row would be the
>> customer row and the next rows would contain only the account IDs for that
>> customer.
>>
>> On Tue, Jan 5, 2016 at 3:46 PM, Kumar Palaniappan <
>> kpalaniappan@marinsoftware.com> wrote:
>>
>>> Thanks James for the response. Our use case is, that array holds all the
>>> accounts for a particular customer. so the  table and query is
>>>
>>> CREATE TABLE T ( ID VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>
>>> find by account is a use case -
>>>
>>> select  ID from table T where ANY (A);
>>>
>>> On Tue, Jan 5, 2016 at 3:34 PM, James Taylor <ja...@apache.org>
>>> wrote:
>>>
>>>> There is some limited indexing you can do on an array by creating a
>>>> functional index for a particular array element. For example:
>>>>     CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>>     CREATE INDEX IDX ON T (A[3]);
>>>>
>>>> In this case, the following query would use the index:
>>>>     SELECT K FROM T WHERE A[3] = 5;
>>>>
>>>> Does this help for your usage?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>> On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
>>>> kpalaniappan@marinsoftware.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> We have a table with a data type BIGINT[], Since phoenix doesnt
>>>>> support to index this data type, our queries are doing a full table scan
>>>>> when we have to do filtering on this field.
>>>>>
>>>>> What are the alternate approaches? Tried looking into Views, but nope.
>>>>>
>>>>> Appreciate your time.
>>>>>
>>>>> Kumar
>>>>>
>>>>
>>>>
>>>
>>
>

Re: array of BIGINT index

Posted by Kumar Palaniappan <kp...@marinsoftware.com>.

Unfortunately changing the table is not an option for us at this time.

On Tue, Jan 5, 2016 at 6:27 PM, James Taylor <ja...@apache.org> wrote:

> If the "finding customers that have a particular account" is a common
> query, you might consider modifying your schema by pulling the account into
> an optional/nullable row key column, like this:
>
> CREATE TABLE T (CID VARCHAR NOT NULL, AID BIGINT, V1 VARCHAR, V2 VARCHAR
>     CONSTRAINT pk PRIMARY KEY (CID,AID));
>
> Your non PK columns (V1 and V2 in this example) would only be set on the
> row where AID is null, but you'd have new rows for all accounts for a given
> customer, and these rows wouldn't have any other column values.
>
> Then you could create a secondary index on AID:
> CREATE INDEX IDX ON T(AID);
>
> and you'd be able to find all customers for a given account quickly.
>
> You could still efficiently iterate over the account of a given customer
> too:
>     SELECT * FROM T WHERE CID=?
> but your application would need to know that the first row would be the
> customer row and the next rows would contain only the account IDs for that
> customer.
>
> On Tue, Jan 5, 2016 at 3:46 PM, Kumar Palaniappan <
> kpalaniappan@marinsoftware.com> wrote:
>
>> Thanks James for the response. Our use case is, that array holds all the
>> accounts for a particular customer. so the  table and query is
>>
>> CREATE TABLE T ( ID VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>
>> find by account is a use case -
>>
>> select  ID from table T where ANY (A);
>>
>> On Tue, Jan 5, 2016 at 3:34 PM, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> There is some limited indexing you can do on an array by creating a
>>> functional index for a particular array element. For example:
>>>     CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>>     CREATE INDEX IDX ON T (A[3]);
>>>
>>> In this case, the following query would use the index:
>>>     SELECT K FROM T WHERE A[3] = 5;
>>>
>>> Does this help for your usage?
>>>
>>> Thanks,
>>> James
>>>
>>> On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
>>> kpalaniappan@marinsoftware.com> wrote:
>>>
>>>>
>>>>
>>>> We have a table with a data type BIGINT[], Since phoenix doesnt support
>>>> to index this data type, our queries are doing a full table scan when we
>>>> have to do filtering on this field.
>>>>
>>>> What are the alternate approaches? Tried looking into Views, but nope.
>>>>
>>>> Appreciate your time.
>>>>
>>>> Kumar
>>>>
>>>
>>>
>>
>

Re: array of BIGINT index

Posted by James Taylor <ja...@apache.org>.

If the "finding customers that have a particular account" is a common
query, you might consider modifying your schema by pulling the account into
an optional/nullable row key column, like this:

CREATE TABLE T (CID VARCHAR NOT NULL, AID BIGINT, V1 VARCHAR, V2 VARCHAR
    CONSTRAINT pk PRIMARY KEY (CID,AID));

Your non PK columns (V1 and V2 in this example) would only be set on the
row where AID is null, but you'd have new rows for all accounts for a given
customer, and these rows wouldn't have any other column values.

Then you could create a secondary index on AID:
CREATE INDEX IDX ON T(AID);

and you'd be able to find all customers for a given account quickly.

You could still efficiently iterate over the account of a given customer
too:
    SELECT * FROM T WHERE CID=?
but your application would need to know that the first row would be the
customer row and the next rows would contain only the account IDs for that
customer.

On Tue, Jan 5, 2016 at 3:46 PM, Kumar Palaniappan <
kpalaniappan@marinsoftware.com> wrote:

> Thanks James for the response. Our use case is, that array holds all the
> accounts for a particular customer. so the  table and query is
>
> CREATE TABLE T ( ID VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>
> find by account is a use case -
>
> select  ID from table T where ANY (A);
>
> On Tue, Jan 5, 2016 at 3:34 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> There is some limited indexing you can do on an array by creating a
>> functional index for a particular array element. For example:
>>     CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>>     CREATE INDEX IDX ON T (A[3]);
>>
>> In this case, the following query would use the index:
>>     SELECT K FROM T WHERE A[3] = 5;
>>
>> Does this help for your usage?
>>
>> Thanks,
>> James
>>
>> On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
>> kpalaniappan@marinsoftware.com> wrote:
>>
>>>
>>>
>>> We have a table with a data type BIGINT[], Since phoenix doesnt support
>>> to index this data type, our queries are doing a full table scan when we
>>> have to do filtering on this field.
>>>
>>> What are the alternate approaches? Tried looking into Views, but nope.
>>>
>>> Appreciate your time.
>>>
>>> Kumar
>>>
>>
>>
>

Re: array of BIGINT index

Posted by Kumar Palaniappan <kp...@marinsoftware.com>.

Thanks James for the response. Our use case is, that array holds all the
accounts for a particular customer. so the  table and query is

CREATE TABLE T ( ID VARCHAR PRIMARY KEY, A BIGINT ARRAY);

find by account is a use case -

select  ID from table T where ANY (A);

On Tue, Jan 5, 2016 at 3:34 PM, James Taylor <ja...@apache.org> wrote:

> There is some limited indexing you can do on an array by creating a
> functional index for a particular array element. For example:
>     CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
>     CREATE INDEX IDX ON T (A[3]);
>
> In this case, the following query would use the index:
>     SELECT K FROM T WHERE A[3] = 5;
>
> Does this help for your usage?
>
> Thanks,
> James
>
> On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
> kpalaniappan@marinsoftware.com> wrote:
>
>>
>>
>> We have a table with a data type BIGINT[], Since phoenix doesnt support
>> to index this data type, our queries are doing a full table scan when we
>> have to do filtering on this field.
>>
>> What are the alternate approaches? Tried looking into Views, but nope.
>>
>> Appreciate your time.
>>
>> Kumar
>>
>
>

Re: array of BIGINT index

Posted by James Taylor <ja...@apache.org>.

There is some limited indexing you can do on an array by creating a
functional index for a particular array element. For example:
    CREATE TABLE T (K VARCHAR PRIMARY KEY, A BIGINT ARRAY);
    CREATE INDEX IDX ON T (A[3]);

In this case, the following query would use the index:
    SELECT K FROM T WHERE A[3] = 5;

Does this help for your usage?

Thanks,
James

On Tue, Jan 5, 2016 at 2:51 PM, Kumar Palaniappan <
kpalaniappan@marinsoftware.com> wrote:

>
>
> We have a table with a data type BIGINT[], Since phoenix doesnt support to
> index this data type, our queries are doing a full table scan when we have
> to do filtering on this field.
>
> What are the alternate approaches? Tried looking into Views, but nope.
>
> Appreciate your time.
>
> Kumar
>