You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Nicolas Paris <ni...@gmail.com> on 2018/05/13 10:50:07 UTC

Binary fields and compression

Hi,

My use case is storing pdf files from one side, and it's content as text
from the other. Two kind of queries would either get the text, or the pdf
from the primary key. Rarely both.

Then I guess the option is to create two columns family, one containing a
VARBINARY column to store the binary pdf, and the other column family
containing the text and other metadata in other columns.

Since text can be compressed, I guess that's would be a good idea to
compress the table.

CREATE TABLE DOCUMENTS (HOST VARCHAR NOT NULL PRIMARY KEY, A.CONTENT
VARBINARY, B.TEXT VARCHAR, B.LABEL VARCHAR, B.DATE_CREATE TIMESTAMP)
COMPRESSION='GZ'

The problem is the compression here is for both column family. As a result,
I will end up compressing pdf, that is lost of efforts.

What do you suggest ?

Thanks

Re: Binary fields and compression

Posted by Nicolas Paris <ni...@gmail.com>.
James

That makes sense

Thanks for your answer,

2018-05-13 18:17 GMT+02:00 James Taylor <ja...@apache.org>:

> You can have a property only apply to a single column family by prefixing
> it with the family name:
>
> CREATE TABLE DOCUMENTS (HOST VARCHAR NOT NULL PRIMARY KEY, A.CONTENT
> VARBINARY, B.TEXT VARCHAR, B.LABEL VARCHAR, B.DATE_CREATE TIMESTAMP)
> B.COMPRESSION='GZ'
>
> On Sun, May 13, 2018 at 3:50 AM Nicolas Paris <ni...@gmail.com> wrote:
>
>> Hi,
>>
>> My use case is storing pdf files from one side, and it's content as text
>> from the other. Two kind of queries would either get the text, or the pdf
>> from the primary key. Rarely both.
>>
>> Then I guess the option is to create two columns family, one containing a
>> VARBINARY column to store the binary pdf, and the other column family
>> containing the text and other metadata in other columns.
>>
>> Since text can be compressed, I guess that's would be a good idea to
>> compress the table.
>>
>> CREATE TABLE DOCUMENTS (HOST VARCHAR NOT NULL PRIMARY KEY, A.CONTENT
>> VARBINARY, B.TEXT VARCHAR, B.LABEL VARCHAR, B.DATE_CREATE TIMESTAMP)
>> COMPRESSION='GZ'
>>
>> The problem is the compression here is for both column family. As a
>> result, I will end up compressing pdf, that is lost of efforts.
>>
>> What do you suggest ?
>>
>> Thanks
>>
>

Re: Binary fields and compression

Posted by James Taylor <ja...@apache.org>.
You can have a property only apply to a single column family by prefixing
it with the family name:

CREATE TABLE DOCUMENTS (HOST VARCHAR NOT NULL PRIMARY KEY, A.CONTENT
VARBINARY, B.TEXT VARCHAR, B.LABEL VARCHAR, B.DATE_CREATE TIMESTAMP)
B.COMPRESSION='GZ'

On Sun, May 13, 2018 at 3:50 AM Nicolas Paris <ni...@gmail.com> wrote:

> Hi,
>
> My use case is storing pdf files from one side, and it's content as text
> from the other. Two kind of queries would either get the text, or the pdf
> from the primary key. Rarely both.
>
> Then I guess the option is to create two columns family, one containing a
> VARBINARY column to store the binary pdf, and the other column family
> containing the text and other metadata in other columns.
>
> Since text can be compressed, I guess that's would be a good idea to
> compress the table.
>
> CREATE TABLE DOCUMENTS (HOST VARCHAR NOT NULL PRIMARY KEY, A.CONTENT
> VARBINARY, B.TEXT VARCHAR, B.LABEL VARCHAR, B.DATE_CREATE TIMESTAMP)
> COMPRESSION='GZ'
>
> The problem is the compression here is for both column family. As a
> result, I will end up compressing pdf, that is lost of efforts.
>
> What do you suggest ?
>
> Thanks
>