You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jay Patel (JIRA)" <ji...@apache.org> on 2014/04/22 23:24:17 UTC

[jira] [Created] (CASSANDRA-7070) Virtual column name aliasing

Jay Patel created CASSANDRA-7070:
------------------------------------

             Summary: Virtual column name aliasing
                 Key: CASSANDRA-7070
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7070
             Project: Cassandra
          Issue Type: Improvement
          Components: API, Core
            Reporter: Jay Patel
             Fix For: 3.0


Hi folks!

Currently, storage space is saved significantly (in terabytes sometime for static tables!) by shortening column names as it's repeated in each row; however, this short column names can be very unreadable. So far, I've seen 10s of tables with 100s of convoluted names. Hard to debug issues, and work with on day-to-day basis. This can make smart engineers quit project or even company :) 

Another reason:  most of the time, folks are not even aware that column names are repeated, and end up with really descriptive column names. Then, realize waste of disk/ram/network, and spend time on re-implementation and/or crazy migration to new table.

Yet another reason: table might be shared by multiple system, use cases and people in organization, e.g. primary/analytics use cases, Ops/Developer, etc. Now, it's becoming the issue where we should reliably keep mapping from convoluted names to descriptive names. Usually, these mappings are done in java enums; I've seen in DB as well, just so Ops folks don't have to interpret java code :)

It would be great if Cassandra internally could virtually alias the column names to a more efficient representation in storage. 

I can take a shot at this feature if there are no major concerns. We ideally want user to work with descriptive alias everywhere & not even aware of internal storage name of the column.  Also, I think name/alias mapping needs to be cached all the time to avoid any performance hit.. Any thoughts? How difficult is it to accommodate? BTW, I think this may not directly apply to dynamic tables as we rely on column name for proper ordering of columns in wide rows. However, we may have some room there if it's not a clustered column..

Thanks,
Jay



--
This message was sent by Atlassian JIRA
(v6.2#6252)