You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Alex Liu (JIRA)" <ji...@apache.org> on 2013/12/04 00:07:38 UTC

[jira] [Commented] (CASSANDRA-6432) Calculate estimated Cql row count per token range

    [ https://issues.apache.org/jira/browse/CASSANDRA-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838301#comment-13838301 ] 

Alex Liu commented on CASSANDRA-6432:
-------------------------------------

SSTableMetadata.estimatedColumnCount collects column counts per SSTable, but there is no column counts per key, so we can't use the current statistics to calculate the columns per token range.

Same column can be distributed across multiple sstables, so we need merging the columns to count the unique columns which is not applicable. 

Select count(*) from cf scans all the rows, then it's not useful for big data.

> Calculate estimated Cql row count per token range
> -------------------------------------------------
>
>                 Key: CASSANDRA-6432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6432
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>
> CASSANDRA-6311 use the client side to calculate actual CF row count for hadoop job. We need fix it by using Cql row count, which need estimated Cql row count per token range.



--
This message was sent by Atlassian JIRA
(v6.1#6144)