You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Lefty Leverenz (JIRA)" <ji...@apache.org> on 2014/07/21 03:20:39 UTC

[jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong

    [ https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068095#comment-14068095 ] 

Lefty Leverenz commented on HIVE-4590:
--------------------------------------

[~eugene.koifman], it's past time to fix this but first I have a couple of questions:

#  Why does the equivalent SELECT statement say "col1" while the description says "an integer in the second column"?  Does this assume column numbers start with zero?
#*  "select col1, count\(*\) from $table group by col1;"
I tried to figure it out from the MR program, but strained my brain.
#  Is there a typo in the output for your sample dataset (1,1,1,3,3,3,5)?  I see three 3s, not 2.  
#*  1, 3
3, 2,
5, 1
... and presumably the comma after the 2 (or 3) can be removed.

The doc has a new location, by the way:

* [HCat Input and Output -- Read Example | https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ReadExample]

> HCatalog documentation example is wrong
> ---------------------------------------
>
>                 Key: HIVE-4590
>                 URL: https://issues.apache.org/jira/browse/HIVE-4590
>             Project: Hive
>          Issue Type: Bug
>          Components: Documentation, HCatalog
>    Affects Versions: 0.10.0
>            Reporter: Eugene Koifman
>            Assignee: Lefty Leverenz
>            Priority: Minor
>
> http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example
> reads
> The following very simple MapReduce program reads data from one table which it assumes to have an integer in the second column, and counts how many different values it sees. That is, it does the equivalent of "select col1, count(*) from $table group by col1;".
> The description of the query is wrong.  It actually counts how many instances of each distinct value it find.  For example, if values of col1 are {1,1,1,3,3,3,5) it will produce
> 1, 3
> 3, 2,
> 5, 1
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)