You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Keith Wiley <kw...@keithwiley.com> on 2013/04/02 00:45:42 UTC
Need rank(), can't build m6d's version
I need rank() in Hive. I have't had much luck with Edward Capriolo's on git and it comes with no documentation. It depends on hive-test (also by Edward) and I can't get maven to build the tests for hive-test which prevents me from building the hive-test package (the jar)...which is all pretty tangential to the real goal, which is rank of course.
Is there any place to download the hive-rank jar pre-built instead of having to build it out of git? If not, has anyone built it recently, or namely has anyone built the hive-test project it depends on? The following is just one example of the sort of error I see. Remember this error is on hive-test, which frankly I don't even care about. :-)
Feeling a little desperate...thanks for any help.
-------------------------------------------------------------------------------
Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec <<< FAILURE!
testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest) Time elapsed: 7.823 sec <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<9>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at $Proxy0.invoke(Unknown Source)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
-- Galileo Galilei
________________________________________________________________________________
Re: Need rank()
Posted by Keith Wiley <kw...@keithwiley.com>.
I'm pretty sure there are at least a few bugs in that article. The problem described below goes away if "rank" is added to the outer select statement. If there's one error I suppose there may be others. Although I have gotten the query to run, I'm not getting properly ranked results yet. I'm still working on it.
On Apr 2, 2013, at 10:03 , Keith Wiley wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
>> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> user string
> category string
> value int
> Time taken: 0.082 seconds
> ==================================================
> hive> select * from test;
> OK
> user1 cat1 1
> user1 cat1 2
> user1 cat1 3
> user1 cat2 10
> user1 cat2 20
> user1 cat2 30
> user2 cat1 11
> user2 cat1 21
> user2 cat1 31
> user2 cat2 5
> user2 cat2 6
> user2 cat2 7
> Time taken: 0.202 seconds
> ==================================================
> hive> SELECT user, category, value
>> FROM (
>> SELECT user, category, rank(user) as rank, value
>> FROM test
>> DISTRIBUTE BY user
>> SORT BY user, value desc
>> ) a
>> WHERE rank > 1
>> ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio than
> when I entered."
> -- Keith Wiley
> ________________________________________________________________________________
>
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
-- Edwin A. Abbott, Flatland
________________________________________________________________________________
Re: Need rank()
Posted by Keith Wiley <kw...@keithwiley.com>.
I suppose. Like I said, I was just following the example on that website and it didn't seem to have any comments on it remarking on errors in the article. Thus my confusion about the whole thing. It is presented as being accurate and is old enough that any errors would have been caught and labeled in the comments long ago. I also thought it was a little odd to name the function the same thing as the alias assigned to the result. I'll try disambiguating that.
On Apr 2, 2013, at 10:22 , Nitin Pawar wrote:
> I am not sure about this
> but you defined a rank function and then aliased the column as rank itself. Can this cause the issue?
>
> From the error it looks like it could not figure out which column is rank in the where clause "WHERE rank > 1"
>
> from Edward's blog the correct query looks similar but his function is named as p_rank
>
> SELECT
> category,country,product,sales,rank
> FROM (
> SELECT
> category,country,product,sales,
> p_rank(category, country) rank
> FROM (
> SELECT
> category,country,product,
> sales
> FROM p_rank_demo
> DISTRIBUTE BY
> category,country
> SORT BY
> category,country,sales desc) t1) t2
> WHERE rank <= 3
>
>
> On Tue, Apr 2, 2013 at 10:45 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I did all that, I just didn't paste it into the email. That isn't the problem. Sorry for the confusion.
>
> hive> add jar Rank.jar;
> hive> create temporary function rank as 'com.example.hive.udf.Rank';
>
> BTW, small typo, the condition at the end uses less-then, not greater-then...obviously...but that isn't the problem either.
>
> On Apr 2, 2013, at 10:06 , Nitin Pawar wrote:
>
> > I dont see you added rank in the classpath and defined rank function
> >
> > can you try add jar and define the function and try again
> >
> >
> >
> > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > user string
> > category string
> > value int
> > Time taken: 0.082 seconds
> > ==================================================
> > hive> select * from test;
> > OK
> > user1 cat1 1
> > user1 cat1 2
> > user1 cat1 3
> > user1 cat2 10
> > user1 cat2 20
> > user1 cat2 30
> > user2 cat1 11
> > user2 cat1 21
> > user2 cat1 31
> > user2 cat2 5
> > user2 cat2 6
> > user2 cat2 7
> > Time taken: 0.202 seconds
> > ==================================================
> > hive> SELECT user, category, value
> > > FROM (
> > > SELECT user, category, rank(user) as rank, value
> > > FROM test
> > > DISTRIBUTE BY user
> > > SORT BY user, value desc
> > > ) a
> > > WHERE rank > 1
> > > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> > ________________________________________________________________________________
> > Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge ratio than
> > when I entered."
> > -- Keith Wiley
> > ________________________________________________________________________________
> >
> >
> >
> >
> > --
> > Nitin Pawar
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "It's a fine line between meticulous and obsessive-compulsive and a slippery
> rope between obsessive-compulsive and debilitatingly slow."
> -- Keith Wiley
> ________________________________________________________________________________
>
>
>
>
> --
> Nitin Pawar
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
-- Keith Wiley
________________________________________________________________________________
Re: Need rank()
Posted by Nitin Pawar <ni...@gmail.com>.
I am not sure about this
but you defined a rank function and then aliased the column as rank itself.
Can this cause the issue?
>From the error it looks like it could not figure out which column is rank
in the where clause "WHERE rank > 1"
from Edward's blog the correct query looks similar but his function is
named as p_rank
SELECT
category,country,product,sales,rank
FROM (
SELECT
category,country,product,sales,
p_rank(category, country) rank
FROM (
SELECT
category,country,product,
sales
FROM p_rank_demo
DISTRIBUTE BY
category,country
SORT BY
category,country,sales desc) t1) t2
WHERE rank <= 3
On Tue, Apr 2, 2013 at 10:45 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I did all that, I just didn't paste it into the email. That isn't the
> problem. Sorry for the confusion.
>
> hive> add jar Rank.jar;
> hive> create temporary function rank as 'com.example.hive.udf.Rank';
>
> BTW, small typo, the condition at the end uses less-then, not
> greater-then...obviously...but that isn't the problem either.
>
> On Apr 2, 2013, at 10:06 , Nitin Pawar wrote:
>
> > I dont see you added rank in the classpath and defined rank function
> >
> > can you try add jar and define the function and try again
> >
> >
> >
> > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article? I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way). Here's
> what I get when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > user string
> > category string
> > value int
> > Time taken: 0.082 seconds
> > ==================================================
> > hive> select * from test;
> > OK
> > user1 cat1 1
> > user1 cat1 2
> > user1 cat1 3
> > user1 cat2 10
> > user1 cat2 20
> > user1 cat2 30
> > user2 cat1 11
> > user2 cat1 21
> > user2 cat1 31
> > user2 cat2 5
> > user2 cat2 6
> > user2 cat2 7
> > Time taken: 0.202 seconds
> > ==================================================
> > hive> SELECT user, category, value
> > > FROM (
> > > SELECT user, category, rank(user) as rank, value
> > > FROM test
> > > DISTRIBUTE BY user
> > > SORT BY user, value desc
> > > ) a
> > > WHERE rank > 1
> > > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> >
> ________________________________________________________________________________
> > Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> > when I entered."
> > -- Keith Wiley
> >
> ________________________________________________________________________________
> >
> >
> >
> >
> > --
> > Nitin Pawar
>
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "It's a fine line between meticulous and obsessive-compulsive and a
> slippery
> rope between obsessive-compulsive and debilitatingly slow."
> -- Keith Wiley
>
> ________________________________________________________________________________
>
>
--
Nitin Pawar
Re: Need rank()
Posted by Keith Wiley <kw...@keithwiley.com>.
I did all that, I just didn't paste it into the email. That isn't the problem. Sorry for the confusion.
hive> add jar Rank.jar;
hive> create temporary function rank as 'com.example.hive.udf.Rank';
BTW, small typo, the condition at the end uses less-then, not greater-then...obviously...but that isn't the problem either.
On Apr 2, 2013, at 10:06 , Nitin Pawar wrote:
> I dont see you added rank in the classpath and defined rank function
>
> can you try add jar and define the function and try again
>
>
>
> On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
> > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> user string
> category string
> value int
> Time taken: 0.082 seconds
> ==================================================
> hive> select * from test;
> OK
> user1 cat1 1
> user1 cat1 2
> user1 cat1 3
> user1 cat2 10
> user1 cat2 20
> user1 cat2 30
> user2 cat1 11
> user2 cat1 21
> user2 cat1 31
> user2 cat2 5
> user2 cat2 6
> user2 cat2 7
> Time taken: 0.202 seconds
> ==================================================
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio than
> when I entered."
> -- Keith Wiley
> ________________________________________________________________________________
>
>
>
>
> --
> Nitin Pawar
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley
________________________________________________________________________________
Re: Need rank()
Posted by Nitin Pawar <ni...@gmail.com>.
I dont see you added rank in the classpath and defined rank function
can you try add jar and define the function and try again
On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
> >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article? I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way). Here's
> what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> user string
> category string
> value int
> Time taken: 0.082 seconds
> ==================================================
> hive> select * from test;
> OK
> user1 cat1 1
> user1 cat1 2
> user1 cat1 3
> user1 cat2 10
> user1 cat2 20
> user1 cat2 30
> user2 cat1 11
> user2 cat1 21
> user2 cat1 31
> user2 cat2 5
> user2 cat2 6
> user2 cat2 7
> Time taken: 0.202 seconds
> ==================================================
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> when I entered."
> -- Keith Wiley
>
> ________________________________________________________________________________
>
>
--
Nitin Pawar
Re: Need rank()
Posted by Keith Wiley <kw...@keithwiley.com>.
On Apr 2, 2013, at 12:59 , Edward Capriolo wrote:
> I feel your pain I do not have time to look at the project ATM. However trying to do Java development these days without understanding git, or maven is fairly impossible, most projects depend on N others that depend on M others. So trying to even compile the code outside of maven requires a lot of manually fetching jar etc. Your better off taking a day and getting good with maven, IMHO. You will be dealing with it for the foreseeable future.
I agree. I'll cozy up with it first chance I get...but sometimes you're on a deadline and have to go for the low-hanging fruit. I fully understand the motivation behind your approach. I'll circle back around it when I get some free time.
Cheers!
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
-- Galileo Galilei
________________________________________________________________________________
Re: Need rank()
Posted by Edward Capriolo <ed...@gmail.com>.
I feel your pain I do not have time to look at the project ATM. However
trying to do Java development these days without understanding git, or
maven is fairly impossible, most projects depend on N others that depend on
M others. So trying to even compile the code outside of maven requires a
lot of manually fetching jar etc. Your better off taking a day and getting
good with maven, IMHO. You will be dealing with it for the foreseeable
future.
Re: Need rank()
Posted by Edward Capriolo <ed...@gmail.com>.
git clone https://github.com/edwardcapriolo/hive_test.git
cd hive_test/
mvn -Dmaven.test.skip=true install
cd ..
[edward@jackintosh java]$ git clone
https://github.com/edwardcapriolo/hive-rank.git
Cloning into 'hive-rank'...
remote: Counting objects: 74, done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 74 (delta 12), reused 70 (delta 8)
Unpacking objects: 100% (74/74), done.
[edward@jackintosh java]$ cd hive-rank/
[edward@jackintosh hive-rank]$ mvn install -Dmaven.test.skip=true
/usr/java/jdk1.7.0_13
...
[INFO] Installing
/home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar
to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar
[INFO] Installing /home/edward/Documents/java/hive-rank/pom.xml to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.pom
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 2.986s
[INFO] Finished at: Tue Apr 02 16:11:41 EDT 2013
[INFO] Final Memory: 17M/210M
[INFO]
------------------------------------------------------------------------
Then copy:
cp
/home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar
to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar
to your hadoop lib.
add jar <name of jar file>
..... etc etc
On Tue, Apr 2, 2013 at 3:51 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> Yep, the original article is definitely erroneous in this regard. I
> figured out that eventually. I'm not sure how much I can trust that
> resource now. I may have to look elsewhere. I agree that Edward's
> description is pretty good, but as I said earlier, I can't actually use his
> code, so I'm trying to cobble a workable solution together from the various
> resources available. Ritesh's article, despite the error in the Hive
> syntax, is still useful in that it enables one to quickly compile a simple
> rank jar without relying on git, maven, or other project dependencies --
> problems which have plagued me with Edward's approach. So, if I can use
> Ritesh's method to write a simple rank function, and Edward's accurate
> description of how to construct the query, then I can put all the pieces
> together into a workable solution.
>
> I'll let you know if I get it.
>
> On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote:
>
> > You are getting the error because you are ORDERing BY rank but rank is
> not in the top SELECT
> >
> > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a
> subquery:
> > SELECT ..., rank(user)
> > FROM (SELECT ... DISTRIBUTE BY ... SORT BY)
> >
> > igor
> > decide.com
> >
> >
> > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley <kw...@keithwiley.com>
> wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article? I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way). Here's
> what I get when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > user string
> > category string
> > value int
> > Time taken: 0.082 seconds
> > ==================================================
> > hive> select * from test;
> > OK
> > user1 cat1 1
> > user1 cat1 2
> > user1 cat1 3
> > user1 cat2 10
> > user1 cat2 20
> > user1 cat2 30
> > user2 cat1 11
> > user2 cat1 21
> > user2 cat1 31
> > user2 cat2 5
> > user2 cat2 6
> > user2 cat2 7
> > Time taken: 0.202 seconds
> > ==================================================
> > hive> SELECT user, category, value
> > > FROM (
> > > SELECT user, category, rank(user) as rank, value
> > > FROM test
> > > DISTRIBUTE BY user
> > > SORT BY user, value desc
> > > ) a
> > > WHERE rank > 1
> > > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> >
> ________________________________________________________________________________
> > Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> > when I entered."
> > -- Keith Wiley
> >
> ________________________________________________________________________________
> >
> >
>
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
> -- Yoda
>
> ________________________________________________________________________________
>
>
Re: Need rank()
Posted by Keith Wiley <kw...@keithwiley.com>.
Yep, the original article is definitely erroneous in this regard. I figured out that eventually. I'm not sure how much I can trust that resource now. I may have to look elsewhere. I agree that Edward's description is pretty good, but as I said earlier, I can't actually use his code, so I'm trying to cobble a workable solution together from the various resources available. Ritesh's article, despite the error in the Hive syntax, is still useful in that it enables one to quickly compile a simple rank jar without relying on git, maven, or other project dependencies -- problems which have plagued me with Edward's approach. So, if I can use Ritesh's method to write a simple rank function, and Edward's accurate description of how to construct the query, then I can put all the pieces together into a workable solution.
I'll let you know if I get it.
On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote:
> You are getting the error because you are ORDERing BY rank but rank is not in the top SELECT
>
> Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a subquery:
> SELECT ..., rank(user)
> FROM (SELECT ... DISTRIBUTE BY ... SORT BY)
>
> igor
> decide.com
>
>
> On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
> > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> user string
> category string
> value int
> Time taken: 0.082 seconds
> ==================================================
> hive> select * from test;
> OK
> user1 cat1 1
> user1 cat1 2
> user1 cat1 3
> user1 cat2 10
> user1 cat2 20
> user1 cat2 30
> user2 cat1 11
> user2 cat1 21
> user2 cat1 31
> user2 cat2 5
> user2 cat2 6
> user2 cat2 7
> Time taken: 0.202 seconds
> ==================================================
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio than
> when I entered."
> -- Keith Wiley
> ________________________________________________________________________________
>
>
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"Luminous beings are we, not this crude matter."
-- Yoda
________________________________________________________________________________
Re: Need rank()
Posted by Igor Tatarinov <ig...@decide.com>.
You are getting the error because you are ORDERing BY rank but rank is not
in the top SELECT
Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a
subquery:
SELECT ..., rank(user)
FROM (SELECT ... DISTRIBUTE BY ... SORT BY)
igor
decide.com
On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
> >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article? I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way). Here's
> what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> user string
> category string
> value int
> Time taken: 0.082 seconds
> ==================================================
> hive> select * from test;
> OK
> user1 cat1 1
> user1 cat1 2
> user1 cat1 3
> user1 cat2 10
> user1 cat2 20
> user1 cat2 30
> user2 cat1 11
> user2 cat1 21
> user2 cat1 31
> user2 cat2 5
> user2 cat2 6
> user2 cat2 7
> Time taken: 0.202 seconds
> ==================================================
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> when I entered."
> -- Keith Wiley
>
> ________________________________________________________________________________
>
>
Re: Need rank()
Posted by Keith Wiley <kw...@keithwiley.com>.
On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example:
hive> describe test;
OK
user string
category string
value int
Time taken: 0.082 seconds
==================================================
hive> select * from test;
OK
user1 cat1 1
user1 cat1 2
user1 cat1 3
user1 cat2 10
user1 cat2 20
user1 cat2 30
user2 cat1 11
user2 cat1 21
user2 cat1 31
user2 cat2 5
user2 cat2 6
user2 cat2 7
Time taken: 0.202 seconds
==================================================
hive> SELECT user, category, value
> FROM (
> SELECT user, category, rank(user) as rank, value
> FROM test
> DISTRIBUTE BY user
> SORT BY user, value desc
> ) a
> WHERE rank > 1
> ORDER BY user, rank;
FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2)
hive>
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
-- Keith Wiley
________________________________________________________________________________
Re: Need rank(), can't build m6d's version
Posted by Keith Wiley <kw...@keithwiley.com>.
Thanks! That looks like it has serious potential. I'll study up on it.
On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"And what if we picked the wrong religion? Every week, we're just making God
madder and madder!"
-- Homer Simpson
________________________________________________________________________________
Re: Need rank(), can't build m6d's version
Posted by Alexander Pivovarov <ap...@gmail.com>.
http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
On Mon, Apr 1, 2013 at 3:45 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I need rank() in Hive. I have't had much luck with Edward Capriolo's on
> git and it comes with no documentation. It depends on hive-test (also by
> Edward) and I can't get maven to build the tests for hive-test which
> prevents me from building the hive-test package (the jar)...which is all
> pretty tangential to the real goal, which is rank of course.
>
> Is there any place to download the hive-rank jar pre-built instead of
> having to build it out of git? If not, has anyone built it recently, or
> namely has anyone built the hive-test project it depends on? The following
> is just one example of the sort of error I see. Remember this error is on
> hive-test, which frankly I don't even care about. :-)
>
> Feeling a little desperate...thanks for any help.
>
>
> -------------------------------------------------------------------------------
> Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest
>
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec
> <<< FAILURE!
> testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest) Time elapsed:
> 7.823 sec <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<9>
> at org.junit.Assert.fail(Assert.java:91)
> at org.junit.Assert.failNotEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:126)
> at org.junit.Assert.assertEquals(Assert.java:470)
> at org.junit.Assert.assertEquals(Assert.java:454)
> at
> com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
> at $Proxy0.invoke(Unknown Source)
> at
> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
> at
> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I do not feel obliged to believe that the same God who has endowed us with
> sense, reason, and intellect has intended us to forgo their use."
> -- Galileo Galilei
>
> ________________________________________________________________________________
>
>
Re: Need rank(), can't build m6d's version
Posted by Keith Wiley <kw...@keithwiley.com>.
Thank you Edward. I appreciate you trying to help out. I'll see if I can get that to work.
On Apr 1, 2013, at 17:09 , Edward Capriolo wrote:
> approx steps
> git clone hive-test
> mvn install
> git clone hive-rang
> mvn install
>
> or just
> git clone hive-test
> mvn -Dmaven.test.skip=true install
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
-- Mark Twain
________________________________________________________________________________
Re: Need rank(), can't build m6d's version
Posted by Edward Capriolo <ed...@gmail.com>.
approx steps
git clone hive-test
mvn install
git clone hive-rang
mvn install
or just
git clone hive-test
mvn -Dmaven.test.skip=true install
On Mon, Apr 1, 2013 at 6:45 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> I need rank() in Hive. I have't had much luck with Edward Capriolo's on
> git and it comes with no documentation. It depends on hive-test (also by
> Edward) and I can't get maven to build the tests for hive-test which
> prevents me from building the hive-test package (the jar)...which is all
> pretty tangential to the real goal, which is rank of course.
>
> Is there any place to download the hive-rank jar pre-built instead of
> having to build it out of git? If not, has anyone built it recently, or
> namely has anyone built the hive-test project it depends on? The following
> is just one example of the sort of error I see. Remember this error is on
> hive-test, which frankly I don't even care about. :-)
>
> Feeling a little desperate...thanks for any help.
>
>
> -------------------------------------------------------------------------------
> Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest
>
> -------------------------------------------------------------------------------
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec
> <<< FAILURE!
> testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest) Time elapsed:
> 7.823 sec <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<9>
> at org.junit.Assert.fail(Assert.java:91)
> at org.junit.Assert.failNotEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:126)
> at org.junit.Assert.assertEquals(Assert.java:470)
> at org.junit.Assert.assertEquals(Assert.java:454)
> at
> com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
> at $Proxy0.invoke(Unknown Source)
> at
> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
> at
> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
>
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I do not feel obliged to believe that the same God who has endowed us with
> sense, reason, and intellect has intended us to forgo their use."
> -- Galileo Galilei
>
> ________________________________________________________________________________
>
>