You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Babe Ruth <gt...@hotmail.com> on 2013/04/30 21:03:50 UTC

Can a bucket be added to a partition?

Hello, I have a table that is already created and is partitioned dynamically by day.     i would like all future partitions to be bucketed on two columns.
Can I add a bucket to a partitions in an already existing table?


Thanks,George

Re: Variable resolution Fails

Posted by sumit ghosh <su...@yahoo.com>.

Thanks for the solution & the tip. :)

________________________________
From: Sanjay Subramanian <Sa...@wizecommerce.com>
To: "user@hive.apache.org" <us...@hive.apache.org>; sumit ghosh <su...@yahoo.com> 
Sent: Tuesday, 30 April 2013 7:44 PM
Subject: Re: Variable resolution Fails

+1  agreed

Also as a general script programming practice I check if the variables I am going to use are NON empty before using them…nothing related to Hive scripts

If [ ${freq} == "" ]
then
   echo "variable freq is empty…exiting"
   exit 1
Fi

From: Anthony Urso <an...@cs.ucla.edu>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
Date: Tuesday, April 30, 2013 7:20 PM
To: "user@hive.apache.org" <us...@hive.apache.org>, sumit ghosh <su...@yahoo.com>
Subject: Re: Variable resolution Fails

Your shell is expanding the variable ${env:freq}, which doesn't exist in the shell's environment, so hive is getting the empty string in that place.  If you are always intending to run your query like this, just use ${freq} which will be expanded as expected by bash and then passed to hive.

Cheers,
Anthony

On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh <su...@yahoo.com> wrote:

Hi,
>
>The following variable freq fails to resolve:
>
>bash-4.1$ export freq=MNTH
>bash-4.1$ echo $freq
>MNTH
>bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
>Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
>Hive history file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
>FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr' 'from' in select clause 
>bash-4.1$
>
>Here dual is a table with 1 row.
>What am I am doing wrong? When I try to resolve freq - it is empty!!
>
>
>  $ hive -S -e "select '${env:freq}' as dr  from dual"
> 
>  $
> 
>Thanks,
>Sumit

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Variable resolution Fails

Posted by Ted Yu <yu...@gmail.com>.

Naidu:
Please don't hijack existing thread. Your questions are not directly related to Hive. 

Cheers

On May 1, 2013, at 12:53 AM, Naidu MS <sa...@gmail.com> wrote:

> Hi i have two questions regarding hdfs and jps utility
> 
> I am new to Hadoop and started leraning hadoop from the past week
> 
> 1.when ever i start start-all.sh and jps in console it showing the processes started
> 
> naidu@naidu:~/work/hadoop-1.0.4/bin$ jps
> 22283 NameNode
> 23516 TaskTracker
> 26711 Jps
> 22541 DataNode
> 23255 JobTracker
> 22813 SecondaryNameNode
> Could not synchronize with target
> 
> But along with the list of process stared it always showing " Could not synchronize with target" in the jps output. What is meant by "Could not synchronize with target"?  Can some one explain why this is happening?
> 
> 
> 2.Is it possible to format namenode multiple  times? When i enter the  namenode -format command, it not formatting the name node and showing the following ouput.
> 
> naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: 
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = naidu/127.0.0.1
> STARTUP_MSG:   args = [-format]
> STARTUP_MSG:   version = 1.0.4
> STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
> ************************************************************/
> Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y
> Format aborted in /home/naidu/dfs/namenode
> 13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: 
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1
> 
> ************************************************************/
> 
> Can someone help me in understanding this? Why is it not possible to format name node multiple times?
> 
> 
> On Wed, May 1, 2013 at 8:14 AM, Sanjay Subramanian <Sa...@wizecommerce.com> wrote:
>> +1  agreed
>> 
>> Also as a general script programming practice I check if the variables I am going to use are NON empty before using them…nothing related to Hive scripts
>> 
>> If [ ${freq} == "" ]
>> then
>>    echo "variable freq is empty…exiting"
>>    exit 1
>> Fi
>>  
>> 
>> 
>> From: Anthony Urso <an...@cs.ucla.edu>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>> Date: Tuesday, April 30, 2013 7:20 PM
>> To: "user@hive.apache.org" <us...@hive.apache.org>, sumit ghosh <su...@yahoo.com>
>> Subject: Re: Variable resolution Fails
>> 
>> Your shell is expanding the variable ${env:freq}, which doesn't exist in the shell's environment, so hive is getting the empty string in that place.  If you are always intending to run your query like this, just use ${freq} which will be expanded as expected by bash and then passed to hive.
>> 
>> Cheers,
>> Anthony
>> 
>> 
>> On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh <su...@yahoo.com> wrote:
>>> Hi,
>>>  
>>> The following variable freq fails to resolve:
>>>  
>>> bash-4.1$ export freq=MNTH
>>> bash-4.1$ echo $freq
>>> MNTH
>>> bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
>>> Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
>>> Hive history file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
>>> FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr' 'from' in select clause
>>> bash-4.1$
>>>  
>>> Here dual is a table with 1 row.
>>> What am I am doing wrong? When I try to resolve freq - it is empty!!
>>>  
>>>   $ hive -S -e "select '${env:freq}' as dr  from dual"
>>>  
>>>   $
>>>  
>>> Thanks,
>>> Sumit
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
>

Re: Variable resolution Fails

Posted by Nitin Pawar <ni...@gmail.com>.

*Could not synchronize with target -- * this is for when jps could not
connect to application waiting for direct attach
this is fixed in sun jdk 1.6 update 7. you may want to check which version
of jdk you are using

it is not required to format the namenode again and again. When you format
the namenode it loses all the meta data eventually meaning loss of all the
data.
Still if you want to reformat then
*Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
its asking for capital letter Y anything else will be considered a No
*
*


On Wed, May 1, 2013 at 1:23 PM, Naidu MS <sa...@gmail.com>wrote:

> Hi i have two questions regarding hdfs and jps utility
>
> I am new to Hadoop and started leraning hadoop from the past week
>
> 1.when ever i start start-all.sh and jps in console it showing the
> processes started
>
> *naidu@naidu:~/work/hadoop-1.0.4/bin$ jps*
> *22283 NameNode*
> *23516 TaskTracker*
> *26711 Jps*
> *22541 DataNode*
> *23255 JobTracker*
> *22813 SecondaryNameNode*
> *Could not synchronize with target*
>
> But along with the list of process stared it always showing *" Could not
> synchronize with target" *in the jps output. What is meant by "Could not
> synchronize with target"?  Can some one explain why this is happening?
>
>
> 2.Is it possible to format namenode multiple  times? When i enter the
>  namenode -format command, it not formatting the name node and showing the
> following ouput.
>
> *naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format*
> *Warning: $HADOOP_HOME is deprecated.*
> *
> *
> *13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: *
> */*************************************************************
> *STARTUP_MSG: Starting NameNode*
> *STARTUP_MSG:   host = naidu/127.0.0.1*
> *STARTUP_MSG:   args = [-format]*
> *STARTUP_MSG:   version = 1.0.4*
> *STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012*
> *************************************************************/*
> *Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
> *Format aborted in /home/naidu/dfs/namenode*
> *13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
> */*************************************************************
> *SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1*
> *
> *
> *************************************************************/*
>
> Can someone help me in understanding this? Why is it not possible to
> format name node multiple times?
>
>
> On Wed, May 1, 2013 at 8:14 AM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>>  +1  agreed
>>
>>  Also as a general script programming practice I check if the variables
>> I am going to use are NON empty before using them…nothing related to Hive
>> scripts
>>
>>  If [ ${freq} == "" ]
>> then
>>    echo "variable freq is empty…exiting"
>>    exit 1
>> Fi
>>
>>
>>
>>   From: Anthony Urso <an...@cs.ucla.edu>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>> Date: Tuesday, April 30, 2013 7:20 PM
>> To: "user@hive.apache.org" <us...@hive.apache.org>, sumit ghosh <
>> sumitkg@yahoo.com>
>> Subject: Re: Variable resolution Fails
>>
>>   Your shell is expanding the variable ${env:freq}, which doesn't exist
>> in the shell's environment, so hive is getting the empty string in that
>> place.  If you are always intending to run your query like this, just use
>> ${freq} which will be expanded as expected by bash and then passed to hive.
>>
>> Cheers,
>> Anthony
>>
>>
>> On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh <su...@yahoo.com> wrote:
>>
>>>  Hi,
>>>
>>> The following variable *freq* fails to resolve:
>>>
>>>   bash-4.1$ export freq=MNTH
>>> bash-4.1$ echo $freq
>>> MNTH
>>> bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
>>> Logging initialized using configuration in
>>> file:/etc/hive/conf.dist/hive-log4j.properties
>>> Hive history
>>> file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
>>> FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr'
>>> 'from' in select clause
>>> bash-4.1$
>>>
>>> Here dual is a table with 1 row.
>>> What am I am doing wrong? When I try to resolve freq - it is empty!!
>>>
>>>   $ hive -S -e "select '${env:freq}' as dr  from dual"
>>>
>>>   $
>>>
>>> Thanks,
>>> Sumit
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>


-- 
Nitin Pawar

Re: Variable resolution Fails

Posted by Naidu MS <sa...@gmail.com>.

Hi i have two questions regarding hdfs and jps utility

I am new to Hadoop and started leraning hadoop from the past week

1.when ever i start start-all.sh and jps in console it showing the
processes started

*naidu@naidu:~/work/hadoop-1.0.4/bin$ jps*
*22283 NameNode*
*23516 TaskTracker*
*26711 Jps*
*22541 DataNode*
*23255 JobTracker*
*22813 SecondaryNameNode*
*Could not synchronize with target*

But along with the list of process stared it always showing *" Could not
synchronize with target" *in the jps output. What is meant by "Could not
synchronize with target"?  Can some one explain why this is happening?


2.Is it possible to format namenode multiple  times? When i enter the
 namenode -format command, it not formatting the name node and showing the
following ouput.

*naidu@naidu:~/work/hadoop-1.0.4/bin$ hadoop namenode -format*
*Warning: $HADOOP_HOME is deprecated.*
*
*
*13/05/01 12:08:04 INFO namenode.NameNode: STARTUP_MSG: *
*/*************************************************************
*STARTUP_MSG: Starting NameNode*
*STARTUP_MSG:   host = naidu/127.0.0.1*
*STARTUP_MSG:   args = [-format]*
*STARTUP_MSG:   version = 1.0.4*
*STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012*
*************************************************************/*
*Re-format filesystem in /home/naidu/dfs/namenode ? (Y or N) y*
*Format aborted in /home/naidu/dfs/namenode*
*13/05/01 12:08:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
*/*************************************************************
*SHUTDOWN_MSG: Shutting down NameNode at naidu/127.0.0.1*
*
*
*************************************************************/*

Can someone help me in understanding this? Why is it not possible to format
name node multiple times?


On Wed, May 1, 2013 at 8:14 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  +1  agreed
>
>  Also as a general script programming practice I check if the variables I
> am going to use are NON empty before using them…nothing related to Hive
> scripts
>
>  If [ ${freq} == "" ]
> then
>    echo "variable freq is empty…exiting"
>    exit 1
> Fi
>
>
>
>   From: Anthony Urso <an...@cs.ucla.edu>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Tuesday, April 30, 2013 7:20 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>, sumit ghosh <
> sumitkg@yahoo.com>
> Subject: Re: Variable resolution Fails
>
>   Your shell is expanding the variable ${env:freq}, which doesn't exist
> in the shell's environment, so hive is getting the empty string in that
> place.  If you are always intending to run your query like this, just use
> ${freq} which will be expanded as expected by bash and then passed to hive.
>
> Cheers,
> Anthony
>
>
> On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh <su...@yahoo.com> wrote:
>
>>  Hi,
>>
>> The following variable *freq* fails to resolve:
>>
>>   bash-4.1$ export freq=MNTH
>> bash-4.1$ echo $freq
>> MNTH
>> bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
>> Logging initialized using configuration in
>> file:/etc/hive/conf.dist/hive-log4j.properties
>> Hive history
>> file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
>> FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr'
>> 'from' in select clause
>> bash-4.1$
>>
>> Here dual is a table with 1 row.
>> What am I am doing wrong? When I try to resolve freq - it is empty!!
>>
>>   $ hive -S -e "select '${env:freq}' as dr  from dual"
>>
>>   $
>>
>> Thanks,
>> Sumit
>>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: Variable resolution Fails

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

+1  agreed

Also as a general script programming practice I check if the variables I am going to use are NON empty before using them…nothing related to Hive scripts

If [ ${freq} == "" ]
then
   echo "variable freq is empty…exiting"
   exit 1
Fi



From: Anthony Urso <an...@cs.ucla.edu>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, April 30, 2013 7:20 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, sumit ghosh <su...@yahoo.com>>
Subject: Re: Variable resolution Fails

Your shell is expanding the variable ${env:freq}, which doesn't exist in the shell's environment, so hive is getting the empty string in that place.  If you are always intending to run your query like this, just use ${freq} which will be expanded as expected by bash and then passed to hive.

Cheers,
Anthony


On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh <su...@yahoo.com>> wrote:
Hi,

The following variable freq fails to resolve:

bash-4.1$ export freq=MNTH
bash-4.1$ echo $freq
MNTH
bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr' 'from' in select clause
bash-4.1$

Here dual is a table with 1 row.
What am I am doing wrong? When I try to resolve freq - it is empty!!

  $ hive -S -e "select '${env:freq}' as dr  from dual"

  $

Thanks,
Sumit


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Variable resolution Fails

Posted by Anthony Urso <an...@cs.ucla.edu>.

Your shell is expanding the variable ${env:freq}, which doesn't exist in
the shell's environment, so hive is getting the empty string in that place.
 If you are always intending to run your query like this, just use ${freq}
which will be expanded as expected by bash and then passed to hive.

Cheers,
Anthony

On Tue, Apr 30, 2013 at 4:40 PM, sumit ghosh <su...@yahoo.com> wrote:

> Hi,
>
> The following variable *freq* fails to resolve:
>
>   bash-4.1$ export freq=MNTH
> bash-4.1$ echo $freq
> MNTH
> bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
> Logging initialized using configuration in
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history
> file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
> FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr'
> 'from' in select clause
> bash-4.1$
>
> Here dual is a table with 1 row.
> What am I am doing wrong? When I try to resolve freq - it is empty!!
>
>   $ hive -S -e "select '${env:freq}' as dr  from dual"
>
>   $
>
> Thanks,
> Sumit
>

Variable resolution Fails

Posted by sumit ghosh <su...@yahoo.com>.

Hi,
 
The following variable freq fails to resolve:
 
bash-4.1$ export freq=MNTH
bash-4.1$ echo $freq
MNTH
bash-4.1$ hive -e "select ${env:freq} as dr  from dual"
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/hadoop1/hive_querylog/sumighos/hive_job_log_sumighos_201304302321_1867815625.txt
FAILED: ParseException line 1:8 cannot recognize input near 'as' 'dr' 'from' in select clause 
bash-4.1$

Here dual is a table with 1 row.
What am I am doing wrong? When I try to resolve freq - it is empty!!
 
  $ hive -S -e "select '${env:freq}' as dr  from dual"
 
  $
 
Thanks,
Sumit

Re: Can a bucket be added to a partition?

Posted by Nitin Pawar <ni...@gmail.com>.

you can add the buckets to a paritions no problems with that.

But to have a bucketed map join what you need is, both the tables need to
bucketed and they need to be in the multiplication factor of each other
like if you have X number of buckets on table A then table B will need NX
number of partitions where N >= 1

there is no condition on partition keys for join condition. Hive only
supports equi joins so its always good idea to have table partitioned on
same column so that you don't have to scan the entire table to match the
column values and you can restrict the data to table in where condition


On Thu, May 2, 2013 at 10:08 AM, Jie Li <ji...@cs.duke.edu> wrote:

> I tried this interesting idea but also felt a little confusing.
>
> I guess you'll need to change the table schema so that it has both buckets
> and partitions.
>
> And to take advantage of the buckets inside the partitions, for example
> using the bucket map join, you'll need to specify one particular partition
> of the table. Seems HIVE-3171 has fixed this problem, but I'm still not
> very clear how two partitioned tables can be joined using bucket map join?
> Do they need the same partition keys and bucket keys, and then Hive will do
> partition-wise join as well as bucket-wise join?
>
> Jie
>
>
> On Tue, Apr 30, 2013 at 12:03 PM, Babe Ruth <gt...@hotmail.com>wrote:
>
>> Hello,
>>  I have a table that is already created and is partitioned dynamically by
>> day.     i would like all future partitions to be bucketed on two columns.
>>
>> Can I add a bucket to a partitions in an already existing table?
>>
>>
>>
>> Thanks,
>> George
>>
>
>


-- 
Nitin Pawar

Re: Can a bucket be added to a partition?

Posted by Jie Li <ji...@cs.duke.edu>.

I tried this interesting idea but also felt a little confusing.

I guess you'll need to change the table schema so that it has both buckets
and partitions.

And to take advantage of the buckets inside the partitions, for example
using the bucket map join, you'll need to specify one particular partition
of the table. Seems HIVE-3171 has fixed this problem, but I'm still not
very clear how two partitioned tables can be joined using bucket map join?
Do they need the same partition keys and bucket keys, and then Hive will do
partition-wise join as well as bucket-wise join?

Jie

On Tue, Apr 30, 2013 at 12:03 PM, Babe Ruth <gt...@hotmail.com>wrote:

> Hello,
>  I have a table that is already created and is partitioned dynamically by
> day.     i would like all future partitions to be bucketed on two columns.
>
> Can I add a bucket to a partitions in an already existing table?
>
>
>
> Thanks,
> George
>