You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Divya Gehlot <di...@gmail.com> on 2015/12/30 11:43:41 UTC

how does Hive Partitioning works ?

Hi,
I am new bee to hive and trying to understand the hive partitioning .
My files are in CSV format
Steps which I followed
CREATE EXTERNAL TABLE IF NOT EXISTS loan_depo_part(COLUMN1 String ,COLUMN2
String ,COLUMN3 String ,
                                               COLUMN4 String,COLUMN5
String,COLUMN6 String,
                                               COLUMN7 Int ,COLUMN8 Int
,COLUMN9 String ,
                                               COLUMN10 String ,COLUMN11
String ,COLUMN12 String,
                                               COLUMN13 String ,COLUMN14
String ,
                                               COLUMN15 String ,COLUMN16
String ,
                                               COLUMN17 String ,COLUMN18
String ,
                                               COLUMN19 String ,COLUMN20
String ,
                                               COLUMN21 String ,COLUMN22
String )
COMMENT 'testing Partition'
PARTITIONED BY (Year String,Month String ,Day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES ("skip.header.line.count"="1") ;



ALTER TABLE loan_depo_part ADD IF NOT EXISTS PARTITION(
Year=2015,Month=01,Day=01);

ALTER TABLE loan_depo_part PARTITION(Year=2015,Month=01,Day=01)
 SET LOCATION
'hdfs://namenode:8020/tmp/TestDivya/HiveInput/year=2015/month=01/day=01/';

Whereas my HDFS data location is
/TestDivya/HiveInput/year=2015/month=01/day=01/
I have few queries regarding the above partioning :
1. It creates the table when run the second step and gives the select
command it doesnt diplay any data
2. Do I need to create normal external table first and the partitioned one
next
 and then do the insert overwrite.

Basically I am not able to understand the partioning things mentioned above
I followed this link
<http://deanwampler.github.io/polyglotprogramming/papers/Hive-SQLforHadoop.pdf>

Would really appreciate the help/pointers.

Thanks,
Divya

RE: how does Hive Partitioning works ?

Posted by Divya Gehlot <di...@gmail.com>.
My apologies for making you confused .
Steps which I followed are :
1.I created the external table  with partitioning and gave the hdfs
location of the data (definitely the data files resides there )
My external table location ( inputs data directory structure) looks like
below:
/tmp/TestDivya/HiveInput/"many csvs files"
2.As I want to implement partitioning  on daily basis .  I haves used the
Alter table add partition (year=2015 , month=01 ,day=01) set location
'/tmp/Hiveinput/2015/01/01'

Now my confusion arises here,when is when I execute the alter statement :
1.As I already gave the location while creating the external table . How
does the above alter table statement works ?
Which data will get loaded here as I have so many  CSV field inside my give
input directory.
Sorry for the naive question .
But I am trying to understand how the alter table statement works .
As I will be executing only above two steps
1.Creating the external table with location and  partitioning.
2.altering the table by adding partition and giving location .

I do understand the loading the input data for particular location into the
partition .

But so much Confused at alter table. statement .

Once again Thanks a lot for your help and guidance.
Thanks,
Divya
On Dec 30, 2015 8:20 PM, "Mich Talebzadeh" <mi...@peridale.co.uk> wrote:

> Hi,
>
>
>
> I gave a generic reply to partitioning and bucketing in Hive back in April
> for a similar question have a check on this. Here we go. Hope it helps
>
>
>
>
>
> As you may know already in RDBMS partitioning (dividing a very large table
> into sub-tables conceptually) is deployed to address three areast.
>
>
>
> 1.     Availability -- each partition can reside on a different
> tablespace/device. Hence a problem with a tablespace/device will take out a
> slice of the table's data instead of the whole thing. This does not really
> ap[ply to Hive with 3 block replication as standard
>
> 2.     Manageability -- partitioning provides a mechanism for splitting
> whole table jobs into clear batches. Partition exchange can make it easier
> to bulk load data. Defragging, moving older partitions to lower tier
> storage, updating stats etc Most of these benefits apply to Hive as well.
> Please check the docs.
>
> 3.     Performance -- partition elimination
>
>
>
> In simplest form (excluding composite partitioning), Hive partitioning
> will be similar to “range partitioning” in RDBMS. One can partition a table
> (say *partitioned_table* as shown below which is batch loaded from
> *non_partitioned_table*) -- by country, year, month etc. Each partition
> will be stored in Hive under sub-directory *table/year/month* like below
>
>
>
> /user/hive/warehouse/scratchpad.db
> */partitioned_table/country=Italy/year=2014/month=Feb*
>
>
>
> Hive does not have the concept of indexes local or global as yet. So
> without partitioning a simple query in Hive will have to read the entire
> table even if it is filtering a smaller result set (WHERE CLAUSE). This
> becomes a bottleneck for running multiple MapReduce jobs over a large table. So
> partitioning will help localise the query by hitting the relevant
> sub-directory or sub-directories only. There is another important aspect
> with Hive as well. The locking granularity will be determined by the lowest
> slice in the filing system (sub-directory). So entering data into the above
> partition/file, will take an exclusive lock on that partition/file but
> crucially the rest of partitions will be available (assuming concurrency in
> Hive is enabled).
>
>
>
>
> +----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+
>
> |  lockid  |  database   |         table          |
> partition              | lock_state  |  lock_type   | transaction_id  |
> last_heartbeat  |  acquired_at   |  user   | hostname  |
>
>
> +----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+
>
> | Lock ID  | Database    | Table                  |
> Partition                          | State       | Type         |
> Transaction ID  | Last Hearbeat   | Acquired At    | User    | Hostname  |
>
> | 1711     | scratchpad  | non_partitioned_table  |
> NULL                               | ACQUIRED    | *SHARED_READ*  |
> NULL            | 1428862154670   | 1428862151904  | hduser  | rhes564   |
>
> | 1711     | scratchpad  | *partitioned_table      |
> country=Italy/year=2014/month=Feb*  | ACQUIRED    | *EXCLUSIVE *   |
> NULL            | 1428862154670   | 1428862151905  | hduser  | rhes564   |
>
>
> +----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+
>
>
>
> Now your point 2, bucketing in Hive refers to hash partitioning where a
> hashing function is applied. Likewise an RDBMS, Hive will apply a linear
> hashing algorithm to prevent data from clustering within specific
> partitions. Hashing is very effective if the column selected for bucketing
> has very high selectivity like an ID column where selectivity (*select
> count(distinct(column))/count(column)* ) = 1.  In this case, the created
> partitions/ files will be as evenly sized as possible. In a nutshell
> bucketing is a method to get data evenly distributed over many
> partitions/files.  One should define the number of buckets by a power of
> two -- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketing
> will help concurrency in Hive. It may even allow a *partition wise join*
> i.e. a join between two tables that are bucketed on the same column with
> the same number of buckets (anyone has tried this?)
>
>
>
> One more things. When one defines the number of buckets at table creation
> level in Hive, the number of partitions/files will be fixed. In contrast,
> with partitioning you do not have this limitation.
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Divya Gehlot [mailto:divya.htconex@gmail.com]
> *Sent:* 30 December 2015 10:44
> *To:* user@hive.apache.org
> *Subject:* how does Hive Partitioning works ?
>
>
>
> Hi,
>
> I am new bee to hive and trying to understand the hive partitioning .
> My files are in CSV format
> Steps which I followed
> CREATE EXTERNAL TABLE IF NOT EXISTS loan_depo_part(COLUMN1 String ,COLUMN2
> String ,COLUMN3 String ,
>                                                COLUMN4 String,COLUMN5
> String,COLUMN6 String,
>                                                COLUMN7 Int ,COLUMN8 Int
> ,COLUMN9 String ,
>                                                COLUMN10 String ,COLUMN11
> String ,COLUMN12 String,
>                                                COLUMN13 String ,COLUMN14
> String ,
>                                                COLUMN15 String ,COLUMN16
> String ,
>                                                COLUMN17 String ,COLUMN18
> String ,
>                                                COLUMN19 String ,COLUMN20
> String ,
>                                                COLUMN21 String ,COLUMN22
> String )
> COMMENT 'testing Partition'
> PARTITIONED BY (Year String,Month String ,Day String)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES ("skip.header.line.count"="1") ;
>
>
>
> ALTER TABLE loan_depo_part ADD IF NOT EXISTS PARTITION(
> Year=2015,Month=01,Day=01);
>
> ALTER TABLE loan_depo_part PARTITION(Year=2015,Month=01,Day=01)
>  SET LOCATION
> 'hdfs://namenode:8020/tmp/TestDivya/HiveInput/year=2015/month=01/day=01/';
>
>
>
> Whereas my HDFS data location is
> /TestDivya/HiveInput/year=2015/month=01/day=01/
>
> I have few queries regarding the above partioning :
>
> 1. It creates the table when run the second step and gives the select
> command it doesnt diplay any data
>
> 2. Do I need to create normal external table first and the partitioned one
> next
>
>  and then do the insert overwrite.
>
> Basically I am not able to understand the partioning things mentioned
> above
>
> I followed this link
> <http://deanwampler.github.io/polyglotprogramming/papers/Hive-SQLforHadoop.pdf>
>
> Would really appreciate the help/pointers.
>
> Thanks,
>
> Divya
>
>
>

Re: how does Hive Partitioning works ?

Posted by Divya Gehlot <di...@gmail.com>.
Thanks Mich for such a good explanation .
I saved your email in my notes .


Thanks,
Divya

On 30 December 2015 at 20:20, Mich Talebzadeh <mi...@peridale.co.uk> wrote:

> Hi,
>
>
>
> I gave a generic reply to partitioning and bucketing in Hive back in April
> for a similar question have a check on this. Here we go. Hope it helps
>
>
>
>
>
> As you may know already in RDBMS partitioning (dividing a very large table
> into sub-tables conceptually) is deployed to address three areast.
>
>
>
> 1.     Availability -- each partition can reside on a different
> tablespace/device. Hence a problem with a tablespace/device will take out a
> slice of the table's data instead of the whole thing. This does not really
> ap[ply to Hive with 3 block replication as standard
>
> 2.     Manageability -- partitioning provides a mechanism for splitting
> whole table jobs into clear batches. Partition exchange can make it easier
> to bulk load data. Defragging, moving older partitions to lower tier
> storage, updating stats etc Most of these benefits apply to Hive as well.
> Please check the docs.
>
> 3.     Performance -- partition elimination
>
>
>
> In simplest form (excluding composite partitioning), Hive partitioning
> will be similar to “range partitioning” in RDBMS. One can partition a table
> (say *partitioned_table* as shown below which is batch loaded from
> *non_partitioned_table*) -- by country, year, month etc. Each partition
> will be stored in Hive under sub-directory *table/year/month* like below
>
>
>
> /user/hive/warehouse/scratchpad.db
> */partitioned_table/country=Italy/year=2014/month=Feb*
>
>
>
> Hive does not have the concept of indexes local or global as yet. So
> without partitioning a simple query in Hive will have to read the entire
> table even if it is filtering a smaller result set (WHERE CLAUSE). This
> becomes a bottleneck for running multiple MapReduce jobs over a large table. So
> partitioning will help localise the query by hitting the relevant
> sub-directory or sub-directories only. There is another important aspect
> with Hive as well. The locking granularity will be determined by the lowest
> slice in the filing system (sub-directory). So entering data into the above
> partition/file, will take an exclusive lock on that partition/file but
> crucially the rest of partitions will be available (assuming concurrency in
> Hive is enabled).
>
>
>
>
> +----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+
>
> |  lockid  |  database   |         table          |
> partition              | lock_state  |  lock_type   | transaction_id  |
> last_heartbeat  |  acquired_at   |  user   | hostname  |
>
>
> +----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+
>
> | Lock ID  | Database    | Table                  |
> Partition                          | State       | Type         |
> Transaction ID  | Last Hearbeat   | Acquired At    | User    | Hostname  |
>
> | 1711     | scratchpad  | non_partitioned_table  |
> NULL                               | ACQUIRED    | *SHARED_READ*  |
> NULL            | 1428862154670   | 1428862151904  | hduser  | rhes564   |
>
> | 1711     | scratchpad  | *partitioned_table      |
> country=Italy/year=2014/month=Feb*  | ACQUIRED    | *EXCLUSIVE *   |
> NULL            | 1428862154670   | 1428862151905  | hduser  | rhes564   |
>
>
> +----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+
>
>
>
> Now your point 2, bucketing in Hive refers to hash partitioning where a
> hashing function is applied. Likewise an RDBMS, Hive will apply a linear
> hashing algorithm to prevent data from clustering within specific
> partitions. Hashing is very effective if the column selected for bucketing
> has very high selectivity like an ID column where selectivity (*select
> count(distinct(column))/count(column)* ) = 1.  In this case, the created
> partitions/ files will be as evenly sized as possible. In a nutshell
> bucketing is a method to get data evenly distributed over many
> partitions/files.  One should define the number of buckets by a power of
> two -- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketing
> will help concurrency in Hive. It may even allow a *partition wise join*
> i.e. a join between two tables that are bucketed on the same column with
> the same number of buckets (anyone has tried this?)
>
>
>
> One more things. When one defines the number of buckets at table creation
> level in Hive, the number of partitions/files will be fixed. In contrast,
> with partitioning you do not have this limitation.
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Divya Gehlot [mailto:divya.htconex@gmail.com]
> *Sent:* 30 December 2015 10:44
> *To:* user@hive.apache.org
> *Subject:* how does Hive Partitioning works ?
>
>
>
> Hi,
>
> I am new bee to hive and trying to understand the hive partitioning .
> My files are in CSV format
> Steps which I followed
> CREATE EXTERNAL TABLE IF NOT EXISTS loan_depo_part(COLUMN1 String ,COLUMN2
> String ,COLUMN3 String ,
>                                                COLUMN4 String,COLUMN5
> String,COLUMN6 String,
>                                                COLUMN7 Int ,COLUMN8 Int
> ,COLUMN9 String ,
>                                                COLUMN10 String ,COLUMN11
> String ,COLUMN12 String,
>                                                COLUMN13 String ,COLUMN14
> String ,
>                                                COLUMN15 String ,COLUMN16
> String ,
>                                                COLUMN17 String ,COLUMN18
> String ,
>                                                COLUMN19 String ,COLUMN20
> String ,
>                                                COLUMN21 String ,COLUMN22
> String )
> COMMENT 'testing Partition'
> PARTITIONED BY (Year String,Month String ,Day String)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> TBLPROPERTIES ("skip.header.line.count"="1") ;
>
>
>
> ALTER TABLE loan_depo_part ADD IF NOT EXISTS PARTITION(
> Year=2015,Month=01,Day=01);
>
> ALTER TABLE loan_depo_part PARTITION(Year=2015,Month=01,Day=01)
>  SET LOCATION
> 'hdfs://namenode:8020/tmp/TestDivya/HiveInput/year=2015/month=01/day=01/';
>
>
>
> Whereas my HDFS data location is
> /TestDivya/HiveInput/year=2015/month=01/day=01/
>
> I have few queries regarding the above partioning :
>
> 1. It creates the table when run the second step and gives the select
> command it doesnt diplay any data
>
> 2. Do I need to create normal external table first and the partitioned one
> next
>
>  and then do the insert overwrite.
>
> Basically I am not able to understand the partioning things mentioned
> above
>
> I followed this link
> <http://deanwampler.github.io/polyglotprogramming/papers/Hive-SQLforHadoop.pdf>
>
> Would really appreciate the help/pointers.
>
> Thanks,
>
> Divya
>
>
>

RE: how does Hive Partitioning works ?

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Hi,

 

I gave a generic reply to partitioning and bucketing in Hive back in April for a similar question have a check on this. Here we go. Hope it helps

 

 

As you may know already in RDBMS partitioning (dividing a very large table into sub-tables conceptually) is deployed to address three areast. 

 

1.     Availability -- each partition can reside on a different tablespace/device. Hence a problem with a tablespace/device will take out a slice of the table's data instead of the whole thing. This does not really ap[ply to Hive with 3 block replication as standard

2.     Manageability -- partitioning provides a mechanism for splitting whole table jobs into clear batches. Partition exchange can make it easier to bulk load data. Defragging, moving older partitions to lower tier storage, updating stats etc Most of these benefits apply to Hive as well. Please check the docs. 

3.     Performance -- partition elimination 

 

In simplest form (excluding composite partitioning), Hive partitioning will be similar to “range partitioning” in RDBMS. One can partition a table (say partitioned_table as shown below which is batch loaded from non_partitioned_table) -- by country, year, month etc. Each partition will be stored in Hive under sub-directory table/year/month like below

 

/user/hive/warehouse/scratchpad.db/partitioned_table/country=Italy/year=2014/month=Feb

 

Hive does not have the concept of indexes local or global as yet. So without partitioning a simple query in Hive will have to read the entire table even if it is filtering a smaller result set (WHERE CLAUSE). This becomes a bottleneck for running multiple MapReduce jobs over a large table. So partitioning will help localise the query by hitting the relevant sub-directory or sub-directories only. There is another important aspect with Hive as well. The locking granularity will be determined by the lowest slice in the filing system (sub-directory). So entering data into the above partition/file, will take an exclusive lock on that partition/file but crucially the rest of partitions will be available (assuming concurrency in Hive is enabled). 

 

+----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+

|  lockid  |  database   |         table          |             partition              | lock_state  |  lock_type   | transaction_id  | last_heartbeat  |  acquired_at   |  user   | hostname  |

+----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+

| Lock ID  | Database    | Table                  | Partition                          | State       | Type         | Transaction ID  | Last Hearbeat   | Acquired At    | User    | Hostname  |

| 1711     | scratchpad  | non_partitioned_table  | NULL                               | ACQUIRED    | SHARED_READ  | NULL            | 1428862154670   | 1428862151904  | hduser  | rhes564   |

| 1711     | scratchpad  | partitioned_table      | country=Italy/year=2014/month=Feb  | ACQUIRED    | EXCLUSIVE    | NULL            | 1428862154670   | 1428862151905  | hduser  | rhes564   |

+----------+-------------+------------------------+------------------------------------+-------------+--------------+-----------------+-----------------+----------------+---------+-----------+--+

 

Now your point 2, bucketing in Hive refers to hash partitioning where a hashing function is applied. Likewise an RDBMS, Hive will apply a linear hashing algorithm to prevent data from clustering within specific partitions. Hashing is very effective if the column selected for bucketing has very high selectivity like an ID column where selectivity (select count(distinct(column))/count(column) ) = 1.  In this case, the created partitions/ files will be as evenly sized as possible. In a nutshell bucketing is a method to get data evenly distributed over many partitions/files.  One should define the number of buckets by a power of two -- 2^n,  like 2, 4, 8, 16 etc to achieve best results. Again bucketing will help concurrency in Hive. It may even allow a partition wise join i.e. a join between two tables that are bucketed on the same column with the same number of buckets (anyone has tried this?)

 

One more things. When one defines the number of buckets at table creation level in Hive, the number of partitions/files will be fixed. In contrast, with partitioning you do not have this limitation. 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Divya Gehlot [mailto:divya.htconex@gmail.com] 
Sent: 30 December 2015 10:44
To: user@hive.apache.org
Subject: how does Hive Partitioning works ?

 

Hi,

I am new bee to hive and trying to understand the hive partitioning .
My files are in CSV format 
Steps which I followed 
CREATE EXTERNAL TABLE IF NOT EXISTS loan_depo_part(COLUMN1 String ,COLUMN2 String ,COLUMN3 String ,
                                               COLUMN4 String,COLUMN5 String,COLUMN6 String,
                                               COLUMN7 Int ,COLUMN8 Int ,COLUMN9 String ,
                                               COLUMN10 String ,COLUMN11 String ,COLUMN12 String,
                                               COLUMN13 String ,COLUMN14 String ,
                                               COLUMN15 String ,COLUMN16 String ,
                                               COLUMN17 String ,COLUMN18 String ,
                                               COLUMN19 String ,COLUMN20 String ,
                                               COLUMN21 String ,COLUMN22 String )
COMMENT 'testing Partition'
PARTITIONED BY (Year String,Month String ,Day String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES ("skip.header.line.count"="1") ;



ALTER TABLE loan_depo_part ADD IF NOT EXISTS PARTITION( Year=2015,Month=01,Day=01);

ALTER TABLE loan_depo_part PARTITION(Year=2015,Month=01,Day=01)
 SET LOCATION 
'hdfs://namenode:8020/tmp/TestDivya/HiveInput/year=2015/month=01/day=01/';

 

Whereas my HDFS data location is /TestDivya/HiveInput/year=2015/month=01/day=01/

I have few queries regarding the above partioning :

1. It creates the table when run the second step and gives the select command it doesnt diplay any data 

2. Do I need to create normal external table first and the partitioned one next 

 and then do the insert overwrite.

Basically I am not able to understand the partioning things mentioned above 

I followed this link  <http://deanwampler.github.io/polyglotprogramming/papers/Hive-SQLforHadoop.pdf> 

Would really appreciate the help/pointers.

Thanks,

Divya