You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Kumar Jayapal <kj...@gmail.com> on 2015/04/30 00:26:33 UTC

how to load data

Hello All,


I have this table


CREATE  TABLE DBCLOC(
   BLwhse int COMMENT 'DECIMAL(5,0) Whse',
   BLsdat string COMMENT 'DATE Sales Date',
   BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
   BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
   BLscnr string COMMENT 'CHAR(1) Scenario',
   BLareq string COMMENT 'CHAR(1) Act Requested',
   BLatak string COMMENT 'CHAR(1) Act Taken',
   BLmsgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
STORED AS PARQUET;

have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
table above


Can any one tell me what is the most efficient way of doing it.


Thanks
Jay

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

Follow the links I sent you already.
 On Apr 30, 2015 11:52 AM, "Kumar Jayapal" <kj...@gmail.com> wrote:

> Hi Alex,
>
> How to create "external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde" ?
>
> Thanks
> Jay
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

Follow the links I sent you already.
 On Apr 30, 2015 11:52 AM, "Kumar Jayapal" <kj...@gmail.com> wrote:

> Hi Alex,
>
> How to create "external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde" ?
>
> Thanks
> Jay
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

Follow the links I sent you already.
 On Apr 30, 2015 11:52 AM, "Kumar Jayapal" <kj...@gmail.com> wrote:

> Hi Alex,
>
> How to create "external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde" ?
>
> Thanks
> Jay
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

Follow the links I sent you already.
 On Apr 30, 2015 11:52 AM, "Kumar Jayapal" <kj...@gmail.com> wrote:

> Hi Alex,
>
> How to create "external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde" ?
>
> Thanks
> Jay
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi Alex,

How to create "external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde" ?

Thanks
Jay

On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you

On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Drake민영근 <dr...@nexr.com>.

Hi, Jay

It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".

Thanks.

Drake 민영근 Ph.D
kt NexR

On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hi
>
> I have created a table as you said,
>
> CREATE  TABLE Seq1 (
>    d5whse int COMMENT 'DECIMAL(5,0) Whse',
>    d5sdat string COMMENT 'DATE Sales Date',
>    d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    d5scnr string COMMENT 'CHAR(1) Scenario',
>    d5areq string COMMENT 'CHAR(1) Act Requested',
>    d5atak string COMMENT 'CHAR(1) Act Taken',
>    d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> 	... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> 	... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: how to load data

Posted by Drake민영근 <dr...@nexr.com>.

Hi, Jay

It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".

Thanks.

Drake 민영근 Ph.D
kt NexR

On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hi
>
> I have created a table as you said,
>
> CREATE  TABLE Seq1 (
>    d5whse int COMMENT 'DECIMAL(5,0) Whse',
>    d5sdat string COMMENT 'DATE Sales Date',
>    d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    d5scnr string COMMENT 'CHAR(1) Scenario',
>    d5areq string COMMENT 'CHAR(1) Act Requested',
>    d5atak string COMMENT 'CHAR(1) Act Taken',
>    d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> 	... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> 	... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: how to load data

Posted by Drake민영근 <dr...@nexr.com>.

Hi, Jay

It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".

Thanks.

Drake 민영근 Ph.D
kt NexR

On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hi
>
> I have created a table as you said,
>
> CREATE  TABLE Seq1 (
>    d5whse int COMMENT 'DECIMAL(5,0) Whse',
>    d5sdat string COMMENT 'DATE Sales Date',
>    d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    d5scnr string COMMENT 'CHAR(1) Scenario',
>    d5areq string COMMENT 'CHAR(1) Act Requested',
>    d5atak string COMMENT 'CHAR(1) Act Taken',
>    d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> 	... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> 	... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: how to load data

Posted by Drake민영근 <dr...@nexr.com>.

Hi, Jay

It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".

Thanks.

Drake 민영근 Ph.D
kt NexR

On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hi
>
> I have created a table as you said,
>
> CREATE  TABLE Seq1 (
>    d5whse int COMMENT 'DECIMAL(5,0) Whse',
>    d5sdat string COMMENT 'DATE Sales Date',
>    d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    d5scnr string COMMENT 'CHAR(1) Scenario',
>    d5areq string COMMENT 'CHAR(1) Act Requested',
>    d5atak string COMMENT 'CHAR(1) Act Taken',
>    d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> 	... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> 	... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi

I have created a table as you said,

CREATE  TABLE Seq1 (
   d5whse int COMMENT 'DECIMAL(5,0) Whse',
   d5sdat string COMMENT 'DATE Sales Date',
   d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
   d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
   d5scnr string COMMENT 'CHAR(1) Scenario',
   d5areq string COMMENT 'CHAR(1) Act Requested',
   d5atak string COMMENT 'CHAR(1) Act Taken',
   d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE

and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2


INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ

I get this error


2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang




Thanks
Jay

On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:

>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi

I have created a table as you said,

CREATE  TABLE Seq1 (
   d5whse int COMMENT 'DECIMAL(5,0) Whse',
   d5sdat string COMMENT 'DATE Sales Date',
   d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
   d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
   d5scnr string COMMENT 'CHAR(1) Scenario',
   d5areq string COMMENT 'CHAR(1) Act Requested',
   d5atak string COMMENT 'CHAR(1) Act Taken',
   d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE

and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2


INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ

I get this error


2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang




Thanks
Jay

On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:

>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi

I have created a table as you said,

CREATE  TABLE Seq1 (
   d5whse int COMMENT 'DECIMAL(5,0) Whse',
   d5sdat string COMMENT 'DATE Sales Date',
   d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
   d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
   d5scnr string COMMENT 'CHAR(1) Scenario',
   d5areq string COMMENT 'CHAR(1) Act Requested',
   d5atak string COMMENT 'CHAR(1) Act Taken',
   d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE

and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2


INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ

I get this error


2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang




Thanks
Jay

On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:

>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi

I have created a table as you said,

CREATE  TABLE Seq1 (
   d5whse int COMMENT 'DECIMAL(5,0) Whse',
   d5sdat string COMMENT 'DATE Sales Date',
   d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
   d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
   d5scnr string COMMENT 'CHAR(1) Scenario',
   d5areq string COMMENT 'CHAR(1) Act Requested',
   d5atak string COMMENT 'CHAR(1) Act Taken',
   d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE

and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2


INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ

I get this error


2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang




Thanks
Jay

On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:

>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> 	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> 	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> 	... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE  TABLE DBCLOC(
>>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi,

I have created the table as you said




2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 14 more










Thanks
Jay

On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:

> as Akex suggested, Please use row format in your query like
> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE  TABLE DBCLOC(
>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi,

I have created the table as you said




2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 14 more










Thanks
Jay

On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:

> as Akex suggested, Please use row format in your query like
> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE  TABLE DBCLOC(
>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi,

I have created the table as you said




2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 14 more










Thanks
Jay

On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:

> as Akex suggested, Please use row format in your query like
> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE  TABLE DBCLOC(
>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi,

I have created the table as you said




2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
	at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
	at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
	... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	... 14 more










Thanks
Jay

On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:

> as Akex suggested, Please use row format in your query like
> CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE  TABLE DBCLOC(
>>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

as Akex suggested, Please use row format in your query like
CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
and give it a try

On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE  TABLE DBCLOC(
>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

as Akex suggested, Please use row format in your query like
CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
and give it a try

On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE  TABLE DBCLOC(
>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

as Akex suggested, Please use row format in your query like
CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
and give it a try

On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE  TABLE DBCLOC(
>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

as Akex suggested, Please use row format in your query like
CREATE  TABLE DBCLOC(....) ROW FORMAT DELIMITED   FIELDS TERMINATED BY ','
and give it a try

On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE  TABLE DBCLOC(
>>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"










Thanks
Jay

On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:

> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE  TABLE DBCLOC(
>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"










Thanks
Jay

On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:

> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE  TABLE DBCLOC(
>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"










Thanks
Jay

On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:

> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE  TABLE DBCLOC(
>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"










Thanks
Jay

On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:

> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE  TABLE DBCLOC(
>>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

Jay can you give first 3 lines of your gz file

On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you

On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

Jay can you give first 3 lines of your gz file

On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

Jay can you give first 3 lines of your gz file

On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you

On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you

On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>

Re: how to load data

Posted by Nitin Pawar <ni...@gmail.com>.

Jay can you give first 3 lines of your gz file

On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:

> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
>  smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE  TABLE DBCLOC(
>>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>    BLsdat string COMMENT 'DATE Sales Date',
>>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>


-- 
Nitin Pawar

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Alex,


I followed the same steps as mentioned in the site. Once I load data into
table which is create below



Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
 smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;

and loaded it with data.

LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;



when I say select * from raw it shows all null values.


NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
Why is not show showing the actual data in file. will it show once I load
it to parque table?

Please let me know if I am doing anything wrong.

I appreciate your help.


Thanks
jay



Thank you very much for you help Alex,


On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi Alex,

How to create "external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde" ?

Thanks
Jay

On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Alex,


I followed the same steps as mentioned in the site. Once I load data into
table which is create below



Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
 smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;

and loaded it with data.

LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;



when I say select * from raw it shows all null values.


NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
Why is not show showing the actual data in file. will it show once I load
it to parque table?

Please let me know if I am doing anything wrong.

I appreciate your help.


Thanks
jay



Thank you very much for you help Alex,


On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi Alex,

How to create "external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde" ?

Thanks
Jay

On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Alex,


I followed the same steps as mentioned in the site. Once I load data into
table which is create below



Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
 smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;

and loaded it with data.

LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;



when I say select * from raw it shows all null values.


NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
Why is not show showing the actual data in file. will it show once I load
it to parque table?

Please let me know if I am doing anything wrong.

I appreciate your help.


Thanks
jay



Thank you very much for you help Alex,


On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Hi Alex,

How to create "external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde" ?

Thanks
Jay

On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Kumar Jayapal <kj...@gmail.com>.

Alex,


I followed the same steps as mentioned in the site. Once I load data into
table which is create below



Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
 smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;

and loaded it with data.

LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;



when I say select * from raw it shows all null values.


NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
Why is not show showing the actual data in file. will it show once I load
it to parque table?

Please let me know if I am doing anything wrong.

I appreciate your help.


Thanks
jay



Thank you very much for you help Alex,


On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE  TABLE DBCLOC(
>>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>    BLsdat string COMMENT 'DATE Sales Date',
>>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>    BLscnr string COMMENT 'CHAR(1) Scenario',
>>    BLareq string COMMENT 'CHAR(1) Act Requested',
>>    BLatak string COMMENT 'CHAR(1) Act Taken',
>>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde

if using hive-0.14 and newer use this
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
if hive-0.13 and older use https://github.com/ogrodnek/csv-serde

You do not even need to unzgip the file. hive automatically unzgip data on
select.

2. run simple query to load data
insert overwrite table <orc_table>
select * from <csv_table>

On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hello All,
>
>
> I have this table
>
>
> CREATE  TABLE DBCLOC(
>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>    BLsdat string COMMENT 'DATE Sales Date',
>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    BLscnr string COMMENT 'CHAR(1) Scenario',
>    BLareq string COMMENT 'CHAR(1) Act Requested',
>    BLatak string COMMENT 'CHAR(1) Act Taken',
>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
> STORED AS PARQUET;
>
> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
> table above
>
>
> Can any one tell me what is the most efficient way of doing it.
>
>
> Thanks
> Jay
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde

if using hive-0.14 and newer use this
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
if hive-0.13 and older use https://github.com/ogrodnek/csv-serde

You do not even need to unzgip the file. hive automatically unzgip data on
select.

2. run simple query to load data
insert overwrite table <orc_table>
select * from <csv_table>

On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hello All,
>
>
> I have this table
>
>
> CREATE  TABLE DBCLOC(
>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>    BLsdat string COMMENT 'DATE Sales Date',
>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    BLscnr string COMMENT 'CHAR(1) Scenario',
>    BLareq string COMMENT 'CHAR(1) Act Requested',
>    BLatak string COMMENT 'CHAR(1) Act Taken',
>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
> STORED AS PARQUET;
>
> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
> table above
>
>
> Can any one tell me what is the most efficient way of doing it.
>
>
> Thanks
> Jay
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde

if using hive-0.14 and newer use this
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
if hive-0.13 and older use https://github.com/ogrodnek/csv-serde

You do not even need to unzgip the file. hive automatically unzgip data on
select.

2. run simple query to load data
insert overwrite table <orc_table>
select * from <csv_table>

On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hello All,
>
>
> I have this table
>
>
> CREATE  TABLE DBCLOC(
>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>    BLsdat string COMMENT 'DATE Sales Date',
>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    BLscnr string COMMENT 'CHAR(1) Scenario',
>    BLareq string COMMENT 'CHAR(1) Act Requested',
>    BLatak string COMMENT 'CHAR(1) Act Taken',
>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
> STORED AS PARQUET;
>
> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
> table above
>
>
> Can any one tell me what is the most efficient way of doing it.
>
>
> Thanks
> Jay
>

Re: how to load data

Posted by Alexander Pivovarov <ap...@gmail.com>.

1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde

if using hive-0.14 and newer use this
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
if hive-0.13 and older use https://github.com/ogrodnek/csv-serde

You do not even need to unzgip the file. hive automatically unzgip data on
select.

2. run simple query to load data
insert overwrite table <orc_table>
select * from <csv_table>

On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com> wrote:

> Hello All,
>
>
> I have this table
>
>
> CREATE  TABLE DBCLOC(
>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>    BLsdat string COMMENT 'DATE Sales Date',
>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    BLscnr string COMMENT 'CHAR(1) Scenario',
>    BLareq string COMMENT 'CHAR(1) Act Requested',
>    BLatak string COMMENT 'CHAR(1) Act Taken',
>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
> STORED AS PARQUET;
>
> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
> table above
>
>
> Can any one tell me what is the most efficient way of doing it.
>
>
> Thanks
> Jay
>