You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Kumar Jayapal <kj...@gmail.com> on 2015/05/01 07:23:42 UTC
Re: how to load data
Alex,
I followed the same steps as mentioned in the site. Once I load data into
table which is create below
Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;
and loaded it with data.
LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;
when I say select * from raw it shows all null values.
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
NULLNULLNULLNULLNULLNULLNULLNULL
Why is not show showing the actual data in file. will it show once I load
it to parque table?
Please let me know if I am doing anything wrong.
I appreciate your help.
Thanks
jay
Thank you very much for you help Alex,
On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:
> 1. Create external textfile hive table pointing to /extract/DBCLOC and
> specify CSVSerde
>
> if using hive-0.14 and newer use this
> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>
> You do not even need to unzgip the file. hive automatically unzgip data on
> select.
>
> 2. run simple query to load data
> insert overwrite table <orc_table>
> select * from <csv_table>
>
> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>
>> I have this table
>>
>>
>> CREATE TABLE DBCLOC(
>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>> BLsdat string COMMENT 'DATE Sales Date',
>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>> BLscnr string COMMENT 'CHAR(1) Scenario',
>> BLareq string COMMENT 'CHAR(1) Act Requested',
>> BLatak string COMMENT 'CHAR(1) Act Taken',
>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>> STORED AS PARQUET;
>>
>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to the
>> table above
>>
>>
>> Can any one tell me what is the most efficient way of doing it.
>>
>>
>> Thanks
>> Jay
>>
>
>
Re: how to load data
Posted by Alexander Pivovarov <ap...@gmail.com>.
if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you
On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
Re: how to load data
Posted by Drake민영근 <dr...@nexr.com>.
Hi, Jay
It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".
Thanks.
Drake 민영근 Ph.D
kt NexR
On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Hi
>
> I have created a table as you said,
>
> CREATE TABLE Seq1 (
> d5whse int COMMENT 'DECIMAL(5,0) Whse',
> d5sdat string COMMENT 'DATE Sales Date',
> d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
> d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
> d5scnr string COMMENT 'CHAR(1) Scenario',
> d5areq string COMMENT 'CHAR(1) Act Requested',
> d5atak string COMMENT 'CHAR(1) Act Taken',
> d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> ... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE TABLE DBCLOC(
>>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
Re: how to load data
Posted by Drake민영근 <dr...@nexr.com>.
Hi, Jay
It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".
Thanks.
Drake 민영근 Ph.D
kt NexR
On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Hi
>
> I have created a table as you said,
>
> CREATE TABLE Seq1 (
> d5whse int COMMENT 'DECIMAL(5,0) Whse',
> d5sdat string COMMENT 'DATE Sales Date',
> d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
> d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
> d5scnr string COMMENT 'CHAR(1) Scenario',
> d5areq string COMMENT 'CHAR(1) Act Requested',
> d5atak string COMMENT 'CHAR(1) Act Taken',
> d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> ... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE TABLE DBCLOC(
>>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
Re: how to load data
Posted by Drake민영근 <dr...@nexr.com>.
Hi, Jay
It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".
Thanks.
Drake 민영근 Ph.D
kt NexR
On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Hi
>
> I have created a table as you said,
>
> CREATE TABLE Seq1 (
> d5whse int COMMENT 'DECIMAL(5,0) Whse',
> d5sdat string COMMENT 'DATE Sales Date',
> d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
> d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
> d5scnr string COMMENT 'CHAR(1) Scenario',
> d5areq string COMMENT 'CHAR(1) Act Requested',
> d5atak string COMMENT 'CHAR(1) Act Taken',
> d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> ... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE TABLE DBCLOC(
>>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
Re: how to load data
Posted by Drake민영근 <dr...@nexr.com>.
Hi, Jay
It seems there is no jar for openCSV. Check your "hive/lib/opencsv-x.y.jar".
Thanks.
Drake 민영근 Ph.D
kt NexR
On Mon, May 4, 2015 at 11:03 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Hi
>
> I have created a table as you said,
>
> CREATE TABLE Seq1 (
> d5whse int COMMENT 'DECIMAL(5,0) Whse',
> d5sdat string COMMENT 'DATE Sales Date',
> d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
> d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
> d5scnr string COMMENT 'CHAR(1) Scenario',
> d5areq string COMMENT 'CHAR(1) Act Requested',
> d5atak string COMMENT 'CHAR(1) Act Taken',
> d5msgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
> WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
> STORED AS TEXTFILE
>
> and it got successfully and I was able to insert the values into it with
> our " , and "" " now I have another issue I am not able to insert the
> values from this table to parque Seq2
>
>
> INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
> SELECT* FROM SEQ
>
> I get this error
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang
>
>
>
>
> Thanks
> Jay
>
> On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>>
>>
>> Hi,
>>
>> I have created the table as you said
>>
>>
>>
>>
>> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
>> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
>> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
>> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
>> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
>> ... 8 more
>> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 14 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> as Akex suggested, Please use row format in your query like
>>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>>> ',' and give it a try
>>>
>>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jay can you give first 3 lines of your gz file
>>>>>
>>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>>
>>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>>> into table which is create below
>>>>>>
>>>>>>
>>>>>>
>>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>>> STORED AS TEXTFILE;
>>>>>>
>>>>>> and loaded it with data.
>>>>>>
>>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>>> TABLE raw;
>>>>>>
>>>>>>
>>>>>>
>>>>>> when I say select * from raw it shows all null values.
>>>>>>
>>>>>>
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>>> Why is not show showing the actual data in file. will it show once I
>>>>>> load it to parque table?
>>>>>>
>>>>>> Please let me know if I am doing anything wrong.
>>>>>>
>>>>>> I appreciate your help.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> jay
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thank you very much for you help Alex,
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>>> apivovarov@gmail.com> wrote:
>>>>>>
>>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>>> and specify CSVSerde
>>>>>>>
>>>>>>> if using hive-0.14 and newer use this
>>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>>
>>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>>> data on select.
>>>>>>>
>>>>>>> 2. run simple query to load data
>>>>>>> insert overwrite table <orc_table>
>>>>>>> select * from <csv_table>
>>>>>>>
>>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>
>>>>>>>> I have this table
>>>>>>>>
>>>>>>>>
>>>>>>>> CREATE TABLE DBCLOC(
>>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>>> STORED AS PARQUET;
>>>>>>>>
>>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>>> to the table above
>>>>>>>>
>>>>>>>>
>>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jay
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nitin Pawar
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi
I have created a table as you said,
CREATE TABLE Seq1 (
d5whse int COMMENT 'DECIMAL(5,0) Whse',
d5sdat string COMMENT 'DATE Sales Date',
d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
d5scnr string COMMENT 'CHAR(1) Scenario',
d5areq string COMMENT 'CHAR(1) Act Requested',
d5atak string COMMENT 'CHAR(1) Act Taken',
d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE
and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2
INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ
I get this error
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang
Thanks
Jay
On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:
>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE TABLE DBCLOC(
>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi
I have created a table as you said,
CREATE TABLE Seq1 (
d5whse int COMMENT 'DECIMAL(5,0) Whse',
d5sdat string COMMENT 'DATE Sales Date',
d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
d5scnr string COMMENT 'CHAR(1) Scenario',
d5areq string COMMENT 'CHAR(1) Act Requested',
d5atak string COMMENT 'CHAR(1) Act Taken',
d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE
and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2
INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ
I get this error
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang
Thanks
Jay
On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:
>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE TABLE DBCLOC(
>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi
I have created a table as you said,
CREATE TABLE Seq1 (
d5whse int COMMENT 'DECIMAL(5,0) Whse',
d5sdat string COMMENT 'DATE Sales Date',
d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
d5scnr string COMMENT 'CHAR(1) Scenario',
d5areq string COMMENT 'CHAR(1) Act Requested',
d5atak string COMMENT 'CHAR(1) Act Taken',
d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE
and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2
INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ
I get this error
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang
Thanks
Jay
On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:
>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE TABLE DBCLOC(
>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi
I have created a table as you said,
CREATE TABLE Seq1 (
d5whse int COMMENT 'DECIMAL(5,0) Whse',
d5sdat string COMMENT 'DATE Sales Date',
d5reg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
d5trn_num int COMMENT 'DECIMAL(5,0) Trn#',
d5scnr string COMMENT 'CHAR(1) Scenario',
d5areq string COMMENT 'CHAR(1) Act Requested',
d5atak string COMMENT 'CHAR(1) Act Taken',
d5msgc string COMMENT 'CHAR(3) Msg Code')
PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"")
STORED AS TEXTFILE
and it got successfully and I was able to insert the values into it with
our " , and "" " now I have another issue I am not able to insert the
values from this table to parque Seq2
INSERT INTO TABLE seq2 PARTITION (FISCAL_YEAR = 2003, FISCAL_PERIOD = 06)
SELECT* FROM SEQ
I get this error
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang
Thanks
Jay
On Sun, May 3, 2015 at 6:57 PM, Kumar Jayapal <kj...@gmail.com> wrote:
>
>
> Hi,
>
> I have created the table as you said
>
>
>
>
> 2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1430691855979_0477_m_000000_1: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
> at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
> at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
> ... 8 more
> Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 14 more
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> as Akex suggested, Please use row format in your query like
>> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>> ',' and give it a try
>>
>> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> 106,"2003-02-03",20,2,"A","2","2","037"
>>> 106,"2003-02-03",20,3,"A","2","2","037"
>>> 106,"2003-02-03",8,2,"A","2","2","037"
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>>
>>>> Jay can you give first 3 lines of your gz file
>>>>
>>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex,
>>>>>
>>>>>
>>>>> I followed the same steps as mentioned in the site. Once I load data
>>>>> into table which is create below
>>>>>
>>>>>
>>>>>
>>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>> and loaded it with data.
>>>>>
>>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>>> TABLE raw;
>>>>>
>>>>>
>>>>>
>>>>> when I say select * from raw it shows all null values.
>>>>>
>>>>>
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>>> Why is not show showing the actual data in file. will it show once I
>>>>> load it to parque table?
>>>>>
>>>>> Please let me know if I am doing anything wrong.
>>>>>
>>>>> I appreciate your help.
>>>>>
>>>>>
>>>>> Thanks
>>>>> jay
>>>>>
>>>>>
>>>>>
>>>>> Thank you very much for you help Alex,
>>>>>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>>> apivovarov@gmail.com> wrote:
>>>>>
>>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>>> and specify CSVSerde
>>>>>>
>>>>>> if using hive-0.14 and newer use this
>>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>>
>>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>>> data on select.
>>>>>>
>>>>>> 2. run simple query to load data
>>>>>> insert overwrite table <orc_table>
>>>>>> select * from <csv_table>
>>>>>>
>>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>
>>>>>>> I have this table
>>>>>>>
>>>>>>>
>>>>>>> CREATE TABLE DBCLOC(
>>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>>> STORED AS PARQUET;
>>>>>>>
>>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz
>>>>>>> to the table above
>>>>>>>
>>>>>>>
>>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jay
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi,
I have created the table as you said
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
Thanks
Jay
On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:
> as Akex suggested, Please use row format in your query like
> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE TABLE DBCLOC(
>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi,
I have created the table as you said
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
Thanks
Jay
On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:
> as Akex suggested, Please use row format in your query like
> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE TABLE DBCLOC(
>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi,
I have created the table as you said
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
Thanks
Jay
On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:
> as Akex suggested, Please use row format in your query like
> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE TABLE DBCLOC(
>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
Hi,
I have created the table as you said
2015-05-04 01:55:42,000 INFO [IPC Server handler 2 on 57009]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1430691855979_0477_m_000000_1: Error:
java.lang.RuntimeException: java.lang.NoClassDefFoundError:
au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NoClassDefFoundError: au/com/bytecode/opencsv/CSVReader
at org.apache.hadoop.hive.serde2.OpenCSVSerde.newReader(OpenCSVSerde.java:177)
at org.apache.hadoop.hive.serde2.OpenCSVSerde.deserialize(OpenCSVSerde.java:147)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180)
... 8 more
Caused by: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVReader
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
Thanks
Jay
On Fri, May 1, 2015 at 6:08 AM, Nitin Pawar <ni...@gmail.com> wrote:
> as Akex suggested, Please use row format in your query like
> CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> and give it a try
>
> On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> 106,"2003-02-03",20,2,"A","2","2","037"
>> 106,"2003-02-03",20,3,"A","2","2","037"
>> 106,"2003-02-03",8,2,"A","2","2","037"
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>> Jay
>>
>> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>>
>>> Jay can you give first 3 lines of your gz file
>>>
>>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Alex,
>>>>
>>>>
>>>> I followed the same steps as mentioned in the site. Once I load data
>>>> into table which is create below
>>>>
>>>>
>>>>
>>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>>> STORED AS TEXTFILE;
>>>>
>>>> and loaded it with data.
>>>>
>>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO
>>>> TABLE raw;
>>>>
>>>>
>>>>
>>>> when I say select * from raw it shows all null values.
>>>>
>>>>
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>>> Why is not show showing the actual data in file. will it show once I
>>>> load it to parque table?
>>>>
>>>> Please let me know if I am doing anything wrong.
>>>>
>>>> I appreciate your help.
>>>>
>>>>
>>>> Thanks
>>>> jay
>>>>
>>>>
>>>>
>>>> Thank you very much for you help Alex,
>>>>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>>> apivovarov@gmail.com> wrote:
>>>>
>>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC
>>>>> and specify CSVSerde
>>>>>
>>>>> if using hive-0.14 and newer use this
>>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>>
>>>>> You do not even need to unzgip the file. hive automatically unzgip
>>>>> data on select.
>>>>>
>>>>> 2. run simple query to load data
>>>>> insert overwrite table <orc_table>
>>>>> select * from <csv_table>
>>>>>
>>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>> I have this table
>>>>>>
>>>>>>
>>>>>> CREATE TABLE DBCLOC(
>>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>>> STORED AS PARQUET;
>>>>>>
>>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>>> the table above
>>>>>>
>>>>>>
>>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jay
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
as Akex suggested, Please use row format in your query like
CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
and give it a try
On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:
> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE TABLE DBCLOC(
>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
--
Nitin Pawar
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
as Akex suggested, Please use row format in your query like
CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
and give it a try
On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:
> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE TABLE DBCLOC(
>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
--
Nitin Pawar
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
as Akex suggested, Please use row format in your query like
CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
and give it a try
On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:
> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE TABLE DBCLOC(
>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
--
Nitin Pawar
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
as Akex suggested, Please use row format in your query like
CREATE TABLE DBCLOC(....) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
and give it a try
On Fri, May 1, 2015 at 6:33 PM, Kumar Jayapal <kj...@gmail.com> wrote:
> 106,"2003-02-03",20,2,"A","2","2","037"
> 106,"2003-02-03",20,3,"A","2","2","037"
> 106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Jay can you give first 3 lines of your gz file
>>
>> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Alex,
>>>
>>>
>>> I followed the same steps as mentioned in the site. Once I load data
>>> into table which is create below
>>>
>>>
>>>
>>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>>> STORED AS TEXTFILE;
>>>
>>> and loaded it with data.
>>>
>>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>>> raw;
>>>
>>>
>>>
>>> when I say select * from raw it shows all null values.
>>>
>>>
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> NULLNULLNULLNULLNULLNULLNULLNULL
>>> Why is not show showing the actual data in file. will it show once I
>>> load it to parque table?
>>>
>>> Please let me know if I am doing anything wrong.
>>>
>>> I appreciate your help.
>>>
>>>
>>> Thanks
>>> jay
>>>
>>>
>>>
>>> Thank you very much for you help Alex,
>>>
>>>
>>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>>> apivovarov@gmail.com> wrote:
>>>
>>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>>> specify CSVSerde
>>>>
>>>> if using hive-0.14 and newer use this
>>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>>
>>>> You do not even need to unzgip the file. hive automatically unzgip data
>>>> on select.
>>>>
>>>> 2. run simple query to load data
>>>> insert overwrite table <orc_table>
>>>> select * from <csv_table>
>>>>
>>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>> I have this table
>>>>>
>>>>>
>>>>> CREATE TABLE DBCLOC(
>>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>>> STORED AS PARQUET;
>>>>>
>>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>>> the table above
>>>>>
>>>>>
>>>>> Can any one tell me what is the most efficient way of doing it.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jay
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
--
Nitin Pawar
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"
Thanks
Jay
On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:
> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE TABLE DBCLOC(
>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"
Thanks
Jay
On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:
> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE TABLE DBCLOC(
>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"
Thanks
Jay
On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:
> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE TABLE DBCLOC(
>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Kumar Jayapal <kj...@gmail.com>.
106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"
Thanks
Jay
On Fri, May 1, 2015 at 12:10 AM, Nitin Pawar <ni...@gmail.com>
wrote:
> Jay can you give first 3 lines of your gz file
>
> On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com>
> wrote:
>
>> Alex,
>>
>>
>> I followed the same steps as mentioned in the site. Once I load data into
>> table which is create below
>>
>>
>>
>> Created table CREATE TABLE raw (line STRING) PARTITIONED BY
>> (FISCAL_YEAR smallint, FISCAL_PERIOD smallint)
>> STORED AS TEXTFILE;
>>
>> and loaded it with data.
>>
>> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
>> raw;
>>
>>
>>
>> when I say select * from raw it shows all null values.
>>
>>
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> NULLNULLNULLNULLNULLNULLNULLNULL
>> Why is not show showing the actual data in file. will it show once I load
>> it to parque table?
>>
>> Please let me know if I am doing anything wrong.
>>
>> I appreciate your help.
>>
>>
>> Thanks
>> jay
>>
>>
>>
>> Thank you very much for you help Alex,
>>
>>
>> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <
>> apivovarov@gmail.com> wrote:
>>
>>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>>> specify CSVSerde
>>>
>>> if using hive-0.14 and newer use this
>>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>>
>>> You do not even need to unzgip the file. hive automatically unzgip data
>>> on select.
>>>
>>> 2. run simple query to load data
>>> insert overwrite table <orc_table>
>>> select * from <csv_table>
>>>
>>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>> I have this table
>>>>
>>>>
>>>> CREATE TABLE DBCLOC(
>>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>>> BLsdat string COMMENT 'DATE Sales Date',
>>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>>> STORED AS PARQUET;
>>>>
>>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>>> the table above
>>>>
>>>>
>>>> Can any one tell me what is the most efficient way of doing it.
>>>>
>>>>
>>>> Thanks
>>>> Jay
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
Jay can you give first 3 lines of your gz file
On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
--
Nitin Pawar
Re: how to load data
Posted by Alexander Pivovarov <ap...@gmail.com>.
if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you
On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
Jay can you give first 3 lines of your gz file
On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
--
Nitin Pawar
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
Jay can you give first 3 lines of your gz file
On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
--
Nitin Pawar
Re: how to load data
Posted by Alexander Pivovarov <ap...@gmail.com>.
if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you
On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
Re: how to load data
Posted by Alexander Pivovarov <ap...@gmail.com>.
if your file is csv file then create table statement should specify
CSVSerde - look at the examples under the links I sent you
On Thu, Apr 30, 2015 at 10:23 PM, Kumar Jayapal <kj...@gmail.com>
wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
Re: how to load data
Posted by Nitin Pawar <ni...@gmail.com>.
Jay can you give first 3 lines of your gz file
On Fri, May 1, 2015 at 10:53 AM, Kumar Jayapal <kj...@gmail.com> wrote:
> Alex,
>
>
> I followed the same steps as mentioned in the site. Once I load data into
> table which is create below
>
>
>
> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
> smallint, FISCAL_PERIOD smallint)
> STORED AS TEXTFILE;
>
> and loaded it with data.
>
> LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE
> raw;
>
>
>
> when I say select * from raw it shows all null values.
>
>
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> NULLNULLNULLNULLNULLNULLNULLNULL
> Why is not show showing the actual data in file. will it show once I load
> it to parque table?
>
> Please let me know if I am doing anything wrong.
>
> I appreciate your help.
>
>
> Thanks
> jay
>
>
>
> Thank you very much for you help Alex,
>
>
> On Wed, Apr 29, 2015 at 3:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> 1. Create external textfile hive table pointing to /extract/DBCLOC and
>> specify CSVSerde
>>
>> if using hive-0.14 and newer use this
>> https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
>> if hive-0.13 and older use https://github.com/ogrodnek/csv-serde
>>
>> You do not even need to unzgip the file. hive automatically unzgip data
>> on select.
>>
>> 2. run simple query to load data
>> insert overwrite table <orc_table>
>> select * from <csv_table>
>>
>> On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kj...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>> I have this table
>>>
>>>
>>> CREATE TABLE DBCLOC(
>>> BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>>> BLsdat string COMMENT 'DATE Sales Date',
>>> BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>>> BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>>> BLscnr string COMMENT 'CHAR(1) Scenario',
>>> BLareq string COMMENT 'CHAR(1) Act Requested',
>>> BLatak string COMMENT 'CHAR(1) Act Taken',
>>> BLmsgc string COMMENT 'CHAR(3) Msg Code')
>>> PARTITIONED BY (FSCAL_YEAR smallint, FSCAL_PERIOD smallint)
>>> STORED AS PARQUET;
>>>
>>> have to load from hdfs location /extract/DBCLOC/DBCL0301P.csv.gz to
>>> the table above
>>>
>>>
>>> Can any one tell me what is the most efficient way of doing it.
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>
>>
>
--
Nitin Pawar