You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ajay Chander <ha...@gmail.com> on 2016/03/09 18:43:36 UTC

Hive_CSV

Hi Everyone,

I am looking for a way, to ignore the first occurrence of the delimiter
while loading the data from csv file to hive external table.

Csv file:

Xyz, baseball, outdoor

Hive table has two columns sport_name & sport_type and fields are separated
by ','

Now I want to load by data into table such that while loading it has to
ignore the first delimiter that ignore xyz and load the data from second
delimiter.

In the end my hive table should have the following data,

Baseball, outdoor .

Any inputs are appreciated. Thank you for your time.

Re: Hive_CSV

Posted by Ajay Chander <ha...@gmail.com>.
Jorn, thanks for your time. The reason I wanted to do so is, I don't want
to bring the unnecessary data into the table. Each record is carrying a
unnecessary value.

On Wednesday, March 9, 2016, Jörn Franke <jo...@gmail.com> wrote:

>
> Why Don't you load all data and use just two columns for querying?
> Alternatively use regular expressions.
>
>
>
> > On 09 Mar 2016, at 18:43, Ajay Chander <hadoopdev18@gmail.com
> <javascript:;>> wrote:
> >
> > Hi Everyone,
> >
> > I am looking for a way, to ignore the first occurrence of the delimiter
> while loading the data from csv file to hive external table.
> >
> > Csv file:
> >
> > Xyz, baseball, outdoor
> >
> > Hive table has two columns sport_name & sport_type and fields are
> separated by ','
> >
> > Now I want to load by data into table such that while loading it has to
> ignore the first delimiter that ignore xyz and load the data from second
> delimiter.
> >
> > In the end my hive table should have the following data,
> >
> > Baseball, outdoor .
> >
> > Any inputs are appreciated. Thank you for your time.
>

Re: Hive_CSV

Posted by Jörn Franke <jo...@gmail.com>.
Why Don't you load all data and use just two columns for querying? Alternatively use regular expressions.



> On 09 Mar 2016, at 18:43, Ajay Chander <ha...@gmail.com> wrote:
> 
> Hi Everyone,
> 
> I am looking for a way, to ignore the first occurrence of the delimiter while loading the data from csv file to hive external table.
> 
> Csv file: 
> 
> Xyz, baseball, outdoor
> 
> Hive table has two columns sport_name & sport_type and fields are separated by ','
> 
> Now I want to load by data into table such that while loading it has to ignore the first delimiter that ignore xyz and load the data from second delimiter.
> 
> In the end my hive table should have the following data,
> 
> Baseball, outdoor .
> 
> Any inputs are appreciated. Thank you for your time.

Re: Hive_CSV

Posted by Jörn Franke <jo...@gmail.com>.
The data is already in the csv so it is not matter for querying. It is recommend to convert it to ORC or Parquet for querying.

> On 09 Mar 2016, at 19:09, Ajay Chander <ha...@gmail.com> wrote:
> 
> Daniel, thanks for your time. Is it like creating two tables, one is to get all the data and the another one is to fetch the required data out of it? If that is the case I was just concerned of redundant data. Please correct me if I am wrong. Thanks 
> 
>> On Wednesday, March 9, 2016, Daniel Haviv <da...@veracity-group.com> wrote:
>> Hi Ajay,
>> Use the CSV serde to read your file, map all three columns but only select the relevant ones when you insert:
>> 
>> Create table csvtab (
>> irrelevant string,
>> sportName string,
>> sportType string) ...
>> 
>> Insert into loaded_table select sportName, sportType from csvtab;
>> 
>> Daniel
>> 
>> > On 9 Mar 2016, at 19:43, Ajay Chander <ha...@gmail.com> wrote:
>> >
>> > Hi Everyone,
>> >
>> > I am looking for a way, to ignore the first occurrence of the delimiter while loading the data from csv file to hive external table.
>> >
>> > Csv file:
>> >
>> > Xyz, baseball, outdoor
>> >
>> > Hive table has two columns sport_name & sport_type and fields are separated by ','
>> >
>> > Now I want to load by data into table such that while loading it has to ignore the first delimiter that ignore xyz and load the data from second delimiter.
>> >
>> > In the end my hive table should have the following data,
>> >
>> > Baseball, outdoor .
>> >
>> > Any inputs are appreciated. Thank you for your time.

Re: Hive_CSV

Posted by Ajay Chander <ha...@gmail.com>.
Daniel, thanks for your time. Is it like creating two tables, one is to get
all the data and the another one is to fetch the required data out of it?
If that is the case I was just concerned of redundant data. Please correct
me if I am wrong. Thanks

On Wednesday, March 9, 2016, Daniel Haviv <da...@veracity-group.com>
wrote:

> Hi Ajay,
> Use the CSV serde to read your file, map all three columns but only select
> the relevant ones when you insert:
>
> Create table csvtab (
> irrelevant string,
> sportName string,
> sportType string) ...
>
> Insert into loaded_table select sportName, sportType from csvtab;
>
> Daniel
>
> > On 9 Mar 2016, at 19:43, Ajay Chander <hadoopdev18@gmail.com
> <javascript:;>> wrote:
> >
> > Hi Everyone,
> >
> > I am looking for a way, to ignore the first occurrence of the delimiter
> while loading the data from csv file to hive external table.
> >
> > Csv file:
> >
> > Xyz, baseball, outdoor
> >
> > Hive table has two columns sport_name & sport_type and fields are
> separated by ','
> >
> > Now I want to load by data into table such that while loading it has to
> ignore the first delimiter that ignore xyz and load the data from second
> delimiter.
> >
> > In the end my hive table should have the following data,
> >
> > Baseball, outdoor .
> >
> > Any inputs are appreciated. Thank you for your time.
>

Re: Hive_CSV

Posted by Daniel Haviv <da...@veracity-group.com>.
Hi Ajay,
Use the CSV serde to read your file, map all three columns but only select the relevant ones when you insert:

Create table csvtab (
irrelevant string,
sportName string,
sportType string) ...

Insert into loaded_table select sportName, sportType from csvtab;

Daniel

> On 9 Mar 2016, at 19:43, Ajay Chander <ha...@gmail.com> wrote:
> 
> Hi Everyone,
> 
> I am looking for a way, to ignore the first occurrence of the delimiter while loading the data from csv file to hive external table.
> 
> Csv file: 
> 
> Xyz, baseball, outdoor
> 
> Hive table has two columns sport_name & sport_type and fields are separated by ','
> 
> Now I want to load by data into table such that while loading it has to ignore the first delimiter that ignore xyz and load the data from second delimiter.
> 
> In the end my hive table should have the following data,
> 
> Baseball, outdoor .
> 
> Any inputs are appreciated. Thank you for your time.