You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2016/01/15 11:15:24 UTC

Converting date format and excel money format in Hive table

Hi,

 

 

I am importing an excel sheet saved as csv file comma separated and compressed with bzip2 into Hive as external table with bzip2

 

The excel looks like this

 


Invoice Number

Payment date

Net

VAT

Total


360

10/02/2014

£10,000.00

£2000.00

£12,000.00

 

And the file (before bzip2) looks like this

 

Invoice Number,Payment date,Net,VAT,Total

360,10/02/2014,"▒12,000.00",▒2000.00,"▒12,000.00"

 

 

The external table is defined as

 

CREATE EXTERNAL TABLE stg_t2 (

INVOICENUMBER string

,PAYMENTDATE string

,NET string

,VAT string

,TOTAL string

)

COMMENT 'from csv file from excel sheet ‘

ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

STORED AS TEXTFILE

LOCATION '/xyz/'

TBLPROPERTIES ("skip.header.line.count"="1")

;

 

 

And the table itself

 

 

CREATE TABLE t2 (

INVOICENUMBER          INT

,PAYMENTDATE            string

,NET                    string

,VAT                    string

,TOTAL                  string

)

COMMENT 'from csv file from excel sheet'

STORED AS ORC

TBLPROPERTIES ( "orc.compress"="ZLIB" )

;

INSERT INTO TABLE t2

SELECT

          INVOICENUMBER

        , PAYMENTDATE

        , NET

        , VAT

        , TOTAL

FROM

stg_t2;

 

 

Now the problem I have is that I do not seem to be able to convert PAYMENTDATE into timestamp (from string) using CAST (PAYMENDATE AS TIMESTAMP) it RFETURNS NULL. Also I would like to store currency properly  replacing “?”  with “£”?

 

+-------------------+-----------------+--------------+-------------+--------------+--+

| t2.invoicenumber  | t2.paymentdate  |    t2.net    |   t2.vat    |   t2.total   |

+-------------------+-----------------+--------------+-------------+--------------+--+

| 360               | 10/02/2014      | ?10,000.00   | ?2000.00    | ?12,000.00   |

|

 

Thanks

 

 

Dr Mich Talebzadeh

 

LinkedIn   <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 <http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

 


Re: Converting date format and excel money format in Hive table

Posted by naga sharathrayapati <sh...@gmail.com>.
I think 'translate' might be useful in this scenario

example:
select translate(t2.net,'?','$') from t2;

On Fri, Jan 15, 2016 at 5:05 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Thanks that solved data conversion.
>
>
>
> How does one replace  ?10,000.00 with £10,000.00 ?
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
> *From:* matshyeq [mailto:matshyeq@gmail.com]
> *Sent:* 15 January 2016 10:31
> *To:* user <us...@hive.apache.org>
> *Subject:* Re: Converting date format and excel money format in Hive table
>
>
>
> try:
> select cast(unix_timestamp('02/10/2014', 'dd/MM/yyyy')*1000 as timestamp);
>
> Kind Regards
>
> ~Maciek
>
> On 15 January 2016 at 10:15, Mich Talebzadeh <mi...@peridale.co.uk> wrote:
>
> Hi,
>
>
>
>
>
> I am importing an excel sheet saved as csv file comma separated and
> compressed with bzip2 into Hive as external table with bzip2
>
>
>
> The excel looks like this
>
>
>
> *Invoice Number*
>
> *Payment date*
>
> *Net*
>
> *VAT*
>
> *Total*
>
> 360
>
> *10/02/2014*
>
> £10,000.00
>
> £2000.00
>
> £12,000.00
>
>
>
> And the file (before bzip2) looks like this
>
>
>
> Invoice Number,Payment date,Net,VAT,Total
>
> 360,10/02/2014,"▒12,000.00",▒2000.00,"▒12,000.00"
>
>
>
>
>
> The external table is defined as
>
>
>
> CREATE EXTERNAL TABLE stg_t2 (
>
> INVOICENUMBER string
>
> ,PAYMENTDATE string
>
> ,NET string
>
> ,VAT string
>
> ,TOTAL string
>
> )
>
> COMMENT 'from csv file from excel sheet ‘
>
> ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
>
> STORED AS TEXTFILE
>
> LOCATION '/xyz/'
>
> TBLPROPERTIES ("skip.header.line.count"="1")
>
> ;
>
>
>
>
>
> And the table itself
>
>
>
>
>
> CREATE TABLE t2 (
>
> INVOICENUMBER          INT
>
> ,PAYMENTDATE            string
>
> ,NET                    string
>
> ,VAT                    string
>
> ,TOTAL                  string
>
> )
>
> COMMENT 'from csv file from excel sheet'
>
> STORED AS ORC
>
> TBLPROPERTIES ( "orc.compress"="ZLIB" )
>
> ;
>
> INSERT INTO TABLE t2
>
> SELECT
>
>           INVOICENUMBER
>
>         , PAYMENTDATE
>
>         , NET
>
>         , VAT
>
>         , TOTAL
>
> FROM
>
> stg_t2;
>
>
>
>
>
> Now the problem I have is that I do not seem to be able to convert
> PAYMENTDATE into timestamp (from string) using CAST (PAYMENDATE AS
> TIMESTAMP) it RFETURNS NULL. Also I would like to store currency properly
>  replacing “?”  with “£”?
>
>
>
>
> +-------------------+-----------------+--------------+-------------+--------------+--+
>
> | t2.invoicenumber  | t2.paymentdate  |    t2.net    |   t2.vat    |
> t2.total   |
>
>
> +-------------------+-----------------+--------------+-------------+--------------+--+
>
> | 360               | 10/02/2014      | ?10,000.00   | ?2000.00    |
> ?12,000.00   |
>
> |
>
>
>
> Thanks
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>

RE: Converting date format and excel money format in Hive table

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Thanks that solved data conversion.

 

How does one replace  ?10,000.00 with £10,000.00 ?

 

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: matshyeq [mailto:matshyeq@gmail.com] 
Sent: 15 January 2016 10:31
To: user <us...@hive.apache.org>
Subject: Re: Converting date format and excel money format in Hive table

 

try:
select cast(unix_timestamp('02/10/2014', 'dd/MM/yyyy')*1000 as timestamp);

Kind Regards 

~Maciek

On 15 January 2016 at 10:15, Mich Talebzadeh <mich@peridale.co.uk <ma...@peridale.co.uk> > wrote:

Hi,

 

 

I am importing an excel sheet saved as csv file comma separated and compressed with bzip2 into Hive as external table with bzip2

 

The excel looks like this

 


Invoice Number

Payment date

Net

VAT

Total


360

10/02/2014

£10,000.00

£2000.00

£12,000.00

 

And the file (before bzip2) looks like this

 

Invoice Number,Payment date,Net,VAT,Total

360,10/02/2014,"▒12,000.00",▒2000.00,"▒12,000.00"

 

 

The external table is defined as

 

CREATE EXTERNAL TABLE stg_t2 (

INVOICENUMBER string

,PAYMENTDATE string

,NET string

,VAT string

,TOTAL string

)

COMMENT 'from csv file from excel sheet ‘

ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

STORED AS TEXTFILE

LOCATION '/xyz/'

TBLPROPERTIES ("skip.header.line.count"="1")

;

 

 

And the table itself

 

 

CREATE TABLE t2 (

INVOICENUMBER          INT

,PAYMENTDATE            string

,NET                    string

,VAT                    string

,TOTAL                  string

)

COMMENT 'from csv file from excel sheet'

STORED AS ORC

TBLPROPERTIES ( "orc.compress"="ZLIB" )

;

INSERT INTO TABLE t2

SELECT

          INVOICENUMBER

        , PAYMENTDATE

        , NET

        , VAT

        , TOTAL

FROM

stg_t2;

 

 

Now the problem I have is that I do not seem to be able to convert PAYMENTDATE into timestamp (from string) using CAST (PAYMENDATE AS TIMESTAMP) it RFETURNS NULL. Also I would like to store currency properly  replacing “?”  with “£”?

 

+-------------------+-----------------+--------------+-------------+--------------+--+

| t2.invoicenumber  | t2.paymentdate  |    t2.net <http://t2.net>     |   t2.vat    |   t2.total   |

+-------------------+-----------------+--------------+-------------+--------------+--+

| 360               | 10/02/2014      | ?10,000.00   | ?2000.00    | ?12,000.00   |

|

 

Thanks

 

 

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

 

 


Re: Converting date format and excel money format in Hive table

Posted by matshyeq <ma...@gmail.com>.
try:
select cast(unix_timestamp('02/10/2014', 'dd/MM/yyyy')*1000 as timestamp);
Kind Regards
~Maciek
On 15 January 2016 at 10:15, Mich Talebzadeh <mi...@peridale.co.uk> wrote:

> Hi,
>
>
>
>
>
> I am importing an excel sheet saved as csv file comma separated and
> compressed with bzip2 into Hive as external table with bzip2
>
>
>
> The excel looks like this
>
>
>
> *Invoice Number*
>
> *Payment date*
>
> *Net*
>
> *VAT*
>
> *Total*
>
> 360
>
> *10/02/2014*
>
> £10,000.00
>
> £2000.00
>
> £12,000.00
>
>
>
> And the file (before bzip2) looks like this
>
>
>
> Invoice Number,Payment date,Net,VAT,Total
>
> 360,10/02/2014,"▒12,000.00",▒2000.00,"▒12,000.00"
>
>
>
>
>
> The external table is defined as
>
>
>
> CREATE EXTERNAL TABLE stg_t2 (
>
> INVOICENUMBER string
>
> ,PAYMENTDATE string
>
> ,NET string
>
> ,VAT string
>
> ,TOTAL string
>
> )
>
> COMMENT 'from csv file from excel sheet ‘
>
> ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
>
> STORED AS TEXTFILE
>
> LOCATION '/xyz/'
>
> TBLPROPERTIES ("skip.header.line.count"="1")
>
> ;
>
>
>
>
>
> And the table itself
>
>
>
>
>
> CREATE TABLE t2 (
>
> INVOICENUMBER          INT
>
> ,PAYMENTDATE            string
>
> ,NET                    string
>
> ,VAT                    string
>
> ,TOTAL                  string
>
> )
>
> COMMENT 'from csv file from excel sheet'
>
> STORED AS ORC
>
> TBLPROPERTIES ( "orc.compress"="ZLIB" )
>
> ;
>
> INSERT INTO TABLE t2
>
> SELECT
>
>           INVOICENUMBER
>
>         , PAYMENTDATE
>
>         , NET
>
>         , VAT
>
>         , TOTAL
>
> FROM
>
> stg_t2;
>
>
>
>
>
> Now the problem I have is that I do not seem to be able to convert
> PAYMENTDATE into timestamp (from string) using CAST (PAYMENDATE AS
> TIMESTAMP) it RFETURNS NULL. Also I would like to store currency properly
>  replacing “?”  with “£”?
>
>
>
>
> +-------------------+-----------------+--------------+-------------+--------------+--+
>
> | t2.invoicenumber  | t2.paymentdate  |    t2.net    |   t2.vat    |
> t2.total   |
>
>
> +-------------------+-----------------+--------------+-------------+--------------+--+
>
> | 360               | 10/02/2014      | ?10,000.00   | ?2000.00    |
> ?12,000.00   |
>
> |
>
>
>
> Thanks
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>