You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Daniel Joanes <dj...@gmail.com> on 2010/02/25 17:44:33 UTC

Hive question.

If I have a line like the following:

<2010-02-09 18:00:16.123 UTC>:[48394803]:<MDS-CS_MDS1>:<DEBUG>:<LAYER =
EP2P, EVENT = Receiving, DEVICEPIN = 2032acb14, GMETAG = -1966209606, TYPE =
22, METHOD = onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
EXP_TIMEOUT(S) = 3600, SIZE = 7312>

What would be the best way to store it into a table like this:

ts            STRING    "2010-02-09 18:00:16.123 UTC"
epochtime     INT       "345093824"   // <-- I'm not sure how to do this
column either
requestId     INT       48394803
component     STRING    "MDS-CS_MDS1"
log_level     STRING    "DEBUG"
properties    STRING     "LAYER = EP2P, EVENT = Receiving, DEVICEPIN =
2032acb14, GMETAG = -1966209606, TYPE = 22, METHOD =
                         onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
EXP_TIMEOUT(S) = 3600, SIZE = 7312"

Thanks,

Daniel

Re: Hive question.

Posted by Carl Steinbach <ca...@cloudera.com>.
You can do a type conversion using the CAST UDF (while streaming the data
from one table to another). See the documentation here:
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Type_Conversion_Functions

Carl

On Thu, Feb 25, 2010 at 11:02 AM, Daniel Joanes <dj...@gmail.com> wrote:

> Awesome, that worked. From what I can tell the columns in my table have to
> be strings.. how would I use other data types?
>
>
> On Thu, Feb 25, 2010 at 1:49 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>
>> Hi Daniel,
>>
>> You can use the RegexSerDe to extract the fields embedded in the text. Try
>> looking at the examples in
>> contrib/src/test/queries/clientpositive/serde_regex.q
>>
>> Carl
>>
>>
>> On Thu, Feb 25, 2010 at 8:44 AM, Daniel Joanes <dj...@gmail.com> wrote:
>>
>>> If I have a line like the following:
>>>
>>> <2010-02-09 18:00:16.123 UTC>:[48394803]:<MDS-CS_MDS1>:<DEBUG>:<LAYER =
>>> EP2P, EVENT = Receiving, DEVICEPIN = 2032acb14, GMETAG = -1966209606, TYPE =
>>> 22, METHOD = onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
>>> EXP_TIMEOUT(S) = 3600, SIZE = 7312>
>>>
>>> What would be the best way to store it into a table like this:
>>>
>>> ts            STRING    "2010-02-09 18:00:16.123 UTC"
>>> epochtime     INT       "345093824"   // <-- I'm not sure how to do this
>>> column either
>>> requestId     INT       48394803
>>> component     STRING    "MDS-CS_MDS1"
>>> log_level     STRING    "DEBUG"
>>> properties    STRING     "LAYER = EP2P, EVENT = Receiving, DEVICEPIN =
>>> 2032acb14, GMETAG = -1966209606, TYPE = 22, METHOD =
>>>                          onInEp2p, DESTINATION = 24a69edf, CONFIRM =
>>> true, EXP_TIMEOUT(S) = 3600, SIZE = 7312"
>>>
>>> Thanks,
>>>
>>> Daniel
>>>
>>
>>
>

Re: Hive question.

Posted by Daniel Joanes <dj...@gmail.com>.
Awesome, that worked. From what I can tell the columns in my table have to
be strings.. how would I use other data types?

On Thu, Feb 25, 2010 at 1:49 PM, Carl Steinbach <ca...@cloudera.com> wrote:

> Hi Daniel,
>
> You can use the RegexSerDe to extract the fields embedded in the text. Try
> looking at the examples in
> contrib/src/test/queries/clientpositive/serde_regex.q
>
> Carl
>
>
> On Thu, Feb 25, 2010 at 8:44 AM, Daniel Joanes <dj...@gmail.com> wrote:
>
>> If I have a line like the following:
>>
>> <2010-02-09 18:00:16.123 UTC>:[48394803]:<MDS-CS_MDS1>:<DEBUG>:<LAYER =
>> EP2P, EVENT = Receiving, DEVICEPIN = 2032acb14, GMETAG = -1966209606, TYPE =
>> 22, METHOD = onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
>> EXP_TIMEOUT(S) = 3600, SIZE = 7312>
>>
>> What would be the best way to store it into a table like this:
>>
>> ts            STRING    "2010-02-09 18:00:16.123 UTC"
>> epochtime     INT       "345093824"   // <-- I'm not sure how to do this
>> column either
>> requestId     INT       48394803
>> component     STRING    "MDS-CS_MDS1"
>> log_level     STRING    "DEBUG"
>> properties    STRING     "LAYER = EP2P, EVENT = Receiving, DEVICEPIN =
>> 2032acb14, GMETAG = -1966209606, TYPE = 22, METHOD =
>>                          onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
>> EXP_TIMEOUT(S) = 3600, SIZE = 7312"
>>
>> Thanks,
>>
>> Daniel
>>
>
>

Re: Hive question.

Posted by Carl Steinbach <ca...@cloudera.com>.
Hi Daniel,

You can use the RegexSerDe to extract the fields embedded in the text. Try
looking at the examples in
contrib/src/test/queries/clientpositive/serde_regex.q

Carl

On Thu, Feb 25, 2010 at 8:44 AM, Daniel Joanes <dj...@gmail.com> wrote:

> If I have a line like the following:
>
> <2010-02-09 18:00:16.123 UTC>:[48394803]:<MDS-CS_MDS1>:<DEBUG>:<LAYER =
> EP2P, EVENT = Receiving, DEVICEPIN = 2032acb14, GMETAG = -1966209606, TYPE =
> 22, METHOD = onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
> EXP_TIMEOUT(S) = 3600, SIZE = 7312>
>
> What would be the best way to store it into a table like this:
>
> ts            STRING    "2010-02-09 18:00:16.123 UTC"
> epochtime     INT       "345093824"   // <-- I'm not sure how to do this
> column either
> requestId     INT       48394803
> component     STRING    "MDS-CS_MDS1"
> log_level     STRING    "DEBUG"
> properties    STRING     "LAYER = EP2P, EVENT = Receiving, DEVICEPIN =
> 2032acb14, GMETAG = -1966209606, TYPE = 22, METHOD =
>                          onInEp2p, DESTINATION = 24a69edf, CONFIRM = true,
> EXP_TIMEOUT(S) = 3600, SIZE = 7312"
>
> Thanks,
>
> Daniel
>