You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by David Swearingen <ds...@42six.com> on 2012/09/01 13:52:11 UTC

Newbie - Hive Tutorial question, what format is the sample data file in?

I'm going through the tutorial at
https://cwiki.apache.org/Hive/tutorial.html .  It's not clear to me what
the exact format of the log file would be for the sample queries described
eg at https://cwiki.apache.org/Hive/tutorial.html#Tutorial-LoadingData  I
can't find a link to download such a file and while I'd be happy to
construct one myself it's not clear to me what a viewTime of type INT would
look like exactly.  Perhaps the file conforms to standard web server
logfile formats however there are I believe a couple of variants on that
format.

Am I missing something?  Thanks.

Re: Newbie - Hive Tutorial question, what format is the sample data file in?

Posted by David Swearingen <ds...@42six.com>.
Thanks!  I found it odd that someone would go to all this trouble
creating a nice tutorial and then leave it to the user to write data
generation code. Strange. Thanks again

Sent from my iPhone

On Sep 1, 2012, at 11:33 AM, Edward Capriolo <ed...@gmail.com> wrote:

> It is up to the user to decide what that INT means in this case. This
> tutorial was created very early on. Since then hive has added support
> for timestamp type which has a clear meaning.
>
> On Sat, Sep 1, 2012 at 10:33 AM, David Swearingen <ds...@42six.com> wrote:
>> Thanks. Still not clear to me what a time field is as an INT:
>> milleseconds since the epoch?  That was my question.
>>
>> Sent from my iPhone
>>
>> On Sep 1, 2012, at 9:37 AM, Edward Capriolo <ed...@gmail.com> wrote:
>>
>>> I do not think their is a sample file. You can tell the format by
>>> create table statement.
>>>
>>> COMMENT 'This is the staging page view table'
>>>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>>>   STORED AS TEXTFILE
>>>   LOCATION '/user/data/staging/page_view';
>>>
>>>
>>> http://www.asciitable.com/
>>>
>>> 12 is a '\n' and 44 is a ','. Since the format is TEXTFILE integers
>>> are serialized into strings.
>>>
>>>
>>>
>>> On Sat, Sep 1, 2012 at 7:52 AM, David Swearingen <ds...@42six.com> wrote:
>>>> I'm going through the tutorial at
>>>> https://cwiki.apache.org/Hive/tutorial.html .  It's not clear to me what the
>>>> exact format of the log file would be for the sample queries described eg at
>>>> https://cwiki.apache.org/Hive/tutorial.html#Tutorial-LoadingData  I can't
>>>> find a link to download such a file and while I'd be happy to construct one
>>>> myself it's not clear to me what a viewTime of type INT would look like
>>>> exactly.  Perhaps the file conforms to standard web server logfile formats
>>>> however there are I believe a couple of variants on that format.
>>>>
>>>> Am I missing something?  Thanks.

Re: Newbie - Hive Tutorial question, what format is the sample data file in?

Posted by Edward Capriolo <ed...@gmail.com>.
It is up to the user to decide what that INT means in this case. This
tutorial was created very early on. Since then hive has added support
for timestamp type which has a clear meaning.

On Sat, Sep 1, 2012 at 10:33 AM, David Swearingen <ds...@42six.com> wrote:
> Thanks. Still not clear to me what a time field is as an INT:
> milleseconds since the epoch?  That was my question.
>
> Sent from my iPhone
>
> On Sep 1, 2012, at 9:37 AM, Edward Capriolo <ed...@gmail.com> wrote:
>
>> I do not think their is a sample file. You can tell the format by
>> create table statement.
>>
>> COMMENT 'This is the staging page view table'
>>    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>>    STORED AS TEXTFILE
>>    LOCATION '/user/data/staging/page_view';
>>
>>
>> http://www.asciitable.com/
>>
>> 12 is a '\n' and 44 is a ','. Since the format is TEXTFILE integers
>> are serialized into strings.
>>
>>
>>
>> On Sat, Sep 1, 2012 at 7:52 AM, David Swearingen <ds...@42six.com> wrote:
>>> I'm going through the tutorial at
>>> https://cwiki.apache.org/Hive/tutorial.html .  It's not clear to me what the
>>> exact format of the log file would be for the sample queries described eg at
>>> https://cwiki.apache.org/Hive/tutorial.html#Tutorial-LoadingData  I can't
>>> find a link to download such a file and while I'd be happy to construct one
>>> myself it's not clear to me what a viewTime of type INT would look like
>>> exactly.  Perhaps the file conforms to standard web server logfile formats
>>> however there are I believe a couple of variants on that format.
>>>
>>> Am I missing something?  Thanks.

Re: Newbie - Hive Tutorial question, what format is the sample data file in?

Posted by David Swearingen <ds...@42six.com>.
Thanks. Still not clear to me what a time field is as an INT:
milleseconds since the epoch?  That was my question.

Sent from my iPhone

On Sep 1, 2012, at 9:37 AM, Edward Capriolo <ed...@gmail.com> wrote:

> I do not think their is a sample file. You can tell the format by
> create table statement.
>
> COMMENT 'This is the staging page view table'
>    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
>    STORED AS TEXTFILE
>    LOCATION '/user/data/staging/page_view';
>
>
> http://www.asciitable.com/
>
> 12 is a '\n' and 44 is a ','. Since the format is TEXTFILE integers
> are serialized into strings.
>
>
>
> On Sat, Sep 1, 2012 at 7:52 AM, David Swearingen <ds...@42six.com> wrote:
>> I'm going through the tutorial at
>> https://cwiki.apache.org/Hive/tutorial.html .  It's not clear to me what the
>> exact format of the log file would be for the sample queries described eg at
>> https://cwiki.apache.org/Hive/tutorial.html#Tutorial-LoadingData  I can't
>> find a link to download such a file and while I'd be happy to construct one
>> myself it's not clear to me what a viewTime of type INT would look like
>> exactly.  Perhaps the file conforms to standard web server logfile formats
>> however there are I believe a couple of variants on that format.
>>
>> Am I missing something?  Thanks.

Re: Newbie - Hive Tutorial question, what format is the sample data file in?

Posted by Edward Capriolo <ed...@gmail.com>.
I do not think their is a sample file. You can tell the format by
create table statement.

COMMENT 'This is the staging page view table'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '44' LINES TERMINATED BY '12'
    STORED AS TEXTFILE
    LOCATION '/user/data/staging/page_view';


http://www.asciitable.com/

12 is a '\n' and 44 is a ','. Since the format is TEXTFILE integers
are serialized into strings.



On Sat, Sep 1, 2012 at 7:52 AM, David Swearingen <ds...@42six.com> wrote:
> I'm going through the tutorial at
> https://cwiki.apache.org/Hive/tutorial.html .  It's not clear to me what the
> exact format of the log file would be for the sample queries described eg at
> https://cwiki.apache.org/Hive/tutorial.html#Tutorial-LoadingData  I can't
> find a link to download such a file and while I'd be happy to construct one
> myself it's not clear to me what a viewTime of type INT would look like
> exactly.  Perhaps the file conforms to standard web server logfile formats
> however there are I believe a couple of variants on that format.
>
> Am I missing something?  Thanks.