You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Edward Capriolo <ed...@gmail.com> on 2009/09/16 20:16:24 UTC

Feature request: WHERE filename='x' or filename='y'

I am dumping files into a hive partion on five minute intervals. I am
using LOAD DATA into a partition.

weblogs
web1.00
web1.05
web1.10
...
web2.00
web2.05
web1.10
....

Things that would be useful..

Select files from the folder with a regex or exact name

select * FROM logs where FILENAME LIKE(WEB1*)

select * FROM LOGS WHERE FILENAME=web2.00

Also it would be nice to be able to select offsets in a file, this
would make sense with appends

select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]

Do these make sense to anyone?

Edward

Re: Feature request: WHERE filename='x' or filename='y'

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Sep 16, 2009 at 10:42 PM, 김영우 <wa...@gmail.com> wrote:
> Hi Edward,
>
> It would be nice and very useful. sometimes I want to select my own
> 'partition' or 'datafile' explicitly.. something like below:
>
> SELECT *  FROM weblogs PARTITION ('2009-09-17', '2009-09-18') WHERE
> col1='..' and col2= ...
>
> Or users can select data files from directory:
>
> SELECT *  FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and
> col2= ...
>
> Anyway, your idea is very cool!
>
> Youngwoo
>
> 2009/9/17 Edward Capriolo <ed...@gmail.com>
>>
>> I am dumping files into a hive partion on five minute intervals. I am
>> using LOAD DATA into a partition.
>>
>> weblogs
>> web1.00
>> web1.05
>> web1.10
>> ...
>> web2.00
>> web2.05
>> web1.10
>> ....
>>
>> Things that would be useful..
>>
>> Select files from the folder with a regex or exact name
>>
>> select * FROM logs where FILENAME LIKE(WEB1*)
>>
>> select * FROM LOGS WHERE FILENAME=web2.00
>>
>> Also it would be nice to be able to select offsets in a file, this
>> would make sense with appends
>>
>> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
>>
>> Do these make sense to anyone?
>>
>> Edward
>
>
I added your comments to
https://issues.apache.org/jira/browse/HIVE-837
Depending on how you are setup you can do this with a where clause

SELECT * FROM weblogs PARTITION ('2009-09-17'

For example I partion by date and by hour
partition (log_date_part string, log_hour_part string)

select * from table where log_date_part like ('2009%')

or

select * from table where log_date_part = '2009-05-05' OR
log_date_part = '2009-05-06'

So you should be able to do that already.

Re: Feature request: WHERE filename='x' or filename='y'

Posted by 김영우 <wa...@gmail.com>.
Hi Edward,

It would be nice and very useful. sometimes I want to select my own
'partition' or 'datafile' explicitly.. something like below:

SELECT *  FROM weblogs PARTITION ('2009-09-17', '2009-09-18') WHERE
col1='..' and col2= ...

Or users can select data files from directory:

SELECT *  FROM weblogs DATAFILE ('log1.txt', 'log2.txt') WHERE col1='..' and
col2= ...

Anyway, your idea is very cool!

Youngwoo

2009/9/17 Edward Capriolo <ed...@gmail.com>

> I am dumping files into a hive partion on five minute intervals. I am
> using LOAD DATA into a partition.
>
> weblogs
> web1.00
> web1.05
> web1.10
> ...
> web2.00
> web2.05
> web1.10
> ....
>
> Things that would be useful..
>
> Select files from the folder with a regex or exact name
>
> select * FROM logs where FILENAME LIKE(WEB1*)
>
> select * FROM LOGS WHERE FILENAME=web2.00
>
> Also it would be nice to be able to select offsets in a file, this
> would make sense with appends
>
> select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
>
> Do these make sense to anyone?
>
> Edward
>