You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/10/02 21:58:34 UTC

Accessing only particular folder using hadoop streaming

Hi,
    I have data in this one folder like following:

data-------shard1---d1_1
            |          |_d2_1
            Lshard2---d1_1
            |          |_d2_2
            Lshard3---d1_1
            |          |_d2_3
            Lshard4---d1_1
                       |_d2_4


Now, I want to search something in d1 (and excluding all the d2's) in it.
So how do i do that in python?
Thanks

Re: Accessing only particular folder using hadoop streaming

Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:

data/shard*/d1*

On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>     I have data in this one folder like following:
>
> data-------shard1---d1_1
>             |          |_d2_1
>             Lshard2---d1_1
>             |          |_d2_2
>             Lshard3---d1_1
>             |          |_d2_3
>             Lshard4---d1_1
>                        |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>



-- 
Harsh J

Re: Accessing only particular folder using hadoop streaming

Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:

data/shard*/d1*

On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>     I have data in this one folder like following:
>
> data-------shard1---d1_1
>             |          |_d2_1
>             Lshard2---d1_1
>             |          |_d2_2
>             Lshard3---d1_1
>             |          |_d2_3
>             Lshard4---d1_1
>                        |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>



-- 
Harsh J

Re: Accessing only particular folder using hadoop streaming

Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:

data/shard*/d1*

On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>     I have data in this one folder like following:
>
> data-------shard1---d1_1
>             |          |_d2_1
>             Lshard2---d1_1
>             |          |_d2_2
>             Lshard3---d1_1
>             |          |_d2_3
>             Lshard4---d1_1
>                        |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>



-- 
Harsh J

Re: Accessing only particular folder using hadoop streaming

Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:

data/shard*/d1*

On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>     I have data in this one folder like following:
>
> data-------shard1---d1_1
>             |          |_d2_1
>             Lshard2---d1_1
>             |          |_d2_2
>             Lshard3---d1_1
>             |          |_d2_3
>             Lshard4---d1_1
>                        |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>



-- 
Harsh J