You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/10/02 21:58:34 UTC
Accessing only particular folder using hadoop streaming
Hi,
I have data in this one folder like following:
data-------shard1---d1_1
| |_d2_1
Lshard2---d1_1
| |_d2_2
Lshard3---d1_1
| |_d2_3
Lshard4---d1_1
|_d2_4
Now, I want to search something in d1 (and excluding all the d2's) in it.
So how do i do that in python?
Thanks
Re: Accessing only particular folder using hadoop streaming
Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:
data/shard*/d1*
On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
> I have data in this one folder like following:
>
> data-------shard1---d1_1
> | |_d2_1
> Lshard2---d1_1
> | |_d2_2
> Lshard3---d1_1
> | |_d2_3
> Lshard4---d1_1
> |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>
--
Harsh J
Re: Accessing only particular folder using hadoop streaming
Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:
data/shard*/d1*
On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
> I have data in this one folder like following:
>
> data-------shard1---d1_1
> | |_d2_1
> Lshard2---d1_1
> | |_d2_2
> Lshard3---d1_1
> | |_d2_3
> Lshard4---d1_1
> |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>
--
Harsh J
Re: Accessing only particular folder using hadoop streaming
Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:
data/shard*/d1*
On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
> I have data in this one folder like following:
>
> data-------shard1---d1_1
> | |_d2_1
> Lshard2---d1_1
> | |_d2_2
> Lshard3---d1_1
> | |_d2_3
> Lshard4---d1_1
> |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>
--
Harsh J
Re: Accessing only particular folder using hadoop streaming
Posted by Harsh J <ha...@cloudera.com>.
You need to use globs when passing your input path, like below perhaps:
data/shard*/d1*
On Thu, Oct 3, 2013 at 1:28 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
> I have data in this one folder like following:
>
> data-------shard1---d1_1
> | |_d2_1
> Lshard2---d1_1
> | |_d2_2
> Lshard3---d1_1
> | |_d2_3
> Lshard4---d1_1
> |_d2_4
>
>
> Now, I want to search something in d1 (and excluding all the d2's) in it.
> So how do i do that in python?
> Thanks
>
--
Harsh J