You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Vilhelm von Ehrenheim <vo...@gmail.com> on 2017/05/22 11:44:10 UTC

Validation of glob patterns in 2.0.0 python SDK

Hi!
Have anyone else had problems with glob patterns in 2.0? My pipelines are
failing w the `No files found based on the file pattern %s' % pattern`
error. Trying to run `FileSystems.match()` on patterns seem to only give me
results is there is no star in the pattern.

Am I missing something obvious here? Do I need to update some dependency or
something to get this working?

Thanks,
Vilhelm von Ehrenheim

Re: Validation of glob patterns in 2.0.0 python SDK

Posted by Vilhelm von Ehrenheim <vo...@gmail.com>.
Awesome! Thanks!

On 23 May 2017 21:07, "Sourabh Bajaj" <so...@google.com> wrote:

> Also before the release you can work-around this issue by passing
> "validate=False" to the source and that would prevent the pipeline from
> hitting this bug.
>
> -Sourabh
>
> On Tue, May 23, 2017 at 11:05 AM Sourabh Bajaj <so...@google.com>
> wrote:
>
>> Hi Vilhelm,
>>
>> Thank you for catching the issue, this is actually a problem in how the
>> limit is implemented. I have created https://github.com/
>> apache/beam/pull/3210 addressing this issue so should be fixed in the
>> upcoming release.
>>
>> -Sourabh
>>
>> On Tue, May 23, 2017 at 6:34 AM Peter Mueller <pm...@atso.com> wrote:
>>
>>> Vilhelm,
>>> Did you forget (as I did, and ran into same problem you describe... ) to
>>> add the required python package - pip install apache-beam[gcp] to
>>> access platform-specific Read.IOs? It looks like 'apache-beam' is going to
>>> be the 'core' and we'll need apache-beam[xx] to run on specific runners...
>>>
>>> Good luck...
>>>
>>> Peter Mueller
>>>
>>>
>>>
>>> On Tue, May 23, 2017 at 3:45 AM, Vilhelm von Ehrenheim <
>>> vonehrenheim@gmail.com> wrote:
>>>
>>>> It definitely matches files. I successfully ran the same pattern and
>>>> files in a batch job using v0.6.0. After digging deeper into this I created
>>>> a bug as it seems to be broken in 2.0: https://issues.apache.org/
>>>> jira/browse/BEAM-2338.
>>>>
>>>> On Mon, May 22, 2017 at 4:47 PM, Chamikara Jayalath <
>>>> chamikara@apache.org> wrote:
>>>>
>>>>> What is the glob pattern and the runner you are using ? Please note
>>>>> that FileBasedSource fails for empty glob-patterns. So make sore that your
>>>>> pattern matches to at least one file.
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>>
>>>>> On Mon, May 22, 2017 at 4:44 AM Vilhelm von Ehrenheim <
>>>>> vonehrenheim@gmail.com> wrote:
>>>>>
>>>>>> Hi!
>>>>>> Have anyone else had problems with glob patterns in 2.0? My pipelines
>>>>>> are failing w the `No files found based on the file pattern %s' % pattern`
>>>>>> error. Trying to run `FileSystems.match()` on patterns seem to only give me
>>>>>> results is there is no star in the pattern.
>>>>>>
>>>>>> Am I missing something obvious here? Do I need to update some
>>>>>> dependency or something to get this working?
>>>>>>
>>>>>> Thanks,
>>>>>> Vilhelm von Ehrenheim
>>>>>>
>>>>>
>>>>
>>>

Re: Validation of glob patterns in 2.0.0 python SDK

Posted by Sourabh Bajaj <so...@google.com>.
Also before the release you can work-around this issue by passing
"validate=False" to the source and that would prevent the pipeline from
hitting this bug.

-Sourabh

On Tue, May 23, 2017 at 11:05 AM Sourabh Bajaj <so...@google.com>
wrote:

> Hi Vilhelm,
>
> Thank you for catching the issue, this is actually a problem in how the
> limit is implemented. I have created
> https://github.com/apache/beam/pull/3210 addressing this issue so should
> be fixed in the upcoming release.
>
> -Sourabh
>
> On Tue, May 23, 2017 at 6:34 AM Peter Mueller <pm...@atso.com> wrote:
>
>> Vilhelm,
>> Did you forget (as I did, and ran into same problem you describe... ) to
>> add the required python package - pip install apache-beam[gcp] to access
>> platform-specific Read.IOs? It looks like 'apache-beam' is going to be the
>> 'core' and we'll need apache-beam[xx] to run on specific runners...
>>
>> Good luck...
>>
>> Peter Mueller
>>
>>
>>
>> On Tue, May 23, 2017 at 3:45 AM, Vilhelm von Ehrenheim <
>> vonehrenheim@gmail.com> wrote:
>>
>>> It definitely matches files. I successfully ran the same pattern and
>>> files in a batch job using v0.6.0. After digging deeper into this I created
>>> a bug as it seems to be broken in 2.0:
>>> https://issues.apache.org/jira/browse/BEAM-2338.
>>>
>>> On Mon, May 22, 2017 at 4:47 PM, Chamikara Jayalath <
>>> chamikara@apache.org> wrote:
>>>
>>>> What is the glob pattern and the runner you are using ? Please note
>>>> that FileBasedSource fails for empty glob-patterns. So make sore that your
>>>> pattern matches to at least one file.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>>
>>>> On Mon, May 22, 2017 at 4:44 AM Vilhelm von Ehrenheim <
>>>> vonehrenheim@gmail.com> wrote:
>>>>
>>>>> Hi!
>>>>> Have anyone else had problems with glob patterns in 2.0? My pipelines
>>>>> are failing w the `No files found based on the file pattern %s' % pattern`
>>>>> error. Trying to run `FileSystems.match()` on patterns seem to only give me
>>>>> results is there is no star in the pattern.
>>>>>
>>>>> Am I missing something obvious here? Do I need to update some
>>>>> dependency or something to get this working?
>>>>>
>>>>> Thanks,
>>>>> Vilhelm von Ehrenheim
>>>>>
>>>>
>>>
>>

Re: Validation of glob patterns in 2.0.0 python SDK

Posted by Sourabh Bajaj <so...@google.com>.
Hi Vilhelm,

Thank you for catching the issue, this is actually a problem in how the
limit is implemented. I have created
https://github.com/apache/beam/pull/3210 addressing this issue so should be
fixed in the upcoming release.

-Sourabh

On Tue, May 23, 2017 at 6:34 AM Peter Mueller <pm...@atso.com> wrote:

> Vilhelm,
> Did you forget (as I did, and ran into same problem you describe... ) to
> add the required python package - pip install apache-beam[gcp] to access
> platform-specific Read.IOs? It looks like 'apache-beam' is going to be the
> 'core' and we'll need apache-beam[xx] to run on specific runners...
>
> Good luck...
>
> Peter Mueller
>
>
>
> On Tue, May 23, 2017 at 3:45 AM, Vilhelm von Ehrenheim <
> vonehrenheim@gmail.com> wrote:
>
>> It definitely matches files. I successfully ran the same pattern and
>> files in a batch job using v0.6.0. After digging deeper into this I created
>> a bug as it seems to be broken in 2.0:
>> https://issues.apache.org/jira/browse/BEAM-2338.
>>
>> On Mon, May 22, 2017 at 4:47 PM, Chamikara Jayalath <chamikara@apache.org
>> > wrote:
>>
>>> What is the glob pattern and the runner you are using ? Please note that
>>> FileBasedSource fails for empty glob-patterns. So make sore that your
>>> pattern matches to at least one file.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>> On Mon, May 22, 2017 at 4:44 AM Vilhelm von Ehrenheim <
>>> vonehrenheim@gmail.com> wrote:
>>>
>>>> Hi!
>>>> Have anyone else had problems with glob patterns in 2.0? My pipelines
>>>> are failing w the `No files found based on the file pattern %s' % pattern`
>>>> error. Trying to run `FileSystems.match()` on patterns seem to only give me
>>>> results is there is no star in the pattern.
>>>>
>>>> Am I missing something obvious here? Do I need to update some
>>>> dependency or something to get this working?
>>>>
>>>> Thanks,
>>>> Vilhelm von Ehrenheim
>>>>
>>>
>>
>

Re: Validation of glob patterns in 2.0.0 python SDK

Posted by Peter Mueller <pm...@atso.com>.
Vilhelm,
Did you forget (as I did, and ran into same problem you describe... ) to
add the required python package - pip install apache-beam[gcp] to access
platform-specific Read.IOs? It looks like 'apache-beam' is going to be the
'core' and we'll need apache-beam[xx] to run on specific runners...

Good luck...

Peter Mueller



On Tue, May 23, 2017 at 3:45 AM, Vilhelm von Ehrenheim <
vonehrenheim@gmail.com> wrote:

> It definitely matches files. I successfully ran the same pattern and files
> in a batch job using v0.6.0. After digging deeper into this I created a bug
> as it seems to be broken in 2.0: https://issues.apache.org/
> jira/browse/BEAM-2338.
>
> On Mon, May 22, 2017 at 4:47 PM, Chamikara Jayalath <ch...@apache.org>
> wrote:
>
>> What is the glob pattern and the runner you are using ? Please note that
>> FileBasedSource fails for empty glob-patterns. So make sore that your
>> pattern matches to at least one file.
>>
>> Thanks,
>> Cham
>>
>>
>> On Mon, May 22, 2017 at 4:44 AM Vilhelm von Ehrenheim <
>> vonehrenheim@gmail.com> wrote:
>>
>>> Hi!
>>> Have anyone else had problems with glob patterns in 2.0? My pipelines
>>> are failing w the `No files found based on the file pattern %s' % pattern`
>>> error. Trying to run `FileSystems.match()` on patterns seem to only give me
>>> results is there is no star in the pattern.
>>>
>>> Am I missing something obvious here? Do I need to update some dependency
>>> or something to get this working?
>>>
>>> Thanks,
>>> Vilhelm von Ehrenheim
>>>
>>
>

Re: Validation of glob patterns in 2.0.0 python SDK

Posted by Vilhelm von Ehrenheim <vo...@gmail.com>.
It definitely matches files. I successfully ran the same pattern and files
in a batch job using v0.6.0. After digging deeper into this I created a bug
as it seems to be broken in 2.0:
https://issues.apache.org/jira/browse/BEAM-2338.

On Mon, May 22, 2017 at 4:47 PM, Chamikara Jayalath <ch...@apache.org>
wrote:

> What is the glob pattern and the runner you are using ? Please note that
> FileBasedSource fails for empty glob-patterns. So make sore that your
> pattern matches to at least one file.
>
> Thanks,
> Cham
>
>
> On Mon, May 22, 2017 at 4:44 AM Vilhelm von Ehrenheim <
> vonehrenheim@gmail.com> wrote:
>
>> Hi!
>> Have anyone else had problems with glob patterns in 2.0? My pipelines are
>> failing w the `No files found based on the file pattern %s' % pattern`
>> error. Trying to run `FileSystems.match()` on patterns seem to only give me
>> results is there is no star in the pattern.
>>
>> Am I missing something obvious here? Do I need to update some dependency
>> or something to get this working?
>>
>> Thanks,
>> Vilhelm von Ehrenheim
>>
>

Re: Validation of glob patterns in 2.0.0 python SDK

Posted by Chamikara Jayalath <ch...@apache.org>.
What is the glob pattern and the runner you are using ? Please note that
FileBasedSource fails for empty glob-patterns. So make sore that your
pattern matches to at least one file.

Thanks,
Cham

On Mon, May 22, 2017 at 4:44 AM Vilhelm von Ehrenheim <
vonehrenheim@gmail.com> wrote:

> Hi!
> Have anyone else had problems with glob patterns in 2.0? My pipelines are
> failing w the `No files found based on the file pattern %s' % pattern`
> error. Trying to run `FileSystems.match()` on patterns seem to only give me
> results is there is no star in the pattern.
>
> Am I missing something obvious here? Do I need to update some dependency
> or something to get this working?
>
> Thanks,
> Vilhelm von Ehrenheim
>