You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2014/10/17 11:25:33 UTC

[jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Robert Metzger created FLINK-1170:
-------------------------------------

             Summary: Localization of InputSplits is not working properly
                 Key: FLINK-1170
                 URL: https://issues.apache.org/jira/browse/FLINK-1170
             Project: Flink
          Issue Type: Bug
          Components: Distributed Runtime
            Reporter: Robert Metzger
            Assignee: Robert Metzger


While running some benchmarks, I found that Flink is not properly assigning the InputSplits.

On my testing cluster, ALL splits were assigned to remote HDFS DataNodes, which causes a lot of network I/O.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Kostas Tzoumas <kt...@apache.org>.

I agree with Fabian. We need to fix this issue, and this would mean extra
overhead for releasing 0.7.1 asap perhaps just for this bug. I vote to
cancel the incubator release thread and vote again here.

On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <fh...@apache.org> wrote:

> Yes, that was intentionally.
>
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
>
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <rm...@apache.org>:
>
> > Did you intentionally post to the mailing list?
> >
> > I'm investigating the issue.
> > So far, I found that the hostname has never been passed to the input
> split
> > assigner. I guess this issue was introduced by the recent jobmanager
> > changes.
> > And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> > using the hostname only. This caused a string-mismatch.
> >
> > I wouln't cancel the release because we are at a point where it is faster
> > to vote a bugfix release.
> > The issue is not a show stopper for using flink. Its just slow on large
> > datasets.
> >
> > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fh...@apache.org>
> > wrote:
> >
> > > This is a critical issue and sounds bit like a release blocker for 0.7
> to
> > > me.
> > >
> > > Other opinions?
> > >
> > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:
> > >
> > > > Robert Metzger created FLINK-1170:
> > > > -------------------------------------
> > > >
> > > >              Summary: Localization of InputSplits is not working
> > properly
> > > >                  Key: FLINK-1170
> > > >                  URL:
> https://issues.apache.org/jira/browse/FLINK-1170
> > > >              Project: Flink
> > > >           Issue Type: Bug
> > > >           Components: Distributed Runtime
> > > >             Reporter: Robert Metzger
> > > >             Assignee: Robert Metzger
> > > >
> > > >
> > > > While running some benchmarks, I found that Flink is not properly
> > > > assigning the InputSplits.
> > > >
> > > > On my testing cluster, ALL splits were assigned to remote HDFS
> > DataNodes,
> > > > which causes a lot of network I/O.
> > > >
> > > >
> > > >
> > > > --
> > > > This message was sent by Atlassian JIRA
> > > > (v6.3.4#6332)
> > > >
> > >
> >
>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Robert Metzger <rm...@apache.org>.

Okay. I see the point.

I'll write on general@incubator to cancel the vote.

On Fri, Oct 17, 2014 at 1:03 PM, Ufuk Celebi <uc...@apache.org> wrote:

> I agree with Fabian on this. Let's cancel the release and create a new RC.
>
> On 17 Oct 2014, at 12:11, Fabian Hueske <fh...@apache.org> wrote:
>
> > Yes, that was intentionally.
> >
> > The whole point of using a parallel engine is to process large datasets.
> > Otherwise you could do it in Python on a single box...
> > Remote reads will severely impact the performance and might cause
> > significant performance regression.
> >
> > 2014-10-17 12:04 GMT+02:00 Robert Metzger <rm...@apache.org>:
> >
> >> Did you intentionally post to the mailing list?
> >>
> >> I'm investigating the issue.
> >> So far, I found that the hostname has never been passed to the input
> split
> >> assigner. I guess this issue was introduced by the recent jobmanager
> >> changes.
> >> And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> >> using the hostname only. This caused a string-mismatch.
> >>
> >> I wouln't cancel the release because we are at a point where it is
> faster
> >> to vote a bugfix release.
> >> The issue is not a show stopper for using flink. Its just slow on large
> >> datasets.
> >>
> >> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fh...@apache.org>
> >> wrote:
> >>
> >>> This is a critical issue and sounds bit like a release blocker for 0.7
> to
> >>> me.
> >>>
> >>> Other opinions?
> >>>
> >>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:
> >>>
> >>>> Robert Metzger created FLINK-1170:
> >>>> -------------------------------------
> >>>>
> >>>>             Summary: Localization of InputSplits is not working
> >> properly
> >>>>                 Key: FLINK-1170
> >>>>                 URL: https://issues.apache.org/jira/browse/FLINK-1170
> >>>>             Project: Flink
> >>>>          Issue Type: Bug
> >>>>          Components: Distributed Runtime
> >>>>            Reporter: Robert Metzger
> >>>>            Assignee: Robert Metzger
> >>>>
> >>>>
> >>>> While running some benchmarks, I found that Flink is not properly
> >>>> assigning the InputSplits.
> >>>>
> >>>> On my testing cluster, ALL splits were assigned to remote HDFS
> >> DataNodes,
> >>>> which causes a lot of network I/O.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message was sent by Atlassian JIRA
> >>>> (v6.3.4#6332)
> >>>>
> >>>
> >>
>
>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Ufuk Celebi <uc...@apache.org>.

I agree with Fabian on this. Let's cancel the release and create a new RC.

On 17 Oct 2014, at 12:11, Fabian Hueske <fh...@apache.org> wrote:

> Yes, that was intentionally.
> 
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
> 
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <rm...@apache.org>:
> 
>> Did you intentionally post to the mailing list?
>> 
>> I'm investigating the issue.
>> So far, I found that the hostname has never been passed to the input split
>> assigner. I guess this issue was introduced by the recent jobmanager
>> changes.
>> And secondly, Flink is using the fully qualified hostname, whereas HDFS is
>> using the hostname only. This caused a string-mismatch.
>> 
>> I wouln't cancel the release because we are at a point where it is faster
>> to vote a bugfix release.
>> The issue is not a show stopper for using flink. Its just slow on large
>> datasets.
>> 
>> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fh...@apache.org>
>> wrote:
>> 
>>> This is a critical issue and sounds bit like a release blocker for 0.7 to
>>> me.
>>> 
>>> Other opinions?
>>> 
>>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:
>>> 
>>>> Robert Metzger created FLINK-1170:
>>>> -------------------------------------
>>>> 
>>>>             Summary: Localization of InputSplits is not working
>> properly
>>>>                 Key: FLINK-1170
>>>>                 URL: https://issues.apache.org/jira/browse/FLINK-1170
>>>>             Project: Flink
>>>>          Issue Type: Bug
>>>>          Components: Distributed Runtime
>>>>            Reporter: Robert Metzger
>>>>            Assignee: Robert Metzger
>>>> 
>>>> 
>>>> While running some benchmarks, I found that Flink is not properly
>>>> assigning the InputSplits.
>>>> 
>>>> On my testing cluster, ALL splits were assigned to remote HDFS
>> DataNodes,
>>>> which causes a lot of network I/O.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>> 
>>> 
>>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Stephan Ewen <se...@apache.org>.

I agree, we should cancel the release, fix this, and make a new release
candidate.

Stephan


On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <fh...@apache.org> wrote:

> Yes, that was intentionally.
>
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
>
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <rm...@apache.org>:
>
> > Did you intentionally post to the mailing list?
> >
> > I'm investigating the issue.
> > So far, I found that the hostname has never been passed to the input
> split
> > assigner. I guess this issue was introduced by the recent jobmanager
> > changes.
> > And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> > using the hostname only. This caused a string-mismatch.
> >
> > I wouln't cancel the release because we are at a point where it is faster
> > to vote a bugfix release.
> > The issue is not a show stopper for using flink. Its just slow on large
> > datasets.
> >
> > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fh...@apache.org>
> > wrote:
> >
> > > This is a critical issue and sounds bit like a release blocker for 0.7
> to
> > > me.
> > >
> > > Other opinions?
> > >
> > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:
> > >
> > > > Robert Metzger created FLINK-1170:
> > > > -------------------------------------
> > > >
> > > >              Summary: Localization of InputSplits is not working
> > properly
> > > >                  Key: FLINK-1170
> > > >                  URL:
> https://issues.apache.org/jira/browse/FLINK-1170
> > > >              Project: Flink
> > > >           Issue Type: Bug
> > > >           Components: Distributed Runtime
> > > >             Reporter: Robert Metzger
> > > >             Assignee: Robert Metzger
> > > >
> > > >
> > > > While running some benchmarks, I found that Flink is not properly
> > > > assigning the InputSplits.
> > > >
> > > > On my testing cluster, ALL splits were assigned to remote HDFS
> > DataNodes,
> > > > which causes a lot of network I/O.
> > > >
> > > >
> > > >
> > > > --
> > > > This message was sent by Atlassian JIRA
> > > > (v6.3.4#6332)
> > > >
> > >
> >
>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Fabian Hueske <fh...@apache.org>.

Yes, that was intentionally.

The whole point of using a parallel engine is to process large datasets.
Otherwise you could do it in Python on a single box...
Remote reads will severely impact the performance and might cause
significant performance regression.

2014-10-17 12:04 GMT+02:00 Robert Metzger <rm...@apache.org>:

> Did you intentionally post to the mailing list?
>
> I'm investigating the issue.
> So far, I found that the hostname has never been passed to the input split
> assigner. I guess this issue was introduced by the recent jobmanager
> changes.
> And secondly, Flink is using the fully qualified hostname, whereas HDFS is
> using the hostname only. This caused a string-mismatch.
>
> I wouln't cancel the release because we are at a point where it is faster
> to vote a bugfix release.
> The issue is not a show stopper for using flink. Its just slow on large
> datasets.
>
> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fh...@apache.org>
> wrote:
>
> > This is a critical issue and sounds bit like a release blocker for 0.7 to
> > me.
> >
> > Other opinions?
> >
> > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:
> >
> > > Robert Metzger created FLINK-1170:
> > > -------------------------------------
> > >
> > >              Summary: Localization of InputSplits is not working
> properly
> > >                  Key: FLINK-1170
> > >                  URL: https://issues.apache.org/jira/browse/FLINK-1170
> > >              Project: Flink
> > >           Issue Type: Bug
> > >           Components: Distributed Runtime
> > >             Reporter: Robert Metzger
> > >             Assignee: Robert Metzger
> > >
> > >
> > > While running some benchmarks, I found that Flink is not properly
> > > assigning the InputSplits.
> > >
> > > On my testing cluster, ALL splits were assigned to remote HDFS
> DataNodes,
> > > which causes a lot of network I/O.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
> >
>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Robert Metzger <rm...@apache.org>.

Did you intentionally post to the mailing list?

I'm investigating the issue.
So far, I found that the hostname has never been passed to the input split
assigner. I guess this issue was introduced by the recent jobmanager
changes.
And secondly, Flink is using the fully qualified hostname, whereas HDFS is
using the hostname only. This caused a string-mismatch.

I wouln't cancel the release because we are at a point where it is faster
to vote a bugfix release.
The issue is not a show stopper for using flink. Its just slow on large
datasets.

On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <fh...@apache.org> wrote:

> This is a critical issue and sounds bit like a release blocker for 0.7 to
> me.
>
> Other opinions?
>
> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:
>
> > Robert Metzger created FLINK-1170:
> > -------------------------------------
> >
> >              Summary: Localization of InputSplits is not working properly
> >                  Key: FLINK-1170
> >                  URL: https://issues.apache.org/jira/browse/FLINK-1170
> >              Project: Flink
> >           Issue Type: Bug
> >           Components: Distributed Runtime
> >             Reporter: Robert Metzger
> >             Assignee: Robert Metzger
> >
> >
> > While running some benchmarks, I found that Flink is not properly
> > assigning the InputSplits.
> >
> > On my testing cluster, ALL splits were assigned to remote HDFS DataNodes,
> > which causes a lot of network I/O.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Posted by Fabian Hueske <fh...@apache.org>.

This is a critical issue and sounds bit like a release blocker for 0.7 to
me.

Other opinions?

2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <ji...@apache.org>:

> Robert Metzger created FLINK-1170:
> -------------------------------------
>
>              Summary: Localization of InputSplits is not working properly
>                  Key: FLINK-1170
>                  URL: https://issues.apache.org/jira/browse/FLINK-1170
>              Project: Flink
>           Issue Type: Bug
>           Components: Distributed Runtime
>             Reporter: Robert Metzger
>             Assignee: Robert Metzger
>
>
> While running some benchmarks, I found that Flink is not properly
> assigning the InputSplits.
>
> On my testing cluster, ALL splits were assigned to remote HDFS DataNodes,
> which causes a lot of network I/O.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>