You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Fei Hu <hu...@gmail.com> on 2016/12/30 04:06:39 UTC
RDD Location
Dear all,
Is there any way to change the host location for a certain partition of RDD?
"protected def getPreferredLocations(split: Partition)" can be used to
initialize the location, but how to change it after the initialization?
Thanks,
Fei
Re: RDD Location
Posted by Fei Hu <hu...@gmail.com>.
It will be very appreciated if you can give more details about why runJob
function could not be called in getPreferredLocations()
In the NewHadoopRDD class and HadoopRDD class, they get the location
information from the inputSplit. But there may be an issue in NewHadoopRDD,
because it generates all of the inputSplits on the master node, which means
I can only use a single node to generate and filter the inputSplits even if
the number of inputSplits is huge. Will it be a performance bottleneck?
Thanks,
Fei
On Fri, Dec 30, 2016 at 10:41 PM, Sun Rui <su...@163.com> wrote:
> You can’t call runJob inside getPreferredLocations().
> You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu <hu...@gmail.com> wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocations() function:
>
> val results: Array[Array[DataChunkPartition]] = context.runJob(
> partitionsRDD, (context: TaskContext, partIter:
> Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
>
> But it seems to be suspended when executing this function. But if I move
> the code to other places, like the main() function, it runs well.
>
> What is the reason for it?
>
> Thanks,
> Fei
>
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <su...@163.com> wrote:
>
>> Maybe you can create your own subclass of RDD and override the
>> getPreferredLocations() to implement the logic of dynamic changing of the
>> locations.
>> > On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > Is there any way to change the host location for a certain partition of
>> RDD?
>> >
>> > "protected def getPreferredLocations(split: Partition)" can be used to
>> initialize the location, but how to change it after the initialization?
>> >
>> >
>> > Thanks,
>> > Fei
>> >
>> >
>>
>>
>>
>
>
Re: RDD Location
Posted by Fei Hu <hu...@gmail.com>.
It will be very appreciated if you can give more details about why runJob
function could not be called in getPreferredLocations()
In the NewHadoopRDD class and HadoopRDD class, they get the location
information from the inputSplit. But there may be an issue in NewHadoopRDD,
because it generates all of the inputSplits on the master node, which means
I can only use a single node to generate and filter the inputSplits even if
the number of inputSplits is huge. Will it be a performance bottleneck?
Thanks,
Fei
On Fri, Dec 30, 2016 at 10:41 PM, Sun Rui <su...@163.com> wrote:
> You can’t call runJob inside getPreferredLocations().
> You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu <hu...@gmail.com> wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocations() function:
>
> val results: Array[Array[DataChunkPartition]] = context.runJob(
> partitionsRDD, (context: TaskContext, partIter:
> Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
>
> But it seems to be suspended when executing this function. But if I move
> the code to other places, like the main() function, it runs well.
>
> What is the reason for it?
>
> Thanks,
> Fei
>
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <su...@163.com> wrote:
>
>> Maybe you can create your own subclass of RDD and override the
>> getPreferredLocations() to implement the logic of dynamic changing of the
>> locations.
>> > On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > Is there any way to change the host location for a certain partition of
>> RDD?
>> >
>> > "protected def getPreferredLocations(split: Partition)" can be used to
>> initialize the location, but how to change it after the initialization?
>> >
>> >
>> > Thanks,
>> > Fei
>> >
>> >
>>
>>
>>
>
>
Re: RDD Location
Posted by Fei Hu <hu...@gmail.com>.
It will be very appreciated if you can give more details about why runJob
function could not be called in getPreferredLocations()
In the NewHadoopRDD class and HadoopRDD class, they get the location
information from the inputSplit. But there may be an issue in NewHadoopRDD,
because it generates all of the inputSplits on the master node, which means
I can only use a single node to generate and filter the inputSplits even if
the number of inputSplits is huge. Will it be a performance bottleneck?
Thanks,
Fei
On Fri, Dec 30, 2016 at 10:41 PM, Sun Rui <su...@163.com> wrote:
> You can’t call runJob inside getPreferredLocations().
> You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu <hu...@gmail.com> wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocations() function:
>
> val results: Array[Array[DataChunkPartition]] = context.runJob(
> partitionsRDD, (context: TaskContext, partIter:
> Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
>
> But it seems to be suspended when executing this function. But if I move
> the code to other places, like the main() function, it runs well.
>
> What is the reason for it?
>
> Thanks,
> Fei
>
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <su...@163.com> wrote:
>
>> Maybe you can create your own subclass of RDD and override the
>> getPreferredLocations() to implement the logic of dynamic changing of the
>> locations.
>> > On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
>> >
>> > Dear all,
>> >
>> > Is there any way to change the host location for a certain partition of
>> RDD?
>> >
>> > "protected def getPreferredLocations(split: Partition)" can be used to
>> initialize the location, but how to change it after the initialization?
>> >
>> >
>> > Thanks,
>> > Fei
>> >
>> >
>>
>>
>>
>
>
Re: RDD Location
Posted by Sun Rui <su...@163.com>.
You can’t call runJob inside getPreferredLocations().
You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations() appropriately.
> On Dec 31, 2016, at 09:48, Fei Hu <hu...@gmail.com> wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocations() function:
>
> val results: Array[Array[DataChunkPartition]] = context.runJob(
> partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
>
> But it seems to be suspended when executing this function. But if I move the code to other places, like the main() function, it runs well.
>
> What is the reason for it?
>
> Thanks,
> Fei
>
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <sunrise_win@163.com <ma...@163.com>> wrote:
> Maybe you can create your own subclass of RDD and override the getPreferredLocations() to implement the logic of dynamic changing of the locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hufei68@gmail.com <ma...@gmail.com>> wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
>
>
>
Re: RDD Location
Posted by Sun Rui <su...@163.com>.
You can’t call runJob inside getPreferredLocations().
You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations() appropriately.
> On Dec 31, 2016, at 09:48, Fei Hu <hu...@gmail.com> wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocations() function:
>
> val results: Array[Array[DataChunkPartition]] = context.runJob(
> partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
>
> But it seems to be suspended when executing this function. But if I move the code to other places, like the main() function, it runs well.
>
> What is the reason for it?
>
> Thanks,
> Fei
>
> On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <sunrise_win@163.com <ma...@163.com>> wrote:
> Maybe you can create your own subclass of RDD and override the getPreferredLocations() to implement the logic of dynamic changing of the locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hufei68@gmail.com <ma...@gmail.com>> wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
>
>
>
Re: RDD Location
Posted by Fei Hu <hu...@gmail.com>.
That is a good idea.
I tried add the following code to get getPreferredLocations() function:
val results: Array[Array[DataChunkPartition]] = context.runJob(
partitionsRDD, (context: TaskContext, partIter:
Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
But it seems to be suspended when executing this function. But if I move
the code to other places, like the main() function, it runs well.
What is the reason for it?
Thanks,
Fei
On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <su...@163.com> wrote:
> Maybe you can create your own subclass of RDD and override the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of
> RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to
> initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
>
>
>
Re: RDD Location
Posted by Fei Hu <hu...@gmail.com>.
That is a good idea.
I tried add the following code to get getPreferredLocations() function:
val results: Array[Array[DataChunkPartition]] = context.runJob(
partitionsRDD, (context: TaskContext, partIter:
Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
But it seems to be suspended when executing this function. But if I move
the code to other places, like the main() function, it runs well.
What is the reason for it?
Thanks,
Fei
On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <su...@163.com> wrote:
> Maybe you can create your own subclass of RDD and override the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of
> RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to
> initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
>
>
>
Re: RDD Location
Posted by Fei Hu <hu...@gmail.com>.
That is a good idea.
I tried add the following code to get getPreferredLocations() function:
val results: Array[Array[DataChunkPartition]] = context.runJob(
partitionsRDD, (context: TaskContext, partIter:
Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
But it seems to be suspended when executing this function. But if I move
the code to other places, like the main() function, it runs well.
What is the reason for it?
Thanks,
Fei
On Fri, Dec 30, 2016 at 2:38 AM, Sun Rui <su...@163.com> wrote:
> Maybe you can create your own subclass of RDD and override the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of
> RDD?
> >
> > "protected def getPreferredLocations(split: Partition)" can be used to
> initialize the location, but how to change it after the initialization?
> >
> >
> > Thanks,
> > Fei
> >
> >
>
>
>
Re: RDD Location
Posted by Sun Rui <su...@163.com>.
Maybe you can create your own subclass of RDD and override the getPreferredLocations() to implement the logic of dynamic changing of the locations.
> On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
>
> Dear all,
>
> Is there any way to change the host location for a certain partition of RDD?
>
> "protected def getPreferredLocations(split: Partition)" can be used to initialize the location, but how to change it after the initialization?
>
>
> Thanks,
> Fei
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
Re: RDD Location
Posted by Sun Rui <su...@163.com>.
Maybe you can create your own subclass of RDD and override the getPreferredLocations() to implement the logic of dynamic changing of the locations.
> On Dec 30, 2016, at 12:06, Fei Hu <hu...@gmail.com> wrote:
>
> Dear all,
>
> Is there any way to change the host location for a certain partition of RDD?
>
> "protected def getPreferredLocations(split: Partition)" can be used to initialize the location, but how to change it after the initialization?
>
>
> Thanks,
> Fei
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org