You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Malligarjunan S <ma...@gmail.com> on 2014/07/08 19:53:48 UTC

Hive UDF performance issue

Hello All,

Can any one help me to answer to my question posted on Stackoverflow?
http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
It is pretty urgent. Please help me.

Thanks and Regards,
Sankar S.

Re: Hive UDF performance issue

Posted by Navis류승우 <na...@nexr.com>.

It's cross producting. Not strange taking so much time even with small
tables.

Thanks,
Navis


2014-07-09 2:53 GMT+09:00 Malligarjunan S <ma...@gmail.com>:

> Hello All,
>
> Can any one help me to answer to my question posted on Stackoverflow?
> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
> It is pretty urgent. Please help me.
>
> Thanks and Regards,
> Sankar S.
>

Re: Hive UDF performance issue

Posted by Edward Capriolo <ed...@gmail.com>.

The "small" table can be any size. You want the small table to be
/path/to/table/b here because that will result in more parallelism. There
is a ticket on hive theta join that you might want to look at.


On Thu, Jul 10, 2014 at 10:23 PM, Malligarjunan S <ma...@gmail.com>
wrote:

> Hello Edwards,
>
> Thank you very much for the update.
> What size you mean is small table. In our case the small table will have
> minimum of 1 million records.
> Can we use this UDTF? how much time improvement will be there?
>
> Appreciate your help!
> Thanks and Regards
> SankarS
>
>
> On Thu, Jul 10, 2014 at 11:26 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> There is no magic. Hopefully one table is smaller then the other. You
>> could make a UDTF to do something like this MR job is doing
>>
>> Make a mapper that runs over table A.
>> InputFormat.setInputPath("/path/to/table/a")
>>
>> Then inside the mapper
>>
>> private Conf c
>> setup(Conf c){
>>   this.c = c
>> }
>> public void map(Text key, Text value, Collector c){
>>   FileSystem fs = Filesystem.get(c);
>>   file f =fs.open("/path/to/table/b")
>>   for (line in f){
>>     c.collect( value + line);
>>   }
>> }
>>
>>
>>
>> On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S <
>> malligarjunan@gmail.com> wrote:
>>
>>> Hello Edward,
>>>
>>> Thank you very much for helping me.
>>> I am new to hive.  Could you please provide the sample map reduce job?
>>>
>>> Regards,
>>> Sankar S
>>>
>>>
>>>
>>>
>>> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <ed...@gmail.com>
>>> wrote:
>>>
>>>> Hive cross product stinks . I have a map reduce job that will do it
>>>>
>>>>
>>>> On Wednesday, July 9, 2014, Navis류승우 <na...@nexr.com> wrote:
>>>>
>>>>> Yes, 2M x 1M makes 2T pairing in single reducer.
>>>>>
>>>>> Thanks,
>>>>> Navis
>>>>>
>>>>>
>>>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <ma...@gmail.com>:
>>>>>
>>>>>> Hello All,
>>>>>> Is that the expected behavior from hive to take so much of time?
>>>>>>
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Sankar S
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <
>>>>>> malligarjunan@gmail.com> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> Can any one help me to answer to my question posted on Stackoverflow?
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>>>>>>> It is pretty urgent. Please help me.
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Sankar S.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>>> than usual.
>>>>
>>>
>>>
>>
>

Re: Hive UDF performance issue

Posted by Malligarjunan S <ma...@gmail.com>.

Hello Edwards,

Thank you very much for the update.
What size you mean is small table. In our case the small table will have
minimum of 1 million records.
Can we use this UDTF? how much time improvement will be there?

Appreciate your help!
Thanks and Regards
SankarS


On Thu, Jul 10, 2014 at 11:26 PM, Edward Capriolo <ed...@gmail.com>
wrote:

> There is no magic. Hopefully one table is smaller then the other. You
> could make a UDTF to do something like this MR job is doing
>
> Make a mapper that runs over table A.
> InputFormat.setInputPath("/path/to/table/a")
>
> Then inside the mapper
>
> private Conf c
> setup(Conf c){
>   this.c = c
> }
> public void map(Text key, Text value, Collector c){
>   FileSystem fs = Filesystem.get(c);
>   file f =fs.open("/path/to/table/b")
>   for (line in f){
>     c.collect( value + line);
>   }
> }
>
>
>
> On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S <malligarjunan@gmail.com
> > wrote:
>
>> Hello Edward,
>>
>> Thank you very much for helping me.
>> I am new to hive.  Could you please provide the sample map reduce job?
>>
>> Regards,
>> Sankar S
>>
>>
>>
>>
>> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>>> Hive cross product stinks . I have a map reduce job that will do it
>>>
>>>
>>> On Wednesday, July 9, 2014, Navis류승우 <na...@nexr.com> wrote:
>>>
>>>> Yes, 2M x 1M makes 2T pairing in single reducer.
>>>>
>>>> Thanks,
>>>> Navis
>>>>
>>>>
>>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <ma...@gmail.com>:
>>>>
>>>>> Hello All,
>>>>> Is that the expected behavior from hive to take so much of time?
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>> Sankar S
>>>>>
>>>>>
>>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <
>>>>> malligarjunan@gmail.com> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> Can any one help me to answer to my question posted on Stackoverflow?
>>>>>>
>>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>>>>>> It is pretty urgent. Please help me.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Sankar S.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>> than usual.
>>>
>>
>>
>

Re: Hive UDF performance issue

Posted by Edward Capriolo <ed...@gmail.com>.

There is no magic. Hopefully one table is smaller then the other. You could
make a UDTF to do something like this MR job is doing

Make a mapper that runs over table A.
InputFormat.setInputPath("/path/to/table/a")

Then inside the mapper

private Conf c
setup(Conf c){
  this.c = c
}
public void map(Text key, Text value, Collector c){
  FileSystem fs = Filesystem.get(c);
  file f =fs.open("/path/to/table/b")
  for (line in f){
    c.collect( value + line);
  }
}



On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S <ma...@gmail.com>
wrote:

> Hello Edward,
>
> Thank you very much for helping me.
> I am new to hive.  Could you please provide the sample map reduce job?
>
> Regards,
> Sankar S
>
>
>
>
> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
>> Hive cross product stinks . I have a map reduce job that will do it
>>
>>
>> On Wednesday, July 9, 2014, Navis류승우 <na...@nexr.com> wrote:
>>
>>> Yes, 2M x 1M makes 2T pairing in single reducer.
>>>
>>> Thanks,
>>> Navis
>>>
>>>
>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <ma...@gmail.com>:
>>>
>>>> Hello All,
>>>> Is that the expected behavior from hive to take so much of time?
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Sankar S
>>>>
>>>>
>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <
>>>> malligarjunan@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> Can any one help me to answer to my question posted on Stackoverflow?
>>>>>
>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>>>>> It is pretty urgent. Please help me.
>>>>>
>>>>> Thanks and Regards,
>>>>> Sankar S.
>>>>>
>>>>
>>>>
>>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>

Re: Hive UDF performance issue

Posted by Malligarjunan S <ma...@gmail.com>.

Hello Edward,

Thank you very much for helping me.
I am new to hive.  Could you please provide the sample map reduce job?

Regards,
Sankar S




On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <ed...@gmail.com>
wrote:

> Hive cross product stinks . I have a map reduce job that will do it
>
>
> On Wednesday, July 9, 2014, Navis류승우 <na...@nexr.com> wrote:
>
>> Yes, 2M x 1M makes 2T pairing in single reducer.
>>
>> Thanks,
>> Navis
>>
>>
>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <ma...@gmail.com>:
>>
>>> Hello All,
>>> Is that the expected behavior from hive to take so much of time?
>>>
>>>
>>> Thanks and Regards,
>>> Sankar S
>>>
>>>
>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <
>>> malligarjunan@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> Can any one help me to answer to my question posted on Stackoverflow?
>>>>
>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>>>> It is pretty urgent. Please help me.
>>>>
>>>> Thanks and Regards,
>>>> Sankar S.
>>>>
>>>
>>>
>>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Hive UDF performance issue

Posted by Edward Capriolo <ed...@gmail.com>.

Hive cross product stinks . I have a map reduce job that will do it

On Wednesday, July 9, 2014, Navis류승우 <na...@nexr.com> wrote:

> Yes, 2M x 1M makes 2T pairing in single reducer.
>
> Thanks,
> Navis
>
>
> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <malligarjunan@gmail.com
> <javascript:_e(%7B%7D,'cvml','malligarjunan@gmail.com');>>:
>
>> Hello All,
>> Is that the expected behavior from hive to take so much of time?
>>
>>
>> Thanks and Regards,
>> Sankar S
>>
>>
>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <malligarjunan@gmail.com
>> <javascript:_e(%7B%7D,'cvml','malligarjunan@gmail.com');>> wrote:
>>
>>> Hello All,
>>>
>>> Can any one help me to answer to my question posted on Stackoverflow?
>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>>> It is pretty urgent. Please help me.
>>>
>>> Thanks and Regards,
>>> Sankar S.
>>>
>>
>>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Hive UDF performance issue

Posted by Navis류승우 <na...@nexr.com>.

Yes, 2M x 1M makes 2T pairing in single reducer.

Thanks,
Navis


2014-07-10 1:50 GMT+09:00 Malligarjunan S <ma...@gmail.com>:

> Hello All,
> Is that the expected behavior from hive to take so much of time?
>
>
> Thanks and Regards,
> Sankar S
>
>
> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <ma...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> Can any one help me to answer to my question posted on Stackoverflow?
>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
>> It is pretty urgent. Please help me.
>>
>> Thanks and Regards,
>> Sankar S.
>>
>
>

Re: Hive UDF performance issue

Posted by Malligarjunan S <ma...@gmail.com>.

Hello All,
Is that the expected behavior from hive to take so much of time?

Thanks and Regards,
Sankar S

On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S <ma...@gmail.com>
wrote:

> Hello All,
>
> Can any one help me to answer to my question posted on Stackoverflow?
> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow
> It is pretty urgent. Please help me.
>
> Thanks and Regards,
> Sankar S.
>