You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ravikant Dindokar <ra...@gmail.com> on 2015/06/25 19:09:21 UTC

how to assign unique ID (Long Value) in mapper

Hi Hadoop user,

I have a file containing one line for each edge in the graph with two
vertex ids (source & sink).
sample:
1    2 (here 1 is source and 2 is sink node for the edge)
1    5
2    3
4    2
4    3
I want to assign a unique Id (Long value )to each edge i.e for each line of
the file.

How to ensure assignment of unique value in distributed mapper process?

Note : File size is large, so using only one reducer is not feasible.

Thanks
Ravikant

Re: how to assign unique ID (Long Value) in mapper

Posted by Shahab Yunus <sh...@gmail.com>.

I see 2 issues here which go kind of against the architecture and idea of
M/R (or distributed and parallel programming models.)

1- The map and reduce tasks are suppose to be shared-nothing and
independent tasks. If you add a functionality like this where you need more
sure that some data is unique across all the map or reduce tasks then you
are no longer 'shared nothing' and letting go the advantages of that.

2- Consequence of #1, if you add a common data need between map or reduce
tasks, you are adding a bottleneck which will and can incur performance
issues. On top of that concurrency and race problems.

Having said that, perhaps zookeeper or a coordinating framework like that
could be used to achieve what you want, though I think the issues that I
highlighted above would still be true. It could be a very tricky design.

Just my 2 cents.

Regards,
Shahab

On Fri, Jun 26, 2015 at 5:29 AM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>> Hi Hadoop user,
>>>>
>>>> I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>> 1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>> I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>> How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>> Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>> Thanks
>>>> Ravikant
>>>>
>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Shahab Yunus <sh...@gmail.com>.

I see 2 issues here which go kind of against the architecture and idea of
M/R (or distributed and parallel programming models.)

1- The map and reduce tasks are suppose to be shared-nothing and
independent tasks. If you add a functionality like this where you need more
sure that some data is unique across all the map or reduce tasks then you
are no longer 'shared nothing' and letting go the advantages of that.

2- Consequence of #1, if you add a common data need between map or reduce
tasks, you are adding a bottleneck which will and can incur performance
issues. On top of that concurrency and race problems.

Having said that, perhaps zookeeper or a coordinating framework like that
could be used to achieve what you want, though I think the issues that I
highlighted above would still be true. It could be a very tricky design.

Just my 2 cents.

Regards,
Shahab

On Fri, Jun 26, 2015 at 5:29 AM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>> Hi Hadoop user,
>>>>
>>>> I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>> 1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>> I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>> How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>> Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>> Thanks
>>>> Ravikant
>>>>
>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

Thanks Gabriel.


On Tue, Jun 30, 2015 at 1:04 AM, gabriel balan <ga...@oracle.com>
wrote:

>  Hi
>
> Rather than trying to figure out the line number of the current line, you
> can use the byte offset of the current line.
> It's just as unique as the line number, and much easier to obtain:
> TextInputFormat (FileInputFormat) uses it as the key.
>
> Keys are the position in the file, and values are the line of text.
>
> If you have multiple files, you may want to combine the file offset with
> the file name (path) to get a unique id. See here how to get the input
> file name in the mapper
> <http://How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.
>
> hth
> Gabriel Balan
>
>
> On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
>
> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>>    Hi Hadoop user,
>>>>
>>>>  I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>>  1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>>  I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>>  How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>>  Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>>  Thanks
>>>>  Ravikant
>>>>
>>>
>>>
>>>
>>>  --
>>> Harshit Mathur
>>>
>>
>>
>
> --
> The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

Thanks Gabriel.


On Tue, Jun 30, 2015 at 1:04 AM, gabriel balan <ga...@oracle.com>
wrote:

>  Hi
>
> Rather than trying to figure out the line number of the current line, you
> can use the byte offset of the current line.
> It's just as unique as the line number, and much easier to obtain:
> TextInputFormat (FileInputFormat) uses it as the key.
>
> Keys are the position in the file, and values are the line of text.
>
> If you have multiple files, you may want to combine the file offset with
> the file name (path) to get a unique id. See here how to get the input
> file name in the mapper
> <http://How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.
>
> hth
> Gabriel Balan
>
>
> On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
>
> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>>    Hi Hadoop user,
>>>>
>>>>  I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>>  1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>>  I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>>  How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>>  Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>>  Thanks
>>>>  Ravikant
>>>>
>>>
>>>
>>>
>>>  --
>>> Harshit Mathur
>>>
>>
>>
>
> --
> The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

Thanks Gabriel.


On Tue, Jun 30, 2015 at 1:04 AM, gabriel balan <ga...@oracle.com>
wrote:

>  Hi
>
> Rather than trying to figure out the line number of the current line, you
> can use the byte offset of the current line.
> It's just as unique as the line number, and much easier to obtain:
> TextInputFormat (FileInputFormat) uses it as the key.
>
> Keys are the position in the file, and values are the line of text.
>
> If you have multiple files, you may want to combine the file offset with
> the file name (path) to get a unique id. See here how to get the input
> file name in the mapper
> <http://How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.
>
> hth
> Gabriel Balan
>
>
> On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
>
> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>>    Hi Hadoop user,
>>>>
>>>>  I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>>  1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>>  I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>>  How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>>  Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>>  Thanks
>>>>  Ravikant
>>>>
>>>
>>>
>>>
>>>  --
>>> Harshit Mathur
>>>
>>
>>
>
> --
> The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

Thanks Gabriel.


On Tue, Jun 30, 2015 at 1:04 AM, gabriel balan <ga...@oracle.com>
wrote:

>  Hi
>
> Rather than trying to figure out the line number of the current line, you
> can use the byte offset of the current line.
> It's just as unique as the line number, and much easier to obtain:
> TextInputFormat (FileInputFormat) uses it as the key.
>
> Keys are the position in the file, and values are the line of text.
>
> If you have multiple files, you may want to combine the file offset with
> the file name (path) to get a unique id. See here how to get the input
> file name in the mapper
> <http://How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.
>
> hth
> Gabriel Balan
>
>
> On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
>
> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>>    Hi Hadoop user,
>>>>
>>>>  I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>>  1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>>  I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>>  How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>>  Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>>  Thanks
>>>>  Ravikant
>>>>
>>>
>>>
>>>
>>>  --
>>> Harshit Mathur
>>>
>>
>>
>
> --
> The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by gabriel balan <ga...@oracle.com>.

Hi

Rather than trying to figure out the line number of the current line, you can use the byte offset of the current line.
It's just as unique as the line number, and much easier to obtain: TextInputFormat (FileInputFormat) uses it as the key.

    Keys are the position in the file, and values are the line of text.

If you have multiple files, you may want to combine the file offset with the file name (path) to get a unique id. See here how to get the input file name in the mapper <How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.

hth
Gabriel Balan

On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
> The problem can be thought as assigning line number for each line. Is there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>     yes , there can be loop in the graph
>
>     On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <mathursharp@gmail.com <ma...@gmail.com>> wrote:
>
>         Are there loops in your graph?
>
>
>         On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi Hadoop user,
>
>             I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
>             sample:
>             1    2 (here 1 is source and 2 is sink node for the edge)
>             1    5
>             2    3
>             4    2
>             4    3
>             I want to assign a unique Id (Long value )to each edge i.e for each line of the file.
>
>             How to ensure assignment of unique value in distributed mapper process?
>
>             Note : File size is large, so using only one reducer is not feasible.
>
>             Thanks
>             Ravikant
>
>
>
>
>         -- 
>         Harshit Mathur
>
>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.

Re: how to assign unique ID (Long Value) in mapper

Posted by Shahab Yunus <sh...@gmail.com>.

I see 2 issues here which go kind of against the architecture and idea of
M/R (or distributed and parallel programming models.)

1- The map and reduce tasks are suppose to be shared-nothing and
independent tasks. If you add a functionality like this where you need more
sure that some data is unique across all the map or reduce tasks then you
are no longer 'shared nothing' and letting go the advantages of that.

2- Consequence of #1, if you add a common data need between map or reduce
tasks, you are adding a bottleneck which will and can incur performance
issues. On top of that concurrency and race problems.

Having said that, perhaps zookeeper or a coordinating framework like that
could be used to achieve what you want, though I think the issues that I
highlighted above would still be true. It could be a very tricky design.

Just my 2 cents.

Regards,
Shahab

On Fri, Jun 26, 2015 at 5:29 AM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>> Hi Hadoop user,
>>>>
>>>> I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>> 1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>> I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>> How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>> Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>> Thanks
>>>> Ravikant
>>>>
>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by gabriel balan <ga...@oracle.com>.

Hi

Rather than trying to figure out the line number of the current line, you can use the byte offset of the current line.
It's just as unique as the line number, and much easier to obtain: TextInputFormat (FileInputFormat) uses it as the key.

    Keys are the position in the file, and values are the line of text.

If you have multiple files, you may want to combine the file offset with the file name (path) to get a unique id. See here how to get the input file name in the mapper <How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.

hth
Gabriel Balan

On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
> The problem can be thought as assigning line number for each line. Is there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>     yes , there can be loop in the graph
>
>     On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <mathursharp@gmail.com <ma...@gmail.com>> wrote:
>
>         Are there loops in your graph?
>
>
>         On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi Hadoop user,
>
>             I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
>             sample:
>             1    2 (here 1 is source and 2 is sink node for the edge)
>             1    5
>             2    3
>             4    2
>             4    3
>             I want to assign a unique Id (Long value )to each edge i.e for each line of the file.
>
>             How to ensure assignment of unique value in distributed mapper process?
>
>             Note : File size is large, so using only one reducer is not feasible.
>
>             Thanks
>             Ravikant
>
>
>
>
>         -- 
>         Harshit Mathur
>
>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.

Re: how to assign unique ID (Long Value) in mapper

Posted by gabriel balan <ga...@oracle.com>.

Hi

Rather than trying to figure out the line number of the current line, you can use the byte offset of the current line.
It's just as unique as the line number, and much easier to obtain: TextInputFormat (FileInputFormat) uses it as the key.

    Keys are the position in the file, and values are the line of text.

If you have multiple files, you may want to combine the file offset with the file name (path) to get a unique id. See here how to get the input file name in the mapper <How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.

hth
Gabriel Balan

On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
> The problem can be thought as assigning line number for each line. Is there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>     yes , there can be loop in the graph
>
>     On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <mathursharp@gmail.com <ma...@gmail.com>> wrote:
>
>         Are there loops in your graph?
>
>
>         On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi Hadoop user,
>
>             I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
>             sample:
>             1    2 (here 1 is source and 2 is sink node for the edge)
>             1    5
>             2    3
>             4    2
>             4    3
>             I want to assign a unique Id (Long value )to each edge i.e for each line of the file.
>
>             How to ensure assignment of unique value in distributed mapper process?
>
>             Note : File size is large, so using only one reducer is not feasible.
>
>             Thanks
>             Ravikant
>
>
>
>
>         -- 
>         Harshit Mathur
>
>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.

Re: how to assign unique ID (Long Value) in mapper

Posted by Shahab Yunus <sh...@gmail.com>.

I see 2 issues here which go kind of against the architecture and idea of
M/R (or distributed and parallel programming models.)

1- The map and reduce tasks are suppose to be shared-nothing and
independent tasks. If you add a functionality like this where you need more
sure that some data is unique across all the map or reduce tasks then you
are no longer 'shared nothing' and letting go the advantages of that.

2- Consequence of #1, if you add a common data need between map or reduce
tasks, you are adding a bottleneck which will and can incur performance
issues. On top of that concurrency and race problems.

Having said that, perhaps zookeeper or a coordinating framework like that
could be used to achieve what you want, though I think the issues that I
highlighted above would still be true. It could be a very tricky design.

Just my 2 cents.

Regards,
Shahab

On Fri, Jun 26, 2015 at 5:29 AM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> The problem can be thought as assigning line number for each line. Is
> there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> yes , there can be loop in the graph
>>
>> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
>> wrote:
>>
>>> Are there loops in your graph?
>>>
>>>
>>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>>> ravikant.iisc@gmail.com> wrote:
>>>
>>>> Hi Hadoop user,
>>>>
>>>> I have a file containing one line for each edge in the graph with two
>>>> vertex ids (source & sink).
>>>> sample:
>>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>>> 1    5
>>>> 2    3
>>>> 4    2
>>>> 4    3
>>>> I want to assign a unique Id (Long value )to each edge i.e for each
>>>> line of the file.
>>>>
>>>> How to ensure assignment of unique value in distributed mapper process?
>>>>
>>>> Note : File size is large, so using only one reducer is not feasible.
>>>>
>>>> Thanks
>>>> Ravikant
>>>>
>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by gabriel balan <ga...@oracle.com>.

Hi

Rather than trying to figure out the line number of the current line, you can use the byte offset of the current line.
It's just as unique as the line number, and much easier to obtain: TextInputFormat (FileInputFormat) uses it as the key.

    Keys are the position in the file, and values are the line of text.

If you have multiple files, you may want to combine the file offset with the file name (path) to get a unique id. See here how to get the input file name in the mapper <How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.

hth
Gabriel Balan

On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
> The problem can be thought as assigning line number for each line. Is there any inbuilt functionality in hadoop which can do this?
>
> On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>     yes , there can be loop in the graph
>
>     On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <mathursharp@gmail.com <ma...@gmail.com>> wrote:
>
>         Are there loops in your graph?
>
>
>         On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi Hadoop user,
>
>             I have a file containing one line for each edge in the graph with two vertex ids (source & sink).
>             sample:
>             1    2 (here 1 is source and 2 is sink node for the edge)
>             1    5
>             2    3
>             4    2
>             4    3
>             I want to assign a unique Id (Long value )to each edge i.e for each line of the file.
>
>             How to ensure assignment of unique value in distributed mapper process?
>
>             Note : File size is large, so using only one reducer is not feasible.
>
>             Thanks
>             Ravikant
>
>
>
>
>         -- 
>         Harshit Mathur
>
>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

The problem can be thought as assigning line number for each line. Is there
any inbuilt functionality in hadoop which can do this?

On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> yes , there can be loop in the graph
>
> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
> wrote:
>
>> Are there loops in your graph?
>>
>>
>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>> ravikant.iisc@gmail.com> wrote:
>>
>>> Hi Hadoop user,
>>>
>>> I have a file containing one line for each edge in the graph with two
>>> vertex ids (source & sink).
>>> sample:
>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>> 1    5
>>> 2    3
>>> 4    2
>>> 4    3
>>> I want to assign a unique Id (Long value )to each edge i.e for each line
>>> of the file.
>>>
>>> How to ensure assignment of unique value in distributed mapper process?
>>>
>>> Note : File size is large, so using only one reducer is not feasible.
>>>
>>> Thanks
>>> Ravikant
>>>
>>
>>
>>
>> --
>> Harshit Mathur
>>
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

The problem can be thought as assigning line number for each line. Is there
any inbuilt functionality in hadoop which can do this?

On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> yes , there can be loop in the graph
>
> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
> wrote:
>
>> Are there loops in your graph?
>>
>>
>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>> ravikant.iisc@gmail.com> wrote:
>>
>>> Hi Hadoop user,
>>>
>>> I have a file containing one line for each edge in the graph with two
>>> vertex ids (source & sink).
>>> sample:
>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>> 1    5
>>> 2    3
>>> 4    2
>>> 4    3
>>> I want to assign a unique Id (Long value )to each edge i.e for each line
>>> of the file.
>>>
>>> How to ensure assignment of unique value in distributed mapper process?
>>>
>>> Note : File size is large, so using only one reducer is not feasible.
>>>
>>> Thanks
>>> Ravikant
>>>
>>
>>
>>
>> --
>> Harshit Mathur
>>
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

The problem can be thought as assigning line number for each line. Is there
any inbuilt functionality in hadoop which can do this?

On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> yes , there can be loop in the graph
>
> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
> wrote:
>
>> Are there loops in your graph?
>>
>>
>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>> ravikant.iisc@gmail.com> wrote:
>>
>>> Hi Hadoop user,
>>>
>>> I have a file containing one line for each edge in the graph with two
>>> vertex ids (source & sink).
>>> sample:
>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>> 1    5
>>> 2    3
>>> 4    2
>>> 4    3
>>> I want to assign a unique Id (Long value )to each edge i.e for each line
>>> of the file.
>>>
>>> How to ensure assignment of unique value in distributed mapper process?
>>>
>>> Note : File size is large, so using only one reducer is not feasible.
>>>
>>> Thanks
>>> Ravikant
>>>
>>
>>
>>
>> --
>> Harshit Mathur
>>
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

The problem can be thought as assigning line number for each line. Is there
any inbuilt functionality in hadoop which can do this?

On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <ra...@gmail.com>
wrote:

> yes , there can be loop in the graph
>
> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
> wrote:
>
>> Are there loops in your graph?
>>
>>
>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
>> ravikant.iisc@gmail.com> wrote:
>>
>>> Hi Hadoop user,
>>>
>>> I have a file containing one line for each edge in the graph with two
>>> vertex ids (source & sink).
>>> sample:
>>> 1    2 (here 1 is source and 2 is sink node for the edge)
>>> 1    5
>>> 2    3
>>> 4    2
>>> 4    3
>>> I want to assign a unique Id (Long value )to each edge i.e for each line
>>> of the file.
>>>
>>> How to ensure assignment of unique value in distributed mapper process?
>>>
>>> Note : File size is large, so using only one reducer is not feasible.
>>>
>>> Thanks
>>> Ravikant
>>>
>>
>>
>>
>> --
>> Harshit Mathur
>>
>
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

yes , there can be loop in the graph

On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
wrote:

> Are there loops in your graph?
>
>
> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> Hi Hadoop user,
>>
>> I have a file containing one line for each edge in the graph with two
>> vertex ids (source & sink).
>> sample:
>> 1    2 (here 1 is source and 2 is sink node for the edge)
>> 1    5
>> 2    3
>> 4    2
>> 4    3
>> I want to assign a unique Id (Long value )to each edge i.e for each line
>> of the file.
>>
>> How to ensure assignment of unique value in distributed mapper process?
>>
>> Note : File size is large, so using only one reducer is not feasible.
>>
>> Thanks
>> Ravikant
>>
>
>
>
> --
> Harshit Mathur
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

yes , there can be loop in the graph

On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
wrote:

> Are there loops in your graph?
>
>
> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> Hi Hadoop user,
>>
>> I have a file containing one line for each edge in the graph with two
>> vertex ids (source & sink).
>> sample:
>> 1    2 (here 1 is source and 2 is sink node for the edge)
>> 1    5
>> 2    3
>> 4    2
>> 4    3
>> I want to assign a unique Id (Long value )to each edge i.e for each line
>> of the file.
>>
>> How to ensure assignment of unique value in distributed mapper process?
>>
>> Note : File size is large, so using only one reducer is not feasible.
>>
>> Thanks
>> Ravikant
>>
>
>
>
> --
> Harshit Mathur
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

yes , there can be loop in the graph

On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
wrote:

> Are there loops in your graph?
>
>
> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> Hi Hadoop user,
>>
>> I have a file containing one line for each edge in the graph with two
>> vertex ids (source & sink).
>> sample:
>> 1    2 (here 1 is source and 2 is sink node for the edge)
>> 1    5
>> 2    3
>> 4    2
>> 4    3
>> I want to assign a unique Id (Long value )to each edge i.e for each line
>> of the file.
>>
>> How to ensure assignment of unique value in distributed mapper process?
>>
>> Note : File size is large, so using only one reducer is not feasible.
>>
>> Thanks
>> Ravikant
>>
>
>
>
> --
> Harshit Mathur
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Ravikant Dindokar <ra...@gmail.com>.

yes , there can be loop in the graph

On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <ma...@gmail.com>
wrote:

> Are there loops in your graph?
>
>
> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <
> ravikant.iisc@gmail.com> wrote:
>
>> Hi Hadoop user,
>>
>> I have a file containing one line for each edge in the graph with two
>> vertex ids (source & sink).
>> sample:
>> 1    2 (here 1 is source and 2 is sink node for the edge)
>> 1    5
>> 2    3
>> 4    2
>> 4    3
>> I want to assign a unique Id (Long value )to each edge i.e for each line
>> of the file.
>>
>> How to ensure assignment of unique value in distributed mapper process?
>>
>> Note : File size is large, so using only one reducer is not feasible.
>>
>> Thanks
>> Ravikant
>>
>
>
>
> --
> Harshit Mathur
>

Re: how to assign unique ID (Long Value) in mapper

Posted by Harshit Mathur <ma...@gmail.com>.

Are there loops in your graph?


On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com
> wrote:

> Hi Hadoop user,
>
> I have a file containing one line for each edge in the graph with two
> vertex ids (source & sink).
> sample:
> 1    2 (here 1 is source and 2 is sink node for the edge)
> 1    5
> 2    3
> 4    2
> 4    3
> I want to assign a unique Id (Long value )to each edge i.e for each line
> of the file.
>
> How to ensure assignment of unique value in distributed mapper process?
>
> Note : File size is large, so using only one reducer is not feasible.
>
> Thanks
> Ravikant
>



-- 
Harshit Mathur

Re: how to assign unique ID (Long Value) in mapper

Posted by Harshit Mathur <ma...@gmail.com>.

Are there loops in your graph?


On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com
> wrote:

> Hi Hadoop user,
>
> I have a file containing one line for each edge in the graph with two
> vertex ids (source & sink).
> sample:
> 1    2 (here 1 is source and 2 is sink node for the edge)
> 1    5
> 2    3
> 4    2
> 4    3
> I want to assign a unique Id (Long value )to each edge i.e for each line
> of the file.
>
> How to ensure assignment of unique value in distributed mapper process?
>
> Note : File size is large, so using only one reducer is not feasible.
>
> Thanks
> Ravikant
>



-- 
Harshit Mathur

Re: how to assign unique ID (Long Value) in mapper

Posted by Harshit Mathur <ma...@gmail.com>.

Are there loops in your graph?


On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com
> wrote:

> Hi Hadoop user,
>
> I have a file containing one line for each edge in the graph with two
> vertex ids (source & sink).
> sample:
> 1    2 (here 1 is source and 2 is sink node for the edge)
> 1    5
> 2    3
> 4    2
> 4    3
> I want to assign a unique Id (Long value )to each edge i.e for each line
> of the file.
>
> How to ensure assignment of unique value in distributed mapper process?
>
> Note : File size is large, so using only one reducer is not feasible.
>
> Thanks
> Ravikant
>



-- 
Harshit Mathur

Re: how to assign unique ID (Long Value) in mapper

Posted by Harshit Mathur <ma...@gmail.com>.

Are there loops in your graph?


On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <ravikant.iisc@gmail.com
> wrote:

> Hi Hadoop user,
>
> I have a file containing one line for each edge in the graph with two
> vertex ids (source & sink).
> sample:
> 1    2 (here 1 is source and 2 is sink node for the edge)
> 1    5
> 2    3
> 4    2
> 4    3
> I want to assign a unique Id (Long value )to each edge i.e for each line
> of the file.
>
> How to ensure assignment of unique value in distributed mapper process?
>
> Note : File size is large, so using only one reducer is not feasible.
>
> Thanks
> Ravikant
>



-- 
Harshit Mathur