You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Gelesh (JIRA)" <ji...@apache.org> on 2012/08/03 17:58:03 UTC

[jira] [Created] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Gelesh created MAPREDUCE-4512:
---------------------------------

             Summary: TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
                 Key: MAPREDUCE-4512
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/mumak, mr-am, mrv1, mrv2, task
    Affects Versions: 2.0.0-alpha
         Environment: Lynux
            Reporter: Gelesh
             Fix For: 0.20.204.0


TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and reaming input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.

eg   delimiter ="record";
and Text = record 1:- name = "Gelesh" e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... 

Here string "=Bangalorrecord 3: " satisfy two condition 
1) contains the delimiter "record"
2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),

Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428214#comment-13428214 ] 

Hadoop QA commented on MAPREDUCE-4512:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12539059/MAPREDUCE-4512.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-common-project/hadoop-common.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2706//console

This message is automatically generated.
                
> TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4512
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak, mr-am, mrv1, mrv2, task
>    Affects Versions: 2.0.0-alpha
>         Environment: Lynux
>            Reporter: Gelesh
>              Labels: patch
>             Fix For: 0.20.204.0
>
>         Attachments: MAPREDUCE-4512.txt
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and reaming input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.
> eg   delimiter ="record";
> and Text = record 1:- name = "Gelesh" e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... 
> Here string "=Bangalorrecord 3: " satisfy two condition 
> 1) contains the delimiter "record"
> 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),
> Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Posted by "Bhallamudi Venkata Siva Kamesh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429142#comment-13429142 ] 

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-4512:
-----------------------------------------------------------

Please update the patch with a Testcase.
                
> TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4512
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak, mr-am, mrv1, mrv2, task
>    Affects Versions: 0.20.204.0, 0.21.0, 1.0.3, 2.0.0-alpha
>         Environment: Linux
>            Reporter: Gelesh
>              Labels: patch
>             Fix For: 0.20.204.0
>
>         Attachments: MAPREDUCE-4512.txt
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and the remaining input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.
> eg   delimiter ="record";
> and Text =" record 1:- name = Gelesh e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... " 
> Here string "=Bangalorrecord 3: " satisfy two conditions 
> 1) contains the delimiter "record"
> 2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),
> Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Posted by "Sonu Prathap (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428988#comment-13428988 ] 

Sonu Prathap commented on MAPREDUCE-4512:
-----------------------------------------

I am also facing the similar issue, Please help me to re create the fixed code using patch

                
> TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4512
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak, mr-am, mrv1, mrv2, task
>    Affects Versions: 0.20.204.0, 0.21.0, 1.0.3, 2.0.0-alpha
>         Environment: Linux
>            Reporter: Gelesh
>              Labels: patch
>             Fix For: 0.20.204.0
>
>         Attachments: MAPREDUCE-4512.txt
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and the remaining input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.
> eg   delimiter ="record";
> and Text =" record 1:- name = Gelesh e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... " 
> Here string "=Bangalorrecord 3: " satisfy two conditions 
> 1) contains the delimiter "record"
> 2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),
> Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Posted by "Gelesh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gelesh updated MAPREDUCE-4512:
------------------------------

    Status: Patch Available  (was: Open)

just one line of code change @ LineReader, would do. Tested
Any issues please let me know to help further
gelesh.hadoop@gmail.com
                
> TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4512
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak, mr-am, mrv1, mrv2, task
>    Affects Versions: 2.0.0-alpha
>         Environment: Lynux
>            Reporter: Gelesh
>              Labels: patch
>             Fix For: 0.20.204.0
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and reaming input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.
> eg   delimiter ="record";
> and Text = record 1:- name = "Gelesh" e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... 
> Here string "=Bangalorrecord 3: " satisfy two condition 
> 1) contains the delimiter "record"
> 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),
> Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Posted by "Gelesh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gelesh updated MAPREDUCE-4512:
------------------------------

          Description: 
TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and the remaining input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.

eg   delimiter ="record";
and Text =" record 1:- name = Gelesh e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... " 

Here string "=Bangalorrecord 3: " satisfy two conditions 
1) contains the delimiter "record"
2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),

Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter   

  was:
TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and reaming input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.

eg   delimiter ="record";
and Text = record 1:- name = "Gelesh" e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... 

Here string "=Bangalorrecord 3: " satisfy two condition 
1) contains the delimiter "record"
2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),

Hear the delimiter is skipped

          Environment: Linux  (was: Lynux)
    Affects Version/s: 0.20.204.0
                       0.21.0
                       1.0.3

Test case
input file text
record 1 name: Java Location:UAErecord 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala

Delimiter = "record"

expected values in map
 1 name: Java Location:UAE
 2 name:Gelesh Location:Bangalor
 3 name Hadoop Location:Kerala

Actual values received in map
 1 name: Java Location:UAE
 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala


                
> TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4512
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak, mr-am, mrv1, mrv2, task
>    Affects Versions: 0.20.204.0, 0.21.0, 1.0.3, 2.0.0-alpha
>         Environment: Linux
>            Reporter: Gelesh
>              Labels: patch
>             Fix For: 0.20.204.0
>
>         Attachments: MAPREDUCE-4512.txt
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and the remaining input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.
> eg   delimiter ="record";
> and Text =" record 1:- name = Gelesh e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... " 
> Here string "=Bangalorrecord 3: " satisfy two conditions 
> 1) contains the delimiter "record"
> 2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),
> Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

Posted by "Gelesh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gelesh updated MAPREDUCE-4512:
------------------------------

    Attachment: MAPREDUCE-4512.txt

Just One line code change at LineRecord. Tested  in case there is any issue please mail me gelesh.hadoop@gmail.com
                
> TextInputFormat delimiter  bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4512
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/mumak, mr-am, mrv1, mrv2, task
>    Affects Versions: 2.0.0-alpha
>         Environment: Lynux
>            Reporter: Gelesh
>              Labels: patch
>             Fix For: 0.20.204.0
>
>         Attachments: MAPREDUCE-4512.txt
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> TextInputFormat delimiter  bug scenario , a character sequence of the input text,  in which the first character matches with the first character of delimiter, and reaming input text character sequence  matches with the entire delimiter character sequence from the  starting position of the delimiter.
> eg   delimiter ="record";
> and Text = record 1:- name = "Gelesh" e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name .... 
> Here string "=Bangalorrecord 3: " satisfy two condition 
> 1) contains the delimiter "record"
> 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter.  (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),
> Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira