You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Colin Patrick McCabe (JIRA)" <ji...@apache.org> on 2012/07/06 08:29:33 UTC

[jira] [Created] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Colin Patrick McCabe created HADOOP-8569:
--------------------------------------------

             Summary: CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
                 Key: HADOOP-8569
                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Colin Patrick McCabe
            Assignee: Colin Patrick McCabe
            Priority: Minor


In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.

_LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Colin Patrick McCabe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409644#comment-13409644 ] 

Colin Patrick McCabe commented on HADOOP-8569:
----------------------------------------------

bq. The disadvantage is that libhdfs currently compiles on non-gnu systems and this breaks that.

Defining _GNU_SOURCE doesn't break the compile on any systems.  Nobody should be checking for this macro except on Linux.  If they are, then that's a bug on their part, which we can work around like this:

{code}
IF (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_GNU_SOURCE")
ENDIF()
{code}

I guess maybe we should do it like that.  Although it shouldn't matter.

bq. What functions are we using that are currently being hidden that are only declared via _gnu_source? If the above is true we should be able to use these via including std headers.

man sync_file_range
{code}
NAME
       sync_file_range - sync a file segment with disk

SYNOPSIS
       #define _GNU_SOURCE         /* See feature_test_macros(7) */
       #include <fcntl.h>

       int sync_file_range(int fd, off64_t offset, off64_t nbytes,
                           unsigned int flags);
{code}

As you can see, the man page tells you to define _GNU_SOURCE in order to make this function visible.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Colin Patrick McCabe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409071#comment-13409071 ] 

Colin Patrick McCabe commented on HADOOP-8569:
----------------------------------------------

What I'm afraid of is issues where some function or feature is not detected as present because _GNU_SOURCE was not defined.  Unfortunately, when you artificially hide functions, these kind of issues are all too common, and they can prove difficult to diagnose.

There's really no disadvantage to defining _GNU_SOURCE that I'm aware of, so most projects just define it everywhere.  I think we should too.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409020#comment-13409020 ] 

Eli Collins commented on HADOOP-8569:
-------------------------------------

How about doing this after HDFS-3537 and enabling _GNU_SOURCE for only the fuse-dfs build? I  think it's the only native src that requires _GNU_SOURCE.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Colin Patrick McCabe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Colin Patrick McCabe updated HADOOP-8569:
-----------------------------------------

    Attachment: HADOOP-8569.001.patch
    
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Andy Isaacson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415584#comment-13415584 ] 

Andy Isaacson commented on HADOOP-8569:
---------------------------------------

The downside to defining {{_GNU_SOURCE}} everywhere -- in both code that is known to need it, and in code that is intended to be portable -- is that it makes it very easy to accidentally break compilation on non-GNU platforms by unintentionally changing code from "portable" to "gnu-specific".  Suppose we have a project with {{foo-linux.c}} containing nonportable code, and {{generic.c}} containing POSIX portable code.  If I define {{_GNU_SOURCE}} in {{CFLAGS}} then unintentionally adding a call to {{sync_file_range}} to {{generic.c}} will silently work on Linux, and won't break the build until you try building on Darwin or Solaris or whatever.

If instead the project puts {{#define _GNU_SOURCE}} at the top of files intended to be platform-specific, then such portability breakage will be noticed immediately.

The argument doesn't extend to LFS support -- it's entirely reasonable to use 64-bit-{{off_t}} everywhere including Linux-32.

{code}
+# note: can't enable -D_FILE_OFFSET_BITS=64: see MAPREDUCE-4258
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_REENTRANT -D_GNU_SOURCE")
+set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_LARGEFILE_SOURCE")
{code}
If we don't have FILE_OFFSET_BITS=64, then shouldn't we also leave out LARGEFILE_SOURCE?  (This is a serious question, I don't know how those two defines interact, I just vaguely remember that there's a complicated rule about how they should be used.)

In summary -- I'd slightly prefer to limit the GNU_SOURCE to a {{#define}} in the files that we intend to be Linux-specific.  The rest of this patch is good though, cleaning up the LFS defines and adding that comment about MAPREDUCE-4258.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409085#comment-13409085 ] 

Eli Collins commented on HADOOP-8569:
-------------------------------------

The disadvantage is that libhdfs currently compiles on non-gnu systems and this breaks that. What functions are we using that are currently being hidden that are only declared via _gnu_source? If the above is true we should be able to use these via including std headers.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408016#comment-13408016 ] 

Hadoop QA commented on HADOOP-8569:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535323/HADOOP-8569.001.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

                  org.apache.hadoop.hdfs.TestDatanodeBlockScanner
                  org.apache.hadoop.hdfs.TestHDFSTrash

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1175//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1175//console

This message is automatically generated.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Colin Patrick McCabe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415619#comment-13415619 ] 

Colin Patrick McCabe commented on HADOOP-8569:
----------------------------------------------

_GNU_SOURCE was defined previously in most (if not all) of our native projects.  After I did the CMake conversion, the fact that it wasn't defined in the CMakeLists.txt was a bug, not a feature.  That's what I'm trying to fix here.

I realize that it's tempting to assume that code that you write without _GNU_SOURCE defined will automatically be portable.  However, this is *NOT TRUE*.  For example, even without _GNU_SOURCE defined, you still get the non-POSIX definition of strerror_r out of glibc.

The only valid way to make sure your code is portable is to build and test it on multiple platforms.  Any other strategy is just a waste of time.  Defining GNU_SOURCE is similar to setting the correct DOCTYPE in your HTML file.  It tells the browser (or compiler in this case) to turn off "quirks mode" and give you the real deal.

I don't think the CheckFunctionExists stuff in the CMakeLists.txt will work consistently without _GNU_SOURCE defined.

There are better ways to improve our portability.  For example, we should probably have some OpenBSD jenkins build slaves.  But let's not waste our time messing with macros.  It really adds nothing but inconvenience.

bq. If we don't have FILE_OFFSET_BITS=64, then shouldn't we also leave out LARGEFILE_SOURCE?

_LARGEFILE_SOURCE exposes fseeko and ftello.   _FILE_OFFSET_BITS changes the default off_t type to be 64 bits.  Basically _LARGEFILE_SOURCE is something you need to define in addition to _FILE_OFFSET_BITS, but the two things do different things.
                
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8569) CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE

Posted by "Colin Patrick McCabe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Colin Patrick McCabe updated HADOOP-8569:
-----------------------------------------

    Status: Patch Available  (was: Open)
    
> CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
> --------------------------------------------------------
>
>                 Key: HADOOP-8569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8569
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HADOOP-8569.001.patch
>
>
> In the native code, we should define _GNU_SOURCE and _LARGEFILE_SOURCE so that all of the functions on Linux are available.
> _LARGEFILE enables fseeko and ftello; _GNU_SOURCE enables a variety of Linux-specific functions from glibc, including sync_file_range.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira