You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/04/01 20:41:43 UTC

[GitHub] [trafficserver] pbchou opened a new issue #7669: Issue with memory mapped GeoIP DB files in Linux?

pbchou opened a new issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669


   Hi. I would appreciate any second opinions on this issue. I noticed that the current MaxMind enabled plugins (geoip_acl, header_rewrite, and maxmind_acl) all use the memory mapped method to open GeoIP DB files using the MaxMind libraries. The geoip_acl and header_rewrite plugins use the old geopi-api-c that allows either loading the DB into memory (GEOIP_MEMORY_CACHE) or using a memory map (GEOIP_MMAP_CACHE). The newer maxmind_acl plugin uses the newer libmaxminddb that only allows the memory map mode (MMDB_MODE_MMAP).
   
   The potential issue is whether over-writing the DB files on the disk is visible to the MaxMind libraries. My testing on Linux says that it is although the Linux man page says that the behavior is unspecified --
   
   <pre>
    MAP_PRIVATE
                 Create a private copy-on-write mapping.  Updates to the mapping are not visible to other processes  mapping  the
                 same  file,  and  are not carried through to the underlying file.  It is unspecified whether changes made to the
                 file after the mmap() call are visible in the mapped region.
   </pre>
   
   Since changes are visible -- this can interfere with any in-progress look-ups when the over-write happens. In addition, if the newer DB file is larger than the older DB file, then pointers to areas of the file extending beyond the original file size may be considered by the MaxMind libraries to be corrupted DB pointers. I think the pointer vs size checking would prevent any segmentation faults at least. So in general, you would need to close and re-open the DB files if they are over-written on disk.
   
   So should we --
   * consider a documentation update for this issue?
   * consider loading the DB files into memory?
   * is this an issue on other platforms?
   
   Test program I used on Linux --
   <pre>
   1. Create data file of "0" of 4096 x 5 size.
   
   while true ; do echo -n "0" ; done | dd if=/dev/stdin of=test-0-05.dat iflag=fullblock bs=4096 count=5
   
   2. Create date file of "1" of 4096 x 10 size.
   
   while true ; do echo -n "1" ; done | dd if=/dev/stdin of=test-1-10.dat iflag=fullblock bs=4096 count=10
   
   3. Test with mmap() with PROT_READ MAP_PRIVATE ( used in geoip-api-c ) --
   
   System Page Size: 4096
   Trial 00000:
   Reached page:  0 address:      0 first-byte: 0
   Reached page:  1 address:   4096 first-byte: 0
   Reached page:  2 address:   8192 first-byte: 0
   Reached page:  3 address:  12288 first-byte: 0
   Reached page:  4 address:  16384 first-byte: 0
   Length = 20480 last-byte-of-file = 0
   Trial 00001:
   Reached page:  0 address:      0 first-byte: 0
   Reached page:  1 address:   4096 first-byte: 0
   Reached page:  2 address:   8192 first-byte: 0
   Reached page:  3 address:  12288 first-byte: 0
   Reached page:  4 address:  16384 first-byte: 0
   Length = 20480 last-byte-of-file = 0
   --- replacing test.dat file : 0-file to 1-file ---
   Trial 00002:
   Reached page:  0 address:      0 first-byte: 1
   Reached page:  1 address:   4096 first-byte: 1
   Reached page:  2 address:   8192 first-byte: 1
   Reached page:  3 address:  12288 first-byte: 1
   Reached page:  4 address:  16384 first-byte: 1
   Reached page:  5 address:  20480 first-byte: 
   Reached page:  6 address:  24576 first-byte: h
   Reached page:  7 address:  28672 first-byte: �
   Segmentation fault (core dumped)
   
   4. Test with mmap() with PROT_READ MAP_SHARED ( used in libmaxminddb ) --
   
   System Page Size: 4096
   Trial 00000:
   Reached page:  0 address:      0 first-byte: 0
   Reached page:  1 address:   4096 first-byte: 0
   Reached page:  2 address:   8192 first-byte: 0
   Reached page:  3 address:  12288 first-byte: 0
   Reached page:  4 address:  16384 first-byte: 0
   Length = 20480 last-byte-of-file = 0
   Trial 00001:
   Reached page:  0 address:      0 first-byte: 0
   Reached page:  1 address:   4096 first-byte: 0
   Reached page:  2 address:   8192 first-byte: 0
   Reached page:  3 address:  12288 first-byte: 0
   Reached page:  4 address:  16384 first-byte: 0
   Length = 20480 last-byte-of-file = 0
   --- replacing test.dat file : 0-file to 1-file ---
   Trial 00002:
   Reached page:  0 address:      0 first-byte: 1
   Reached page:  1 address:   4096 first-byte: 1
   Reached page:  2 address:   8192 first-byte: 1
   Reached page:  3 address:  12288 first-byte: 1
   Reached page:  4 address:  16384 first-byte: 1
   Reached page:  5 address:  20480 first-byte: 
   Reached page:  6 address:  24576 first-byte: h
   Reached page:  7 address:  28672 first-byte: �
   Segmentation fault (core dumped)
   
   5. Test Program Listing --
   
   #include <fcntl.h>
   #include <stdio.h>
   #include <stdlib.h>
   #include <sys/mman.h>
   #include <unistd.h>
   
   int main(int argc, char *argv[]) {
     char *addr;
     FILE *fd;
     int length;
     int x;
     int count = 0;
     int chksum = 0;
   
     system("cp -f test-0-05.dat test.dat");
     length = 4096 * 5;
   
     printf("System Page Size: %ld\n", sysconf(_SC_PAGE_SIZE));
   
     fd = fopen("test.dat", "rb");
     addr = mmap(NULL, length, PROT_READ, MAP_SHARED, fileno(fd), 0);
   
     while (1) {
       if (count == 2) {
         system("cp -f test-1-10.dat test.dat");
         printf("--- replacing test.dat file : 0-file to 1-file ---\n");
         length = 4096 * 10;
         sleep(2);
       }
       printf("Trial %05d:\n", count);
       for (x = 0; x < length; x++) {
         if (x % 4096 == 0) {
           printf("Reached page: %2d address: %6d first-byte: %c\n",
                  (int)(x / 4096), x, addr[x]);
         }
       }
       printf("Length = %d last-byte-of-file = %c\n", length, addr[length - 1]);
       sleep(2);
       count++;
     }
   
     return 0;
   }
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 edited a comment on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
ezelkow1 edited a comment on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-812227904


   Agreed, it does seem like an issue, I had not had a chance to test it with the new maxmind_acl one since we havent put it into production yet. That one will close existing opened mmdb's and reload but only on restart or on remap reload when the plugin reloading makes a new instance, so I can see corruption being an issue
   
   At least some warnings in docs should be added that if using a new mmdb/geoip file then the config should be changed and you should do some sort of swapping of names between the dbs in order to not touch one that is in use until if/when a solution can be done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-812227904


   Agreed, it does seem like an issue, I had not had a chance to test it with the new maxmind_acl one since we havent put it into production yet. That one will close existing opened mmdb's and reload but only on restart or on remap reload when the plugin reloading makes a new instance, so I can see corruption being an issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-818310757


   opened an issue with mmdb as well, since it just seems like good functionality to have, https://github.com/maxmind/libmaxminddb/issues/256


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall closed issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
bryancall closed issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 edited a comment on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
ezelkow1 edited a comment on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-818323914


   @pbchou can you try your test with a method as outlined here, https://github.com/maxmind/geoipupdate/blob/main/pkg/geoipupdate/database/local_file_writer.go#L134
   
   Its in golang but you should be able to do something similar in C, i.e. write new db to a temp file, rename it to the existing DB, and then issue a sync call to atomically write the changes? This is what is recommended from the mmdb devs and what their updater tool does


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] jrushford commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
jrushford commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-819885995


   Before the database is opened with MMDB_open(), why not first copy the database file  as possibly a hidden file and then MMDB_open() and get a file handle to the copy.  Any changes to the configured database file would not affect ATS but, the plugin could notice the change to the configured filename and make another copy, use MMDB_open() to get a 2nd file handle to the new copy.  When it is determined safe to do so, swap the the active file handle with the new file handle and MMDB_close the old file handle.  Maybe using a smart pointer to the active file handle or a mutex in order to swap file handles.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-885799828


   So some feedback from us now that we've started deploying this plugin. Yes doing a copy is an issue, but doing the `mv` should be atomic. In my testing while using the maxmind plugin, doing a move of a temporary file on top of the existing one allowed ATS to continue using its old copy until a reload is issued and the plugin is reloaded and picks up the new file from disk
   
   Thats at least to how it relates to the ats maxmind plugin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] oschwald commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
oschwald commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-819550435


   On unix systems, when you atomically replace a file (or use `rm` for that matter), the file is merely unlinked. The space associated with the file is not freed until all open file handles are closed. There is no need to keep it with a dummy name. See `rename(2) and `unlink(2)`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] pbchou commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
pbchou commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-818961271


   The test resets the length in case the new DB is larger than the old one, and there are pointers to the regions of the file which are larger than the old file. It turns out the look-up function in the MaxMind libraries do have a pointer vs size check so it should result in a return of "corrupted database" rather than causing a segmentation fault. I assume this mv-vs-cp method works because the mmap is anchored to the disk contents rather than the file. Is there an issue with possible over-write of the disk contents after the disk sectors are freed when the original DB is replaced (perhaps old file needs to be kept with a dummy *.old file name)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-818323914


   @pbchou can you try your test with a method as outlined here, https://github.com/maxmind/geoipupdate/blob/main/pkg/geoipupdate/database/local_file_writer.go
   
   Its in golang but you should be able to do something similar in C, i.e. write new db to a temp file, rename it to the existing DB, and then issue a sync call to atomically write the changes? This is what is recommended from the mmdb devs and what their updater tool does


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] randall commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
randall commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-818841745


   If you change the test to not reset the length (when count == 2) and use a mv instead of a copy, things run along fine (still reading 0s). I'm not sure changing the length there is a valid state since mmap is with the original length.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] oschwald edited a comment on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
oschwald edited a comment on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-819550435


   On unix systems, when you atomically replace a file (or use `rm` for that matter), the file is merely unlinked. The space associated with the file is not freed until all open file handles are closed. There is no need to keep it with a dummy name. See `rename(2)` and `unlink(2)`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] bryancall commented on issue #7669: Issue with memory mapped GeoIP DB files in Linux?

Posted by GitBox <gi...@apache.org>.
bryancall commented on issue #7669:
URL: https://github.com/apache/trafficserver/issues/7669#issuecomment-1061292733


   If this is still an issue please reopen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org