|
||||||||
And here's yet another m0n0wall release announcement: pb7r310 is out! And I'm happy to report that the reason for the MPD/DHCPD/etc. crashes has finally been found and fixed! Yes, it's true - read on... First of all, because of all the debugging involved, the only new features are: - Diagnostics: Ping function in webGUI (contributed by Bob Zoller) - WLAN channel auto-select in webGUI (contributed by Bob Zoller) Now, concerning those crashes... I have reported earlier that they were due to a SIGPROF being received by MPD (Note: when I refer to MPD, I also mean any other daemon (e.g. DHCPD) that may have been crashing in earlier versions of m0n0wall). I've been searching for the reason for the SIGPROF in some kernel stack corruption issues that were present in earlier versions of FreeBSD. Unsurprisingly, changing HZ, increasing UPAGES or removing CPU_ELAN did not help. The reason was much more simple! I found that PHP internally calls setitimer(2) with ITIMER_PROF (profiling timer) to enforce the time limits that can be set with max_execution_time and max_input_time in php.ini. If the time limit is exceeded while executing a script, the system sends PHP a SIGPROF (profiling timer alarm). This signal is caught by PHP and tells it to stop executing the script. Unfortunately, these interval timer values are inherited by all processes exec'd by PHP. This means that MPD and all other processes invoked by the m0n0wall boot-time scripts had that interval timer set to 30 seconds (the setting in php.ini). Since ITIMER_PROF decrements in process virtual time and system time on behalf of the process (NOT wallclock time!), this means that the processes received a SIGPROF after having consumed 30 seconds of CPU time. Because those processes do not use signal(3) to catch or ignore SIGPROF, the default action was executed on them, which was to terminate the process. Yikes! Since MPD and DHCPD consume very little CPU time, it was well possible for them to run for several days until they had consumed their 30 seconds and got the deadly SIGPROF. The fix was simply to set max_execution_time=0 in php.ini - thttpd kills CGIs that run for more than 60 seconds anyway (and no, it doesn't use the profiling timer :), so that's no problem. I hope that this change eliminated all sources of instability in m0n0wall - now that the bug is fixed, I can finally sleep well again and concentrate on implementing new features. ;) Enjoy! Manuel |