Monitoring application and Linux system logs is a skill that every seasoned SysAdmin has down cold. Logs provide a window into understanding the health of your systems, and they’re the first place to look when things aren’t working. But no matter how familiar you are with Linux log monitoring, even gurus of the command line can learn new tricks. Whether you’re an old hand or a relative newcomer, here are seven tips on how to monitor log files in Linux that you may have overlooked.
1. Check Your Log File Permissions
It’s common practice to run daemons as non-root users to prevent the class of security issue known as privilege escalation. Occasionally, however, this practice can cause issues with log permissions when the daemon’s user ID cannot access its own logs. To fix this misstep, make sure the user running the daemon has permission to access all logs. And it’s not just permissions to access individual files that logging tools need—often they also need access to write to the directories containing those logs when creating new files.
2. Log Rotation Can Cause Zero-filled Files
Log rotation prevents log files from growing too large and keeps their size manageable. Tools such as logrotate can automatically make copies of logs once they reach a specific size, and either create a new log file or truncate the existing one. However, some applications do not handle truncation properly.
If you configure logrotate to truncate log files with the copytruncate directive, applications writing to that file can end up filling it with zeros. This is due to the application not opening the log file in append mode, which means that when it writes to the truncated log file, it continues writing at the last file position instead of starting from the beginning.
Thankfully, you don’t need to modify the application to work around this. If log messages are written to stdout, you can use I/O redirection to append to the log file with the “>>” operator, like so:
$ app01 -debug >> debug.log
3. Increase inotify Watches for Linux Logs
The inotify API is used for real-time log monitoring in Linux, and allows applications to build watch lists that detect when specific events (watches) occur for itemized files and directories. A number of familiar tools, like tail and its -f option, use this API to continuously output the contents of files as they grow. As a result, if you’re monitoring lots of files on Linux, you may run into the following message:
tail: cannot watch ‘/var/log/messages’: No space left on device
The inotify API has an adjustable upper limit on the maximum number of watches per user. To stop seeing the above message, you can increase the maximum number of watches allowed for each user by writing a new value to /proc/sys/fs/inotify/max_user_watches.
4. Direct Cron Job Output to the System Logger
Cron jobs are used heavily for running administrative tasks in the background. Because they don’t execute interactively, their output is mailed to the user running the job and not recorded in the usual Linux system logs. That practice means that when jobs fail, it can be difficult to debug what happened if you aren’t using a feature such as Papertrail inactivity alerts, which trigger when an event fails to run.
Instead of losing helpful debugging logs, you can pipe each job’s stdout to the logger program, which writes to the system log:
*/5 * * * * /usr/local/bin/nightly-job | /usr/bin/logger -t nightly-job
This entry writes everything to the standard system log location and gives you one less place to dig for messages.
5. Rate Limit Logging
While it might seem that more is better, it’s possible to log too much information. Applications that write heavily to log files can create a mess of uninteresting data at best, and a denial of service attack at worst by exhausting disk space and consuming system resources. Even if you don’t have access to the errant application, there are ways to protect your systems.
Rate limiting your log messages places a strict limit on the number of messages delivered within a time period. Any messages delivered before the period elapses will be dropped if the maximum number of messages has already been reached. This feature is supported by systemd and rsyslog, as well as log management tools, such as Papertrail support for syslog rate limits.
On the other hand, if you want to confine rate limiting to one or more applications instead of applying it to the entire system, you can use the setlogmask(3) library function to set the log mask priority and control which calls to syslog(3) will be logged. Any message written with syslog(3) at a priority not included in the log priority mask will simply be ignored.
6. Don’t Forget About Hardware Issues
Sometimes the issue you encounter isn’t caused by a misbehaving application or runtime oddity—sometimes the real problem occurred immediately after you turned your machine on. Boot-time issues are recorded in /var/log/boot.log. By searching this file, you can often find clues that explain seemingly unrelated problems that manifested much later.
This is particularly true of hardware malfunctions such as memory corruption or disk controller failures. As the Linux kernel boots, it prints hardware status information, which will tell you if it failed to properly initialize a device.
Feb 3 00:16:46 prodhost kernel: [107260.385338] ata1.00: configured for UDMA/100
Feb 3 00:16:46 prodhost kernel: [107260.385363] sd 0:0:0:0: [sda] Unhandled sense code
Feb 3 00:16:46 prodhost kernel: [107260.385367] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Feb 3 00:16:46 prodhost kernel: [107260.385373] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
These kinds of boot messages are usually the first warning sign that you’re going to see issues in production. If you’re unable to find the root cause of an issue, make sure you validate the hardware by checking the system boot log.
7. Tune Syslog Performance
By default, syslog ensures that all messages are written to log files on disk as soon as new messages arrive. This behavior can slow down your application if it’s writing a lot of messages, since each call to syslog(3) blocks the application until the message has been safely written. This is known as synchronous logging.
You can optimize the way syslog writes log messages by using asynchronous logging and prepending a “–” before your log file in the syslogd configuration file. The major caveat is that if your system crashes before any outstanding writes are flushed to disk, you can lose important log messages, making debugging more difficult.
Use Linux Server Logs Creatively
While accurate and informative logs are the foundation of a strong monitoring strategy, knowing how to use them to get results is the real secret. It’s easy to get stuck using the same old techniques for monitoring Linux server logs to keep applications and systems running smoothly. But debugging complex issues requires thinking outside the box and using logs creatively. By paying attention to the seven items covered in this post, you’ll have plenty of ideas to analyze your logs and home in on the underlying problem the next time something goes wrong.