As users know, Papertrail has offered painless log search alerts for some time. Notify a Campfire chat room, email a team, or kick off a PagerDuty escalation process when something important happens. Get periodic summaries about less-critical issues.
Notifications are inherently binary, though: either an alert fires or it doesn’t. You want to know or you don’t. There’s often a gray area in monitoring, though, where two of something is fine, 20 is not.
Papertrail now lets you decide that gray area. Papertrail search alerts can now include a minimum number of events that must occur (during the alert interval) for the alert to be invoked. The default and existing behavior is a minimum of 1.
What’s possible with a minimum threshold?
- Allow for endemic problems. With some issues, a constant undercurrent occurs all the time. Internet-facing problems like 404s, brute-force attacks, and site scrapers are the obvious examples. I don’t care that they happen, but I do care when they cross above what I consider noise.
- Allow for known issues. If you’re short on RAM, the fact that Linux’s “oom_killer” killed a process once to save RAM probably isn’t severe. If it kills 5, it is. This is particularly handy with code exceptions (known bugs) and services where log output can’t easily be modified.
- Detect when less-critical issues become trends. For example, I’d like to know when more than 10 slow queries occur across all of my database servers. That’s not a slow query, that’s a trend.
- Combine these. Take velocity into account when deciding what the alert should do. When 50 of something happen in a 10 minute period, it’s urgent; tell me in HipChat. Separately, email a summary to my team every day, regardless of volume.
How to use it
Use this in combination with log filtering and “Show related logs‘ links to ignore the events that don’t matter, react to those that appear to be important, and link directly from your admin dashboard to those that you know are relevant.
Perhaps the best part of Papertrail’s search alert threshold is that it comes without a ton more time or effort on your behalf. There’s not a zillion controls because there doesn’t need to be; there’s the one that matters.
As product designers, we always try find the knobs that solve the most problems – expose the most power – with the least, and least visible, complexity. There’s not many better examples of it than this.