Let’s paint a picture. You’re a developer working on a system broken down into multiple services. These services do their one thing well and all communicate. However, it’s 3 a.m., and you get a call from your boss telling you something’s wrong. People can’t complete the tasks they’re supposed to complete, and there are errors on the webpages they’re using. You’re the first responder and need to figure out what’s going on.
One of the first things you do is look at various data sources to try to get the lay of the land and see what’s going on. One of these data sources will be the logs each service creates. In this post, let’s explore what makes good logs and how best to get every bit of data you need—the moment you need it—out of these logs.
What Are the Goals of Logging in a Microservice?
When you’re designing your microservice architecture, it’s important to consider how you’re going to get information about how the service is working. One of the easiest and most common ways of doing so is ensuring your microservice is logging information about what’s going on.
The logs should answer a few questions first: the day and time the log message was created, what happened at what point in the process, information about the current state of what was being processed, and a trace ID to help you correlate different log messages.
Logs can be some of the easiest ways to capture information about what the system is doing and why it’s doing it. You can use logs to provide the data to operators to allow them to quickly learn what’s going on inside the system.
Another type of data that can also come out of logs is performance metrics. As certain parts of your system get more complex, you can start adding in metrics detailing how long certain things take. This will help ensure you have data points when trying to track down performance bottlenecks and if systems start to misbehave in ways outside of their normal operating profile.
Having this information gives your logs the data needed to answer questions about what happened when operators need it most. And if your logs have this data, most of the heavy lifting required during an emergency is done because you have the necessary visibility into your system.
When all these things are put together, an operator should be able to see across the entire system. Usually, operators set up dashboards so they can see across the entire system in one view and have all the important information at their fingertips. Dashboards are critical to ensuring visibility is available across the system, so when problems arise, the appropriate action can be taken.
Tracing Through Services
A great way to get visibility across your system is ensuring you can trace messages throughout each piece of the system. This is usually achieved by having some kind of trace ID designed to go through your system and show how each piece of the system interacted with a request.
This is especially important in microservices because there are usually different pieces of the system touching every request. Ensuring you can trace your way through the system as quickly and efficiently as possible is vital to the success of the system. And being able to view this type of data in a centralized system also ensures you can respond when needed.
Another benefit of being able to trace your way through the system is you can add metrics to the system. If you want to know how well certain things are performing, you can start looking at single traces and observing which parts of your system are taking the longest to respond. This can help you tune your system so everything operates at peak performance.
Another piece to help you extract information quickly is structuring your logs. Structuring logs is a way to ensure the message and any form of metadata about the message are tracked inside a single unit. Usually, these are JSON-like blobs of information, but they can be XML if desired.
Structured logs are also far easier to work with in a centralized system. They give meaning to your logs and the data around them. Each field can have a purpose and a definition of what’s expected inside.
Together, tracing messages and structuring logs can help you make a standard that will work across all your microservices and give you visibility into your system when you need it most. Establishing standard formats, libraries, and structures for your logs will go a long way.
Logging Aggregation Services
Now that you have all this information, you need a way to look at it in a single, easy-to-use place. Log aggregation services are designed to take logs from multiple places and put them together so you can view them in one place.
Log aggregators can save a system when operators need to figure out what’s going on in an emergency. These systems help put your logs together in an easily searchable manner. This means you can access information from multiple logs quickly and get the information needed when there’s an issue.
When looking at a log aggregator, you want to make sure it can read the logs you’re producing. Log aggregators are usually focused on one type of cloud provider, a technology stack, or even a process. This means when looking at log aggregators, depending on your technology stack, the number of log aggregators can be limited rather quickly. Another thing to consider is how the log data is being hosted inside the aggregator. Depending on your area of business, you may have to retain logs for compliance reasons. Log aggregators are a great option for log archiving, so ensuring your log aggregator can do this is important.
Even better, most centralized systems have other features as well. They can usually collect other types of data and metrics than just log information. This enables you to tie together these different data sources to give you an even broader view across your system.
An application performance management (APM) solution is usually where the system operators will be spending much of their time. Dashboards, alerting, and other metrics will be used on a daily basis, so ensuring your log aggregation service integrates into an APM system will be critical to the success of your microservice solution.
As you look into APM solutions, verify you can get these kinds of features from the system. An APM solution can make a complex system easier because it gives visibility into what’s going on. It helps lay out which activity is happening where and can point you to things as they go wrong. They can be the guides to whatever part of the system is misbehaving. When looking into these types of systems, make sure they help you get visibility into the system instead of making the system more complex.
Conclusion
Together, these techniques and services can help ensure your system is performing as expected. And in the event of an emergency or problem, determining what happened and how you can fix the issue should easily be at your fingertips. There are many difficult aspects in a microservices environment, but logging doesn’t need to be one of them.
This post was written by Erik Lindblom. Erik has been a full-stack developer for the last 13 years. During this time, he’s tried to understand everything required to deliver high-quality, valuable software. Today, this means using cloud services, microservices techniques, and container technologies. Tomorrow? Well, he’s ready to find out.