The Problem With Logging

It’s already been mentioned but its work repeating… You usually don’t need to log synchronously. As long as they are still logged in order and you can handle the possible memory load introduced by a potential backup on the queue, why not log asynchronously?

log4j has a rather handy AsynAppender which you can wire up multiple appenders for it to dispatch to:

http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/AsyncAppender.html

I’m sure log4net has something similar. If not, it should be fairly simple to roll your own.

i don’t understand how you could get a deadlock. if you are only holding the log-lock temporarily then you can’t get deadlock. maybe dotnet is doing some unsafe lock coarsening. ie converting:

lock log
unlock log
lock db
lock log
unlock log
unlock db

to

lock log
lock db
unlock log
unlock db

or more likely log4net code has buggy internal locking and is deadlocking by itself without the assistance of the db locks.

I log info, warnings, and exceptions.

Info is just for various ‘break points’ in a program.

Warnings is when something happened that is accounted for, but that users should be doing (like typing in random query strings)

Exceptions are, exceptions.

I agree with most of the article, however there are critical parts of the code (i.e.: banking transactions or a part of the code known for its instability when dealing with external systems, like CORBA interfaces) that must be logged to the finest detail to allow later debugging.

Something never mentioned is the fact that some of the debugging can be disabled (the all-too-famous isDebugEnabled) to prevent cluttered code.

However, at least some of us use log4j, log4net and such logging frameworks: I’ve witnessed some of the worst no-nos in the industry: using System.out.println() to debug Java code. Of course, it means such a burden to the disk (especially for high-traffic applications) that just by changing those lines by a logger.debug() we can cut the app response times by half (sometimes even more).

It all depends on the application and scenarios to me. If I have a middleware app like some kind of rules engine as opposed to a GUI frontend, logging becomes critical (we found MQSeries logging to be cheap and flexible). Even in a GUI situation, some use cases and problems can be extremely tough to reproduce and logging can be a real lifesaver as to what really happened when user X produced strange behavior Y. Everything in moderation and the right tool for the right job.

I use tracepoints instead of Trace statements in my massive Windows client project. Tracepoints do not affect your code and can be enabled or disabled all at once. Mostly I use them to trace the program execution using the message: Function: $FUNCTION, Caller: $CALLER. But I still use logging for remote web applications.

Logs are not necessarily useful to show in the UI. That depends on who your users are. If Stack Overflow has an error parsing a config file, not a single one of the users will care about that. If Stack Overflow has a problem applying reputation because something is inconsistent, no user wants to see it.

The admins, however, do want to know this stuff, and they have access to the logs. That IS the UI for the logs. Also, because you have only one consumer of your program, you can get away with minimal logging because you can access the production servers and get memory dumps and whatnot. But when you have dozens of clients, and you’re working with a system that requires a certain amount of reliability, logging lots of stuff could be invaluable. For example, we log responses from our ecom providers. This proved to be critical when something went wrong with our database and the responses weren’t recorded properly. We were able to reconstruct the responses from the logs and fix the data. Saved our butts.

Also, it can be extremely helpful to log certain kinds of things under certain circumstances. Our framework allows us to log SQL statements or Command invocations; we turn this off most of the time but it is extremely useful for development. So what if your logging can sometimes deadlock? Turn off that log level, or stop logging for that class (in log4j this is easy) and you’re good to go… developers can keep their logs and production still runs.

To your first point, use AOP to reduce to code clutter, like so: http://ayende.com/Blog/archive/2008/07/31/Logging–the-AOP-way.aspx

Any major project without a Logging system is infinitely more complex to debug.

Saying or teaching anything else is complete bullshit.

Many projects would not have shipped without the logging support.

Perhaps the logging system has bugs, and if it changes behavior, your doing it wrong.

Clearly there’s no shortage of opinions on this topic, but I’d like to chime in.

First, what’s your batch size set at? If you set it to a reasonable level, then it shouldn’t be hitting the database that often. Of course the higher your batch size, the higher your risk of losing log statements (at those would be the ones closest to the error) but I still think they’re valuable.

As for the extra lines of code you can lean on IoC containers to weave in logging aspects to your interface based components… you get that for free. Seriously… there’s no code to log entry/exit of methods. Of course the value of these statements isn’t very high, I’d prefer more granular what is the user doing messages than line level ones. But this is why you have log levels.

I’m not sure that constantly logging everything is a good idea for the reasons you mentioned. However, let’s ignore the code bloat for a second and focus on performance. In many cases you need the detailed log only once you already have a recurring hard to place problem when you would be willing to inspect through all of it.

So it makes sense in those situations to have the code in place, and just have the logging mechanism ignore the calls, and give you the option of throwing a switch to turn it on.

Existing log calls are way better than having to rely on ad-hoc printfs that people would add on a per-need basis.

You can’t always show a meaningful error to the user - like if your program is a server, with no UI.

Having spent many hours dealing with issues of multi-server, large-scale systems, I can state that unless you are the sole developer, you will need to have debugging traces in your code to determine what happened at the customer site, because, as others have noted, the customer will not be able to tell you what was going on, either because of lack of knowledge (server:server issues) or lack of desire (network group has turf wars with ops and changes routing without prior notice)

Unusually, I disagree. When you’re dealing with complex software that parses a huge amount of data, your logs can be the major indicator of problems- sure, you log the exceptions, but if you aren’t logging a fair amount of everything else then you don’t know the precursor to the exception so you don’t get much handle on where the problem comes from and if the software takes hours to run, you really need to find whereabouts the problem comes from.

A little logging overhead and a lot of unnecessary data when it does work is a very low price to pay for the benefits of being able to identify the whereabouts of a problem on a slow run.

I still don’t understand how the logging caused a deadlock here, unless log4net was configured to use a special database appender that in turn used locks that other parts of the application used. In that scenario, just changing the configuration of the logging should have resolved the problem.

I believe that a logging framework that provides the ability to turn turn logging on and off at runtime with some kind of fine granularity (like log4net allows), you can make your log show the information you need without all of the fluff and without impacting performace all of the time. Additionally using AOP for the mundane entry/exit/parameter logging (which I rarely need) that can also be configured at runtime makes for a great performing application that allows thorough troubleshooting if necessary.

Exception logging is all I ever use. Really, do you care about all the successful opperations on your site? Or do you care more about the one or two that fail? Ussually an exception is more than enough information for me to recreate the error and fix it.

Greets Jeff,

I appreciate the nod/reference (ahh shaft, my web page is all weird with the CSS template now…). Here I was getting ready to call in to the SO podcast to get your opinion on just this matter!!! 8^D

I don’t know if it helps or hinders the discussion, but the only thing about the initial post was that I failed to complete the explanation of things. Somebody else had followed up on the issue (http://stackoverflow.com/questions/231903/how-much-to-log-within-an-application-how-much-is-too-much) and I had added the following comment


In the sample I provided, that method logs on a daily basis in WARN mode. Which means that the only thing that gets logged by default is if an exception occurs. If I get a call from one of my clients about having an error in the application, they don’t have to read me some cryptic message on the screen, I jump in the log and can see what’s going on. Most of the time, the answer is right there.

What happens if the answer isn’t readily available? Log4net allows me to update my configuration file (no re-compilation necessary, no need to get access to some special system file on the web server with the sysadmin’s approval) and go into INFO mode. Now you start seeing a second layer of logging. Maybe the code never made it to a certain loop. Maybe the data retrieval had an empty record set. This second level of debugging is helpful, and the log only gets slightly larger. Once this is done, I can change the config again and go back to the light logging.

Naturally if things are REALLY crazy, then I go to the full debug level, and I want to know what each variable is reporting, what DataRows I’m dealing with, and what is going on in the application. At my current place of work, we don’t have the ability to do remote debugging into our web applications, and we can’t always tap into the production database without potentially augmenting data, so having this full debug is the next best thing.

I agree with the majority of folks out there that excessive logging can really bring down an application and cause more problems than it is worth. If wouldn’t recommend this kind of verbose logging in an application either unless the application warranted it for security reasons. However, being able to leverage verbose logging when needed and without having to recompile my code IS a HUGE benefit in my opinion and if you have a framework that can allow for it easily (such as log4net), then I say get nice and verbose and it is easy enough to mentally filter out the log code references if you’re having to go back into the code itself.


I’m not trying to back peddle or anything, I guess my point was that if the logging architecture can leverage things properly so that you can get everything or nothing at the flick of a switch (or update of a config file without a recompile), then it would be beneficial to do so.

I’m going to pile on log4net here: if your logging system uses locks in your threads and has the potential to help you deadlock your application like you described, it’s a piece of crap. Staying out of the way of your application is one of the most important things a logging framework can do. Multiple producers / one consumer is a textbook multiprocessing problem (literally!), it has lockless solutions, and failing to do this in a way that it cannot lead to deadlocks is unforgivable. Isn’t the point of a third party package for this sort of thing, that you expect them to get just this sort of detail right?

Yes, you should blame log4net!

While I agree that logging can be abused, I think your description of what is necessary has swung the pendulum too far the other direction and is too absolute. Saying that you are not anti-logging, just anti-abusive logging is kind of a no sh** statement.

The real problem here, one you did state, is in determining exactly what should be logged and that is subjective and squishy. Your ultimate advice to start small and with the most heinous of problems is good, but a bit abstract.

  1. If it’s worth saving to a logfile, it’s worth showing in the user interface.

I firmly disagree with this. I can think of numerous examples where this isn’t desirable.

  • It is possible that issues are occurring that are undesirable but recoverable that you may want to log so that your development staff can investigate but are not issues the end user needs to be concerned about. For example, if you are creating SQL deadlocks that are causing some of your queries to be killed by the SQL engine and you are able to recover by re-committing the transaction, then you may want to log the issue for developers but the end user doesn’t need to know.

  • Another even bigger issue is that the level of technical granularity that is appropriate at the user-interface level is often insufficient for developers/admins. As you are well aware (at least your book list suggests) you should provide meaningful errors messages to the user in terms that are relevant to their problem domain and that they can understand. However, abstracting the issue to their level can often strip out the information necessary or at least useful to the developers/admins.

Honestly, I don’t miss it at all. I can’t even think of a single time since then that I’d wished I’d had a giant verbose logfile to help me diagnose a problem.

This statement provides a false sense of security to the reader (and even you). StackOverflow is a fresh codebase that is fairly small, written by a very small team of highly-skilled developers and most if not all of the original developers are still around. This means that most of your team knows the intricate details of most of the codebase. If you play with any of those five variables - a larger team, a larger codebase, an older codebase, a loss of original team members/addition of new ones, or less-skilled developers - then you may quickly find that the logs are indispensable. This is especially true in software that is deployed on the desktop or in the customer environment where you can’t just access the running installation. Logging is like a fire extinguisher - you don’t need one until you need one and obtaining one during the problem takes too long. Sure you can add logging in as you need it but as your code base grows and ages and your team changes, this becomes more difficult and time consuming to do. It also can lead to false positives as you see things being logged for the first time that are issues but have really always been there, don’t have to do with the bug at hand, and you are just now noticing.

My $0.02

Another very important thing to remember with logging is to always CHECK THE LOGGING LEVEL YOURSELF before you do any work.

Let’s say you log some message for debug mode. Doing this may seem ok:

log.debug(Some message + someThing.toString());

Sure, if the environment isn’t configured for debug logging, this won’t be logged.

However, all the work of converting the object to a string and combining it with your message and passing it along happens BEFORE the logging function is called! This means that even in your production environment,

You can incur serious performance penalties through excessive logging this way.

The fix is pretty simple:

if (log.isDebugEnabled()) {
log.debug(Some message + someThing.toString());
}

Now, all you are doing is checking a boolean each time, not building a (potentially very large complex) string from an object and THEN checking a boolean.

Quick note TM…the beauty about log4net is that it does the level checking for you. No need to add to extra code 8^D