The Problem With Logging

Everything’s already been said, really – you might not have access to the machine where the code runs, you might not have the sources for the stuff that broke. The stuff that broke might not even know it’s broken… I had to troubleshoot a SAML (single sign on) problem once where the users where getting sporadic errors while trying to SSO to our partner site. Everything looked fine on our end, and only by carefully comparing our logs with the partner’s we were able to figure out that the server’s time was out of sync far enough for some of the assertions to start becoming invalid. This troubleshooting would not be possible if the application did not generate couple gigs worth of really detailed logs every day.

Basically, I agree with people who are saying that too much logging is better then too little. (And yes, you can obviously overdo it and kill your performance, or write tons of useless stuff, or screw up and introduce deadlocks, etc.)

Logging is hard, let’s go shopping!

If it’s worth saving to a logfile, it’s worth showing in the user interface

I’m not sure you want to expose your mistakes or possible lacks of security to an experimented user (stackoverflow has a lot of them) or show technical information for common users, it’s always better to show a friendly message.

Logfile has a purpouse, to help you clean your systems of undesireble bugs and/or potencial ones…what do you think?

Logging is used for so much different things that it has lost it’s meaning over time.

First of all, don’t use log levels. Murphy’s law states that you always need that DEBUG output when it’s disabled and they’re always wrong anyways. The best is to use different methods for different purposes : trace, should_not_happen, user_notify are all different things.

Also, don’t marshal your data to an unstructured text representation like syslog. You’re never sure if you missed something when you have like 3 lines of regexps.

Logging is not mainly for tracing where your process goes, but for where it was when something went wrong.

Trace begginig
Trace exceptions
Trace the first time a process begun
Trace when it failed
Trace if it got recovered or just when you know it will not be recovered
Trace end

Treat logging like this familiar phrase: Take only what you need to survive.

Don’t be the Druish princess that says It’s my industrial-strength hair dryer… and I can’t live without it. :slight_smile:

There is a third way, as well. It is possible to set up your logging system so that you can turn off certain log levels through config. Thus, you can log as much at the debug level as much as you like if you can simply flip a switch and turn it off in production. If you want to diagnose a specific prod error, try flipping it back on for a time and searching for statements that you don’t need.

To avoid code clutter, you can build a declarative logging model for method entrance / exits / params / return values using annotation / attributes (depending on your language), coupled with the previous ability to turn those messages on and off via config.

Having developed many an app for custom hardware, often that’s fielded in industrial applications with no UI to speak of, logging is the only way to get state information for problems durign an actual production run.

That said, I agree that developers usually get way too deep into logging for their own good. Spartan logging of critical pieces is useful, but you don’t need to log everything. Typically we add it for exceptions and when there is a problem we add more incrementally as we search for the issue, then remove it once the problem is identified and resolved.

@Sean Patterson

It does the logging checking for you… after the method is invoked. All the parameters to the method are built BEFORE that check.

So the string building is still done.

@TM

True true, it does construct the string, so I guess that ultimately depends on how efficient you need the entire process to run. From the log4net FAQ:


What is the fastest way of (not) logging?

For some logger log, writing,

log.Debug(Entry number: + i + is + entry[i]);

incurs the cost of constructing the message parameter, that is converting both integer i and entry[i] to a string, and concatenating intermediate strings. This, regardless of whether the message will be logged or not.

If you are worried about speed, then write

if(log.IsDebugEnabled)
{
log.Debug(Entry number: + i + is + entry[i]);
}

This way you will not incur the cost of parameter construction if debugging is disabled for logger log. On the other hand, if the logger is debug enabled, you will incur the cost of evaluating whether the logger is enabled or not, twice: once in IsDebugEnabled and once in Debug. This is an insignificant overhead since evaluating a logger takes less than 1% of the time it takes to actually log a statement.


We spent days troubleshooting these deadlocks by … wait for it …
adding more logging! Which naturally made the problem worse and even
harder to figure out. We eventually were forced to take memory dumps
and use dump analysis tools. With the generous assistance of Greg
Varveris, we were finally able to identify the culprit: our logging
strategy. How ironic. And I mean real irony, not the fake Alanis
Morrissette kind.

GAAAAAAAAAAAAAAAAAAAGH!

For the love of god… THAT IS NOT IRONIC! Don’t you even watch Futurama?!? That is a COINCIDENCE, not IRONY.

What would be ironic is if adding more logging FIXED your problem, not CAUSED your problem.

Logging isn’t free, but it’s pretty cheap if designed correctly.

I have a hard time believing you’d actually write a diagnostic logging program that would hit the disk for EVERY line. There’s this new concept called caching to memory you ought to try.

I write all my diagnostic logs to memory and only dump them to disk when I explicitly call the diagnostics.logdump([FileName]) function All lines are counted, of course, so I can also force a dump at X number of lines or on certain fail conditions.

Jeff, you might actually want to consider designing your event logs before implementing them. Diagnostics are no more an ad-hoc programming problem than anything else.

I used to log a lot, but have since moved to using Elmah (thanks for turning me on to it, Jeff). Since then, I get all the info I needed, and none that I don’t. Unhandled exceptions are logged straight to a SQL Server table, and can be viewed in a nicely formatted page which includes stack trace info.

Thank you very much for that posting. It was the right thing at the right time. It’s not, that logging really was my problem, but this article put me on the right lead. Thanks!!!

Not to be too nitpicky, but your irony is exactly the fake Alanis Morissette kind. To quote Bender: The use of words expressing something other than their literal intention. Now that IS irony!

Bottom line is it’s not that simple, Jeff. Write in C and use macros if you can’t find a good logging class :slight_smile:

As others have said there’s a difference between trace logging for development (which is what you have with INFO / DEBUG) and the sorts of errors you may want on customer systems such as your ERROR / FATAL. Trace logging really shouldn’t be exposed via the UI, for example.

Writing logging code is even easy and quick. I don’t know any good programmers who are limited by their typing speed rather than their brain speed in producing good code – if you have rules for when INFO / DEBUG logging needs to be done then it happens as second nature when your code calls for it.

I’d be interested to hear Joel’s view on the next podcast…

What about applications with no user interface?

I write a lot of Windows Services. Logging is critical. You don’t have to fill up gigabytes of storage. We have a config setting that allows you to set the level (debug, info, warning, error, fatal). Normally only Error and Fatal are logged. However, for particularly nasty bugs you can turn on Debug and see everything.

Also, there are numerous strategies for controlling the size of log files: create new daily, create new hourly, create new by size, etc.

If writing a log row creates a potentially contentious lock, your logging API sucks.

I completely agree with the confusion of excessive logging. I have walked through log files so huge that, I had to use file splitter to load it in editor. Also programmers following different definitions of logging levels is havoc.

But the amount of logging at every level and specifically in log files, depends on the kind of application. I have written an data migration tools transferring 70 million records. In such scenarios, its extremely important to log files indicating which data was used and at what level it failed. Without these details in logs, we would have to spend lot of time rerunning the scenarios.

Also, though I agree with logging polluting business code, AOP eases up decoupling most of the logging code from business code though not all.

Hey Jeff, enough phoning it in, get serious. This is a post with exactly zero content. How do I qualify that? The snippet below would be equally correct for just about anything:

So is [insert feature here] a giant waste of time? I’m sure some people will read about this far and draw that conclusion, no matter what else I write. I am not anti-[insert feature here]. I am anti-abusive-[insert feature here]. Like any other tool in your toolkit, when used properly and appropriately, it can help you create better programs. The problem with [insert feature here] isn’t [insert feature here], per se – it’s the seductive OCD [insert what feature does] trap that programmers fall into when implementing [insert feature here]. [insert feature here] gets a bad name because it’s so often abused.