Whether you’re working in maintenance, in testing or in support, as an IT consultant you will often need to analyse errors. Here’s a few things to keep in mind regarding error analysis that I’ve learned from my own experience:
I) What is the priority?
When an error occurs, the first thing you have to establish is it’s priority. Since most of the time there are lots of errors waiting to be analysed and you can’t do it all at the same time, you have to establish an order in which you solve them. When making this decision you have to take into consideration the following factors:
The environment – If the error occurs in the Production environment for example then it must be treated before an error that occurs in one of the testing environments. That is because the enterprise loses money for everything that does not go right in the Production environment. Technically every problem was supposed to be fixed by the time it got past the testing environments.
The severity – When you find an error, you must weigh the impact it has on the system as a whole. If for instance every other part of the system depends on solving that one error then it must be treated as soon as possible, and you should leave whatever you were already working on for another time.
The complexity – If you find that solving an error thoroughly will take too much time, sometimes it is better to find a workaround to patch things up temporarily, and deal with the problem later when all of the urgent stuff is out of the way.
II) What is the overall situation?
If you encounter an error, you have to make sure that you are first aware of the overall situation in order to avoid wasting precious time on something that could have been obvious from the very beginning. Sometimes a problem cannot be fixed, for the simple reason that the error that caused it is still in the process of being fixed. For example, you can’t run a process, no matter how hard you try if the server is down. Taking a bit of time to make a list of all the problems and their consequences may save you a lot of time in the long run.
III) How far do you have to analyse?
The second thing you have to keep in mind when you start analysing an error is how far you are in fact supposed to analyse by yourself, and when it is time to give your analysis to someone more specialised. If you manage to find the exact cause of the problem, but you have to understand the code to be able to provide a solution, you might be tempted to read the code for yourself if you’ve done coding in that language before. If your job is only to identify the cause of the error however, it is not a good idea to start reading the actual code, not because you won’t find the solution eventually, but because there are other people who actually worked with those scripts and will reach a conclusion much faster.
IV) Will the problem repeat itself?
When we solve an error we are often tempted to find an immediate fix to the problem and not investigate further. Even if investigating the actual cause of an error is most of the time lowest on the priority list when a temporary fix has already been found, you must not forget that there is a possibility the problem will arise again in the future unless you find out the true root of it’s cause.
V) What were the steps of the investigation?
Even if the cause and the solution to the error you’ve encountered may seem easy to understand to you after hours spent with the investigation, there will most likely be other people you will need to report your results to, that did not spend any time whatsoever in analysing that specific error. It is very important to see all of the steps undergone to reach your conclusion clearly in your head, so that you can explain the hours long process of analysing the error in a matter of minutes.