Good troubleshooting is a process that combines knowledge, experience, and intuition. As you practice service and support in a work environment, you will add to your experience and develop intuition that will help you to quickly solve a variety of problems.
Regardless of your current troubleshooting abilities, you will benefit from following a systematic approach to problem solving. The following process has proven effective in a variety of situations:
- Identify the problem. Resist the urge to start fixing things at this point.
- Ask the user to describe the problem, check for error messages, or recreate the problem.
- Establish what has changed. Most often, problems are caused by new hardware or software or changes to the configuration. If necessary, carefully ask users to discover what might have changed that could have caused the problem.
- Before making changes to the system, back up user and system data (or make sure a recent backup exists). While some changes can be made without affecting user data, you should back up data to protect against unintentional data loss caused by making changes.
- Identify possible causes and identify a theory of a probable cause. Check for simple, obvious, and common problems first. For example, check power cords, connectors, and common user errors.
- Test your theory to verify the cause of the problem.
- If your theory is not correct, examine other possible causes (return to the previous step).
- At this point, if the problem is caused by simple things such as an unplugged system, you can safely take actions to resolve the problem.
- If the cause is not a simple one, identify the necessary steps to correct the problem.
- If you cannot identify the cause of the problem, or if the problem is beyond your ability or responsibility to fix, escalate the problem. Escalation means turning the problem over to someone more capable of handling the problem. When escalating the problem, be sure to detail the actions you took and the information you have discovered up to this point.
- Create an action plan, addressing the most likely problem and account for side effects of the proposed plan. For example, will the fix result in significant system downtime? Is the resolution best left for other times of the day? Is there a temporary solution that should be implemented immediately? When side effects have been weighed against the fix and all concerns have been addressed, fix the problem.
- Test the result.
- Ensure that the problem is fully resolved and that implementation did not cause any new problems.
- If necessary, take additional actions to prevent the problem from happening again.
- After the problem is fixed, ensure the customer's satisfaction and explain what you did to fix the problem. If possible, have the user perform the task to make sure that they understand and accept that the problem has been resolved.
- Document the solution and process. In the future, you can check your documentation to see what has changed or to help you remember the solution to common problems.
Remember that troubleshooting is a process of both deduction and induction. Experience will show you when deviating from this process can save both time and effort.
Keep in mind the following tips when troubleshooting systems:
- Often the hardest part of troubleshooting is to reproduce the problem. You might need to ask the end user questions to identify exactly how the problem occurred, or you might need to watch them perform the task again to reproduce the problem.
- If a hardware device or a software program causes a specific error, check the manufacturer's Web site for additional help in troubleshooting the error.
- To help diagnose issues, you can run special software tools supplied by the hardware manufacturer.
- In addition to a basic toolkit, you can keep a few spare parts on hand that you know to be in working order. If you suspect that a component has failed, replace it with the known good spare. If that solves the problem, replace the faulty component.
- Intermittent problems are particularly difficult to troubleshoot. Check for environmental conditions such as kinked cables or overheated components.
- If you have problems identifying a hardware error, you can simplify the system by removing all but necessary components (processor, memory, and hard disk). Add devices one at a time and restart the system. If an error occurs, remove the newly added device and troubleshoot that device. Another strategy would be to remove a single device and restart the system, seeing if removing that device corrects the problem.
- Some problems might be caused by software errors, not hardware failures. You might need to begin by updating the drivers or unloading software.