View Single Post
Old 09-07-10, 07:30 AM   #1482
tvt_addict
Swabbie
 
Join Date: Nov 2009
Location: Bristol UK
Posts: 12
Downloads: 0
Uploads: 0
Default

Just in case anyone else hits problems with screen freezes and blackouts, there is a reason why this happens. The following was extracted from the Nvidia forum, but as you can see this applies to ATI cards as well. Having a low fps seems to be a liability!

Rob G

***************

Just as a 'disclaimer' here, this thread is not for everyone to post their problems on... The purpose of this thread is to try and help you, but also to prevent multiple topics on the same subject. Lots of people have seen these errors, so hopefully this should help you understand exactly what it is before you post... I have seen and responded to a lot of TDR related topics now where people have not made the effort to do any prior searching... I class myself as guilty here also! The generic error people visit this forum for is:

'Display driver nvlddmkm stopped responding and was recovered.'

Also seen as:
'Display driver atikmdag stopped responding and was recovered.' (ATI cards)
'Display driver xxxxxxxx stopped responding and was recovered.' (others)

Also noted as nvlddmkm.sys, atikmdag.sys, and xxxxxxxx.sys bug-check/BSOD.

As a first off, this is not an nVidia issue. It is not an ATI issue either. These errors are triggered by a Windows service called 'Timeout Detection and Recovery' (TDR). You will only see this error on Windows Vista and Windows 7, as TDR is a feature of the new WDDM driver model (implemented first in Vista). Its meant to be there to help stop BSOD's by resetting the GPU and/or driver when there is an issue. If the problem happens multiple times in a row, it can produce a BSOD.

The problem is normally perfectly solvable, but it can take some troubleshooting to do so. I personally have seen this issue on two separate nVidia builds, and an Intel onboard GPU.

How does TDR work?
Timeout Detection and Recovery
Windows Vista attempts to detect (these) problematic hang situations and recover a responsive desktop dynamically. In this process, the Windows Display Driver Model (WDDM) driver is reinitialized and the GPU is reset. No reboot is necessary, which greatly enhances the user experience. The only visible artifact from the hang detection to the recovery is a screen flicker, which results from resetting some portions of the graphics stack, causing a screen redraw. Some older Microsoft DirectX applications may render to a black screen at the end of this recovery. The end user would have to restart these applications.

The following is a brief overview of the TDR process:

1. Timeout detection:
The Video Scheduler component of the Windows Vista graphics stack detects that the GPU is taking more than the permitted quantum time to execute the particular task and tries to preempt this particular task. The preempt operation has a "wait" timeout—the actual "TDR timeout." This step is thus the "timeout detection" phase of the process. The default timeout period in Windows Vista is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout, then the GPU is diagnosed as hung.

2. Preparation for recovery:
The operating system informs the WDDM driver that a timeout has been detected and it must reset the GPU. The driver is told to stop accessing memory and should not access hardware after this time. The operating system and the WDDM driver collect hardware and other state information that could be useful for post-mortem diagnosis.

3. Desktop recovery:
The operating system resets the appropriate state of the graphics stack. The Video Memory Manager component of the graphics stack purges all allocations from video memory. The WDDM driver resets the GPU hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As mentioned earlier, some older DirectX applications may now render just black, and the user may be required to restart these applications. Well-written DirectX 9Ex and DirectX 10 applications that handle "Device Remove" continue to work correctly. The application must release and then recreate its Microsoft Direct3D device and all of its objects. DirectX application programmers can find more information in the Windows SDK.

The things that should be done to avoid it by our hardware/software providers:
Graphics hardware vendors:
• Ensure that graphics operations (that is, DMA buffer completion) take no more than 2 seconds in end-user scenarios such as productivity and gameplay.

Graphics software vendors:
• Ensure that the DirectX graphics application does not run at a low frames per second (FPS) rate. As the FPS decreases, the likelihood of the GPU getting reset increases. If the application is running at 10 FPS or lower and a complex graphics operation is about to start, then a flush can be inserted.
• For running benchmark tests on low-end GPUs, use the aforementioned registry keys that control the TDR timeout. Remember that they should not be used in production systems because it would affect overall system stability and robustness. Use these keys only as a final solution.

System manufacturers:
• Work with the graphics hardware vendor to diagnose the TDR debug reports.
• Remember that any system that uses the aforementioned TDR registry keys to change the default values is a Windows Logo Program violation.

Windows SP1 improved the accuracy, and reduced the frequency of these events:
Windows Vista SP1 Update
Minor changes were made in Windows Vista SP1 to improve the user experience in cases of frequent and rapidly occurring GPU hangs. Repetitive GPU hangs indicate that the graphics hardware has not recovered successfully. In these instances, the system must be shut down and restarted to fully reset the graphics hardware. If the operating system detects that six or more GPU hangs and subsequent recoveries occur within 1 minute, then the following GPU hang is treated as a system bug check.

It is possible to disable TDR, but it normally happens for a reason, therefore is not recommended nor supported. For more of the quoted jargon, see HERE.

Now you know what exactly the error is, you probably want to stop it happening. Now I wish there was a one-stop fix that I could recommend, but unfortunately, TDR events can be caused by many different problems.

Common issues that can cause a TDR:
- Bad memory
- Insufficient/problematic PSU
- Corrupt driver install
- Overheating
- Unstable overclocks (GPU or CPU)
- Incorrect MB voltages (generally NB/SB)
- Faulty graphics card
- Your asking too much of your graphics card. Not one that many people like to hear, but as the blurb from Microsoft states, if your game falls below a certain FPS and something graphically complex occurs, it could trigger a TDR.
- The issue can potentially be caused by a badly written driver or piece of software, but this is an unlikely cause in most cases.

Things to check first:
(Ideally, before you post a topic on a TDR problem, it would be useful to have completed the majority of these to ensure certain things can be ruled out.)
- Run memtest (memtest.org). This should complete with NO errors.
- Check your PSU ratings. Is it providing enough power, and most importantly enough Amps on the 12V rail.
- Check temperatures. Its important you check these at load, which is generally when a TDR event will occur. Everest Ultimate Edition is a good tool for this. If things are too hot, you can use tools such as EVGA Precision to increase GPU fan speeds on graphics cards. Cleaning your system of dust can help temperatures significantly. Common sense will normally tell you if something is too hot, but if you aren't sure, the information is generally available online.
- Test with stock clocks. This includes memory, CPU and GPU (even factory OC'd cards). Best to try each separately so you can be sure if one solves the issue.
- If you are using SLI, try each card separately to see if the fault lies with one.
- Try graphics card/cards in another computer if you can.
- Check for newer driver version.
- Check for patches to the specific game with issues.

Programs to use for stress testing CPU:
- Prime95 (would advise running for at least a few hours).
- Intel Burntest (warning, can cause significant heat!)

Programs to use for stress testing GPU:
- Furmark
- 3DMark Vantage
- Crysis!
tvt_addict is offline   Reply With Quote