Understanding User and Kernel Mode

Vista still uses a two-ring user/supervisor split, and so will operating systems yet to come. Virtualization software like vmware have used the other rings because virtualization is tricky, but with the VMX (Vanderpool/Pacifica) instruction sets, my bet is ring 1 and 2 will be used even less.

Btw, I sent the vcblog team a mail asking about the exception handling stuff, and they said my questions might be answered here (haven’t had time to watch yet, though): http://channel9.msdn.com/Showpost.aspx?postid=343189

Jeff,i’ve been learning about the NT kernel which Microsoft uses for almost all its OS’s and found out that the kernel was originally designed for a two-ring system (the kernel ring and the application ring) thus the other two rings were practically left idle.I would like to know if this two-ring system is still being used i.e with Vista.If so ,how will the other two rings when implemented improve system performance (thinking of a well structured preemptive kernel).Please fill me in. :slight_smile:

@fodder
Thanks for the info,checking out the link.

Hello, I was wondering if someone could help point me in the right direction… (if there is one) I have a function that needs to test how many clock cycles it takes to execute a function. (I am using c# but can use anything else.) … are any of the below possible…

  1. disable interrupts on a system. (this does not seem possible with xp/vista) (I understand that this is not safe.)
  2. If i could get the OS to give me a guaranteed full time slice. If the function is small it can be timed without interruption or if the function where larger then different parts of the function could be run on different time-slices and then the times can be added.
  3. If I could somehow ketch the CPU counter when it is preempted. I would then have different lengths that I could add it up.

Ryan: forget about getting anything really exact when running under traditional operating systems (that includes linux, too). Forget about turning off interrupts. Forget about any “guarantees”.

Basically, the best you can do is boosting your thread priority to your OS equivalent of “realtime” (requires admin/root), giving away the reminder of you current timeslice, and then running your code… usually, for routines that take less than a second, you’re going to run the routine N times, and either do an average, or record min/max/avg times… or even recording all times, and reporting mean time as well.

If you want anything more accurate than that, get a profiler (AMD CodeAnalyst or intel VTune).

Oh, and remember to “burn some rubber” before doing the test, so you’re sure the CPU is not in reduced-speed power saving mode.

Hi F0dder, Thank you for your post. That is what i have come up with so far. I run the code multiple times and then i take the most often occurring tick-count(the fastest ones) and then average those out. 90% of the time it is with-in a few clock cycles (that is all i need). The only drawback to this is that it need to be run several times.
BTW here is what i do in a nutshell…

Call the code 4 or 5 times. (to make sure its in cache)
Loop X times
call a sleep(0) (to get to a beginning of the timeslice)
start timer
call the code
end timer
Find the most often (and fastest) occurring clocks

I also set the system to “background processes” so that the OS creates longer timeslices. (helps a little bit)

Thanks for the VTune tip… ill check it out to see if there is anything that can be used without re-inventing the wheel.

And what about DirectX applications? Can be transitions between UM KM the reason of performance penalties incurred by pipeline state changes? Maybe anybody do this tests?

“… crashes in user mode are always recoverable.”

Well, that depends on what you mean by recoverable. A crash in a critical process like winlogon.exe or csrss.exe halts the PC, although they’re user mode processes.

When did your ads move? They don’t show up in my RSS reader (which I like), but are subtlely on the left-hand side now.

GCC on windows does not support any kind of SEH at all, local or otherwise.

Romulo: If it comes to csrss.exe, if you view this “user mode” application, you’ll see that the thread cdd.dll is running in special mode, which does not allow you to view stacks.

also, it is good to note that csrss.exe is a kernel of Win32 mode, thus’ it’s hang WILL hang all other applications that are not using Native code (while, I hope all here know Native code uses only kernel-mode programs in Windows).

This leads me to say that both csrss.exe and winlogon.exe are in-fact kernel-mode programs, which execute lots of code in kernel-mode, but they have also major user-mode part in the same process.

Besides, the csrss.exe “cannot be run in Win32 mode” while, winlogon.exe can do this. :slight_smile:

Besides, hung of winlogon.exe doesn’t give bad things (try suspend the process, which I did offen).

Conclusion: winlogon.exe is hybrid user-mode process (major part in user-mode, only some calls to kernel-mode), you can even kill it and smss.exe or csrss.exe (not sure which one) will simply terminate all process in your session and reconnect you to new session, destroying previous one.

csrss.exe is hybrid kernel-mode process (major part in kernel-mode, mainly for “Canonical display driver” running inside csrss.exe (not to mention the csrss.exe itself is just a kernel-mode loader for JUST DLLs that run, the csrss.exe thread itself does not exist in csrss.exe)

The fact you see the process outside of SYSTEM process, does not mean they are user-mode yet.

Ofcourse, I am not sure of what I written here, I am pretty much sure that csrss.exe, smss.exe, wininit.exe (Vista), and subsystem programs (optional) all run in KERNEL-MODE, even tho’ they are not in SYSTEM(4) process.

While csrss.exe and winlogon.exe are exactly at the “border” of kernel-mode, user-mode. (Don’t know about services.exe, it runs in early time, but I think it’s hybrid too (mostly in user-mode tho’))

hi,
you told that the transition time to switch from user mode to kernel mode is very expensive, my question is whether this time could be greater than the context switch time from one process to another process. Please clarify and thanks in advance.

The first figure implies that devices drivers run in ring 1 and 2, that is not the case. The x86 architecture defines four rings. Windows uses ring 0 for kernel mode and ring 3 for user mode. The reason Windows uses only two levels is that some architectures, such as ARM and MIPS/Alpha, implemented only two privilege levels. Settling on the lowest minimum bar allowed for a more efficient and portable architecture, especially as the other x86 ring levels do not provide the same guarantees as the ring 0/ring 3 divide.

Source : “Windows Internals Part 1”

2 Likes