Pages

23 December 2006

An enlightenment about shadow page table

Lately, I got trouble understanding why Virtual Machine Monitor (hypervisors, such as Xen) implement shadow page table. So I entered #osdev in Freenode and got these explanations from these generous gentlemen. Enjoy! (Note: the_hydra is /me)


<the_hydra>hi
<the_hydra> could somebody help me understanding what shadow page
table really is?
<the_hydra> from what I read, seems like we do shadow because CPU in
vmx root mode doesn't care with guest mode PTEs
<the_hydra> while guest only "sees" the guest mode PMD/PGD/PTEs, is
this correct?
* schoolboy has joined osdev
<geist> I assume you're talking about intel's VT stuff?
<the_hydra> geist: yes
<the_hydra> sorry was afk
<the_hydra> vt-i and vt-x if I might add
* KillerX has joined osdev
<the_hydra> geist: care to explain?
<geist> dont know enough about the intel variant
<geist> i know enough to know they screwed it up
<the_hydra> oh :|
<the_hydra> ok maybe you can explain in general how shadow page table
works?
<geist> i dont know enough details to give you a reasonable explanation
<geist> i read the spec on the amd design, but only have heard about
the intel one
<geist> and the intel one is a lot more crappy, from what I hear
<the_hydra> hm ok np
* redblue has quit IRC (Read error: 110 (Connection timed out))
<geist> the amd design completely virtualizes it, so the guest doesn't
have to care about the higher level page tables
<the_hydra> sounds great!
<geist> the intel one doesn't completely hide the physical pages
<geist> so it's very hard to make a perfectly secure system
<the_hydra> so in AMD's, VMM just need initially tell where to store
real and "fake" pgd pointer and the rest will be taken care by CPU?
<geist> that's what i understand, yeah
* schoolboy has quit IRC ("hello world")
* wcstok has quit IRC (Remote closed the connection)
* Mikaku has quit IRC ("Leaving")
* _anoid has joined osdev
<mwk> geist: my support. intel VMX sucks.
<the_hydra> mwk: you think so too?
<mwk> the_hydra: AMD SVM system may or may not support Nested Paging,
according to the specification [i don't have any idea if it's
actually supported in RL processors or not, though]
<mwk> if CPU supports nested paging, you have host CR3 and guest CR3
<the_hydra> mwk: oh so the official name for this hardware based MMU
virtualization is called nested paging?
<mwk> guest CR3 is just the virtualised machine's linear-to-physical
translation, so it can be taken directly from guest's virtualised CR3
<mwk> yeah
<mwk> host CR3 provides guest-physical-to-host-physical translation
<mwk> so, you manage host CR3 and let virtualised guest manage guest
CR3
<the_hydra> hm
<mwk> but, if CPU supports SVM and not nested paging, you need shadow
paging tables
<mwk> which provide guest-linear-to-host-physical translation directly
* wobster has joined osdev
<the_hydra> with this "double" mapping (guest virt to guest phys,
guest phys to host phys) ...do you think it will have impact in
virtualization? perfomance... latency and so on
<mwk> not much
<mwk> but it'll help
<mwk> so, if you need shadow tables, you do the following:
<mwk> 1. create empty page table
<mwk> 2. run the guest
<mwk> 3. make CR3 read/write, invlpg, and page fault interceped events
<the_hydra> sorry what is invplg?
<mwk> invlpg.
<the_hydra> *lpg*
<mwk> leave now.
<the_hydra> sorry what is invlpg??
<mwk> uhm... did you read the manual about paging?
<mwk> anyway:
<mwk> INVLPG Invalidate TLB Entry
<mwk> so
<mwk> 4. when VM exits due to nonexistent-page fault, check CR2 and
walk guest and host page tables to see if it actually has some
translation. if so, insert it to shadow page table and restart VM.
otherwise, inject real page fault into VM
<the_hydra> ok got it...wasn't familiar with that invlpg, but I do
understand what invalidate TLB entry is
<mwk> 5. when VM exits due to invlpg, just zero out that entry in
shadow tables
<mwk> 6. when VM exits due to CR3 write [or CR0 or CR4, in fact],
delete all shadow tables and replace with an empty one
<mwk> that should be it
<the_hydra> very detailed...thanks a lot
<mwk> oh, also, when you intercept invlpg and/or CR3 write, you need
to flush real CPU's TLBs... so you need invlpga, or ASID change.
details are in AMD spec.

2 comments:

opt1lc said...

assalamualaikum,

hiks..hiks..hiks..
udah lama ndak chat om!!!!!

gmana kbarnya om?!

wow.. itu langsung dipaste dai irc ya bos..!!!!

wass.

opt1lc said...

assalamualaikum,

pa kbar om?
udah lam nih nga chat!!!!
hiks..hiks..hiks..

itu langsung dipaste dari IRC ya om!!!!!

yu ah..
sukses deh bro...

wass.