Mon, 08 Jun 2009
ZFS + NFS = Crash :(
I started to experience a crash recently that was triggered when building something in my tinderbox setup. This particular tinderbox is running on ZFS and uses NFS mounts on localhost. The panic and backtrace look like this:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x6dc
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80572e7f
stack pointer = 0x28:0xffffff803e722530
frame pointer = 0x28:0xffffff803e722550
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1030 (nfsd: service)
[thread pid 1030 tid 100140 ]
Stopped at prison_priv_check+0xff: movl 0x6dc(%rsi),%eax
db> bt
Tracing pid 1030 tid 100140 td 0xffffff00029ea000
prison_priv_check() at prison_priv_check+0xff
priv_check_cred() at priv_check_cred+0x4c
secpolicy_vnode_access() at secpolicy_vnode_access+0x28
zfs_zaccess() at zfs_zaccess+0x1d5
zfs_freebsd_access() at zfs_freebsd_access+0xd0
VOP_ACCESS_APV() at VOP_ACCESS_APV+0x44
nfsrv_access() at nfsrv_access+0xf3
nfsrv3_access() at nfsrv3_access+0x386
nfssvc_program() at nfssvc_program+0x1fb
svc_run_internal() at svc_run_internal+0x6d2
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x8006a0e1c, rsp = 0x7fffffffe6d8, rbp = 0x5 ---
db>
Here's the relevant pieces of prison_priv_check():
/*
* Check with permission for a specific privilege is granted within jail. We
* have a specific list of accepted privileges; the rest are denied.
*/
int
prison_priv_check(struct ucred *cred, int priv)
{
if (!jailed(cred))
return (0);
switch (priv) {
...
/*
* Depending on the global setting, allow privilege of
* mounting/unmounting file systems.
*/
case PRIV_VFS_MOUNT:
case PRIV_VFS_UNMOUNT:
case PRIV_VFS_MOUNT_NONUSER:
case PRIV_VFS_MOUNT_OWNER:
if (cred->cr_prison->pr_allow & PR_ALLOW_MOUNT)
return (0);
else
return (EPERM);
...
Loading up the core in kgdb for analysis it becomes very clear what is going on.
(kgdb) frame 12
#12 0xffffffff80572e7f in prison_priv_check (cred=0xffffff00168f5900, priv=334) at /usr/home/wxs/freebsd/src/head/sys/kern/kern_jail.c:3315
3315 switch (priv) {
(kgdb) p/x *cred
$7 = {cr_ref = 0x1, cr_uid = 0x0, cr_ruid = 0x0, cr_svuid = 0x0,
cr_ngroups = 0x1, cr_groups = {0x0 }, cr_rgid = 0x0,
cr_svgid = 0x0, cr_uidinfo = 0x0, cr_ruidinfo = 0x0, cr_prison = 0x0,
cr_vimage = 0x0, cr_flags = 0x0, cr_pspare = {0x0, 0x0}, cr_label = 0x0,
cr_audit = {ai_auid = 0x0, ai_mask = {am_success = 0x0, am_failure = 0x0},
ai_termid = {at_port = 0x0, at_type = 0x0, at_addr = {0x0, 0x0, 0x0,
0x0}}, ai_asid = 0x0, ai_flags = 0x0}}
(kgdb)
It's clear that cred->cr_prison is bad. But this isn't the real meat of the problem. The first check in prison_priv_check() is to see if we are jailed, and that looks something like this:
/*
* Return 1 if the passed credential is in a jail, otherwise 0.
*/
int
jailed(struct ucred *cred)
{
return (cred->cr_prison != &prison0);
}
Up until fairly recently this function used to contain:
return (cred->cr_prison != NULL);
So, because cred->cr_prison is NULL in our case the check in prison_check_cred() is evaluating to false, when it should be evaluating to true. So now we know why we are crashing (NULL ptr dereference) we still don't know what is causing cr_cred to be NULL.
Credentials like this are derived from a very small number of places. The reason these are wrong is that the RPC code in the kernel doesn't know which credentials to assign when it handles the request. Luckily a workaround has been put in place while a more proper solution is being worked on.
posted at: 12:48 | tags: freebsd, zfs, nfs | path: /entries/geek | permanent link to this entry








