[Prev][Next][Index]

Re: FreeBSD kernel debugging (CPS210 course project)



In article <3528F401.46B6@cs.duke.edu>, Jeff Chase  <chase@cs.duke.edu> wrote:
>Sorry for the delay on this.  I forgot to click "send" on this last
>week.  I've edited a bit.
>
>Haifeng Yu wrote:
>> 
>> I have compiled the FreeBSD kernel and I get 3 executables (aicasm*,
>> kernel*, genassym*) in my "sys/compile/CLUSTERBOX" directory. What are
>> the executables for? How can I trace the kernel execution? I have not
>> modified anything, so no rebooting is needed. What I want to do is to
>> trace the kernel to see how it works.
>>
>
>Kernel is your bootable kernel image.  The others are used in generating
>the kernel.  Basically, you install a kernel by placing it in the
>appropriate place (preserving the old one), logging into the machine
>with rconsole, and rebooting the machine.

More specifically, genassym is used to create assym.s.  This file
declares some #defines for use in the assembly language portions of
the kernel.  Things like the offset of certain fields in the proc
struct, for example.  Aicasm generates code which I believe is
downloaded by the kernel into the Adaptec SCSI host adapters.

Read the Makefile in your kernel build directory for more information
on how all this fits together.

>There are detailed directions on the ari web (ari/local) for Digital
>Unix systems, but we don't have local directions in place....though the
>basic process (rconsole usage, etiquette, administrative commands) is
>the same.  There is detailed information on the freebsd web site (please
>don't print it out), but they need to be augmented a bit with some local
>rules.  Drew is trying to gather this information in one place for us. 
>More on that later.

I've created http://www.cs.duke.edu/ari/local/freebsd.html which
should be enough to get people started.  It has links to
local build instructions, as well as official sources of FreeBSD
information.

>Tracing kernel execution is a little difficult.  FreeBSD has an OK
>kernel debugger, but it's really intended for postmortem debugging. 
>There's a remote debugger (teledebugger) but I don't think it will let
>you do watchpoints, which is what you need to trace execution.  Drew
>knows more about this than anyone and he is back now, so he may be able
>to add some information.
>
>JC

A little background..  FreeBSD has 2 kernel debuggers.  The first is
called ddb.  This debugger is derived from Mach and is very primitive
and hard to use.  It can be used to obtain a stack trace, list the
current process, view the contents of registers, etc.  You can get
into DDB 3 ways: automatically at panic time, at boot time by booting
with the -d flag to the kernel, or at any time by sending a break on
the system console.  Read the ddb(4) manual page on a FreeBSD machine
for details.  DDB must be compiled into the kernel, our example
CLUSTERBOX configuration file has the required options set to get DDB
into your kernels.

The more useful kernel debugger is kgdb.  You can read/write values in
a running kernel by doing 'sudo gdb -k -wcore /kernel /dev/mem'.
Postmortem debugging can be done in a similar fashion -- 
'sudo gdb -k /var/crash/{kernel,vmcore}.NN'.  Note that in order to
get any useful information, the kernel image must contain a full
symbol table.  You will find it useful to be in your kernel build
directory when running kgdb.  For the most part, it is the same gdb
that I'm sure many of you are familiar with.

In terms of tracing the execution of a running kernel -- it is
possible to boot directly into ddb, and do remote debugging via gdb
over serial line from a second machine.  This is described in detail
in the "On-line Kernel Debugging Using Remote GDB" section of FreeBSD
Handbook.  I've never done this on FreeBSD, only on Digital UNIX.  In
my experience, this type of debugging is painfully slow, and should be
used only as a last resort.  It is typically useful when you panic
before a dump device has been added -- like in a device driver's
initialization code.

You may also find it useful to examine a process using the kernel
debugger.  Using kgdb, you can look at the execution stack of a
process in the kernel after its made a system call.  For example, 
here's how to figure out what a process is sleeping on in FreeBSD.
Assume we're interested in process 220, which appears to be stuck:

<4:10pm>rain/gallatin:TPZ>sudo gdb -k kernel /dev/mem
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (i386-unknown-freebsd), 
Copyright 1996 Free Software Foundation, Inc...
IdlePTD 295000
current pcb at 7257000
#0  mi_switch () at ../../kern/kern_synch.c:628
628             microtime(&runtime);
(kgdb) proc pidhashtbl[220]->lh_first
current pcb at f5a41000
(kgdb) where
#0  mi_switch () at ../../kern/kern_synch.c:628
#1  0xf011f3b5 in tsleep (ident=0xf041cb50, priority=4, 
    wmesg=0xf01b98d4 "vmopar", timo=0) at ../../kern/kern_synch.c:391
#2  0xf01b9a9c in vm_object_page_remove (object=0xf1903680, start=0, end=1540, 
    clean_only=1) at ../../vm/vm_object.c:1261
#3  0xf013a090 in vinvalbuf (vp=0xf1903700, flags=1, cred=0xf18f6d00, 
    p=0xf18b2200, slpflag=0, slptimeo=0) at ../../kern/vfs_subr.c:540
#4  0xf015e278 in nfs_vinvalbuf (vp=0xf1903700, flags=1, cred=0xf18f6d00, 
    p=0xf18b2200, intrflg=1) at ../../nfs/nfs_bio.c:799
#5  0xf015cd60 in nfs_bioread (vp=0xf1903700, uio=0xefbffe48, ioflag=8, 
    cred=0xf18f6d00, getpages=1) at ../../nfs/nfs_bio.c:213
#6  0xf015ca98 in nfs_getpages (ap=0xefbffe84) at ../../nfs/nfs_bio.c:130
#7  0xf01beaa8 in vnode_pager_getpages (object=0xf1903680, m=0xefbfff3c, 
    count=2, reqpage=0) at vnode_if.h:1063
#8  0xf01bd657 in vm_pager_get_pages (object=0xf1903680, m=0xefbfff3c, 
    count=2, reqpage=0) at ../../vm/vm_pager.c:188
#9  0xf01b32f6 in vm_fault (map=0xf18fe900, vaddr=6303744, 
    fault_type=3 '\003', change_wiring=0) at ../../vm/vm_fault.c:426
#10 0xf01ccdcc in trap_pfault (frame=0xefbfffbc, usermode=1)
    at ../../i386/i386/trap.c:633
#11 0xf01cc95b in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = 0, 
      tf_esi = -272640436, tf_ebp = -272640440, tf_isp = -272629788, 
      tf_ebx = -272640432, tf_edx = -272640424, tf_ecx = 0, tf_eax = 0, 
      tf_trapno = 12, tf_err = 6, tf_eip = 4168, tf_cs = 31, 
      tf_eflags = 66054, tf_esp = -272640452, tf_ss = 39})
    at ../../i386/i386/trap.c:239
#12 0x1048 in ?? ()


However, the most effective way to figure out what's going on is to
sit down with the source code for a few hours and manually trace out
the code path you are interested in.  Once you think you have an idea
of what's going on, you can confirm your suspicions via console
printfs(), counters that you can read via kgdb or even panic().

I hope all this helps,

Drew

-- 
------------------------------------------------------------------------------
Andrew Gallatin, Sr Systems Programmer	http://www.cs.duke.edu/~gallatin
Duke University				Email: gallatin@cs.duke.edu
Department of Computer Science		Phone: (919) 660-6590

Reference(s):