Discussion:
xen+vimage kernel panic
Nathan Friess
2018-08-19 18:50:55 UTC
Permalink
Hi,

While testing out the new PVH support in a domU (which is running
great!), I discovered a kernel panic related to xen and vimage support
when trying to add an xn interface into a bridge.

I'm running r337024 from svn. Removing vimage (which seems to be turned
on in 12-CURRENT now) allows using the bridge with no panics. As part
of attempting to debug this I enabled vimage in my 11.2 domU and that
also panics in the same code.

I'm not sure if the problem is a xen issue or a vimage issue so I
haven't submitted a PR yet. The kernel output is listed below.

It looks like netfront_backend_changed() calls netfront_send_fake_arp(),
which calls arp_ifinit() on the interface. The first line of the call
stack with arprequest+0x454 corresponds to a call to
ARPSTAT_INC(txrequests) at the end of arprequest, which expands to
VNET_PCPUSTAT_ADD(). I tried to debug further and I got a little lost,
but that's where I figured out that vimage is involved somehow.

Are there any thoughts on why the xn interface would cause a panic there?

Thanks,

Nathan




=======

Steps to reproduce:

# ifconfig bridge create
bridge0
# ifconfig bridge0 addm xn0
(panic...)


======

Kernel output:

xn0: performing interface reset due to feature change
(... lock reversal)
xn0: backend features: feature-sg feature-gso-tcp4


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d15db4
stack pointer = 0x0:0xfffffe0000483840
frame pointer = 0x0:0xfffffe0000483940
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 14 (xenwatch)
[ thread pid 14 tid 100033 ]
Stopped at arprequest+0x454: movq ll+0x7(%rax),%rax

db> bt
Tracing pid 14 tid 100033 td 0xfffff800032f5000
arprequest() at arprequest+0x454/frame 0xfffffe0000483940
arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe0000483980
netfront_backend_changed() at netfront_backend_changed+0x144/frame
0xfffffe0000483a40
xenwatch_thread() at xenwatch_thread+0x182/frame 0xfffffe0000483a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0000483ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000483ab0

======
Marko Zec
2018-08-19 20:48:52 UTC
Permalink
On Sun, 19 Aug 2018 12:50:55 -0600
Post by Nathan Friess
Hi,
While testing out the new PVH support in a domU (which is running
great!), I discovered a kernel panic related to xen and vimage
support when trying to add an xn interface into a bridge.
I'm running r337024 from svn. Removing vimage (which seems to be
turned on in 12-CURRENT now) allows using the bridge with no panics.
As part of attempting to debug this I enabled vimage in my 11.2 domU
and that also panics in the same code.
I'm not sure if the problem is a xen issue or a vimage issue so I
haven't submitted a PR yet. The kernel output is listed below.
It looks like netfront_backend_changed() calls
netfront_send_fake_arp(), which calls arp_ifinit() on the interface.
The first line of the call stack with arprequest+0x454 corresponds to
a call to ARPSTAT_INC(txrequests) at the end of arprequest, which
expands to VNET_PCPUSTAT_ADD(). I tried to debug further and I got a
little lost, but that's where I figured out that vimage is involved
somehow.
Are there any thoughts on why the xn interface would cause a panic there?
The xn driver calls arp_ifinit() without setting the vnet context
first. Perhaps the attached patch could help (not even compile
tested...)

Marko
Post by Nathan Friess
Thanks,
Nathan
=======
# ifconfig bridge create
bridge0
# ifconfig bridge0 addm xn0
(panic...)
======
xn0: performing interface reset due to feature change
(... lock reversal)
xn0: backend features: feature-sg feature-gso-tcp4
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d15db4
stack pointer = 0x0:0xfffffe0000483840
frame pointer = 0x0:0xfffffe0000483940
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 14 (xenwatch)
[ thread pid 14 tid 100033 ]
Stopped at arprequest+0x454: movq ll+0x7(%rax),%rax
db> bt
Tracing pid 14 tid 100033 td 0xfffff800032f5000
arprequest() at arprequest+0x454/frame 0xfffffe0000483940
arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe0000483980
netfront_backend_changed() at netfront_backend_changed+0x144/frame
0xfffffe0000483a40
xenwatch_thread() at xenwatch_thread+0x182/frame 0xfffffe0000483a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0000483ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000483ab0
======
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
Roger Pau Monné
2018-08-20 09:49:04 UTC
Permalink
Post by Marko Zec
On Sun, 19 Aug 2018 12:50:55 -0600
Post by Nathan Friess
Hi,
While testing out the new PVH support in a domU (which is running
great!), I discovered a kernel panic related to xen and vimage
support when trying to add an xn interface into a bridge.
I'm running r337024 from svn. Removing vimage (which seems to be
turned on in 12-CURRENT now) allows using the bridge with no panics.
As part of attempting to debug this I enabled vimage in my 11.2 domU
and that also panics in the same code.
I'm not sure if the problem is a xen issue or a vimage issue so I
haven't submitted a PR yet. The kernel output is listed below.
It looks like netfront_backend_changed() calls
netfront_send_fake_arp(), which calls arp_ifinit() on the interface.
The first line of the call stack with arprequest+0x454 corresponds to
a call to ARPSTAT_INC(txrequests) at the end of arprequest, which
expands to VNET_PCPUSTAT_ADD(). I tried to debug further and I got a
little lost, but that's where I figured out that vimage is involved
somehow.
Are there any thoughts on why the xn interface would cause a panic there?
The xn driver calls arp_ifinit() without setting the vnet context
first. Perhaps the attached patch could help (not even compile
tested...)
I know nothing about VNET, so is this initialization required now that
VNET is enabled? Is this an existing bug in netfront that was harmless
before VNET was activated?

Can you please file a bug report and attach the patch?

Thanks, Roger.
Nathan Friess
2018-08-27 00:45:00 UTC
Permalink
Post by Roger Pau Monné
Post by Marko Zec
Post by Nathan Friess
Are there any thoughts on why the xn interface would cause a panic there?
The xn driver calls arp_ifinit() without setting the vnet context
first. Perhaps the attached patch could help (not even compile
tested...)
I know nothing about VNET, so is this initialization required now that
VNET is enabled? Is this an existing bug in netfront that was harmless
before VNET was activated?
Can you please file a bug report and attach the patch?
Hi Roger and everyone,

My apologies for not opening the bug report earlier this week. I tested
the patch this weekend and it indeed does fix the panic in my domUs.

Cheers,

Nathan

Loading...