Debian Bug report logs -
#892105
linux-image-4.9.0-6-amd64: i40e driver still unstable
Reported by: Harald Wilhelmi <harald.wilhelmi@tngtech.com>
Date: Mon, 5 Mar 2018 16:18:02 UTC
Severity: normal
Found in versions linux/4.9.258-1, linux/4.9.272-1, linux/4.9.82-1+deb9u3
Fixed in version linux/4.10~rc6-1~exp1
Done: Salvatore Bonaccorso <carnil@debian.org>
Bug is archived. No further changes may be made.
Toggle useless messages
Report forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Mon, 05 Mar 2018 16:18:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Harald Wilhelmi <harald.wilhelmi@tngtech.com>:
New Bug report received and forwarded. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Mon, 05 Mar 2018 16:18:04 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: src:linux
Version: 4.9.82-1+deb9u3
Severity: normal
Dear Maintainer,
*** Reporter, please consider answering these questions, where appropriate ***
After rebooting the system with a new Kernel we see various problems
with i40e driver. Usually the problems start to occure within one day
or so under load. The last time we got the error messages shown below,
while the NIC seemed to stop processing some(?) packages. tcpdump
suggested that the machine received packages but either failed to
forward them or to send out the responses. With older Kernels we have
seen also Panics.
Our usual solution is to install a i40e driver from Intel (version
1.6.42 works nice for us). Please note that this is the only driver
taining our kernel - as a workaround.
*** End of the template - remove these template lines ***
-- Package-specific info:
** Version:
Linux version 4.9.0-6-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.82-1+deb9u2 (2018-02-21)
** Command line:
BOOT_IMAGE=/vmlinuz-4.9.0-6-amd64 root=/dev/mapper/vg--data-root ro quiet
** Tainted: O (4096)
* Out-of-tree module has been loaded.
** Kernel log:
2018-02-28T18:10:23.090521+01:00 fire13a1 kernel: [ 2.951283] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.6.16-k
2018-02-28T18:10:23.090522+01:00 fire13a1 kernel: [ 2.951283] i40e: Copyright (c) 2013 - 2014 Intel Corporation.
2018-02-28T18:10:23.090544+01:00 fire13a1 kernel: [ 2.967616] i40e 0000:01:00.0: fw 4.40.35115 api 1.4 nvm 4.53 0x8000206e 0.0.0
2018-02-28T18:10:23.090635+01:00 fire13a1 kernel: [ 3.227930] i40e 0000:01:00.0: MAC address: 3c:fd:fe:9e:51:80
2018-02-28T18:10:23.090635+01:00 fire13a1 kernel: [ 3.231934] i40e 0000:01:00.0: SAN MAC: 3c:fd:fe:9e:51:82
2018-02-28T18:10:23.090646+01:00 fire13a1 kernel: [ 3.402043] i40e 0000:01:00.0: Added LAN device PF0 bus=0x00 func=0x00
2018-02-28T18:10:23.090647+01:00 fire13a1 kernel: [ 3.402053] i40e 0000:01:00.0: PCI-Express: Speed 8.0GT/s Width x8
2018-02-28T18:10:23.090652+01:00 fire13a1 kernel: [ 3.431852] i40e 0000:01:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 4 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
2018-02-28T18:10:23.090654+01:00 fire13a1 kernel: [ 3.443530] i40e 0000:01:00.1: fw 4.40.35115 api 1.4 nvm 4.53 0x8000206e 0.0.0
2018-02-28T18:10:23.090669+01:00 fire13a1 kernel: [ 3.720312] i40e 0000:01:00.1: MAC address: 3c:fd:fe:9e:51:81
2018-02-28T18:10:23.090669+01:00 fire13a1 kernel: [ 3.724518] i40e 0000:01:00.1: SAN MAC: 3c:fd:fe:9e:51:83
2018-02-28T18:10:23.090693+01:00 fire13a1 kernel: [ 3.887825] i40e 0000:01:00.1: Added LAN device PF1 bus=0x00 func=0x01
2018-02-28T18:10:23.090693+01:00 fire13a1 kernel: [ 3.887833] i40e 0000:01:00.1: PCI-Express: Speed 8.0GT/s Width x8
2018-02-28T18:10:23.090696+01:00 fire13a1 kernel: [ 3.917891] i40e 0000:01:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 4 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
2018-02-28T18:10:23.090697+01:00 fire13a1 kernel: [ 3.918420] i40e 0000:01:00.1 eth3: renamed from eth2
2018-02-28T18:10:23.090698+01:00 fire13a1 kernel: [ 3.940100] i40e 0000:01:00.0 eth2: renamed from eth1
2018-02-28T18:10:23.411181+01:00 fire13a1 kernel: [ 10.794451] i40e 0000:01:00.0 eth2: already using mac address 3c:fd:fe:9e:51:80
2018-02-28T18:10:23.415575+01:00 fire13a1 kernel: [ 10.797168] i40e 0000:01:00.0 eth2: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
2018-02-28T18:10:23.415583+01:00 fire13a1 kernel: [ 10.799073] i40e 0000:01:00.1 eth3: set new mac address 3c:fd:fe:9e:51:80
2018-02-28T18:10:23.421269+01:00 fire13a1 kernel: [ 10.803806] i40e 0000:01:00.1 eth3: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
...
2018-03-01T11:23:49.484086+01:00 fire13a1 kernel: [62013.858296] i40e 0000:01:00.0: TX driver issue detected, PF reset issued
2018-03-01T11:23:50.100101+01:00 fire13a1 kernel: [62014.475520] i40e 0000:01:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
2018-03-01T11:23:52.020101+01:00 fire13a1 kernel: [62016.393751] i40e 0000:01:00.0: TX driver issue detected, PF reset issued
2018-03-01T11:23:53.088122+01:00 fire13a1 kernel: [62017.461657] i40e 0000:01:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
2018-03-01T11:23:54.624095+01:00 fire13a1 kernel: [62018.999104] i40e 0000:01:00.0: TX driver issue detected, PF reset issued
2018-03-01T11:23:55.100100+01:00 fire13a1 kernel: [62019.473449] i40e 0000:01:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
** Model information
sys_vendor: Thomas-Krenn.AG
product_name: X10SLH-F/X10SLM+-F
product_version: 0123456789
chassis_vendor: Supermicro
chassis_version: 0123456789
bios_vendor: American Megatrends Inc.
bios_version: 3.0
board_vendor: Supermicro
board_name: X10SLH-F/X10SLM+-F
board_version: 1.01
** Loaded modules:
fuse
btrfs
ufs
qnx4
hfsplus
hfs
minix
ntfs
vfat
msdos
fat
jfs
xfs
drbg
ansi_cprng
authenc
echainiv
xfrm6_mode_tunnel
xfrm4_mode_tunnel
tun
xt_nat
cls_u32
cls_fw
sch_sfq
sch_htb
ip6table_filter
ip6_tables
xt_recent
ip_set_hash_ip
xt_comment
xt_tcpmss
iptable_nat
nf_nat_ipv4
ipt_REJECT
nf_reject_ipv4
xt_addrtype
bridge
xt_policy
xt_mark
iptable_mangle
xt_TCPMSS
xt_tcpudp
xt_CT
iptable_raw
xt_multiport
nf_conntrack_ipv4
nf_defrag_ipv4
twofish_generic
twofish_avx_x86_64
twofish_x86_64_3way
twofish_x86_64
xt_conntrack
xt_NFLOG
nfnetlink_log
twofish_common
xt_LOG
serpent_avx2
serpent_avx_x86_64
serpent_sse2_x86_64
serpent_generic
nf_log_ipv4
nf_log_common
nf_nat_tftp
blowfish_generic
blowfish_x86_64
nf_nat_snmp_basic
nf_conntrack_snmp
nf_nat_sip
nf_nat_pptp
nf_nat_proto_gre
blowfish_common
nf_nat_irc
cast5_avx_x86_64
cast5_generic
cast_common
nf_nat_h323
nf_nat_ftp
nf_nat_amanda
ts_kmp
ctr
nf_conntrack_amanda
nf_nat
nf_conntrack_sane
des_generic
nf_conntrack_tftp
cbc
nf_conntrack_sip
algif_skcipher
nf_conntrack_proto_udplite
camellia_generic
nf_conntrack_proto_sctp
camellia_aesni_avx2
nf_conntrack_pptp
nf_conntrack_proto_gre
nf_conntrack_netbios_ns
nf_conntrack_broadcast
camellia_aesni_avx_x86_64
camellia_x86_64
xts
xcbc
nf_conntrack_irc
sha512_ssse3
sha512_generic
nf_conntrack_h323
md4
algif_hash
nf_conntrack_ftp
af_alg
xfrm_user
xfrm4_tunnel
tunnel4
ipcomp
xfrm_ipcomp
esp4
xt_set
ah4
ip_set
af_key
xfrm_algo
iptable_filter
8021q
garp
mrp
stp
llc
nf_conntrack_netlink
nf_conntrack
nfnetlink
bonding
intel_rapl
x86_pkg_temp_thermal
intel_powerclamp
coretemp
kvm_intel
kvm
irqbypass
crct10dif_pclmul
crc32_pclmul
ghash_clmulni_intel
iTCO_wdt
iTCO_vendor_support
intel_cstate
evdev
intel_uncore
joydev
sg
pcspkr
intel_rapl_perf
ast
ttm
drm_kms_helper
drm
button
intel_pch_thermal
ie31200_edac
mei_me
mei
lpc_ich
mfd_core
video
shpchp
edac_core
acpi_pad
ipmi_si
ipmi_devintf
ipmi_msghandler
ip_tables
x_tables
autofs4
ext4
crc16
jbd2
fscrypto
ecb
mbcache
raid10
raid456
async_raid6_recov
async_memcpy
async_pq
async_xor
async_tx
xor
raid6_pq
libcrc32c
crc32c_generic
raid0
multipath
linear
dm_mod
raid1
md_mod
hid_generic
usbhid
hid
sd_mod
ahci
libahci
libata
scsi_mod
crc32c_intel
i40e(O)
aesni_intel
aes_x86_64
glue_helper
lrw
gf128mul
ablk_helper
cryptd
i2c_i801
i2c_smbus
xhci_pci
ehci_pci
igb
xhci_hcd
ehci_hcd
i2c_algo_bit
dca
ptp
usbcore
usb_common
pps_core
fan
thermal
** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller [8086:0c08] (rev 06)
Subsystem: Super Micro Computer Inc Xeon E3-1200 v3 Processor DRAM Controller [15d9:0803]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: <access denied>
Kernel driver in use: ie31200_edac
Kernel modules: ie31200_edac
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 25
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: f7300000-f73fffff
Prefetchable memory behind bridge: 00000000e0000000-00000000e1bfffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp
00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05) (prog-if 30 [XHCI])
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family USB xHCI [15d9:0803]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 26
Region 0: Memory at f7400000 (64-bit, non-prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04)
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family MEI Controller [15d9:0803]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 0: Memory at f7417000 (64-bit, non-prefetchable) [size=16]
Capabilities: <access denied>
Kernel modules: mei_me
00:16.1 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #2 [8086:8c3b] (rev 04)
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family MEI Controller [15d9:0803]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 10
Region 0: Memory at f7416000 (64-bit, non-prefetchable) [size=16]
Capabilities: <access denied>
00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05) (prog-if 20 [EHCI])
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family USB EHCI [15d9:0803]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f7414000 (32-bit, non-prefetchable) [size=1K]
Capabilities: <access denied>
Kernel driver in use: ehci-pci
Kernel modules: ehci_pci
00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8086:8c10] (rev d5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: f6000000-f70fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp
00:1c.2 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 [8086:8c14] (rev d5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin C routed to IRQ 18
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: f7200000-f72fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp
00:1c.3 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 [8086:8c16] (rev d5) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin D routed to IRQ 19
Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: f7100000-f71fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel driver in use: pcieport
Kernel modules: shpchp
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05) (prog-if 20 [EHCI])
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family USB EHCI [15d9:0803]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 22
Region 0: Memory at f7413000 (32-bit, non-prefetchable) [size=1K]
Capabilities: <access denied>
Kernel driver in use: ehci-pci
Kernel modules: ehci_pci
00:1f.0 ISA bridge [0601]: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller [8086:8c56] (rev 05)
Subsystem: Super Micro Computer Inc C226 Series Chipset Family Server Advanced SKU LPC Controller [15d9:0803]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: <access denied>
Kernel driver in use: lpc_ich
Kernel modules: lpc_ich
00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c02] (rev 05) (prog-if 01 [AHCI 1.0])
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [15d9:0803]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 27
Region 0: I/O ports at f050 [size=8]
Region 1: I/O ports at f040 [size=4]
Region 2: I/O ports at f030 [size=8]
Region 3: I/O ports at f020 [size=4]
Region 4: I/O ports at f000 [size=32]
Region 5: Memory at f7412000 (32-bit, non-prefetchable) [size=2K]
Capabilities: <access denied>
Kernel driver in use: ahci
Kernel modules: ahci
00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 05)
Subsystem: Super Micro Computer Inc 8 Series/C220 Series Chipset Family SMBus Controller [15d9:0803]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin C routed to IRQ 18
Region 0: Memory at f7411000 (64-bit, non-prefetchable) [size=256]
Region 4: I/O ports at 0580 [size=32]
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
00:1f.6 Signal processing controller [1180]: Intel Corporation 8 Series Chipset Family Thermal Management Controller [8086:8c24] (rev 05)
Subsystem: Super Micro Computer Inc 8 Series Chipset Family Thermal Management Controller [15d9:0803]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin C routed to IRQ 18
Region 0: Memory at f7410000 (64-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
Kernel modules: intel_pch_thermal
01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)
Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-2 [8086:0007]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at e0000000 (64-bit, prefetchable) [size=8M]
Region 3: Memory at e1800000 (64-bit, prefetchable) [size=32K]
Expansion ROM at f7380000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: i40e
Kernel modules: i40e
01:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 01)
Subsystem: Intel Corporation Ethernet Converged Network Adapter X710 [8086:0000]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at e0800000 (64-bit, prefetchable) [size=8M]
Region 3: Memory at e1808000 (64-bit, prefetchable) [size=32K]
Expansion ROM at f7300000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: i40e
Kernel modules: i40e
02:00.0 PCI bridge [0604]: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge [1a03:1150] (rev 03) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: f6000000-f70fffff
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: <access denied>
Kernel modules: shpchp
03:00.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 30) (prog-if 00 [VGA controller])
Subsystem: Super Micro Computer Inc ASPEED Graphics Family [15d9:0803]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at f7000000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at e000 [size=128]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: ast
Kernel modules: ast
04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection [15d9:1533]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at f7200000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at d000 [size=32]
Region 3: Memory at f7280000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: igb
Kernel modules: igb
05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Subsystem: Super Micro Computer Inc I210 Gigabit Network Connection [15d9:1533]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at f7100000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at c000 [size=32]
Region 3: Memory at f7180000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: igb
Kernel modules: igb
** USB devices:
Bus 002 Device 002: ID 8087:8000 Intel Corp.
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:8008 Intel Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 004: ID 0557:2419 ATEN International Co., Ltd
Bus 003 Device 002: ID 0557:7000 ATEN International Co., Ltd Hub
Bus 003 Device 003: ID 10d5:55d2 Uni Class Technology Co., Ltd
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
-- System Information:
Debian Release: 9.3
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 4.9.0-6-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages linux-image-4.9.0-6-amd64 depends on:
ii initramfs-tools [linux-initramfs-tool] 0.130
ii kmod 23-2
ii linux-base 4.5
Versions of packages linux-image-4.9.0-6-amd64 recommends:
pn firmware-linux-free <none>
pn irqbalance <none>
Versions of packages linux-image-4.9.0-6-amd64 suggests:
pn debian-kernel-handbook <none>
ii grub-pc 2.02~beta3-5
pn linux-doc-4.9 <none>
Versions of packages linux-image-4.9.0-6-amd64 is related to:
pn firmware-amd-graphics <none>
pn firmware-atheros <none>
pn firmware-bnx2 <none>
pn firmware-bnx2x <none>
pn firmware-brcm80211 <none>
pn firmware-cavium <none>
pn firmware-intel-sound <none>
pn firmware-intelwimax <none>
pn firmware-ipw2x00 <none>
pn firmware-ivtv <none>
pn firmware-iwlwifi <none>
pn firmware-libertas <none>
pn firmware-linux-nonfree <none>
pn firmware-misc-nonfree <none>
pn firmware-myricom <none>
pn firmware-netxen <none>
pn firmware-qlogic <none>
pn firmware-realtek <none>
pn firmware-samsung <none>
pn firmware-siano <none>
pn firmware-ti-connectivity <none>
pn xen-hypervisor <none>
-- no debconf information
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Mon, 05 Mar 2018 17:39:02 GMT) (full text, mbox, link).
Acknowledgement sent
to "Raymond Burkholder" <ray@oneunified.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Mon, 05 Mar 2018 17:39:02 GMT) (full text, mbox, link).
Message #10 received at 892105@bugs.debian.org (full text, mbox, reply):
> Our usual solution is to install a i40e driver from Intel (version
> 1.6.42 works nice for us). Please note that this is the only driver
taining our
> kernel - as a workaround.
I am in a similar circumstance. After a few days of moderate operation, it
appears packets can be transmitted, but tcpdump does not see the ingress
packet. The IRQs may have issues?
I did some troubleshooting with a network appliance vendor in whose devices
these cards are installed.
Their comments are to use a more current kernel, and to use Intel's drivers
from their e1000 sourceforge site. The i40e driver in a more current kernel
may operate better. Debian Stretch has 4.14 in stretch-backports.
I see many many commits to the i40e module between the 4.9 and 4.14 kernel
versions. Maybe the issue has been solved in a more recent kernel/module
incarnation. And/Or use the intel (tainted) module/driver. I am persuing
both: install the stretch-backports kernel (which provides additional
iproute2 functions as a bonus), plus install the separate intel i40e driver.
I am testing my auto-build scripts to suit the new requirements.
What the real problem is with the driver, I do not know. The above is my
version of a workaround.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Fri, 29 Jun 2018 07:39:06 GMT) (full text, mbox, link).
Acknowledgement sent
to "Jörg Kost" <jk@ip-clear.de>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Fri, 29 Jun 2018 07:39:06 GMT) (full text, mbox, link).
Message #15 received at 892105@bugs.debian.org (full text, mbox, reply):
Run into a similar issues with i40e. I found out that the i40e driver
included in linux-image-4.9.0-6-amd64 drops ARP requests randomly if the
interface is not configured into promiscuous mode (e.g. tcpdump
running). Therefore if the switch / router expires or invalidates its
arp cache, the system may not be reachable until the next ARP request
will get randomly accepted & answered.
Workaround:
- Set up a static arp entry on the switch / router
Solution:
- Compile the current i40e driver from the Intel website, where the
issue does not occur, in my case: 2.4.10
I did not test stretch-backports so far.
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Wed, 09 Jan 2019 21:42:11 GMT) (full text, mbox, link).
Acknowledgement sent
to Paul Szabo <paul.szabo@sydney.edu.au>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Wed, 09 Jan 2019 21:42:11 GMT) (full text, mbox, link).
Message #20 received at 892105@bugs.debian.org (full text, mbox, reply):
I use kernel 4.9.130 (my own build from current "stretch" sources,
package linux-source-4.9 version 4.9.130-2), and on my new machines
with i40e devices, I observe similar, occasional issues:
Jan 9 07:30:06 viale kernel: [428469.260531] i40e 0000:19:00.1: cleared PE_CRITERR
Jan 9 07:30:06 viale kernel: [428469.260639] i40e 0000:19:00.1: TX driver issue detected, PF reset issued
Jan 9 08:47:06 siv kernel: [422993.009196] i40e 0000:19:00.1: cleared PE_CRITERR
Jan 9 08:47:06 siv kernel: [422993.013535] i40e 0000:19:00.1 eth1: NIC Link is Down
Jan 9 08:47:16 siv kernel: [423002.131389] i40e 0000:19:00.1 eth1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
Curiously each of those machines only ever show the one type of error
(never show an error like the other machine), and both only complain
about eth1, never about eth0 (though eth0 is also connected with similar
traffic volumes).
Following the hints in this bug report, I will try the Intel i40e
driver, from (either)
https://downloadcenter.intel.com/download/24411/
https://sourceforge.net/projects/e1000/files/i40e%20stable/
Cheers, Paul
--
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Fri, 03 May 2019 10:42:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Dominik Dausch <dausch@megaspace.de>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Fri, 03 May 2019 10:42:06 GMT) (full text, mbox, link).
Message #25 received at 892105@bugs.debian.org (full text, mbox, reply):
On Fri, 29 Jun 2018 09:33:56 +0200 "=?utf-8?q?J=C3=B6rg?= Kost" <jk@ip-clear.de> wrote:
> Run into a similar issues with i40e. I found out that the i40e driver
> included in linux-image-4.9.0-6-amd64 drops ARP requests randomly if the
> interface is not configured into promiscuous mode (e.g. tcpdump
> running). Therefore if the switch / router expires or invalidates its
> arp cache, the system may not be reachable until the next ARP request
> will get randomly accepted & answered.
We have the same problem here with the newest Stretch 4.9 kernel. When using a secondary IP address the i40e driver starts to randomly ignore ARP requests and makes the system unreachable for the rest of the network.
Upgrading to a backport kernel (4.19) with newer 4.19 driver fixes the problem.
Is there a change to backport a fix into 4.9 stable? I40e is a very widely used driver for Intel NICs.
Greetings,
Dominik
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Sun, 05 May 2019 08:18:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Salvatore Bonaccorso <carnil@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Sun, 05 May 2019 08:18:03 GMT) (full text, mbox, link).
Message #30 received at 892105@bugs.debian.org (full text, mbox, reply):
Hi,
On Fri, May 03, 2019 at 10:38:16AM +0000, Dominik Dausch wrote:
> On Fri, 29 Jun 2018 09:33:56 +0200 "=?utf-8?q?J=C3=B6rg?= Kost" <jk@ip-clear.de> wrote:
> > Run into a similar issues with i40e. I found out that the i40e driver
> > included in linux-image-4.9.0-6-amd64 drops ARP requests randomly if the
> > interface is not configured into promiscuous mode (e.g. tcpdump
> > running). Therefore if the switch / router expires or invalidates its
> > arp cache, the system may not be reachable until the next ARP request
> > will get randomly accepted & answered.
>
> We have the same problem here with the newest Stretch 4.9 kernel.
> When using a secondary IP address the i40e driver starts to randomly
> ignore ARP requests and makes the system unreachable for the rest of
> the network.
> Upgrading to a backport kernel (4.19) with newer 4.19 driver fixes
> the problem.
>
> Is there a change to backport a fix into 4.9 stable? I40e is a very
> widely used driver for Intel NICs.
Can anyone of the affected users -- if feasible -- try to help on
isolating the issue?
https://kernel-team.pages.debian.net/kernel-handbook/ch-bugs.html#s9.1.5
and
https://kernel-team.pages.debian.net/kernel-handbook/ch-bugs.html#s9.2.1
can give some help to isolate the fixing change.
Regards,
Salvatore
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Fri, 18 Sep 2020 13:51:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Mikhail Krylatykh <mikhail.krylatykh@skyeng.ru>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Fri, 18 Sep 2020 13:51:03 GMT) (full text, mbox, link).
Message #35 received at 892105@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
> We have the same problem here with the newest Stretch 4.9 kernel. When using a secondary IP address the i40e driver starts to randomly ignore ARP requests and makes the system unreachable for the rest of the network.
> Upgrading to a backport kernel (4.19) with newer 4.19 driver fixes the problem.
Hi,
We've faced the same issue with Debian Stretch. i40e driver with version 1.6.16 silently dropped ARP packets while not `promisc on` enabled on target interface. Update to linux-image-4.19 from backports (i40e driver comes with version 2.3.2) repository fixed this issue.
Thanks for comments and suggested workaround.
---
WBR,
Mikhail
[Message part 2 (text/html, inline)]
Reply sent
to carnil@debian.org:
You have taken responsibility.
(Sun, 23 May 2021 12:30:12 GMT) (full text, mbox, link).
Notification sent
to Harald Wilhelmi <harald.wilhelmi@tngtech.com>:
Bug acknowledged by developer.
(Sun, 23 May 2021 12:30:12 GMT) (full text, mbox, link).
Message #40 received at 892105-done@bugs.debian.org (full text, mbox, reply):
Hi
This bug was filed for a very old kernel or the bug is old itself
without resolution.
If you can reproduce it with
- the current version in unstable/testing
- the latest kernel from backports
please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.
Regards,
Salvatore
Message sent on
to Harald Wilhelmi <harald.wilhelmi@tngtech.com>:
Bug#892105.
(Sun, 23 May 2021 12:30:13 GMT) (full text, mbox, link).
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Mon, 21 Jun 2021 07:25:11 GMT) (full text, mbox, link).
Bug unarchived.
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:51:08 GMT) (full text, mbox, link).
Bug reopened
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:03 GMT) (full text, mbox, link).
Marked as found in versions linux/4.9.258-1.
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:05 GMT) (full text, mbox, link).
Marked as found in versions linux/4.9.272-1.
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:06 GMT) (full text, mbox, link).
Marked as fixed in versions linux/4.10~rc6-1~exp1.
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:07 GMT) (full text, mbox, link).
Marked Bug as done
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:07 GMT) (full text, mbox, link).
Notification sent
to Harald Wilhelmi <harald.wilhelmi@tngtech.com>:
Bug acknowledged by developer.
(Tue, 22 Jun 2021 18:54:08 GMT) (full text, mbox, link).
Bug reopened
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:09 GMT) (full text, mbox, link).
No longer marked as fixed in versions linux/4.10~rc6-1~exp1.
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:09 GMT) (full text, mbox, link).
Marked as fixed in versions linux/4.10~rc6-1~exp1.
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:12 GMT) (full text, mbox, link).
Marked Bug as done
Request was from Salvatore Bonaccorso <carnil@debian.org>
to control@bugs.debian.org.
(Tue, 22 Jun 2021 18:54:12 GMT) (full text, mbox, link).
Notification sent
to Harald Wilhelmi <harald.wilhelmi@tngtech.com>:
Bug acknowledged by developer.
(Tue, 22 Jun 2021 18:54:12 GMT) (full text, mbox, link).
Message sent on
to Harald Wilhelmi <harald.wilhelmi@tngtech.com>:
Bug#892105.
(Tue, 22 Jun 2021 18:54:14 GMT) (full text, mbox, link).
Message #72 received at 892105-submitter@bugs.debian.org (full text, mbox, reply):
reopen 892105
found 892105 4.9.258-1
found 892105 4.9.272-1
close 892105 4.10~rc6-1~exp1
thanks
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Tue, 22 Jun 2021 18:54:16 GMT) (full text, mbox, link).
Acknowledgement sent
to Philipp Hahn <salvatore.bonaccorso@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Tue, 22 Jun 2021 18:54:16 GMT) (full text, mbox, link).
Message #77 received at 892105@bugs.debian.org (full text, mbox, reply):
Hello,
I request the following patch from v4.10-rc1 to get cherry-picked into
"stable/linux-4.9.y":
> commit f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> Author: Alexander Duyck <alexander.h.duyck@intel.com>
> Date: Tue Oct 25 16:08:46 2016 -0700
>
> i40e: Be much more verbose about what we can and cannot offload
>
> This change makes it so that we are much more robust about defining what we
> can and cannot offload. Previously we were just checking for the L4 tunnel
> header length, however there are other fields we should be verifying as
> there are multiple scenarios in which we cannot perform hardware offloads.
>
> In addition the device only supports GSO as long as the MSS is 64 or
> greater. We were not checking this so an MSS less than that was resulting
> in Tx hangs.
>
> Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Debian had this old Bug
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892105> reported
against 4.9.82, which still exists in Debians old-stable 9 "Stretch"
current kernel 4.9.258, but also with latest stable 4.9.273.
Our environment
===============
- KVM server
- dual port i40e
- classic bridge with enp96s0f0
- VM attached to bridge via veth
- no VLANs
- no MacVLan
> # ethtool -i enp96s0f0
> driver: i40e
> version: 1.6.16-k
> firmware-version: 3.33 0x80000e48 1.1876.0
> expansion-rom-version:
> bus-info: 0000:60:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: ye
> # lspci -s 0000:60:00.0
> 60:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)
Analysis
========
As soon as we start one of our "Ubuntu" images the bridge stops
receiving unicast packages for *all* VMs on that bridge.
- we still see outgoing traffic leaving the host, e.g. ARP requests
- "tcpdump -i enp96s0f0" shows no incoming unicast traffic, e.g. no ARP
response
- broadcast traffic passes the bridge
- VMs on the same bridge can communicate with each other
Most often I see the following error message after doing `dmesg -n 8`:
> [ +9,376367] i40e 0000:60:00.0: cleared PE_CRITERR
> [ +0,000252] i40e 0000:60:00.0: TX driver issue detected, PF reset issued
> [ +0,859912] i40e 0000:60:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
In one case I've seen this also (don't know if it is relevant):
> [ 218.921466] i40e 0000:60:00.0 enp96s0f0: VSI_seid 390, Hung TX queue 43, tx_pending_hw: 1, NTC:0xa6, HWB: 0xa6, NTU: 0xa7, TAIL: 0xa7
> [ 218.921470] i40e 0000:60:00.0 enp96s0f0: VSI_seid 390, Issuing force_wb for TX queue 43, Interrupt Reg: 0x0
After that error the only way to reset this broken state it to reboot
the host. I've been unable to tear down the bridge and/or remove the
`i40e` driver, which most often crashes the Linux kernel (some other bug
on `ip link set enp96s0f0 nomaster`).
If you need more data I have a PCAP file, but I still don't know which
packet exactly triggers the bug.
The bugs seems to be fixed with 4.10.0 and I bisected it down to
> git bisect start '--' 'drivers/net/ethernet/intel/i40e'
> # new: [c470abd4fde40ea6a0846a2beab642a578c0b8cd] Linux 4.10
> git bisect new c470abd4fde40ea6a0846a2beab642a578c0b8cd
> # old: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
> git bisect old 69973b830859bc6529a7a0468ba0d80ee5117826
> # old: [13fd3f9cc3def8b276c7913ae4acbfa2653cb198] i40e: clear mac filter count on reset
> git bisect old 13fd3f9cc3def8b276c7913ae4acbfa2653cb198
> # new: [7ec9ba11b046b4b7fd768c366870ada60d409295] i40e: Driver prints log message on link speed change
> git bisect new 7ec9ba11b046b4b7fd768c366870ada60d409295
> # new: [0b7c8b5d5436317a5f4509e2a150c6cec017f348] i40e: fix trivial typo in naming of i40e_sync_filters_subtask
> git bisect new 0b7c8b5d5436317a5f4509e2a150c6cec017f348
> # new: [f114dca2533ca770aebebffb5ed56e5e7d1fb3fb] i40e: Be much more verbose about what we can and cannot offload
> git bisect new f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> # old: [81fa7c97bebd6e1a52c4e059eeffe18df5b3f11f] i40e: Implementation of ERROR state for NVM update state machine
> git bisect old 81fa7c97bebd6e1a52c4e059eeffe18df5b3f11f
> # old: [3aa7b74dbeedfb32406fec70cfd76d797209e8c9] i40e: removed unreachable code
> git bisect old 3aa7b74dbeedfb32406fec70cfd76d797209e8c9
> # first new commit: [f114dca2533ca770aebebffb5ed56e5e7d1fb3fb] i40e: Be much more verbose about what we can and cannot offload
I used v4.10 as the basis and only bisected everything in
drivers/net/ethernet/intel/i40e/ as vanilla v4.9 and several other
versions between that and v4.10 crashed my host, so basically
git checkout v4.10
git checkout $hash -- drivers/net/ethernet/intel/i40e/
make all modules_install install
git checkout v4-10 -- drivers/net/ethernet/intel/i40e/
git bisect (old|new) $hash
I verified that cherry-picking f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
on top of v4.9.273 fixes the problem and reverting it again shows the
problem again.
Philipp
--
Philipp Hahn
Open Source Software Engineer
Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
📞 +49-421-22232-57
🖶 +49-421-22232-99
✉️ hahn@univention.de
🌐 https://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Wed, 23 Jun 2021 15:00:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Greg KH <gregkh@linuxfoundation.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Wed, 23 Jun 2021 15:00:03 GMT) (full text, mbox, link).
Message #82 received at 892105@bugs.debian.org (full text, mbox, reply):
On Tue, Jun 22, 2021 at 08:18:53PM +0200, Philipp Hahn wrote:
> Hello,
>
> I request the following patch from v4.10-rc1 to get cherry-picked into
> "stable/linux-4.9.y":
>
> > commit f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> > Author: Alexander Duyck <alexander.h.duyck@intel.com>
> > Date: Tue Oct 25 16:08:46 2016 -0700
> >
> > i40e: Be much more verbose about what we can and cannot offload
> > This change makes it so that we are much more robust about defining what we
> > can and cannot offload. Previously we were just checking for the L4 tunnel
> > header length, however there are other fields we should be verifying as
> > there are multiple scenarios in which we cannot perform hardware offloads.
> > In addition the device only supports GSO as long as the MSS is 64 or
> > greater. We were not checking this so an MSS less than that was resulting
> > in Tx hangs.
> > Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> > Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
>
> Debian had this old Bug
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892105> reported against
> 4.9.82, which still exists in Debians old-stable 9 "Stretch" current kernel
> 4.9.258, but also with latest stable 4.9.273.
Now queued up, thanks.
greg k-h
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Tue, 29 Jun 2021 18:24:02 GMT) (full text, mbox, link).
Acknowledgement sent
to "Fujinaka, Todd" <todd.fujinaka@intel.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Tue, 29 Jun 2021 18:24:02 GMT) (full text, mbox, link).
Message #87 received at 892105@bugs.debian.org (full text, mbox, reply):
I think I accidentally deleted the forward from the intel-wired-lan spam filter. Re-forwarding and adding Alex's gmail address.
Also,
Todd Fujinaka
Software Application Engineer
Data Center Group
Intel Corporation
todd.fujinaka@intel.com
-----Original Message-----
From: Philipp Hahn <hahn@univention.de>
Sent: Tuesday, June 22, 2021 11:19 AM
To: stable@vger.kernel.org; 892105@bugs.debian.org; Ben Hutchings <benh@debian.org>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>; Andrew Bowers <andrewx.bowers@intel.com>; Bonaccorso, Salvatore <carnil@debian.org>
Subject: Cherry-pick "i40e: Be much more verbose about what we can and cannot offload"
Hello,
I request the following patch from v4.10-rc1 to get cherry-picked into
"stable/linux-4.9.y":
> commit f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> Author: Alexander Duyck <alexander.h.duyck@intel.com>
> Date: Tue Oct 25 16:08:46 2016 -0700
>
> i40e: Be much more verbose about what we can and cannot offload
>
> This change makes it so that we are much more robust about defining what we
> can and cannot offload. Previously we were just checking for the L4 tunnel
> header length, however there are other fields we should be verifying as
> there are multiple scenarios in which we cannot perform hardware offloads.
>
> In addition the device only supports GSO as long as the MSS is 64 or
> greater. We were not checking this so an MSS less than that was resulting
> in Tx hangs.
>
> Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Debian had this old Bug
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892105> reported against 4.9.82, which still exists in Debians old-stable 9 "Stretch"
current kernel 4.9.258, but also with latest stable 4.9.273.
Our environment
===============
- KVM server
- dual port i40e
- classic bridge with enp96s0f0
- VM attached to bridge via veth
- no VLANs
- no MacVLan
> # ethtool -i enp96s0f0
> driver: i40e
> version: 1.6.16-k
> firmware-version: 3.33 0x80000e48 1.1876.0
> expansion-rom-version:
> bus-info: 0000:60:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: ye
> # lspci -s 0000:60:00.0
> 60:00.0 Ethernet controller: Intel Corporation Ethernet Connection
> X722 for 10GBASE-T (rev 09)
Analysis
========
As soon as we start one of our "Ubuntu" images the bridge stops receiving unicast packages for *all* VMs on that bridge.
- we still see outgoing traffic leaving the host, e.g. ARP requests
- "tcpdump -i enp96s0f0" shows no incoming unicast traffic, e.g. no ARP response
- broadcast traffic passes the bridge
- VMs on the same bridge can communicate with each other
Most often I see the following error message after doing `dmesg -n 8`:
> [ +9,376367] i40e 0000:60:00.0: cleared PE_CRITERR [ +0,000252] i40e
> 0000:60:00.0: TX driver issue detected, PF reset issued [ +0,859912]
> i40e 0000:60:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF,
> promiscuous mode forced on
In one case I've seen this also (don't know if it is relevant):
> [ 218.921466] i40e 0000:60:00.0 enp96s0f0: VSI_seid 390, Hung TX
> queue 43, tx_pending_hw: 1, NTC:0xa6, HWB: 0xa6, NTU: 0xa7, TAIL: 0xa7
> [ 218.921470] i40e 0000:60:00.0 enp96s0f0: VSI_seid 390, Issuing
> force_wb for TX queue 43, Interrupt Reg: 0x0
After that error the only way to reset this broken state it to reboot the host. I've been unable to tear down the bridge and/or remove the `i40e` driver, which most often crashes the Linux kernel (some other bug on `ip link set enp96s0f0 nomaster`).
If you need more data I have a PCAP file, but I still don't know which packet exactly triggers the bug.
The bugs seems to be fixed with 4.10.0 and I bisected it down to
> git bisect start '--' 'drivers/net/ethernet/intel/i40e'
> # new: [c470abd4fde40ea6a0846a2beab642a578c0b8cd] Linux 4.10
> git bisect new c470abd4fde40ea6a0846a2beab642a578c0b8cd
> # old: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
> git bisect old 69973b830859bc6529a7a0468ba0d80ee5117826
> # old: [13fd3f9cc3def8b276c7913ae4acbfa2653cb198] i40e: clear mac filter count on reset
> git bisect old 13fd3f9cc3def8b276c7913ae4acbfa2653cb198
> # new: [7ec9ba11b046b4b7fd768c366870ada60d409295] i40e: Driver prints log message on link speed change
> git bisect new 7ec9ba11b046b4b7fd768c366870ada60d409295
> # new: [0b7c8b5d5436317a5f4509e2a150c6cec017f348] i40e: fix trivial typo in naming of i40e_sync_filters_subtask
> git bisect new 0b7c8b5d5436317a5f4509e2a150c6cec017f348
> # new: [f114dca2533ca770aebebffb5ed56e5e7d1fb3fb] i40e: Be much more verbose about what we can and cannot offload
> git bisect new f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> # old: [81fa7c97bebd6e1a52c4e059eeffe18df5b3f11f] i40e: Implementation of ERROR state for NVM update state machine
> git bisect old 81fa7c97bebd6e1a52c4e059eeffe18df5b3f11f
> # old: [3aa7b74dbeedfb32406fec70cfd76d797209e8c9] i40e: removed unreachable code
> git bisect old 3aa7b74dbeedfb32406fec70cfd76d797209e8c9
> # first new commit: [f114dca2533ca770aebebffb5ed56e5e7d1fb3fb] i40e: Be much more verbose about what we can and cannot offload
I used v4.10 as the basis and only bisected everything in
drivers/net/ethernet/intel/i40e/ as vanilla v4.9 and several other
versions between that and v4.10 crashed my host, so basically
git checkout v4.10
git checkout $hash -- drivers/net/ethernet/intel/i40e/
make all modules_install install
git checkout v4-10 -- drivers/net/ethernet/intel/i40e/
git bisect (old|new) $hash
I verified that cherry-picking f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
on top of v4.9.273 fixes the problem and reverting it again shows the
problem again.
Philipp
--
Philipp Hahn
Open Source Software Engineer
Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
📞 +49-421-22232-57
🖶 +49-421-22232-99
✉️ hahn@univention.de
🌐 https://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#892105; Package src:linux.
(Mon, 05 Jul 2021 07:15:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Greg KH <gregkh@linuxfoundation.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>.
(Mon, 05 Jul 2021 07:15:03 GMT) (full text, mbox, link).
Message #92 received at 892105@bugs.debian.org (full text, mbox, reply):
On Tue, Jun 29, 2021 at 06:20:30PM +0000, Fujinaka, Todd wrote:
> I think I accidentally deleted the forward from the intel-wired-lan spam filter. Re-forwarding and adding Alex's gmail address.
>
> Also,
>
> Todd Fujinaka
> Software Application Engineer
> Data Center Group
> Intel Corporation
> todd.fujinaka@intel.com
>
> -----Original Message-----
> From: Philipp Hahn <hahn@univention.de>
> Sent: Tuesday, June 22, 2021 11:19 AM
> To: stable@vger.kernel.org; 892105@bugs.debian.org; Ben Hutchings <benh@debian.org>
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>; Andrew Bowers <andrewx.bowers@intel.com>; Bonaccorso, Salvatore <carnil@debian.org>
> Subject: Cherry-pick "i40e: Be much more verbose about what we can and cannot offload"
>
> Hello,
>
> I request the following patch from v4.10-rc1 to get cherry-picked into
> "stable/linux-4.9.y":
>
> > commit f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
Please provide a working backport, that you have tested works properly,
as it does not apply cleanly.
thanks,
greg k-h
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Mon, 02 Aug 2021 07:24:41 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Wed Jul 3 06:14:18 2024;
Machine Name:
bembo
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.