Thursday, 2016-07-07

sjenningsanyone have an issue with the NIC on the minnowboard turbot resetting about 60 seconds after boot?18:56
sjenningsi boot, the NIC autonegs at gigabit, then at about 67 second dmesg time after boot, the link goes down and comes back up at 100M18:58
sjenningsi have 3 boards and they all do it.  i have the latest firmware.18:59
wmatsjennings: i will test my turbot19:14
m_wwhat driver is it using?19:16
sjenningsm_w, r816919:16
sjenningsi've tried the out of tree r8168 from realtek too and it did the same thing19:16
m_wwhat can of switch you connecting to?19:18
sjenningsmy only theory is that the EFI is resetting it for some reason. might be interesting to see if i delay the boot in grub and see if that offsets the time that it occurs.19:18
sjenningsit's a 5-port netgear19:18
sjenningsm_w, ^19:19
m_wyes delay the boot and watch the link19:19
wmatsjennings: sorry, i forgot i loaned out my turbot, so I can't test19:21
sjenningswmat, ha, ok. thanks anyway :)19:22
sjenningsm_w, delaying the boot has no effect. link still goes down at 67 seconds dmesg time.19:24
sjenningsso it seems like the driver is doing it19:24
sjenningsm_w, hmmm
sjenningsmight try patching my kernel with that19:34
m_wlooks like a good idea19:35
m_wsjennings: what kernel version are you using?19:59
sjenningsm_w, 4.5.519:59
m_wguessing fedora 2420:09
sjenningsm_w, yes20:14
m_wwell you are gonna have to do it manually, this patch never went to stable20:17
sjenningsm_w, yeah :(  i'm about to pull down the kernel srpm and add the patch20:23
m_wyou can try mainline instead, might be easier20:29
sjenningsyeah, i like using the packaging.  makes cleanup easier.  plus i'm not going to build the kernel on the turbot :)20:30
sjenningsm_w, didn't fix it20:44
sjenningsi'm now on kernel 4.6.3 which has the fix20:45
m_wany kernel messages that could give a clue as to what is happening?20:46
sjenningsm_w, unfortunately no, just the random "link down" at around 60 seconds20:48
m_wWhat does ethtool report? ethtool eth020:48
m_wdo you have time to run ethtool before the connection flips to 100?20:52
m_wsomething is causing the advertised link mode to be dropped20:59
sjenningsm_w, yeah, it's weird for sure21:00
m_wdoes forcing the link mode with ethtool work?21:03
m_wyou have another switch to try?21:11
sjenningsm_w, interesting. let me try.21:16
sjenningsm_w, all my switches are netgate.  i moved from my 5-port basic switch to an 8-port smart switch, but it still drops the link21:18
sjennings*netgear that is21:19
m_wdoes the link go back to 1000 for a while when you plug and unplug the cable?21:19
sjenningsno, but if i unload/reload the driver it does go back to 1000, at least for a while21:20
sjenningseventually it drops back down21:20
sjenningsalthough not in a fixed amount of time like the first 60 seconds21:21
m_wdid you try to force the link with ethtool yet?21:22
sjenningsyes, it doesn't allow it21:22
m_wwhat does it say?21:22
sjenningsbecause the spec _requires_ autoneg for 100021:22
m_wmaybe try forcing the advertised link21:25
sjenningsactually, wrong machine :-/
sjenningshow do i do that?21:26
m_w--advertise option devname21:27
sjenningsah, ok so "ethtool -s enp2s0 advertise 0x020" worked21:28
sjenningsthe link jumped up to 100021:28
m_wgive it a minute and see if it reverts back21:29
sjenningsit dropped back21:33
m_whave you tried different cables?21:33
sjenningsonly 10/100 mode are in the advertise list now21:33
sjenningscables, yes21:34
sjenningsif i download a large file over my LAN while it is at 1000, it come it at about 38MB/s (limited by packet processing  or i/o speed I assume), but 38Mb/s > 100Mbps to it is going at 1000 when negotiated at 100021:37
sjennings*it comes in21:37
m_wso the performance sucks21:39
m_weither way21:39
sjenningswe'll i don't think the realtek nic with very limit offload abilities can keep the cpu from pegging21:40
sjenningsseems like there is still at 60ish second interval between events21:40
m_wethtool -S21:40
sjenningstx_errors is interesting21:41
sjenningsthere is a TX_TIMEOUT watchdog in the driver code21:41
m_wdoes the tx_error count go up after every link reset?21:50
sjenningsm_w so after the first link drop after boot, tx_erros is still 021:54
sjenningsit seems to stay at 1000 after a force advertise, until i start loading the network.  then the link drops.  this time still no tx_errors.  so i don't think that is it.21:57
m_wcan you pastebin dmesg?22:03
sjenningsm_w, thanks for the help, i gotta run for now.  i'll let you know if i get anything figured out.  thanks again!22:28
m_wno problem22:28
