Damned,
I can't force openmanage to set the timer under 60s :(
#omconfig system recovery timer=10
Error! Recovery reset time must be between 60 and 720 seconds.
I'll try to see if we can disable it.
----- Mail original -----
De: "aderumier" <***@odiso.com>
À: "pve-devel" <pve-***@pve.proxmox.com>
Envoyé: Jeudi 3 Décembre 2015 18:24:40
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ?
I just found a strange bug with ipmi_watchdog, dell openmanage related
at boot the timeout is correclty setup to 10s
***@kvmtest1 ~ # ipmitool mc watchdog get
Watchdog Timer Use: SMS/OS (0x44)
Watchdog Timer Is: Started/Running
Watchdog Timer Actions: Hard Reset (0x01)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x10
Initial Countdown: 10 sec
Present Countdown: 9 sec
but after some minutes (5-10min),
I'm seeing it at 480s
# ipmitool mc watchdog get
Watchdog Timer Use: SMS/OS (0xc4)
Watchdog Timer Is: Started/Running
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x10
Initial Countdown: 480 sec
Present Countdown: 479 sec
In the dell openmanage, I'm seeing a reset configuration option at 480s.
(I think it's the openmanage service which overwrite the value).
I'll add a note in the wiki about this too.
----- Mail original -----
De: "aderumier" <***@odiso.com>
À: "dietmar" <***@proxmox.com>
Cc: "pve-devel" <pve-***@pve.proxmox.com>
Envoyé: Jeudi 3 Décembre 2015 17:48:14
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ?
Post by Alexandre DERUMIERPost by Dietmar MaurerThe timeout must be 60 seconds!! Never change that.
We set the timeout to 60s when we start watchdog-mux.
Ah ok. I thinked we need to define it manually
What is the difference between this 2 timeout ?
+ int watchdog_timeout = 10;
+ int client_watchdog_timeout = 60;
ipmitool give me 10s, so it's seem to works fine :)
# ipmitool mc watchdog get
Initial Countdown: 10 sec
Post by Alexandre DERUMIERAnother question, I have done some tests 2weeks ago with a customer,
and I think I had some problem, if the node reboot too fast
(pve-ha-manager see the node down, but it's coming up again before the vm was migrated).
Is it a known bug ?
I don't remember exactly, but lrm or crm was stuck, because node (and vms) had rebooted too fast.
I don't have access to customer logs sorry.
----- Mail original -----
De: "dietmar" <***@proxmox.com>
À: "aderumier" <***@odiso.com>
Cc: "pve-devel" <pve-***@pve.proxmox.com>
Envoyé: Jeudi 3 Décembre 2015 17:28:55
Objet: Re: [pve-devel] Blacklisting HP hardware watchdog timer module ?
Post by Alexandre DERUMIERBTW, what is the best timeout for the watchdog ?
I think that pve ha manager wait for around 1min before migrating vm ?
if yes, the watchdog timeout should be lower ?
The timeout must be 60 seconds!! Never change that.
We set the timeout to 60s when we start watchdog-mux.
Post by Alexandre DERUMIERAnother question, I have done some tests 2weeks ago with a customer,
and I think I had some problem, if the node reboot too fast
(pve-ha-manager see the node down, but it's coming up again before the vm was migrated).
Is it a known bug ?
What bug exactly?
_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel