Discussion:
[pve-devel] [PATCH pve-ha-manager] handle node deletion in the HA stack
Thomas Lamprecht
2015-09-28 09:34:51 UTC
Permalink
When deleting a node from the cluster through pvecm delnode the dead node wasn't removed from the HAs manager status.
Even if it has no real affect to function of the HA stack, especially if no services run there before the deletion - which should be the case.
But for the user it is naturally confusing to see them in the interface.

This patch proposes an automated removal process, after an hour it vanished from the cluster member list it will get deleted.

An alternate approach would be an manual command through the ha-manager's binary.

This patch doesn't covers some side effects like the deletion of the node from defined groups.

Commit message has also some more details.

Thomas Lamprecht (1):
delete node from HA stack when deleted from cluster

src/PVE/HA/NodeStatus.pm | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
--
2.1.4
Thomas Lamprecht
2015-09-28 09:34:52 UTC
Permalink
When a node gets deleted from the cluster with pvecm delnode
we set it's node state in the manager status to 'gone'.
When set to gone the manager waits an hour after the node was last
seen online and only then deletes it from the manager status.

When some HA services were forgotten on the node (shouldn't happen
at all!!) the node will be fenced, the service migrated and then its
state reset to 'gone'. After an hour the node will be deleted,
unless it joined the cluster again in the meantime.

Deleting a node from the HA manager status is by no means a final
act, the ha-manager could live without deleting it, but for the user
it is confusing to see dead nodes in the interface.

Signed-off-by: Thomas Lamprecht <***@proxmox.com>
---
src/PVE/HA/NodeStatus.pm | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/PVE/HA/NodeStatus.pm b/src/PVE/HA/NodeStatus.pm
index fe8c0ef..eb174cb 100644
--- a/src/PVE/HA/NodeStatus.pm
+++ b/src/PVE/HA/NodeStatus.pm
@@ -24,6 +24,7 @@ my $valid_node_states = {
online => "node online and member of quorate partition",
unknown => "not member of quorate partition, but possibly still running",
fence => "node needs to be fenced",
+ gone => "node vanished from cluster members list, possibly deleted"
};

sub get_node_state {
@@ -79,6 +80,20 @@ sub list_online_nodes {
return $res;
}

+my $delete_node = sub {
+ my ($self, $node) = @_;
+
+ return undef if $self->get_node_state($node) ne 'gone';
+
+ my $haenv = $self->{haenv};
+
+ delete $self->{last_online}->{$node};
+ delete $self->{status}->{$node};
+
+ $haenv->log('notice', "deleting gone node '$node', not a cluster member".
+ " anymore.");
+};
+
my $set_node_state = sub {
my ($self, $node, $state) = @_;

@@ -113,7 +128,7 @@ sub update {

if ($state eq 'online') {
# &$set_node_state($self, $node, 'online');
- } elsif ($state eq 'unknown') {
+ } elsif ($state eq 'unknown' || $state eq 'gone') {
&$set_node_state($self, $node, 'online');
} elsif ($state eq 'fence') {
# do nothing, wait until fenced
@@ -133,9 +148,16 @@ sub update {
if ($state eq 'online') {
&$set_node_state($self, $node, 'unknown');
} elsif ($state eq 'unknown') {
- # &$set_node_state($self, $node, 'unknown');
+
+ # node isn't in the member list anymore, deleted from the cluster?
+ &$set_node_state($self, $node, 'gone') if(!defined($d));
+
} elsif ($state eq 'fence') {
# do nothing, wait until fenced
+ } elsif($state eq 'gone') {
+ if($self->node_is_offline_delayed($node, 3600)) {
+ &$delete_node($self, $node);
+ }
} else {
die "detected unknown node state '$state";
}
--
2.1.4
Dietmar Maurer
2015-09-29 05:37:22 UTC
Permalink
applied, thanks!
Post by Thomas Lamprecht
When a node gets deleted from the cluster with pvecm delnode
we set it's node state in the manager status to 'gone'.
When set to gone the manager waits an hour after the node was last
seen online and only then deletes it from the manager status.
When some HA services were forgotten on the node (shouldn't happen
at all!!) the node will be fenced, the service migrated and then its
state reset to 'gone'. After an hour the node will be deleted,
unless it joined the cluster again in the meantime.
Deleting a node from the HA manager status is by no means a final
act, the ha-manager could live without deleting it, but for the user
it is confusing to see dead nodes in the interface.
Continue reading on narkive:
Search results for '[pve-devel] [PATCH pve-ha-manager] handle node deletion in the HA stack' (Questions and Answers)
3
replies
Write a program to implement the insertion operation and deletion operation in a doubly linked list.?
started 2007-04-18 05:11:22 UTC
programming & design
Loading...