Discussion:
[pve-devel] proxmox3->4 : online upgrade howto
Alexandre DERUMIER
2015-10-13 12:20:37 UTC
Permalink
Hi,

I have finished to migrate a small 5 nodes cluster from proxmox3 to proxmox4,
using qemu live migration.



Here the howto:

requirements:
-------------
external storage (nfs,ceph).
don't have tested with clvm + iscsi, or local ceph(which should work)


1)Upgrade a first node to proxmox 4.0 and recreate cluster
------------------------------------------------------------
Have an empty node,
then upgrade it to proxmox 4.0, folowing the current wiki


# apt-get update && apt-get dist-upgrade
# apt-get remove proxmox-ve-2.6.32 pve-manager corosync-pve openais-pve redhat-cluster-pve pve-cluster pve-firmware
# sed -i 's/wheezy/jessie/g' /etc/apt/sources.list
# sed -i 's/wheezy/jessie/g' /etc/apt/sources.list.d/pve-enterprise.list
# apt-get update
# apt-get install pve-kernel-4.2.2-1-pve
# apt-get dist-upgrade

reboot

# apt-get install proxmox-ve
# apt-get remove pve-kernel-2.6.32-41-pve

# pvecm create <clustername>


2) upgrade second node
----------------------
# apt-get update && apt-get dist-upgrade
# apt-get remove proxmox-ve-2.6.32 pve-manager corosync-pve openais-pve redhat-cluster-pve pve-cluster pve-firmware
# sed -i 's/wheezy/jessie/g' /etc/apt/sources.list
# sed -i 's/wheezy/jessie/g' /etc/apt/sources.list.d/pve-enterprise.list
# apt-get update
# apt-get install pve-kernel-4.2.2-1-pve
# apt-get dist-upgrade


---> here no reboot

# apt-get install proxmox-ve

if, apt return an error like:

"Setting up pve-manager (4.0-48) ...
Failed to get D-Bus connection: Unknown error -1
Failed to get D-Bus connection: Unknown error -1
dpkg: error processing package pve-manager (--configure):
subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of proxmox-ve:
proxmox-ve depends on pve-manager; however:
Package pve-manager is not configured yet.

dpkg: error processing package proxmox-ve (--configure):
dependency problems - leaving unconfigured
Errors were encountered while processing:"


then,

# touch /proxmox_install_mode
# apt-get install proxmox-ve
# rm /proxmox_install_mode

now the tricky part

mount /etc/pve

# /usr/bin/pmxcfs -l

add node to cluster

# pvecm add ipofnode1 -force

close old corosync and delete old config
# killall -9 corosync
# /etc/init.d/pve-cluster stop
# rm /var/lib/config.db*

start new corosync and pve-cluster

# corosync
# /etc/init.d/pve-cluster start

verify than you can write in /etc/pve/ and that's is correctly replicate on other proxmox4 nodes
#touch /etc/pve/test.txt
#rm /etc/pve/test.txt

migrate vms (do it for each vmid)

# qm migrate <vmid> <target_proxmox4_server> -online

(migrate must do done with cli, because pvestatd can't start without systemd, so gui is not working)


# reboot node

3) do the same thing for next node(s)

4) when all nodes are migrated, remove

# rm /etc/pve/cluster.conf
lyt_yudi
2015-10-13 14:15:37 UTC
Permalink
Post by Alexandre DERUMIER
I have finished to migrate a small 5 nodes cluster from proxmox3 to proxmox4,
using qemu live migration.
Great!, Thank you for sharing !
Alexandre DERUMIER
2015-10-13 14:23:57 UTC
Permalink
Post by lyt_yudi
Great!, Thank you for sharing !
Feel free to test and comment the howto :)




----- Mail original -----
De: "lyt_yudi" <***@icloud.com>
À: "pve-devel" <pve-***@pve.proxmox.com>
Envoyé: Mardi 13 Octobre 2015 16:15:37
Objet: Re: [pve-devel] proxmox3->4 : online upgrade howto





在 2015年10月13日,下午8:20,Alexandre DERUMIER < ***@odiso.com > 写道:

I have finished to migrate a small 5 nodes cluster from proxmox3 to proxmox4,
using qemu live migration.




Great!, Thank you for sharing !

_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
lyt_yudi
2015-11-04 09:34:17 UTC
Permalink
hi, Alexandre
Post by Alexandre DERUMIER
then,
# touch /proxmox_install_mode
# apt-get install proxmox-ve
# rm /proxmox_install_mode
now the tricky part
when here, I got error this

# pvecm add 10.0.0.3 -force
cluster not ready - no quorum?
unable to add node: command failed (ssh 10.0.0.3 -o BatchMode=yes pvecm addnode t2 --force 1)

the 10.0.0.3 is the empty node.

how fix this error?

thanks

# pveversion -v
proxmox-ve: 4.0-19 (running kernel: 2.6.32-42-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-2.6.32-42-pve: 2.6.32-165
pve-kernel-2.6.32-43-pve: 2.6.32-166
pve-kernel-2.6.32-39-pve: 2.6.32-157
pve-kernel-2.6.32-40-pve: 2.6.32-160
pve-kernel-4.2.3-2-pve: 4.2.3-19
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-20
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve4~jessie
openvswitch-switch: 2.3.2-1

lyt_yudi
***@icloud.com
lyt_yudi
2015-11-04 09:38:29 UTC
Permalink
Post by lyt_yudi
when here, I got error this
# pvecm add 10.0.0.3 -force
cluster not ready - no quorum?
unable to add node: command failed (ssh 10.0.0.3 -o BatchMode=yes pvecm addnode t2 --force 1)
the 10.0.0.3 is the empty node.
how fix this error?
on the t3(10.0.0.3) ,had create cluster from '# pvecm create <clustername>'

# pvecm n

Membership information
----------------------
Nodeid Votes Name
1 1 t3 (local)
lyt_yudi
2015-11-04 09:44:52 UTC
Permalink
sorry. I have fixed,

just ignore it. and continue with the following steps¡­

thanks again..
Post by lyt_yudi
Post by lyt_yudi
when here, I got error this
# pvecm add 10.0.0.3 -force
cluster not ready - no quorum?
unable to add node: command failed (ssh 10.0.0.3 -o BatchMode=yes pvecm addnode t2 --force 1)
the 10.0.0.3 is the empty node.
how fix this error?
on the t3(10.0.0.3) ,had create cluster from '# pvecm create <clustername>'
# pvecm n
Membership information
----------------------
Nodeid Votes Name
1 1 t3 (local)
lyt_yudi
2015-11-04 10:01:44 UTC
Permalink
hi, Alexandre
Post by Alexandre DERUMIER
migrate vms (do it for each vmid)
# qm migrate <vmid> <target_proxmox4_server> -online
(migrate must do done with cli, because pvestatd can't start without systemd, so gui is not working)
Very unfortunate! (:

got new error.

# qm migrate 104 t3 -online
Nov 04 17:58:42 starting migration of VM 104 to node ¡¯t3' (10.0.0.3)
Nov 04 17:58:42 copying disk images
Nov 04 17:58:42 starting VM 104 on remote node ¡¯t3'
Nov 04 17:58:43 starting ssh migration tunnel
Nov 04 17:58:44 starting online/live migration on localhost:60000
Nov 04 17:58:44 migrate_set_speed: 8589934592
Nov 04 17:58:44 migrate_set_downtime: 0.1
Nov 04 17:58:44 migrate uri => tcp:[localhost]:60000 failed: VM 104 qmp command 'migrate' failed - address resolution failed for localhost:60000: Name or service not known
Nov 04 17:58:46 ERROR: online migrate failure - aborting
Nov 04 17:58:46 aborting phase 2 - cleanup resources
Nov 04 17:58:46 migrate_cancel
Nov 04 17:58:47 ERROR: migration finished with problems (duration 00:00:05)
migration problems

how fix this error?

thanks
Alexandre DERUMIER
2015-11-05 12:31:42 UTC
Permalink
Post by lyt_yudi
how fix this error?
mmm, maybe an ssh tunnel problem.

Myself, I always disable migration through ssh tunnel, maybe that's why it's works for me.

try to add

migration_unsecure: 1

in /etc/pve/datacenter.cfg

this will do live migration directly between both qemu process, without ssh encryption.

----- Mail original -----
De: "lyt_yudi" <***@icloud.com>
À: "pve-devel" <pve-***@pve.proxmox.com>
Envoyé: Mercredi 4 Novembre 2015 11:01:44
Objet: Re: [pve-devel] proxmox3->4 : online upgrade howto

hi, Alexandre




在 2015年10月13日,下午8:20,Alexandre DERUMIER < ***@odiso.com > 写道:

migrate vms (do it for each vmid)

# qm migrate <vmid> <target_proxmox4_server> -online

(migrate must do done with cli, because pvestatd can't start without systemd, so gui is not working)




Very unfortunate! (:

got new error.

# qm migrate 104 t3 -online
Nov 04 17:58:42 starting migration of VM 104 to node ’t3' (10.0.0.3)
Nov 04 17:58:42 copying disk images
Nov 04 17:58:42 starting VM 104 on remote node ’t3'
Nov 04 17:58:43 starting ssh migration tunnel
Nov 04 17:58:44 starting online/live migration on localhost:60000
Nov 04 17:58:44 migrate_set_speed: 8589934592
Nov 04 17:58:44 migrate_set_downtime: 0.1
Nov 04 17:58:44 migrate uri => tcp:[localhost]:60000 failed: VM 104 qmp command 'migrate' failed - address resolution failed for localhost:60000: Name or service not known
Nov 04 17:58:46 ERROR: online migrate failure - aborting
Nov 04 17:58:46 aborting phase 2 - cleanup resources
Nov 04 17:58:46 migrate_cancel
Nov 04 17:58:47 ERROR: migration finished with problems (duration 00:00:05)
migration problems

how fix this error?

thanks


_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
lyt_yudi
2015-11-05 12:34:30 UTC
Permalink
Post by Alexandre DERUMIER
mmm, maybe an ssh tunnel problem.
Myself, I always disable migration through ssh tunnel, maybe that's why it's works for me.
try to add
migration_unsecure: 1
in /etc/pve/datacenter.cfg
this will do live migration directly between both qemu process, without ssh encryption.
yes, it¡¯s fixed. wolfgang told me that.

so thanks again.
Florent B
2015-12-01 09:10:05 UTC
Permalink
Hi Alexandre,

Thank you a lot for your how-to, but I don't understand something.

When you do :

pvecm add ipofnode1 -force

On a non-rebooted node, this will fail because node wasn't rebooted
since Jessie upgrade, and pvecm script calls systemctl (not present in
Wheezy, and can't run without systemd) :

pvecm add 192.168.0.203 -force
node test2 already defined <--- not a problem because of 'force' flag
copy corosync auth key
stopping pve-cluster service <----- systemctl call (see source)
Failed to get D-Bus connection: Unknown error -1
can't stop pve-cluster service


"Failed to get D-Bus connection" is systemctl related, and expected
without system reboot.

How did you make it work ? The only thing I see is to modify pvecm
script temporary on the node...

Loading...