Discussion:
[pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan
Alexandre DERUMIER
2012-04-26 14:57:55 UTC
Permalink
Hi Dietmar,
I have some problem today with pvestatd.

my iscsi san had some read queues (no so much),
but proxmox interface was hanging,blocking.

So I see that the problem was pvestatd



in update_storage_status()
-> my $info = PVE::Storage::storage_info($cfg);

storage_info()
->eval { activate_storage_list ($cfg, $slist, $session); };

activate_storage_list()
-> __activate_storage_full


__activate_storage_full()
->

} elsif ($type eq 'iscsi') {

return if !check_iscsi_support(1);

$session->{iscsi_sessions} = iscsi_session_list()
if !$session->{iscsi_sessions};

my $iscsi_sess = $session->{iscsi_sessions}->{$scfg->{target}};
if (!defined ($iscsi_sess)) {
eval { iscsi_login ($scfg->{target}, $scfg->{portal}); };
warn $@ if $@;
} else {
# make sure we get all devices
iscsi_session_rescan ($iscsi_sess);
}



then I can see a

/usr/bin/iscsiadm --mode session -r 1 -R
then
/usr/bin/iscsiadm --mode session -r 2 -R

running each minute.

As my san was a little slow, iscsiadm was hanging (40sec),normal time is around 6sec.


Do you know why it's always rescan the iscsi sessions ?
Alexandre DERUMIER
2012-04-26 16:39:35 UTC
Permalink
also volume stats for iscsi volume are always zero, so maybe we can bypass it in pvestatd ?


----- Mail original -----

De: "Alexandre DERUMIER" <***@odiso.com>
À: pve-***@pve.proxmox.com
Envoyé: Jeudi 26 Avril 2012 16:57:55
Objet: [pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan

Hi Dietmar,
I have some problem today with pvestatd.

my iscsi san had some read queues (no so much),
but proxmox interface was hanging,blocking.

So I see that the problem was pvestatd



in update_storage_status()
-> my $info = PVE::Storage::storage_info($cfg);

storage_info()
->eval { activate_storage_list ($cfg, $slist, $session); };

activate_storage_list()
-> __activate_storage_full


__activate_storage_full()
->

} elsif ($type eq 'iscsi') {

return if !check_iscsi_support(1);

$session->{iscsi_sessions} = iscsi_session_list()
if !$session->{iscsi_sessions};

my $iscsi_sess = $session->{iscsi_sessions}->{$scfg->{target}};
if (!defined ($iscsi_sess)) {
eval { iscsi_login ($scfg->{target}, $scfg->{portal}); };
warn $@ if $@;
} else {
# make sure we get all devices
iscsi_session_rescan ($iscsi_sess);
}



then I can see a

/usr/bin/iscsiadm --mode session -r 1 -R
then
/usr/bin/iscsiadm --mode session -r 2 -R

running each minute.

As my san was a little slow, iscsiadm was hanging (40sec),normal time is around 6sec.


Do you know why it's always rescan the iscsi sessions ?


_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
Alexandre DERUMIER
2012-04-27 06:18:36 UTC
Permalink
I check more deeply,
the main problem seem to be that iscsiadm have a long timeout.
(In fact the iscsi timeout, so it can be huge in some config with iscsi failover, or if a path fail).

Maybe can we implemented some kind of "kill process iscsiadm" if it take too much time ?


I see that "timelimit" package exist in debian,
"timelimit /usr/bin/iscsiadm/ ...."

maybe this can add some protections to iscsiadm command?


----- Mail original -----

De: "Alexandre DERUMIER" <***@odiso.com>
À: pve-***@pve.proxmox.com
Envoyé: Jeudi 26 Avril 2012 18:39:35
Objet: Re: [pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan

also volume stats for iscsi volume are always zero, so maybe we can bypass it in pvestatd ?


----- Mail original -----

De: "Alexandre DERUMIER" <***@odiso.com>
À: pve-***@pve.proxmox.com
Envoyé: Jeudi 26 Avril 2012 16:57:55
Objet: [pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan

Hi Dietmar,
I have some problem today with pvestatd.

my iscsi san had some read queues (no so much),
but proxmox interface was hanging,blocking.

So I see that the problem was pvestatd



in update_storage_status()
-> my $info = PVE::Storage::storage_info($cfg);

storage_info()
->eval { activate_storage_list ($cfg, $slist, $session); };

activate_storage_list()
-> __activate_storage_full


__activate_storage_full()
->

} elsif ($type eq 'iscsi') {

return if !check_iscsi_support(1);

$session->{iscsi_sessions} = iscsi_session_list()
if !$session->{iscsi_sessions};

my $iscsi_sess = $session->{iscsi_sessions}->{$scfg->{target}};
if (!defined ($iscsi_sess)) {
eval { iscsi_login ($scfg->{target}, $scfg->{portal}); };
warn $@ if $@;
} else {
# make sure we get all devices
iscsi_session_rescan ($iscsi_sess);
}



then I can see a

/usr/bin/iscsiadm --mode session -r 1 -R
then
/usr/bin/iscsiadm --mode session -r 2 -R

running each minute.

As my san was a little slow, iscsiadm was hanging (40sec),normal time is around 6sec.


Do you know why it's always rescan the iscsi sessions ?


_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France

_______________________________________________
pve-devel mailing list
pve-***@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
Dietmar Maurer
2012-04-27 08:12:46 UTC
Permalink
Post by Alexandre DERUMIER
I check more deeply,
the main problem seem to be that iscsiadm have a long timeout.
Well, I think the problem is that you iscsi server is slow.
Post by Alexandre DERUMIER
(In fact the iscsi timeout, so it can be huge in some config with iscsi failover,
or if a path fail).
Maybe can we implemented some kind of "kill process iscsiadm" if it take too much time ?
I see that "timelimit" package exist in debian, "timelimit /usr/bin/iscsiadm/
...."
maybe this can add some protections to iscsiadm command?
Forcing a timeout is easy (we have a timeout parameter for run_command()).

The question is what t
Alexandre DERUMIER
2012-04-27 08:38:34 UTC
Permalink
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Well, I think the problem is that you iscsi server is slow.
Yes indeed, the san was overload yesterday.
Also I have 2 scsi controllers (active/passive), failover can take 2min. (vm can handle this)


But the problem it's that proxmox become unresponsive when calling the iscsiadm during this time
also pvestatd hang, so rrds for vm stats are not updated during this time.
And with ha cluster, maybe fencing can be call ? (don't have tested it)
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Forcing a timeout is easy (we have a timeout parameter for run_command()).
oh great!
Post by Alexandre DERUMIER
Post by Dietmar Maurer
The question is what timeout do you want?
I think 5sec timeout must be enough for session rescan.

***@kvm6:~# time /usr/bin/iscsiadm --mode session -r 1 -R

real 0m0.616s
user 0m0.003s
sys 0m0.084s
***@kvm6:~# time /usr/bin/iscsiadm --mode session -r 1 -R
Rescanning session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-d0140e876a4b, portal: 10.6.0.18,3260]

real 0m0.764s
user 0m0.006s
sys 0m0.114s
***@kvm6:~# time /usr/bin/iscsiadm --mode session -r 1 -R
Rescanning session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-d0140e876a4b, portal: 10.6.0.18,3260]

real 0m0.753s
user 0m0.005s
sys 0m0.077s
***@kvm6:~# time /usr/bin/iscsiadm --mode session -r 1 -R
Rescanning session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-d0140e876a4b, portal: 10.6.0.18,3260]

real 0m0.734s
user 0m0.005s
sys 0m0.094s


Thanks !
Alexandre



----- Mail original -----

De: "Dietmar Maurer" <***@proxmox.com>
À: "Alexandre DERUMIER" <***@odiso.com>, pve-***@pve.proxmox.com
Envoyé: Vendredi 27 Avril 2012 10:12:46
Objet: RE: [pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan
Post by Alexandre DERUMIER
I check more deeply,
the main problem seem to be that iscsiadm have a long timeout.
Well, I think the problem is that you iscsi server is slow.
Post by Alexandre DERUMIER
(In fact the iscsi timeout, so it can be huge in some config with iscsi failover,
or if a path fail).
Maybe can we implemented some kind of "kill process iscsiadm" if it take too
much time ?
I see that "timelimit" package exist in debian, "timelimit /usr/bin/iscsiadm/
...."
maybe this can add some protections to iscsiadm command?
Forcing a timeout is easy (we have a timeout parameter for run_command()).

The question is what timeout do you want?

- Dietmar
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
Dietmar Maurer
2012-05-14 05:00:26 UTC
Permalink
You want me to implement that? Or will you send a patch?
-----Original Message-----
Sent: Freitag, 27. April 2012 10:39
To: Dietmar Maurer
Subject: Re: [pve-devel] iscsi, some hang : pvestatd always do
iscsi_session_rescan
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Well, I think the problem is that you iscsi server is slow.
Yes indeed, the san was overload yesterday.
Also I have 2 scsi controllers (active/passive), failover can take 2min. (vm can
handle this)
But the problem it's that proxmox become unresponsive when calling the
iscsiadm during this time also pvestatd hang, so rrds for vm stats are not
updated during this time.
And with ha cluster, maybe fencing can be call ? (don't have tested it)
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Forcing a timeout is easy (we have a timeout parameter for
run_command()).
oh great!
Post by Alexandre DERUMIER
Post by Dietmar Maurer
The question is what timeout do you want?
I think 5sec timeout must be enough for session rescan.
real 0m0.616s
user 0m0.003s
sys 0m0.084s
session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-
d0140e876a4b, portal: 10.6.0.18,3260]
real 0m0.764s
user 0m0.006s
sys 0m0.114s
session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-
d0140e876a4b, portal: 10.6.0.18,3260]
real 0m0.753s
user 0m0.005s
sys 0m0.077s
session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-
d0140e876a4b, portal: 10.6.0.18,3260]
real 0m0.734s
user 0m0.005s
sys 0m0.094s
Thanks !
Alexandre
----- Mail original -----
Envoyé: Vendredi 27 Avril 2012 10:12:46
Objet: RE: [pve-devel] iscsi, some hang : pvestatd always do
iscsi_session_rescan
Post by Alexandre DERUMIER
I check more deeply,
the main problem seem to be that iscsiadm have a long timeout.
Well, I think the problem is that you iscsi server is slow.
Post by Alexandre DERUMIER
(In fact the iscsi timeout, so it can be huge in some config with
iscsi failover, or if a path fail).
Maybe can we implemented some kind of "kill process iscsiadm" if it
take too much time ?
I see that "timelimit" package exist in debian, "timelimit
/usr/bin/iscsiadm/ ...."
maybe this can add some protections to iscsiadm command?
Forcing a timeout is easy (we have a timeout parameter for
run_command()).
The question is what timeout do you want?
- Dietmar
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002
Alexandre DERUMIER
2012-05-15 07:03:23 UTC
Permalink
Hi Dietmar ,

I have some progress about it, we can use sysfs ( echo "- - -" > /sys/class/scsi_host/hostX/scan) to do the same thing async and udevadm settle to check if disks changed are made.

What do you think about it ?






I'm also working on a add/remove iscsi disk at vm start/stop.

basicly:

activate storage : login to target, udevadm settle waiting for udev add devices, delete all devices from target (echo to sysfs),udevam settle
(no other way possible, at login, scanning of devices is hardcorded is iscsid and udev add them)

activate volume: simple echo to sysfs to add disk (on each path)

desactivate volume: simple echo to sysfs to delete disk (on each path)

deactivate storage : logout target
those thinks works fine.

the only things annoying is iscsi lun listing, as we need to rescan the target, so re-add all devices.

So,I was thinking of adding this in a separate new storage plugin,
maybe with simply specify lunid for volume without needing to rescan.

I also use this method with my nexenta san, works fine too.(no need 1target-1lun).




The main advatange with many luns, is that multipath, it's a lot faster as by exemple with 300luns+2 path, I got 600 devices to monitor on each host, and multipathd use a lot of cpu.
Also system can be unresponsive in case of iscsi problems with so much devices hanging.
And finaly, remove scsi luns if really a pain for the moment, as if you remove a lun in your scan, you need to remove it on each host manually on each path. With this method, device is remove at vm stop :)


What do you think about it ?

----- Mail original -----

De: "Dietmar Maurer" <***@proxmox.com>
À: "Alexandre DERUMIER" <***@odiso.com>
Cc: pve-***@pve.proxmox.com
Envoyé: Lundi 14 Mai 2012 07:00:26
Objet: RE: [pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan

You want me to implement that? Or will you send a patch?
-----Original Message-----
Sent: Freitag, 27. April 2012 10:39
To: Dietmar Maurer
Subject: Re: [pve-devel] iscsi, some hang : pvestatd always do
iscsi_session_rescan
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Well, I think the problem is that you iscsi server is slow.
Yes indeed, the san was overload yesterday.
Also I have 2 scsi controllers (active/passive), failover can take 2min. (vm can
handle this)
But the problem it's that proxmox become unresponsive when calling the
iscsiadm during this time also pvestatd hang, so rrds for vm stats are not
updated during this time.
And with ha cluster, maybe fencing can be call ? (don't have tested it)
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Forcing a timeout is easy (we have a timeout parameter for
run_command()).
oh great!
Post by Alexandre DERUMIER
Post by Dietmar Maurer
The question is what timeout do you want?
I think 5sec timeout must be enough for session rescan.
real 0m0.616s
user 0m0.003s
sys 0m0.084s
session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-
d0140e876a4b, portal: 10.6.0.18,3260]
real 0m0.764s
user 0m0.006s
sys 0m0.114s
session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-
d0140e876a4b, portal: 10.6.0.18,3260]
real 0m0.753s
user 0m0.005s
sys 0m0.077s
session [sid: 1, target: iqn.1986-03.com.sun:02:316dd6a9-76bc-62ea-93fa-
d0140e876a4b, portal: 10.6.0.18,3260]
real 0m0.734s
user 0m0.005s
sys 0m0.094s
Thanks !
Alexandre
----- Mail original -----
Envoyé: Vendredi 27 Avril 2012 10:12:46
Objet: RE: [pve-devel] iscsi, some hang : pvestatd always do
iscsi_session_rescan
Post by Alexandre DERUMIER
I check more deeply,
the main problem seem to be that iscsiadm have a long timeout.
Well, I think the problem is that you iscsi server is slow.
Post by Alexandre DERUMIER
(In fact the iscsi timeout, so it can be huge in some config with
iscsi failover, or if a path fail).
Maybe can we implemented some kind of "kill process iscsiadm" if it
take too much time ?
I see that "timelimit" package exist in debian, "timelimit
/usr/bin/iscsiadm/ ...."
maybe this can add some protections to iscsiadm command?
Forcing a timeout is easy (we have a timeout parameter for
run_command()).
The question is what timeout do you want?
- Dietmar
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
Dietmar Maurer
2012-05-16 04:59:29 UTC
Permalink
Hi Alexandre,
Post by Alexandre DERUMIER
The main advatange with many luns, is that multipath, it's a lot faster as by
exemple with 300luns+2 path, I got 600 devices to monitor on each host, and
multipathd use a lot of cpu.
Also system can be unresponsive in case of iscsi problems with so much devices hanging.
And finaly, remove scsi luns if really a pain for the moment, as if you remove
a lun in your scan, you need to remove it on each host manually on each
path. With this method, device is remove at vm stop :)
What do you think about it ?
Is a bit difficult to understand, because I do not have that problem. The suggested way
is to use LVM on top of iSCSI, and that avoid the whole problem AFAIK.

Anyways, the whole thing looks like a workaround, and it is maybe easier to fix the cause of the problem (iscsid)?
Form what I see you want:

- do not create any devices at iscsi login
- do not create devices when scanning for luns (any other way to list luns?)
- avtivate/deactivate LUNs selectively

Because we want to activate/deactivate devices selectively. You already asked the iscsi developers about the
Alexandre DERUMIER
2012-05-16 10:05:39 UTC
Permalink
Hi Dietmar
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Is a bit difficult to understand, because I do not have that problem. The suggested way
is to use LVM on top of iSCSI, and that avoid the whole problem AFAIK.
I would like to use lvm but it's not possible in my usage.
I'm use clones/snapshots/remote replication/lun sharing in my san. (each clone=1 lun)
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Anyways, the whole thing looks like a workaround, and it is maybe easier to fix the cause of the problem (iscsid)?
- do not create any devices at iscsi login
- do not create devices when scanning for luns (any other way to list luns?)
- avtivate/deactivate LUNs selectively
exactly !

(because it's a big change, I wanted to do it in a new separate plugin to not break current model. specially for users with a big number of luns)
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Because we want to activate/deactivate devices selectively. You already asked the iscsi developers about their opinion?
Not yet. But adding/remove device is supported by redhat in a clean way.

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/removing_devices.html
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/removing_path-to-storage-device.html
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/adding_storage-device-or-path.html

-Alexandre

----- Mail original -----

De: "Dietmar Maurer" <***@proxmox.com>
À: "Alexandre DERUMIER" <***@odiso.com>
Cc: pve-***@pve.proxmox.com
Envoyé: Mercredi 16 Mai 2012 06:59:29
Objet: RE: [pve-devel] iscsi, some hang : pvestatd always do iscsi_session_rescan

Hi Alexandre,
Post by Alexandre DERUMIER
The main advatange with many luns, is that multipath, it's a lot faster as by
exemple with 300luns+2 path, I got 600 devices to monitor on each host, and
multipathd use a lot of cpu.
Also system can be unresponsive in case of iscsi problems with so much
devices hanging.
And finaly, remove scsi luns if really a pain for the moment, as if you remove
a lun in your scan, you need to remove it on each host manually on each
path. With this method, device is remove at vm stop :)
What do you think about it ?
Is a bit difficult to understand, because I do not have that problem. The suggested way
is to use LVM on top of iSCSI, and that avoid the whole problem AFAIK.

Anyways, the whole thing looks like a workaround, and it is maybe easier to fix the cause of the problem (iscsid)?
Form what I see you want:

- do not create any devices at iscsi login
- do not create devices when scanning for luns (any other way to list luns?)
- avtivate/deactivate LUNs selectively

Because we want to activate/deactivate devices selectively. You already asked the iscsi developers about their opinion?

- Dietmar
--
--
Alexandre D erumier
Ingénieur Système
Fixe : 03 20 68 88 90
Fax : 03 20 68 90 81
45 Bvd du Général Leclerc 59100 Roubaix - France
12 rue Marivaux 75002 Paris - France
Dietmar Maurer
2012-05-16 10:18:24 UTC
Permalink
Post by Alexandre DERUMIER
I would like to use lvm but it's not possible in my usage.
I'm use clones/snapshots/remote replication/lun sharing in my san. (each clone=1 lun)
sure, I know.
Post by Alexandre DERUMIER
Post by Dietmar Maurer
Anyways, the whole thing looks like a workaround, and it is maybe easier
to fix the cause of the problem (iscsid)?
Post by Dietmar Maurer
- do not create any devices at iscsi login
- do not create devices when scanning for luns (any other way to list luns?)
- avtivate/deactivate LUNs selectively
exactly !
Basically we exactly do that with storage type LVM (only activate LVs when needed).
Post by Alexandre DERUMIER
(because it's a big change, I wanted to do it in a new separate plugin to not
break current model. specially for users with a big number of luns)
I am still working on tha

Loading...