ES1640DC v2 Failover Speed

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

ES1640DC v2 Failover Speed

Post by smccloud » Wed Dec 04, 2019 7:48 am

So I'm running a ES1640DC v2 as storage for a vSphere cluster and while I'm sure there are faster options out there its fast enough for us in most respects. However, when I pitched it to management the premise was that due to its dual controllers we'd be able to seamlessly fail over to the inactive controller with for maintenance and in an emergency. In practice however, a fail over of the controllers for any reason takes our entire production environment down for 15 minutes or so as the hosts don't recognize the NFS exports once the fail over is complete. Is this a known issue or do I just have something configured wrong?

It is setup with 16 3TB drives in RAID10 and currently a single export for NFS as well as some iSCSI exports for a planned SQL Active/Passive HA migration.

We are still in better shape than our old setup with local storage only but I feel like I'm missing something. Does anyone have any suggestions or is it just the way it is?

zmaho
Starting out
Posts: 40
Joined: Wed Mar 13, 2019 2:43 am

Re: ES1640DC v2 Failover Speed

Post by zmaho » Wed Dec 04, 2019 8:58 am

how did you connect your setup to vSphere...
is iSCSI on multipath ... ?

could some one correct me please if i am wront but NFS can not have multipath ...

if you have two controller and they both have IP adress .... let say
controller A one have ip 10.10.10.50/24 and con.B have ip 10.10.20.50/24
and you have TWO ip set on you NIC on vSphere ... like 10.10.10.10 and 10.10.20.10
and you setup multipath in iSCSI ... one path should be operational if other goes down?

or did i misunderstood DC series of QNAP ... they can not go from on to another in zero time ...
it is not FAULT TOLERANCE like in vSphere :") ...

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Wed Dec 04, 2019 9:11 pm

zmaho wrote:
Wed Dec 04, 2019 8:58 am
how did you connect your setup to vSphere...
is iSCSI on multipath ... ?

could some one correct me please if i am wront but NFS can not have multipath ...

if you have two controller and they both have IP adress .... let say
controller A one have ip 10.10.10.50/24 and con.B have ip 10.10.20.50/24
and you have TWO ip set on you NIC on vSphere ... like 10.10.10.10 and 10.10.20.10
and you setup multipath in iSCSI ... one path should be operational if other goes down?

or did i misunderstood DC series of QNAP ... they can not go from on to another in zero time ...
it is not FAULT TOLERANCE like in vSphere :") ...
NFS4 supports multipath. And it is working to the currently active controller.

Per the product page
The ES1640dc v2 is powered by two Intel® Xeon® E5-2400 v2 processors and features dual active-active controller architecture, ensuring businesses with nearly zero downtime high availability as the standby controller can quickly take over if one controller breaks down. The ES1640dc v2 connects to the JBOD enclosure (EJ1600 v2) via the dual path mini-SAS design to sustain continuous operations even if an external JBOD cable is disconnected. Designed around redundancy, the ES1640dc v2 is the best realization of reliable enterprise storage for uninterrupted mission-critical enterprise tasks and productivity.
I don't consider 15 minutes to be quick or nearly zero downtime. As it stands, if I need o upgrade the firmware on it I need to do it off hours (and really on a weekend) after shutting down all the production VMs.

Do I expect every VM to keep running without issues during a controller fail over, no but I expect minor hiccups as that is what the product literature shows.

If I need to reconfigure our setup to use iSCSI instead of NFS I will. Not what I'd like to do but if that is the final solution so be it.

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Wed Dec 04, 2019 10:13 pm

Ok, I just created an iSCSI Target & LUN, configured vSphere to connect to all 4 IPs configured on our storage networks of our ES1640dc v2 and it appears to be using both controllers for traffic. Although NFS4 supports multipath, it appears the ES1640dc v2 doesn't support it in the same way that it does for multipath on iSCSI. I'll still have to try a fail over outside of normal working hours but now I have some VM migrations to get done (and of course our license doesn't support live migration of running VMs unless its host & storage).

User avatar
storageman
Ask me anything
Posts: 5511
Joined: Thu Sep 22, 2011 10:57 pm

Re: ES1640DC v2 Failover Speed

Post by storageman » Wed Dec 04, 2019 11:04 pm

smccloud wrote:
Wed Dec 04, 2019 10:13 pm
Ok, I just created an iSCSI Target & LUN, configured vSphere to connect to all 4 IPs configured on our storage networks of our ES1640dc v2 and it appears to be using both controllers for traffic. Although NFS4 supports multipath, it appears the ES1640dc v2 doesn't support it in the same way that it does for multipath on iSCSI. I'll still have to try a fail over outside of normal working hours but now I have some VM migrations to get done (and of course our license doesn't support live migration of running VMs unless its host & storage).
The ES1640DC is active/active ALUA meaning it only runs on one controller at a time (bear with me).
Multipathing only works on one controller not both controllers (assuming you connect more than 1 NIC on that single controller).
The reason you connect IPs from the other controller is to ensure failover.
If you want to use both controllers on this full time you have to assign a separate storage pool to each controller.

Very few companies over active/active symmetrical NAS (meaning IO multipathing down both controllers).
Whereas in the SAN work active/active symmetrical is quite common.

How have you proved IO is running down both controllers?

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Wed Dec 04, 2019 11:20 pm

storageman wrote:
Wed Dec 04, 2019 11:04 pm
smccloud wrote:
Wed Dec 04, 2019 10:13 pm
Ok, I just created an iSCSI Target & LUN, configured vSphere to connect to all 4 IPs configured on our storage networks of our ES1640dc v2 and it appears to be using both controllers for traffic. Although NFS4 supports multipath, it appears the ES1640dc v2 doesn't support it in the same way that it does for multipath on iSCSI. I'll still have to try a fail over outside of normal working hours but now I have some VM migrations to get done (and of course our license doesn't support live migration of running VMs unless its host & storage).
The ES1640DC is active/active ALUA meaning it only runs on one controller at a time (bare with me).
Multipathing only works on one controller not both controllers (assuming you connect more than 1 NIC on that single controller).
The reason you connect IPs from the other controller is to ensure failover.
If you want to use both controllers on this full time you have to assign a separate storage pool to each controller.

Very few companies over active/active symmetrical NAS (meaning IO multipathing down both controllers).
Whereas in the SAN work active/active symmetrical is quite common.

How have you proved IO is running down both controllers?
Opening System Status -> Resource Monitor -> Network Usage. I can see traffic on the following interfaces, Ethernet 1 (SCA), Ethernet 3 (SCA), Ethernet 1 (SCB) & Ethernet 3 (SCB). Also logging into the GUI for my switches (MikroTik CRS309-1G-8S+IN) I can see traffic on the ports for both controllers. Although one controller is preferred over the other they are both active.
Switch 1.png
Switch 2.png
You do not have the required permissions to view the files attached to this post.

User avatar
storageman
Ask me anything
Posts: 5511
Joined: Thu Sep 22, 2011 10:57 pm

Re: ES1640DC v2 Failover Speed

Post by storageman » Wed Dec 04, 2019 11:32 pm

Yes they may be active but that doesn't mean much, SCB is only active for failover not for read/write traffic.
What ports should I look at for the SCB ports?
If you are achieving this I need to speak to Qnap because this would be news to me!

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Wed Dec 04, 2019 11:39 pm

They aren't named great in the switch (I should change them) but NASA is is SCA and NASB is SCB. It appears to be sending more traffic to SCB even though SCA is the active controller.

User avatar
storageman
Ask me anything
Posts: 5511
Joined: Thu Sep 22, 2011 10:57 pm

Re: ES1640DC v2 Failover Speed

Post by storageman » Wed Dec 04, 2019 11:44 pm

And have you only one pool?

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Wed Dec 04, 2019 11:47 pm

storageman wrote:
Wed Dec 04, 2019 11:44 pm
And have you only one pool?
Yep, only one pool. Since we're using spinning rust that was the decision made to get the most IOPS possible (also why we did RAID 10).

User avatar
storageman
Ask me anything
Posts: 5511
Joined: Thu Sep 22, 2011 10:57 pm

Re: ES1640DC v2 Failover Speed

Post by storageman » Wed Dec 04, 2019 11:56 pm

Hit it with a lot of traffic report back.
If it was multipathing correctly the traffic should be fairly equal across both controller ports and they aren't.
I think one side will only be "are you there" traffic.
What does storage space say on controller assignment "pool1 SCA"?

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Thu Dec 05, 2019 12:01 am

Pool tank is managed by SCA (tank is a hold over from when I used FreeNAS & NAS4Free). Nothing on SCB.

So far our main two servers are connected via iSCSI and I'm starting to migrate VMs. Third server isn't playing nice, but still only has a single DAC in place so its not a huge deal (internal storage for now).

User avatar
storageman
Ask me anything
Posts: 5511
Joined: Thu Sep 22, 2011 10:57 pm

Re: ES1640DC v2 Failover Speed

Post by storageman » Thu Dec 05, 2019 12:05 am

Then your naming must be wrong surely? Most traffic should be on SCA.
If you've got it mulitpathing across both controllers I'll eat my (virtual) hat.

Technical differences between approaches is below:
ActiveActive ALUA.jpg
Qnap ES1640DC/1686 are ALUA.
You do not have the required permissions to view the files attached to this post.
Last edited by storageman on Thu Dec 05, 2019 12:10 am, edited 1 time in total.

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Thu Dec 05, 2019 12:07 am

Its possible. But the native interface shows the same thing.

smccloud
Starting out
Posts: 21
Joined: Sat Aug 24, 2019 2:37 am

Re: ES1640DC v2 Failover Speed

Post by smccloud » Thu Dec 05, 2019 12:48 am

Well, I just figured out what I had wrong with our third server. Its connected to an IP on SCB and not SCA but still working fine. From what I've read it shouldn't be working, but it is.

Post Reply

Return to “QES Operating System (QNAP Enterprise Storage OS)”