Tiering: Scale Up? Scale Out? Do Both

AddThis Social Bookmark Button

 

By Mark Ferelli

Editor’s Note: It’s hard to go anywhere in the trade press without the challenges of providing more resources per server, in both virtual and non-virtual enterprise environments. Computer Technology Review had the good fortune to connect with thought leader and storage visionary Hu Yoshida, CTO for Hitachi Data Systems. Our discussion follows.

MF/CTR: We’ll look at tiering issues in the data center I think if you don’t mind. When we’re actually implanting tiering, there seems to be two sensible ways to go. You can scale out to multiple nodes in the tier or you can scale up by adding more elements and functionality to a single node within the tier. What makes more sense in the data center from both an operational and a budgetary point of view?

Hu/HDS: You can do both in one system. That’s what we do in our universal storage platform. Normally when we talk about scaling up or scaling out we’re talking about the ability to provide more resources to one server demand. In other words, scale up hardware storage is primarily what we call monolithic storage. It has a global cache and you can add more cache modules to it, you can add more core processors and you can add back end processors but they all can be accessed as one pool of resources.

So as a VMware cluster, for instance, starts to spin out more and more machines,  they come to one file system, but that file system can access more and more resources in the scale up architecture.

In the scale out architecture you have generally two controller node modular systems that you loosely couple over some sort of switch. In the case of EMC it’s a rapid IO Switch.  With IBM XIB it’s an Ethernet switch and with V-series it’s an Ethernet switch.

The problem with scale out is that as you increase more load you’re coming into one node. You can’t share the resources in other nodes. So one node can be 10 percent busy another node can be 100 percent busy and you can’t cross them because those file systems come through one cache and just get the resources behind that cache module. So that’s the difference between a scale out and a scale up architecture in the storage.

Tiering is another element where you can have different costs tiers of storage. They can be in the same storage frame. So in a sense the processing power can access it but then you can also distribute the tiering behind a virtualization front end. Therefore,  if you have a virtualization front end like the USP, you can have the primary storage in the USP and then the lower cost storage on an external asset that you attach behind it. But with this approach, with the USP, you still have the very high performance consolidation on the front end. Is that clear?

MF/CTR: Yes, it certainly is. I’m just thinking when you’re setting up the tiering with both the scale up and scale out, you’ve got an unequal pursuit of workloads. You’re consolidating your server workload, but you need to consolidate it somewhere so that 10 percent usefulness hikes once you get passed the choke point of the cache. What are the best ways to optimize those server workloads? In your system you’re pooling data across all the tiers right?

Hu/HDS: Yes.

MF/CTR: So that addresses the scale up immediately… is it also going to meet the workload consolidation for service you’re going to need

Hu/HDS: Oh Yes. Right now the biggest consolidation play is with the virtual servers with the VMware clustering. All those physical servers that support maybe 100 virtual machines are all coming down to one file system. One VM file system. And all these virtual disks from these machines are just files within that file share. So that one file system right now comes through some storage boards to one cache image so it’s in one storage system. You may have another storage node sitting next to it, but if it is not tightly coupled through a global cache you can’t access it.

In other words, if you’re loosely coupling across some external switch, that doesn’t help you because that cache is where the choke point is. The only way to break that choke point is to have a monolithic architecture where you can tightly couple through that same cache and access all the resources attached behind those caches.

MF/CTR: A global cache should be required to implement that common pool that you’re talking about going across all tiers.

Hu/HDS: Right. So that is needed to get to all the resources. Now with our system, we can also attach external storage behind it for capacity so that as the storage ages out or becomes less active we can move it to the external storage. They would certainly have less performance capability, but you don’t need it for the stale data that we’re aging down. Right now we’re the only vendor that can provide this type of tiering.

If you look at other people who do virtualization like IBM’s SVC, the SVC has no storage within it because it’s sitting so if you attach any storage behind it, there is some performance degradation because you have this appliance sitting in front. That appliance is also very limited in cache and very limited in the number of storage boards on it. It cannot really provide you additional performance than what is attached behind it. And in fact, it will always degrade what is attached behind it because it’s not storage, where as our USP is storage and it has a global cache and it has thousands of port connections, load balancing and all that. So our front end can enhance existing storage that sits behind it.

MF/CTR: Which common file system makes sense

Hu/HDS: The biggest driver of consolidation is the virtual file VMSS that VMware uses because it supporting all these clusters.  On top of that, it’s doing things like moving applications across these physical servers. It’s now supporting virtual desktops where you get these boot storms at eight in the morning when everybody signs in. It’s doing a lot of copies and moves and when you do all that, that file system becomes a choke point. On virtual machines, out of 100 virtual machines needs to do some sort of write to that files system so they should have the reserve and all the other virtual machines are locked out.

MF/CTR: No, I think you’re right. For writing operations you need to have at least one of those servers in contact with reality rather than being out in that virtual cloud.

Hu/HDS: One of the things that VMware is doing is that they’re realizing that’s a problem… so they’re providing API’s to the storage vendors. Instead of locking the whole file system, we only get an extend that they identify for us. So all those virtual machines can work together.

What that means, though, is that it opens up the flood gates and all of a sudden we have much more I/O coming to the storage system and that storage system needs to be able to scale up or else it’ll just run out of gas via choke point. 

Click here to read Part 2 of this feature.

 
Sign Up for Breaking News and Top Stories in the CTR+ Newsletter (enter email below)