Home Top Stories The Look of Next-Generation Decision Support Technology in High-Performance Database Appliances
Tuesday January 06, 2009

The Look of Next-Generation Decision Support Technology in High-Performance Database Appliances

For IT pros in financial services, telecommunications, corporate data services, and retail, keeping up with "the speed of business" is becoming increasingly tougher and more expensive. The information demands from marketing, risk management, fraud prevention and financial modeling groups are often not being met with the existing computing infrastructure. Performance is falling short while IT budgets are simultaneously being constrained.

At the same time data complexity is rapidly rising and fueling wider, deeper analysis, which is essential to developing predictive models that enable preemptive business strategies. What is needed for next-generation decision support computing are standards-compliant systems that provide significantly more computing power per dollar.  An attractive system would perhaps offer performance in the range of 1TB/min of sustained SQL processing, while using substantially less energy and offering a roadmap to next-generation technology.

For IT pros to effectively catch up with business intelligence processing requirements in the near future, they will require an entirely new category of standardized high-performance database appliances, running full-table scans and ad-hoc queries up to two orders of magnitude faster than anything currently available. This article explains why new system capabilities like these are needed to provide what's becoming essential in business information and what they might look like.

What is the definition of a high-performance database appliance?

First, a definition of what such an appliance is not. It is not a repackaging of software that one can buy stand-alone onto a hardware configuration that one can also buy separately. In short, if I can buy and build it myself, it is not an appliance. Secondly, “appliance” implies ease of use and plug-and-play. It can have a “tuning” knob or two, but the true “appliance experience” should be “ready-to-use out-of-the box”. You shouldn’t need an army of consultants to get it up and running.

Why full-table scans and ad-hoc features in a high-performance appliance?

Every day thousands of business analysis and marketing executives go to work, get their coffee, and sit down to a wall of dashboards, reports, and pre-canned business intelligence results that give them the vision they need to run their businesses. These reports comprise a rear-view mirror of business operations and are essential to keeping things running smoothly. However these reports provide little value for non-operational, long-term strategic planning. To define a corporation’s competitive strategy, analysts need to turn to ad-hoc custom queries that will help them discover new data points that will lead to actionable strategies. 

“If your task is to find all the needles in a haystack -- you cannot do it by taking small samples of the haystack -- you have to sift through the entire stack!” Quote courtesy of Data Mining & BI researcher, Prof. Alok Choudhary, Electrical Engineering and Kellogg School of Management, Northwestern University.

Without the proper infrastructure, this task can be very expensive and slow, limiting your results and therefore strategy going forward. This is unacceptable in tomorrow’s competitive business climate.

Why build a very large database (VLDB) analytics appliance?

There are two simple answers: First, data processing loads are severely straining data centers. Second, it is “time for a new architecture”. The past fifteen years show evidence of evolution not revolution, and these solutions have failed to solve these two  important business challenges.

“Burning” issues in the data center:

“… IBM Fellow Bernard Meyerson told the crowd at the Hot Chips conference yesterday that he expects a power crisis of sorts to occur in the server market come 2007. That's when the overall cost of powering and cooling all the servers in the US will outpace the amount of money spent on new server sales.…,” The Register, 23 Aug 2006

“ … more than 80% of data centers are already constrained by electrical power, physical space, or cooling capacity.  Simply adding more of the same kinds of systems is clearly no solution, …,” Sun Whitepaper on Throughput Computing, Nov 2005.

Moore’s Law vs. “Greg’s Law”:

There is a class of emerging “Redshift” applications where the growth rate of data processing volumes outpaces the growth rate of CPU computing power (paraphrase of “Redshift” theory from Greg Papadopoulos, CTO of Sun Microsystems, 2007).

The rate of data volume growth is evident in metrics such as disk drive sales. IDC forecasts an annual growth rate of 60 percent in the number of terabytes of external disk capacity shipped. This is faster than the growth rate in CPU power (if measured in terms of transistor density) predicted by Moore’s Law, roughly 42 percent annually. 

Michael Stonebraker, a well-known expert in the database field, recently published several papers on the state of the art.  The title of his September 2007 paper at the VLDB ’07 conference is itself telling: “The End of an Architectural Era (It’s Time for a Complete Rewrite)” – see below.

“…These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1-2 orders of magnitude by specialized engines in the data warehouse, stream processing, text and scientific database markets,” Stonebraker, et al, VLDB ’07, September 23-28 2007, Vienna, Austria.  Copyright 2007 VLDB Endowment, ACM 978-1-59593-649-3/07/09.

How do I lower power?

It is generally accepted by the High Performance Computing (HPC) market that horizontal scaling is no longer the answer to increase performance, mainly because of the power and cooling concerns of 1000’s of CPUs working in parallel. Today, many of largest “Super Computers” take advantage of custom accelerators, like FPGAs, GPGPUs, and the Cell, in one way or another. Can the IT and DSS markets learn any lessons from the HPC market?  The answer is a resounding YES but only if accelerators can be abstracted from the IT user who does not know or care knowing anything about how to use them.  If a ready-to-use system can provide acceleration and lower power “under the hood”, then the benefits can be extraordinary. Lower power, superior performance, and true scalability can all become real by finding the appliance that includes the proper mix to storage, CPU, software, and acceleration. 

Approaching system re-architecture?

The time is ripe for re-thinking system architectures. The key question is how to approach the problem and converge to the “right” solution?  The solution should be a flexible, modularly-scalable system architecture that can ride the technology roadmap curves and provide a sustainable price/performance advantage over legacy solutions well into the future.   

One of today’s trends is away from “big iron”: tightly-coupled, proprietary Symmetric Multi Processing (SMP) Unix boxes towards “commodity” x86, loosely-coupled Massive Multi Processing (MPP) Linux clusters. Another of the major industry trends is the “burning” importance of power consumption in the data centers. With the constraints of power supply and cooling severely limiting scale-up, it is the right time to explore co-processors as CPU accelerators in markets outside of high performance computing.

The best computing choices available today are x86 CPUs coupled to Field Programmable Gate Array (FPGA)-based accelerators. FPGAs have several key benefits. They are very power efficient. They outperform CPUs on the performance/watt metric by two or three orders of magnitude. They follow the CPU’s semiconductor process technology very closely, typically lagging by no more than 6-12 months, so all the “Moore’s Law” benefits accrue to FPGAs. They are in-system re-configurable. Unlike fixed architecture accelerators, FPGAs can be instantly reloaded to optimally match application requirements.

It is generally accepted today that scale-up in a huge way can only be achieved by the loosely-coupled MPP approach. This scale-up philosophy with the insertion of “acceleration” (FPGA) in the future, will be leveraged in all next-generation data warehouse architectures.  The benefits of price/performance, performance/watt, “under the hood” acceleration, and ease-of-use will dominate the minds of database and appliance makers for years to come.

Target applications for the accelerated appliance

VLDB Analytics Appliances need to be designed with a sharp focus on accelerating the large time-consuming tasks in the decision support systems (DSS) marketplace. Attempting such broad acceleration of “all SQL” is futile. The real solution will tend to converge on the lowest-common denominator, a lower-performance generic, but broad solution, such is the approaches offered in “so-called” appliances available currently.

In the future, additional value will come solely from queries that include Full table scans with large multi-table Joins, Aggregations qualified with GroupBy / OrderBy clauses, and large Sorts. These constructs on tens to hundreds of terabytes of data are needed to effectively extract additional value and find the secrets in large data that were there in the haystack the whole time but inaccessible.

Systems like these will add maximum value to applications involving large database tables where substantially all of the data needs to be touched. Many commercial examples of such applications can be readily identified and include:

  • Predictive modeling for database marketing campaigns
  • Fraud detection in financial services industries
  • Customer risk profiling and profitability analysis in financial and telecom services industries
  • Logistics and inventory management in retail industries
  • Graph and page rank algorithms

It is an exciting time in the data warehousing market. As more and more solutions become available that leverage acceleration technologies “under the hood”, it will get even get more exciting.  Lower power, increased performance, and further ease of use will be the keys that lead the Accelerated VLDB Analytics Appliance revolution. Their greatly added ability for extracting value for businesses is something that can no longer be overlooked. So set your coffee down and get ready to “Discover the Undiscovered” – the future is closer than you may think.

Geno Valente is vice president at XtremeData Inc. in Schalumburg, IL. He has spent over 13 years helping support, sell, and market computing technology into markets including the financial services, bioinformatics, high performance computing, and WiMax/LTE sectors.

 

 

Computer Technology News

Our twice weekly email newsletter
Click here to see current issue or sign up below

Subscribe to CTN

Information Technology Jobs
Keywords:
Location:
Job category: