"Data warehouses, and the data centers that house them, require an enormous amount of power, both to run legions of servers and to cool them... In 2007, the Environmental Protection Agency revealed that data centers account for 61 billion kilowatt-hours of electricity annually and cost $4.5 billion a year."
Not only is there an area of opportunity here for money to be saved in data mining, there is an opportunity to get on board the recent "going green" trend a lot of consumers are expecting businesses to be on. There has been little change to Data Warehousing in the last years and it seems now is the time for that change. With the addition of e-commerce and e-retailing business now more so than ever are having a greater demand for data collection and analyzing.
Open Source Software the the rescue!
"How does data warehousing address the challenges involved in "going green"? The answer is a combination of new database technology designed for analysis of massive quantities of data, with open source software that leverages commodity low-cost, energy-efficient software and hardware. Together they reduce the need for expensive hardware infrastructure and the energy required to power it."
These databases enable efficient data compression because each column stores a single data type (as opposed to rows that typically contain several data types). This allows compression to be optimized for each data type, significantly reducing the amount of storage needed for the database. Column orientation also greatly accelerates query processing, which significantly increases the number of data warehouse transactions a server can process.
There are a variety of column-oriented solutions on the market. Some explode and duplicate the data and require as large a hardware footprint as traditional row-based systems. Others, however, have combined the column basis with other technologies, eliminating the need for data duplication and massive hardware footprints. What this ultimately means is that users don't need as many servers or as much storage to analyze the same volume of data. In fact, these column-oriented databases can achieve compression ranging from 10:1 (a 10 TB database becomes a 1 TB database) to more than 40:1 depending on the data. With this level of compression, a distributed server environment can be reduced by a factor of 20-50 times and be brought down to a single box, significantly slashing heat, power consumption, and carbon emissions. Open source products, specifically designed to serve a broad community of users, take this a step further as they do not require proprietary hardware or specialized appliances. This offers open source users the ability to leverage simple, lower-cost commodity servers and reduce their hardware footprint. Open source software such as Linux can also extend the life of hardware components by allowing older servers to be seamlessly integrated into a single virtual machine. This keeps older servers out of landfills and reduces the demand for new machines to be built.
Cost, image, and regulatory concerns are compelling more businesses to explore how they can make their operations and their IT infrastructure greener. At the same time, many of these same organizations are struggling to keep up with overwhelming data access and management requirements that are burning through energy and resources.
The combination of new database technology with open source applications solves both of these challenges.
No comments:
Post a Comment