Our research group conducts experimental and theoretical studies of data layout organizations
and algorithms that can provide suitable run-time performance and reliability for specific data
storage applications and technology environments. The laboratory is directed by Professor
Walter A. Burkhard.
Computation often entails massive heterogeneous data sets involving text, audio, and video
streams. Storage, access and analysis of such large data sets will be an ongoing challenge due to
continuing increases in users demands as well as continuing changes in technology. Information
production is accelerating at a breakneck speed; by year 2000, information will double every three
years. Information storage facilities are becoming less expensive and more varied; larger primary,
secondary, and now tertiary storage domains are feasible. Currently approximately 95% of stored
information is recorded on paper, another 4% on microfiche, and only 1% is on-line. Cost versus
performance tradeoff studies will continue to be of interest as a larger fraction of the stored
information is placed on-line.
Traditional approaches to data storage will not suffice in this expanding arena; new storage
technologies, new access methods, new operating systems, and new data models will be necessary.
Application specific approaches are driving both research and development.
Beyond the usual performance issues, the preservation of data in spite of eventual component
failures is another key issue. As the size of the data sets increases, data loss can become an
unavoidable reality. Data replication provides the traditional remedy. Less expensive data storage
organizations are required. The informaton dispersal algorithm is one such approach. These data
reliability approaches have negative run-time performance consequences giving rise to additional
tradeoff studies.
Currently one specific storage application is MPEG encoded multimedia data. We have created
suitable data layouts that recognize the regular and real-time nature of these streams thereby
expeditiously utilizing disk storage; resulting servers are efficient under both fault-free and
degraded operation modes.