OCC Announces Tukey – An Integrated Set of Cloud Services for Working with Big Data
April 4, 2012
Today, the Open Cloud Consortium (OCC) announced the availability of Tukey, which is an innovative integrated set of cloud services designed specifically to enable scientific researchers to manage, analyze and make discoveries with big data.
Several public cloud service providers provide resources for individual scientists and small research groups, and large research groups can build their own dedicated infrastructure for big data. However,currently, there is no cloud service provider that is focused on providing services to projects that must work with big data, but are not large enough to build their own dedicated clouds.
Tukey is the first set of integrated cloud services to fill this niche.
Tukey was developed by the Open Cloud Consortium, a not-for-profit multi-organizational partnership. Many scientific projects are more comfortable hosting their data with a not-for-profit organization than with a commercial cloud service provider.
Cloud Service Providers (CSP) that are focused on meeting the needs of the research community are beginning to be called Science Cloud Service Providers or Sci CSPs (pronounced psi-sip). Cloud Service Providers serving the scientific community must support the long term archiving of data, large data flows so that large datasets can be easily imported and exported, parallel processing frameworks for analyzing large datasets, and high end computing.
“The Open Cloud Consortium is one of the first examples of an innovative resource that is being called a Science Cloud Service Provider or Sci CSP,” says Robert Grossman, Director of the Open Cloud Consortium. “Tukey makes it easy for scientific research projects to manage, analyze and share big data, something this is quite difficult to do with the services from commercial Cloud Service Providers.”
The beta version of Tukey is being used by several research projects, including: the Matsu Project, which hosts over two years of data from NASA’s EO-1 satellite; Bionimbus, which is a system for managing, analyzing, and sharing large genomic datasets; and bookworm, which is an applications that extracts patterns from large collections of books.
The services include: hosting large public scientific datasets; standard installations of the open source OpenStack and Eucalyptus systems, which provide instant on demand computing infrastructure; standard installations of the open source Hadoop system, which is the most popular platform for processing big data; standard installations of UDT, which is a protocol for transporting large datasets; and a variety of domain specific applications.
Tukey has a direct 10 Gbps connection to StarLight, an advanced national and international communications exchange facility, which in turn connects to dozens of high performance research networks around the nation and the globe. “Tukey enables scientists to share their big datasets with researchers around the country and the world,” says Joe Mambretti, Director, International Center for Advanced Internet Research (iCAIR) at Northwestern University.
About the Open Cloud Consortium
The Open Cloud Consortium (OCC) is not for profit that manages and operates cloud computing infrastructure to support scientific, medical, health care, and environmental research. The Open Cloud Consortium is a consortium managed by the Center for Computational Science Research, Inc., which is an Illinois based 501(c)(3) not-for-profit corporation. (http://www.opencloudconsortium.org
Tukey is named after the American scientist John Wilder Tukey (1915 - 2000), who made a number of fundamental contributions to statistics. He helped popularize exploratory data analysis, which is an important technique when working with big data. He also introduced the term “bit.”
StarLight is the world's most advanced national and international communications exchange facility. StarLight provides advanced networking services and technologies that are optimized for high-performance, large-scale metro, regional, national and global applications, especially for data intensive research science communities. (http://www.startap.net/starlight