The White House Office of Science and Technology Policy (OSTP), in concert with several Federal departments and agencies, launched a $200-million “Big Data Research and Development Initiative” to improve greatly the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.
The Big Data initiative is intended to: (1) advance state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data; (2) harness these technologies to accelerate the pace of discovery in science and engineering; and (3) expand the workforce needed to develop and use Big Data technologies.
In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.—Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy
The initiative responds to recommendations by the President’s Council of Advisors on Science and Technology, which last year concluded that the Federal Government is under-investing in technologies related to Big Data. In response, OSTP launched a Senior Steering Group on Big Data to coordinate and expand the Government’s investments in this critical area.
The first wave of agency commitments to support this initiative includes:
Department of Energy – Scientific Discovery Through Advanced Computing. The Department of Energy (DOE) will provide $25 million in funding to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute. Led by the Energy Department’s Lawrence Berkeley National Laboratory, the SDAV Institute deploy, and assist scientists in using, technical solutions addressing challenges in three areas:
Data Management: infrastructure that captures the data models used in science codes, efficiently moves, indexes, and compresses this data, enables query of scientific datasets, and provides the underpinnings of in situ data analysis.
Data Analysis: application-driven, architecture-aware techniques for performing in situ data analysis, filtering, and reduction to optimize downstream I/O and prepare for in-depth post-processing analysis and visualization.
Data Visualization: exploratory visualization techniques that support understanding ensembles of results, methods of quantifying uncertainty, and identifying and understanding features in multi-scale, multi-physics datasets.
SDAV is a collaboration tapping the expertise of researchers at six laboratories: Argonne, Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Sandia national laboratories and in seven universities: Georgia Tech, North Carolina State, Northwestern, Ohio State, Rutgers, the University of California at Davis, and the University of Utah. Kitware, a company that develops and supports specialized visualization software, is also a partner in the project. The team will build on their successes from the SciDAC Scientific Data Management (SDM) Center for Enabling Technologies, the Visualization and Analytics Center for Enabling Technologies (VACET), and the Institute for Ultra-Scale Visualization (UltraVis) and provide the tools and knowledge required to achieve breakthrough science in this data-rich era.
US Geological Survey – Big Data for Earth System Science. USGS is announcing the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The Center catalyzes innovative thinking in Earth system science by providing scientists a place and time for in-depth analysis, state-of-the-art computing capabilities, and collaborative tools invaluable for making sense of huge data sets. These Big Data projects will improve our understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.
National Science Foundation and the National Institutes of Health - Core Techniques and Technologies for Advancing Big Data Science & Engineering. “Big Data” is a new joint solicitation supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) that will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible.
NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease.
National Science Foundation. In addition to funding the Big Data solicitation, and keeping with its focus on basic research, NSF is implementing a comprehensive, longterm strategy that includes new methods to derive knowledge from data; infrastructure to manage, curate, and serve data to communities; and new approaches to education and workforce development.
Specifically, NSF is: (1) encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers; (2) funding a $100-million Expeditions in Computing project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information - machine learning, cloud computing, and crowd sourcing; (3) providing the first round of grants to support “EarthCube”—a system that will allow geoscientists to access, analyze and share information about our planet; (4) issuing a $2-million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data; (5) providing $1.4 million in support for a focused research group of statisticians and biologists to determine protein structures and biological pathways; and (6) convening researchers across disciplines to determine how Big Data can transform teaching and learning.
Department of Defense – Data to Decisions. The Department of Defense (DoD) is “placing a big bet on big data” investing approximately $250 million annually (with $60 million available for new research projects) across the Military Departments in a series of programs intended to: (1) harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own; (2) improve situational awareness to help warfighters and analysts and provide increased support to operations. The Department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.
To accelerate innovation in Big Data that meets these and other requirements, DoD will announce a series of open prize competitions over the next several months. In addition, the Defense Advanced Research Projects Agency (DARPA) is beginning the XDATA program, which intends to invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g., tabular, relational, categorical, meta-data) and unstructured (e.g., text documents, message traffic).Central challenges to be addressed include: (1) developing scalable algorithms for processing imperfect data in distributed data stores; and (2) creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions. The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.
National Institutes of Health – 1000 Genomes Project Data Available on Cloud. The National Institutes of Health is announcing that the world’s largest set of data on human genetic variation—produced by the international 1000 Genomes Project—is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes, the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publicly available data set for free and researchers only will pay for the computing services that they use.
OSTP was created by Congress in 1976 to serve as a source of scientific and technological analysis and judgment for the President with respect to major policies, plans, and programs of the Federal Government.