<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="part2stratml.xsl"?>
<PerformancePlanOrReport xmlns="urn:ISO:std:iso:17469:tech:xsd:PerformancePlanOrReport" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

 xsi:schemaLocation="urn:ISO:std:iso:17469:tech:xsd:PerformancePlanOrReport http://stratml.us/references/PerformancePlanOrReport20160216.xsd" Type="Strategic_Plan"><Name>NIH STRATEGIC PLAN FOR DATA SCIENCE</Name><Description>This document, the NIH Strategic Plan for Data Science describes NIH's Overarching Goals, Strategic
Objectives, and Implementation Tactics for modernizing the NIH-funded biomedical data-resource
ecosystem.</Description><OtherInformation>In establishing this plan, NIH addresses storing data efficiently and securely; making data
usable to as many people as possible (including researchers, institutions, and the public); developing a
research workforce poised to capitalize on advances in data science and information technology; and
setting policies for productive, efficient, secure, and ethical data use. As articulated herein, this strategic
plan commits to ensuring that all data-science activities and products supported by the agency adhere
to the FAIR principles, meaning that data be Findable, Accessible, Interoperable, and Reusable ...</OtherInformation><StrategicPlanCore><Organization><Name>National Institutes of Health</Name><Acronym>NIH</Acronym><Identifier>_8c9e7852-57cc-11df-8407-a5537a64ea2a</Identifier><Description/><Stakeholder StakeholderTypeType="Organization"><Name>NIH Science Data Council</Name><Description>The NIH Strategic Plan for Data Science was conceived by the NIH Science Data Council, with input from the NIH Data Science Policy Council, HHS, scientists, policymakers, scientific and professional societies, the general public, and IC and NIH leadership and staff.</Description></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>NIH Data Science Policy Council</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Organization"><Name>HHS</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Scientists</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Policymakers</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Scientific Societies</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Professional Societies</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>The General Public</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>NIH Leaders</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>NIH Staff</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>RFI Respondents</Name><Description>This process was augmented by analysis of comments from a public Request for Information about the draft Strategic Plan. Results of that analysis revealed that 822 unique comments were contributed from the following responder types:</Description></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Academic Institutions</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Advocacy Groups</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Government Agencies</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Health Professionals</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Members of the Public</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Patient Community</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Private Sector</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Professional Societies</Name><Description/></Stakeholder><Stakeholder StakeholderTypeType="Generic_Group"><Name>Scientific Research Organizations</Name><Description/></Stakeholder></Organization><Vision><Description>All data-science activities and products supported by the agency adhere to the FAIR principles</Description><Identifier>_1f7e95e2-2c1e-11ee-9e43-679b1183ea00</Identifier></Vision><Mission><Description>To modernize the NIH-funded biomedical data-resource ecosystem</Description><Identifier>_1f7e9808-2c1e-11ee-9e43-679b1183ea00</Identifier></Mission><Value><Name>Findability</Name><Description>To be Findable, data must have unique identifiers, effectively labeling it within searchable resources.</Description></Value><Value><Name>Accessibility</Name><Description>To be Accessible, data must be easily retrievable via open systems and effective and secure authentication and authorization procedures.</Description></Value><Value><Name>Interoperability</Name><Description>To be Interoperable, data should "use and speak the same language" via use of standardized vocabularies.</Description></Value><Value><Name>Reusability</Name><Description>To be Reusable, data must be adequately described to a new user, have clear information about data-usage licenses, and have a traceable "owner's manual," or provenance.</Description></Value><Value><Name>Adaptability</Name><Description>Recognizing the rapid course of evolution of data science and technology, this plan maps a general path for the next five years but is intended to be as nimble as possible to adjust to undiscovered concepts and products derived from current investments from NIH and elsewhere in the public and private sectors. Frequent course corrections are likely based upon the needs of NIH and its stakeholders and on new opportunities that arise because of the development of new technologies and platforms. </Description></Value><Goal><Name>Infrastructure</Name><Description>Support a Highly Efficient and Effective Biomedical Research Data Infrastructure</Description><Identifier>_1f7e995c-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>NIH ICs routinely support intramural and extramural research projects that generate tremendous amounts of biomedical data. Regardless of format, all types of data require hardware, architecture, and platforms to capture, organize, store, allow access to, and perform computations. As projects mature, data have traditionally been stored and made available to the broader community via public repositories or at data generators' or data aggregators' local institutions. This model has become strained as the number of data-intensive projects -- and the amount of data generated for each project -- continues to grow rapidly. </OtherInformation><Objective><Name>Data Storage &amp; Security</Name><Description>Optimize Data Storage and Security</Description><Identifier>_1f7e9b0a-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>1.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Large-scale cloud-computing platforms are shared environments for data storage, access, and computing. They rely on using distributed data-storage resources for accessibility and economy of scale -- similar conceptually to storage and distribution of utilities like electricity and water. Cloud environments thus have the potential to streamline NIH data use by allowing rapid and seamless access, as well as to improve efficiencies by minimizing infrastructure and maintenance costs. NIH will leverage what is available in the private sector, either through strategic partnerships or procurement, to create a workable Platform as a Service (PaaS) environment. Using unique approaches enabled by the 21st Century Cures Act (such as "Other Transactions Authority"), NIH will partner with cloud-service providers for cloud storage, computational, and related infrastructure services needed to facilitate the deposit, storage, and access to large, high-value NIH datasets ... NIH will evaluate which of these approaches are useful as we enter the implementation phase of this strategic plan.
^^
These negotiations may result in partnership agreements with top infrastructure providers from U.S.-based companies whose focus includes support for research. Suitable cloud environments will house diverse data types and high-value datasets created with public funds. NIH will ensure that they are
stable and adhere to stringent security requirements and applicable law, to protect against data compromise or loss. NIH will also comply with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Security Rule and National Institute of Standards and Technology (NIST) healthinformation security standards. NIH's cloud-marketplace initiative will be the first step in a phased
operational framework that establishes a SaaS paradigm for NIH and its stakeholders. 
^^
Implementation Tactics:
• Leverage existing federal, academic, and commercial computer systems for data storage and analysis.
• Adopt and adapt emerging and specialized technologies ...
• Support technical and infrastructure needs for data security, authorization of use, and unique identifiers to index and locate data. 
^^
More than 3,000 different groups and individuals submit data via NCBI systems daily. Among these are genome sequences from humans and research organisms; gene-expression data; chemical structures and properties, including safety and toxicity data; information about clinical trials and their results; data on genotype-phenotype correlations; and others. Beyond NIH-funded scientists and research centers, many other individuals and groups contribute data to the biomedical research data ecosystem, including other federal agencies, publishers, state public-health laboratories, genetic-testing laboratories, and biotech and pharmaceutical companies. NIH will develop strategies to link high-value NIH data systems, building a framework to ensure they can be used together rather than existing as isolated data silos ... A key goal is to promote expanded data sharing to benefit not only biomedical researchers but also policymakers, funding agencies, professional organizations, and the public. 
^^
Implementation Tactics:
• Link the NIH Data Commons (see text box, above) and existing, widely-used NIH databases/data repositories using NCBI as a coordinating hub.
• Ensure that new NIH data resources are connected to other NIH systems upon implementation.
• When appropriate, develop connections to non-NIH data resources.</OtherInformation></Objective></Goal><Goal><Name>Modernization</Name><Description>Promote Modernization of the Data-Resources Ecosystem</Description><Identifier>_1f7ea10e-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>The current biomedical data-resource ecosystem is challenged by a number of organizational problems that create significant inefficiencies for researchers, their institutions, funders, and the public. For example, from 2007 to 2016, NIH ICs used dozens of different funding strategies to support data resources, most of them linked to research-grant mechanisms that prioritized innovation and hypothesis testing over user service, utility, access, or efficiency. In addition, although the need for open and efficient data sharing is clear, where to store and access datasets generated by individual laboratories -- and how to make them compliant with FAIR principles -- is not yet straightforward. Overall, it is critical that the data-resource ecosystem become seamlessly integrated such that different data types and information about different organisms or diseases can be used easily together rather than existing in separate data "silos" with only local utility. Wherever possible, NIH will coordinate and collaborate with
other federal, private, and international funding agencies and organizations to promote economies of scale and synergies and prevent unnecessary duplication.</OtherInformation><Objective><Name>Data Repository Ecosystem</Name><Description>Modernize the Data Repository Ecosystem</Description><Identifier>_1f7ea280-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>2.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>To promote modernization of the data-repository ecosystem, NIH will refocus its funding priorities on the utility, user service, accessibility, and efficiency of operation of repositories ... Wherever possible, data repositories should be integrated and contain harmonized data for all related organisms, systems, or conditions, allowing for seamless comparison. To improve evaluation of data-repository utility, and allow those who run them to focus on the particular
goals they need to achieve to best support the research community and operate as efficiently as possible, NIH will distinguish between databases and knowledgebases ... and will support each separately from one another as well as from the development and dissemination of tools used to analyze data (see Goal 3 for NIH's proposed new strategies for tool development). Although a grey area does exist between databases and knowledgebases, and some data types currently appropriate for a knowledgebase may eventually harden and become core data more appropriate for a database, this distinction will allow improved focus and coherence in the support and operation of modern data resources. It should be noted that the same NIH-funded group could be supported to perform multiple functions -- for example, to run a knowledgebase (or database) and also to develop tools to be used on the information in that knowledgebase or database -- but these proposed activities should be evaluated and funded separately to ensure that each is providing the best possible value and service to the research community and that the existence of essential resources is not tied to functions that may not be as useful. Funding approaches used for databases and knowledgebases will be appropriate for resources and focus on user service, utility, interoperability, and operational efficiency rather than on research-project goals. As such, NIH will establish procedures and metrics to monitor data usage and impact -- including usage patterns within individual datasets. The above principles and approaches are well aligned with those
being implemented by the European inter-governmental data-resource coordinating organization ELIXIR.
^^
Implementation Tactics:
• Separate the support of databases and knowledgebases (see text box, above).
• Use appropriate and separate funding strategies, review criteria, and management for each repository type.
• Dynamically measure data use, utility, and modification.
• Ensure privacy and security.
• Create unified, efficient, and secure authorization of access to sensitive data.
• Employ explicit evaluation, lifecycle, sustainability, and sunsetting expectations (where appropriate) for data resources. </OtherInformation></Objective><Objective><Name>Storage &amp; Sharing</Name><Description>Support the Storage and Sharing of Individual Datasets</Description><Identifier>_1f7ea3fc-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>2.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Currently, most datasets generated by biomedical researchers are small-scale datasets produced by individual laboratories. In contrast, large, organized consortia, including various programs specific to NIH ICs and the NIH Common Fund, generate large, high-value datasets that are relatively small in numbers but used by thousands of researchers. Whereas these large datasets generally reside in dedicated data resources, a current dilemma for NIH is determining how to store and make accessible all of the smaller datasets from individual laboratories. NIH will create an environment in which individual laboratories can link datasets, through intuitive and user-friendly interfaces, to publications in the NCBI's PubMed Central publication database. Part of that effort includes development of the NIH Data Commons Pilot, the first trans-NIH effort to create a shared, cloud-based environment for data storage, access, and computation.
^^
Implementation Tactics:
• Link datasets to publications via PubMed Central and NCBI.
• Longer-term: Expand NIH Data Commons to allow submission, open sharing, and indexing of individual, FAIR datasets. </OtherInformation></Objective><Objective><Name>Clinical &amp; Observational Data</Name><Description>Leverage Ongoing Initiatives to Better Integrate Clinical and Observational Data
into Biomedical Data Science</Description><Identifier>_1f7ea62c-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>2.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>NIH has several large-scale, ongoing efforts that are building high-value resources that include clinical and observational data from individual volunteers and patients ... These efforts also include the All of Us Research Program and Cancer MoonshotSM, the National Heart, Lung and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program, the Environmental Influences on Child Health Outcomes (ECHO) program, and various datasets generated by scientists conducting research in the NIH Clinical Center ... Other federal datasets are likely to be invaluable for research discovery, such as the Million Veteran Program and other data from the Veteran's Health Administration, the nation’s largest integrated health system. NIH will leverage these and other related initiatives to integrate the patient health data they contain into the biomedical data-science ecosystem in ways that maintain security and confidentiality and are consistent with informed consent, applicable laws, and high standards for ethical conduct of research ... NIH will collaborate with the Office of the National Coordinator for Health Information Technology (ONC), within the Department of Health and Human Services, which leads national health information technology efforts ... The NIH Clinical Center has in place
the Biomedical Translational Research Information System (BTRIS), which is a resource available to the NIH intramural community that brings together clinical research data from the Clinical Center and other NIH ICs. BTRIS provides clinical investigators and translational informaticists with access to identifiable data for subjects on their own active protocols, while providing all NIH investigators with access to data without personal identifiers across all protocols. Additionally, NIH encourages researchers to use common data elements, or CDEs, which helps improve accuracy, consistency, and interoperability among data sets within various areas of health and disease research.
^^
Implementation Tactics:
• Create efficient linkages among NIH data resources that contain clinical and observational information.
• Develop and implement universal credentialing protocols and user-authorization systems to enforce a broad range of access and patient-consent policies across NIH data resources and platforms.
• Promote use of the NIH Common Data Elements Repository. </OtherInformation></Objective></Goal><Goal><Name>Tools</Name><Description>Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools</Description><Identifier>_1f7ea7bc-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Extracting understanding from large-scale or complex biomedical research data requires algorithms, software, models, statistics, visualization tools, and other advanced approaches such as machine learning, deep learning, and artificial intelligence ... Accomplishing NIH's goal of optimizing the biomedical data-science ecosystem requires prioritizing development and dissemination of accessible and efficient methods for advanced data management, analysis, and visualization. To make the best tools available to the research community, NIH will leverage existing, vibrant tool-sharing systems to help establish a more competitive "marketplace" for tool developers and providers than currently exists, striving for wide availability and limited or no costs for users. By separating the evaluation and funding for tool development and dissemination from support for databases and knowledgebases, innovative new tools and methods should rapidly overtake and supplant older, obsolete ones. The goal of creating a more competitive marketplace, in which open-source programs, workflows, and other applications can be provided directly to users, could also allow direct linkages to key data resources for real-time data analysis. </OtherInformation><Objective><Name>Tools &amp; Workflows</Name><Description>Support Useful, Generalizable, and Accessible Tools and Workflows</Description><Identifier>_1f7ea974-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>3.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Historically, because data resources have generally been funded through NIH research grants, applicants have emphasized development of new tools in order to meet innovation expectations associated with conducting research. This strategy can shift the focus of data resources away from their core function of providing reliable and efficient access to high-quality data. In addition, coupling review and funding of data resources to tool development can inhibit the type of open competition among developers that allows support of the most innovative and useful tools ... To address these concerns, NIH will evaluate and fund tool development and access separately from support of databases and knowledgebases, although tools that are necessary for data intake, integration, management, access, or QA/QC could still be incorporated into database and
knowledgebase funding. NIH will also promote the establishment of environments in which high-quality, open-source data management, analytics, and visualization tools can be obtained and used directly with data in the NIH Data Commons and/or other cloud environments. A key step will be leveraging through partnerships, grants, or procurement expertise in systems integration/engineering to refine and harden tools from academia to improve software design, usability, performance, security, and efficiency. The use of Small Business Innovation Research or Small Business Technology Transfer (SBIR/STTR) grants might provide a useful mechanism for bringing into the biomedical data science ecosystem expertise from systems integrators and software engineers.
^^
Implementation Tactics:
• Separate support for tool development from support for databases and knowledgebases.
• Use appropriate funding mechanisms, scientific review, and management for tool development.
• Establish programs to allow systems integrators/engineers from the private sector to refine and optimize prototype tools and algorithms developed in academia to make them efficient, costeffective, and widely useful for biomedical research.
• Employ a range of incentives to promote data-science and tool innovation including "code-athons," challenges, public-private partnerships, and other approaches.</OtherInformation></Objective><Objective><Name>Specialized Tools</Name><Description>Broaden Utility, Usability, and Accessibility of Specialized Tools</Description><Identifier>_1f7eab04-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>3.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>An important opportunity and challenge is to adopt for use in biomedical research tools that have been developed by fields outside of the biomedical sciences. For example, the same software used by NASA scientists to determine the depths of lakes from space is being tested for use in medical-image analysis for mammography, X Rays, computerized tomography (CT) and magnetic resonance imaging (MRI) scans, as well as for ultrasound measurements.  Specialized tools developed for one subfield of biomedical research might also be adopted for different purposes by researchers in other areas.
^^
It will also be important to develop and adopt better tools, and standards, for collecting and efficiently assimilating data from disparate and dynamic sources that combine to inform us about the health of individuals and populations. Novel data-science algorithms will likely create new knowledge and innovative solutions relevant to health disparities and disease prevention. New approaches and tools have the potential to transform data combined from various structured and unstructured sources into actionable information that can be used to identify needs, provide services, and predict and prevent poor outcomes in vulnerable populations. Metadata, which is information about data such as its content, context, location, and format, affects the ability of data to be found and used/re-used, and thus will be important in these efforts. Data provenance, or "version control" of data is also an important consideration for both data generators and data users. Toward identifying and stratifying risk, employing standardized data formats and vocabularies should advance our understanding of relationships between demographic information, social determinants of health, and health outcomes.
^^
One especially ripe opportunity for use in community-based research is broader use of rapidly evolving mobile-device technologies and information-sharing platforms. These resources, such as wearable devices, can capture a wide variety of health and lifestyle-related information including geospatial and biometric data and patient-reported outcomes from individual volunteers that could help transform our understanding of both normal human biology and disease states.
^^
Finally, there is a critical need for better methods to mine the wealth of data available in electronic health records ... These records present great opportunities for advancing medical research and improving human health -- particularly in the area of precision medicine -- but they also pose tremendous challenges ... For example, patient confidentiality must be assured, and the level of access granted by each individual to researchers has to be obtained, recorded, obeyed, and enforced, in accordance with HIPAA and NIST standards. Equally challenging is the fact that electronic health records are controlled by thousands of different hospitals and other organizations using dozens of different commercial computer platforms that do not always share a uniform language or data standards. Because of these challenges, NIH will support additional research to find better ways to allow clinical data to be used securely, ethically, and legally, to advance medicine.  NIH will also work with other federal and state agencies, private healthcare and insurance providers, and patient advocacy groups to find more efficient paths to realize the promise of electronic health records and other clinical data for medical research. 
^^
Implementation Tactics:
• Adopt and adapt emerging and specialized methods, algorithms, tools, software, and workflows.
• Promote innovative contributions to biomedical data science from allied fields such as mathematics, statistics, computer science, engineering, and physics ...
• Promote development and adoption of better mobile-device and data-interface tools through APIs that integrate with certified health information technology to pull data and support data analysis.
• Support research to develop improved methods for clinical informaticists and other scientists to use certified electronic health records and other clinical data securely and ethically for medical research.</OtherInformation></Objective><Objective><Name>Discovery &amp; Cataloging</Name><Description>Improve Discovery and Cataloging Resources</Description><Identifier>_1f7eaca8-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>3.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Data that is not easily located is likely to be underused and of little value to the broader research community. NIH has invested in developing resources, such as the Data Discovery Index that is part of the NIH Data Commons pilot, to enable data reuse. Such a resource will exceed a mere cataloging function and will also contain platforms and tools by which biomedical researchers can find and reuse data and that will support data-citation metrics. NIH will continue to invest in development of improved approaches for making data findable and accessible. For example, the NIH Data Commons pilot is creating search and analysis workspaces that support a broad range of authenticated users, and where users with all levels of expertise can access and interact with data and tools. Collaboration will be integral to this approach, which, in addition to continued research and development, must involve a community-driven process for identifying and implementing optimal standards to improve indexing, understandability, reuse, and citation of datasets. NIH will also leverage the U.S. Data Core for Interoperability effort, which addresses data standards and data provenance as part of its aim to enhance interoperability of health information data.
^^
Implementation Tactics:
• Promote community development and adoption of uniform standards for data indexing, citation,
and modification-tracking (data provenance). </OtherInformation></Objective></Goal><Goal><Name>Workforce</Name><Description>Enhance Workforce Development for Biomedical Data Science</Description><Identifier>_1f7eae74-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>4</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>NIH also considers it essential to equip the biomedical research workforce with tools to enhance datascience understanding and expertise. Innovative contributions to biology from computer science, mathematics, statistics, and other quantitative fields have facilitated the shift in biomedicine described throughout this document. NSF has been at the forefront of funding disciplines that contribute to data science, and thus NIH will collaborate on joint initiatives of mutual interest related to training and education of researchers at various career stages. NIH recognizes that data scientists perform far more than a support function, as data science has evolved to be an investigative domain in its own right.
^^
While NIH has supported quantitative training at various levels along the biomedical career path, more needs to be done to facilitate familiarity and expertise with data-science approaches and effective and
secure use of various types of biomedical research data. There is also a need to grow and diversify the pipeline of researchers developing new tools and analytic methods for broad use by the biomedical research community. Finally, data-science approaches will be essential for NIH to achieve the stewardship goals outlined in the NIH-wide strategic plan and are likely to facilitate the agency's ability to monitor demographic trends among its workforce and thus address diversity gaps ...</OtherInformation><Objective><Name>Enhancement</Name><Description>Enhance the NIH Data-Science Workforce</Description><Identifier>_1f7eb16c-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>4.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Given the importance of data science for biomedical research, NIH needs an internal workforce that is increasingly skilled in this area. This includes ensuring that NIH program and review staff who administer and manage grants and coordinate the evaluation of applications have sufficient experience with and knowledge of data science. To begin to address this need, NIH will develop training programs for its staff to improve their knowledge and skills in areas related to data science. In addition, NIH will recruit a cohort of data scientists and others with expertise in areas such as project management, systems engineering, and computer science from the private sector and academia for short-term (1- to 3-year) national service sabbaticals.
^^
These "NIH Data Fellows" will be embedded within a range of high-profile, transformative NIH projects such as All of Us, the Cancer MoonshotSM and the BRAIN initiative and will serve to provide innovation and expertise not readily available within the federal government.
^^
Implementation Tactics:
• Develop data-science training programs for NIH staff.
• Launch the NIH Data Fellows program.</OtherInformation></Objective><Objective><Name>Expansion</Name><Description>Expand the National Research Workforce</Description><Identifier>_1f7eb31a-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>4.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Modern biomedical research is becoming increasingly quantitative and it is essential that the next generation of researchers be equipped with the skills needed to take advantage of the growing promise of data science for advancing human health. NIH will work to ensure that NIH-funded training and fellowship programs emphasize teaching of quantitative and computational skills and integrate training in data-science approaches throughout their curricula and during mentored research. In keeping with the National Library of Medicine's (NLM) strategic plan, "A Platform for Biomedical Discovery and Datapowered Health," NIH will partner with institutions to engage librarians and information specialists in finding new paths in areas such as library science that have the potential to enrich the data-science ecosystem for biomedical research. The NLM Institutional Training Grants for Research Training in Biomedical Informatics and Data Science (T15) program offers one funding vehicle.
^^
Implementation Tactics:
• Enhance quantitative and computational training for undergraduates, graduate students, and postdoctoral fellows.
• Enable the development of curricula and other resources toward enhancing rigor and reproducibility of data science-based approaches.
• Promote training of data scientists in biomedical research areas.
• Improve the education of students on NIH training grants by enriching content in Responsible Conduct of Research requirements with information about secure and ethical data use.
• Build on diversity-enhancing efforts in data science, such as the NIH BD2K Diversity Initiative.
• Engage librarians and information specialists in developing data-science solutions and programs.
• Employ data-driven methods to monitor workforce diversity.</OtherInformation></Objective><Objective><Name>Engagement</Name><Description>Engage a Broader Community</Description><Identifier>_1f7eb4fa-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>4.3</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>As a field, data science crosses boundaries between research and practice, as well as between science and policy. NIH will promote knowledge exchange and development of best practices for the collection, organization, preservation, and dissemination of information resources across communities. Part of this effort is nurturing cultural change, emphasizing the role of data science in discovery and health, and enabling citizen scientists access to data without compromising its privacy or security. NIH recognizes its role in the larger data-science ecosystem and that NIH-generated biomedical data are used often by the private sector, clinicians, and other public groups. As part of the BD2K effort, NIH encouraged development of new or significantly adapted interactive digital media that engages the public, experts or non-experts, in performing some aspect of biomedical research via crowdsourcing. NIH will work to find additional ways to engage the public and healthcare providers in making use of biomedical data and data-science tools. Doing so will help to expand the biomedical “sandbox” to researchers without access to large-scale computational resources, such as non-research academic organizations, community colleges, and citizen scientists. 
^^
NIH will also consider new engagement models for enhancing data security such as "bug bounty programs," in which individuals can receive recognition and compensation for reporting bugs, especially those pertaining to data exploits and vulnerabilities. Such programs have been successful in the federal government -- one example is "Hack the Pentagon," which enabled the discovery of critical vulnerabilities within minutes. Such efforts require close collaboration with federal partners such as the Departments of Justice, the National Institute of Standards and Technology, and Homeland Security, as well as with private industry.
^^
Implementation Tactics:
• Give citizen scientists access to appropriate data, tools, and educational resources ...
• Find innovative solutions to data-science and data-resource challenges using community engagement models such as code-athons, contests, and crowdsourcing.
• Develop materials to train healthcare providers in data science-related clinical applications. </OtherInformation></Objective></Goal><Goal><Name>Stewardship &amp; Sustainability</Name><Description>Enact Appropriate Policies to Promote Stewardship and Sustainability</Description><Identifier>_1f7eb6c6-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>5</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Creating and maintaining an efficient and effective biomedical data-science ecosystem requires policies and practices appropriate for optimal governance, financial management, evaluation, and sustainable stewardship of resources. Because cultural issues are central to implementing policies, appropriate reward, review, and expectation systems are central to making data FAIR and for incentivizing researchers to share their data and analysis tools widely for reuse by others. To ensure researchers collecting data understand and comply with data-security and confidentiality standards and applicable law, it will be important for NIH to collaborate with the research community on strategies to guide general practice in data security and privacy matters and to collaborate with industry leaders who set standards in the information-security arena. </OtherInformation><Objective><Name>Data Ecosystem</Name><Description>Develop Policies for a FAIR Data Ecosystem</Description><Identifier>_1f7eb946-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>5.1</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>Currently, most biomedical data do not adhere to FAIR principles and thus are difficult to find and access. Moreover, complex or integrated analysis requires that data are interoperable and reusable across multiple domains with high fidelity. Thus, through appropriate policies and practices and as a core data-management activity, NIH will strive to ensure that all data in NIH-supported data resources are FAIR. The NIH Data Commons Pilot will be a starting point toward accomplishing this objective.
^^
While freely sharing high-value data is a critical goal for advancing research, NIH must ensure that its policies are achievable and sustainable and do not impose unnecessary burdens or untenable expectations on grantee institutions. Therefore, policies must reflect the data use and evaluation metrics and methods that will be established in Objective 2-1 to guide what data need to be made accessible and when they should be moved to less-accessible but less-expensive archive storage or retired altogether. NIH will also promote community-guided development of model open data-use licenses that will facilitate data sharing while simultaneously allowing protection of confidentiality and intellectual property. In addition, the NIH Data Commons pilot is establishing ways to use controlledaccess data through appropriate authentication and protocols.
^^
Implementation Tactics:
• Create rational and supportable data-sharing and data-management policies that ensure the security and confidentiality of patient and participant data and comply with applicable law.
• Promote development of community standards that support FAIR principles for data storage.
• Develop model open-data use licenses to enable broad access to datasets.
• Optimize security management and access policies.
• Ensure appropriate standards and mechanisms are in place to grant trusted-partner status for efficient data-access management. </OtherInformation></Objective><Objective><Name>Stewardship</Name><Description>Enhance Stewardship</Description><Identifier>_1f7ebb58-2c1e-11ee-9e43-679b1183ea00</Identifier><SequenceIndicator>5.2</SequenceIndicator><Stakeholder><Name/><Description/></Stakeholder><OtherInformation>The rapidly growing amount of data generated by the biomedical research enterprise creates an urgent need for developing clear guidelines for what data must be stored and shared, where and in what form it must be stored, as well as practical solutions for sustaining valuable data resources and determining priorities for data-resource funding.
^^
In addition, to produce the most scientific value for taxpayers' investments and to provide researchers with the best access to data resources possible, NIH must work with the community to improve the efficiency of operation of these resources and, wherever possible, create synergies and economies of scale.
^^
Toward achieving these goals, NIH will collaborate with its stakeholders -- including academia, other U.S. and international funding agencies, journals, and the private sector -- to establish a wide range of metrics to dynamically measure data use, utility, and modification, as well as measures of the operational efficiency of the resources themselves. Creating incentives and expectations for depositing FAIR-compliant data in NIH-funded repositories, data commons, or other NIH data systems, will enhance data sharing and reuse and allow NIH to accurately assess data usage and lifecycles. This information will be essential for making informed decisions about priorities for data-resource support.
^^
In addition, NIH will engage the broader data-science community in testing the utility of the NIH Data Commons as it is developed. As it refines these systems, NIH will seek input from entities with expertise in research ethics, privacy regulations and statutes, and data security to ensure NIH-supported data resources maintain research-participant confidentiality.
^^
Implementation Tactics:
• Develop standard use, utility, and efficiency metrics and review expectations for data resources and tools.
• Establish sustainability models for data resources.
• Develop a reward and expectation system for investigators to make data FAIR and for ensuring open-source data-analysis tools are available.</OtherInformation></Objective></Goal></StrategicPlanCore><AdministrativeInformation><StartDate/><EndDate/><PublicationDate>2023-07-26</PublicationDate><Source>https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_Science_Final_508.pdf</Source><Submitter><GivenName>Owen</GivenName><Surname>Ambur</Surname><PhoneNumber/><EmailAddress>Owen.Ambur@verizon.net</EmailAddress></Submitter></AdministrativeInformation></PerformancePlanOrReport>