Ensuring the security of the United States’ intellectual investments in science and engineering has been a central focus of U.S. innovation policy for the last several years. Yet national security faces a different threat: the consolidation of vast amounts of data about American research and American researchers in the hands of a small number of for profit publishers increasingly tied to foreign data brokers. There is now a real and present danger of the sale of sensitive information to third parties and use of that data without the knowledge or consent of the U.S. government.
The danger has surfaced because of the steady acquisition by publishers of tools used by academic institutions for annual evaluations, hiring and promotion decisions and of library services. Universities are providing detailed information about all the scientific activities of their researchers to for profit publishers. Those publishers can move that information off American soil, use it to build a full picture of current and future American science, and repackage and resell it regardless of the potential threat to national security.
Take the example of the 2022 purchase by Dutch based Elsevier of Interfolio, a faculty information tool that was owned by an American company. This tool is used by over 400 American academic institutions and over 700,000 researchers to collect and process detailed confidential data about scientific hiring, funding and promotion processes. It connects to university payroll and human resource records without the consent of the academic researchers. The resulting vast data repository thus reflects the often unpublished intellectual property and scientific collaborations of tens of thousands of American researchers and often their students. The data about researchers also includes information about which businesses, federal science or defense agencies funded their research and what additional positions they held. This connected — and verified — information can be used to characterize the emerging science and technology portfolio of the United States, identify the thought leaders and potentially be resold for profit.
Importantly, there is no control over the reuse and resale of the data. In the Elsevier example, its updated (2023) privacy policy makes it clear that the data on research and researchers can be reused and resold by their Anglo-Dutch owners, RELX and their related companies that provide technology, customer service and other shared services functions … [as well as] sponsors, joint venture partners and other third parties”. The information and analysis are beyond the reach of U.S. law. The “personal information may be stored and processed in your region or another country where Elsevier companies and their service providers maintain servers and facilities, including Australia, China, France, Germany, India, Ireland, the Netherlands, the Philippines, Singapore, the United Kingdom and the United States.”
The solution is to balance the scales. There is little coordination across units within a university and even less across universities, so the implication of the use of foreign-owned tools may be obscured unless one considers the comprehensive national view. Each university or university unit negotiates individual contracts that have non-disclosure agreements and hence cannot share information about terms and conditions with other university entities. By contrast, publishing companies are increasing their concentration “single systems to monetize the full research lifecycle” and concentrate market power,
A national strategy, initiated as protection of national interests, should balance the scales in three ways.
First, balance the scales on contract negotiations. Define a set of required terms that apply at a macro level to all universities and provide federal support for implementing those terms with any vendor. Require that universities negotiate contracts with foreign publishers as a domestic consortium rather than as individual units. Require transparency in contracts that explicitly limit the potential for for-profit companies to repackage, resell, or reuse data on scientific activity at the expense of U.S. taxpayers. And finally, ensure that the reuse of data on researchers be limited to the explicit purpose of the tools. In the case of the Interfolio tool, for example, the use of data should cover only a limited time and not go beyond the explicit hiring, tenure and promotion management of the university.
Second, balance the scales on the security of the data on researchers and the work that they do. Only people authorized by the university with a need to know should access the data. The data should be hosted on secure domestic servers. No personally identifiable information should be made available and no individual level information should be released. Research data should not be transmitted to places beyond the control of U.S. law. They should be hosted either in existing, secure, university-based centers or Federally certified and tested FedRAMP secure environments with Federal Authorizations to Operate.
Third, require openness and transparency. Require private publishing companies to make public all existing code, methodology and analysis that they have developed and applied to any research data or grey data connected to federal funding in an open source repository that can be accessed, assessed and repurposed by U.S. institutions. Scientists, engineers and university administrators should be provided with full information about how their data are being used with an explicit opt-in/opt-out option. All research content managed by private companies but connected to federal funding should be regularly audited for compliance with security and privacy frameworks.
In the longer run, the U.S. government should recognize the value of these data for advancing the interests of national security and global competitiveness. Our national leaders should identify other options that do not cede control to foreign publishers. This could consist of investing in a U.S.-based national research infrastructure centered around open source research reporting tools and research data management tools with high standards for research security and under the control of U.S. law. Standardized reporting systems are already being built by American research libraries, including the California Digital Library. U.S. organizations would result in cheaper, American-owned and more transparent tools in the long run.
There’s no doubt that the management of R&D requires better tools to securely manage decisions in an increasingly complex and competitive ecosystem. Alarmingly, foreign for-profit companies have been quick to identify and fill the need. If the for-profit foreign data broker industry continues to go unmonitored and unchecked, the United States risks weakening the security of critical research. Immediate action by policymakers and research leaders is urgently needed to enable a new wave of innovation in open data management that is protected by U.S. law, preserves national security and guards the people doing the work.
Nick Hart, President, Data Foundation; Suzette Kent, former Federal CIO; Julia Lane, Professor, NYU; Nancy Potok, former U.S. Chief Statistician