GAO Reports on Data Mining at Federal Agencies

May 27, 2004. The General Accounting Office (GAO) released a report [71 pages in PDF] titled "Data Mining: Federal Efforts Cover a Wide Range of Uses".

The report finds that "Driven by advances in computing and data storage capabilities and by growth in the volumes and availability of information collected by the public and private sectors, data mining enables government agencies to analyze massive volumes of data. Our survey shows that data mining is increasingly being used by government for a variety of purposes, ranging from improving service or performance to analyzing and detecting terrorist patterns and activities."

The report defines data mining as "the application of database technology and techniques -- such as statistical analysis and modeling -- to uncover hidden patterns and subtle relationships in data and to infer rules that allow for the prediction of future results". The GAO surveyed chief information officers at 128 federal departments and agencies regarding whether their entities had operational and planned data mining systems or activities.

The report finds that "52 agencies are using or are planning to use data mining. These departments and agencies reported 199 data mining efforts, of which 68 were planned and 131 were operational." It also found that "122 used personal information".

The report finds that "Agencies also identified efforts to mine data from the private sector and data from other federal agencies, both of which could include personal information. Of 54 efforts to mine data from the private sector (such as credit reports or credit card transactions), 36 involve personal information. Of 77 efforts to mine data from other federal agencies, 46 involve personal information (including student loan application data, bank account numbers, credit card information, and taxpayer identification numbers)." (Parentheses in original.)

The report states that agencies are using data mining for "improving service or performance", "detecting fraud, waste, and abuse", "analyzing scientific and research information", "managing human resources", "detecting criminal activities or patterns", and "analyzing intelligence and detecting terrorist activities".

Privacy. The report discusses the privacy implications of data mining. It states that "Since the terrorist attacks of September 11, 2001, data mining has been seen increasingly as a useful tool to help detect terrorist threats by improving the collection and analysis of public and private sector data", and that "Such use of data mining by federal agencies has raised public and congressional concerns regarding privacy".

It continues that "Privacy concerns about mined or analyzed personal data also include concerns about the quality and accuracy of the mined data; the use of the data for other than the original purpose for which the data were collected without the consent of the individual; the protection of the data against unauthorized access, modification, or disclosure; and the right of individuals to know about the collection of personal information, how to access that information, and how to request a correction of inaccurate information."

It concludes that "more work is needed to shed light on the privacy implications of these efforts".

Examples of Government Data Mining. The GAO report contains summary tables that lists each data mining activity reported to the GAO by the survey respondents.

The Department of Homeland Security (DHS) operates, or has planned, numerous data mining projects that use both personal data, and data acquired from the private sector, for purposes related to detecting crime and terrorist activities. For example, there is its forthcoming "Incident Data Mart", which "Will look through incident logs for patterns of events. An incident is an event involving a law enforcement or government agency for which a log was created (e.g., traffic ticket, drug arrest, or firearm possession). The system may look at crimes in a particular geographic location, particular types of arrests, or any type of unusual activity." (Parentheses in original.)

The Department of Defense (DOD) also operates numerous data mining projects. For example, there is its "Verity K2 Enterprise", which "Mines data from the intelligence community and Internet searches to identify foreign terrorists or U.S. citizens connected to foreign terrorism activities". It uses both personal data, and private sector data.

There is also a DOD project named "Pathfinder" that "Is a data mining tool developed for analysts that provides the ability to analyze government and private sector databases rapidly. It can compare and search multiple large databases quickly". It uses personal data. Both Pathfinder and Verity K2 Enterprise are used for detecting terrorist activity.

The Federal Bureau of Investigation (FBI) has an operational data mining project that "Supports the Foreign Terrorist Tracking Task Force that seeks to prevent foreign terrorists from gaining access to the United States. Data from the Department of Homeland Security, Federal Bureau of Investigation, and public data sources are put into a data mart and mined to determine unlawful entry and to support deportations and prosecutions." It uses both private sector and personal data.

The Department of the Treasury has numerous data mining projects, that use personal information, for the purpose of "increasing tax compliance". Several planned projects will also use private sector data. One would also use data obtained from other government agencies. Some projects are intended to identify noncompliance, while others would predict abuse, or "evaluate and rate potentially fraudulent individual tax returns".

The Department of the Treasury's Secret Service has a project that "Mines data in suspicious activity reports received from banks to find commonalities in data to assist in strategically allocating resources." It uses personal data.

The Department of the Treasury also has a project that "Attempts to identify and stop fraudulent activity involving stolen credit cards to order products over the Internet or via telephone. Fraud rating identifiers are used to identify areas where fraud has occurred and to determine the likelihood of fraud. Allows for orders to be stopped or for orders over a certain dollar limit to be stopped." It uses personal data, private sector data, and data from other government agencies.

Several government agencies use data mining to investigate misuse of government provided credit cards.

Several government agencies use data mining to detect fraud in government pension and assistance programs.

The Department of Education uses data mining in its Pell grants program.

The NASA operates several data mining projects to analyze scientific and research information.

The U.S. Patent and Trademark Office (USPTO) has a data mining project that "Generates and makes available compensation projection data, both salary and benefits, on current employees and on planned hires. It also accounts for planned attritions."

Agencies Not Using Data Mining. The report lists the agencies that were surveyed by the GAO, but which reported no data mining activities. This list includes the Department of Defense's (DOD) Defense Advanced Research Projects Agency (DARPA).

The DARPA previously operated a program known as Total Information Awareness (TIA). It was cancelled at the DARPA following criticism from Congress.

The Department of Defense's (DOD) Office of Inspector General (OIG) released a report [42 pages in PDF], titled "Information Technology Management: Terrorism Information Awareness Program", and dated December 12, 2003. See also, story titled "DOD Releases Report on DARPA's Total Information Awareness Program" in TLJ Daily E-Mail Alert No. 809, January 5, 2004.

Also, on May 17, 2004, the DOD's Technology and Privacy Advisory Committee (TAPAC) released a report [140 pages in PDF] titled "Safeguarding Privacy in the Fight Against Terrorism". It addresses data mining by the DOD and the other federal agencies, the DARPA's TIA program, and individual privacy. See, story titled "DOD Advisory Committee Backs Data Mining, with Attention to Privacy" in TLJ Daily E-Mail Alert No. 900, May 18, 2004.

The GAO surveyed the Securities and Exchange Commission (SEC), but it reported no data mining activities. The SEC investigates, and brings civil enforcement actions, for securities fraud, including insider trading.

Many federal entities were not surveyed, including the Federal Communications Commission (FCC) and Federal Trade Commission (FTC).

The report was prepared for Sen. Daniel Akaka (D-HI), the ranking Democrat on the Senate Governmental Affairs Committee's Subcommittee on Financial Management, the Budget, and International Security.

The Electronic Privacy Information Center (EPIC), the Center for Democracy & Technology (CDT), and the ACLU wrote a letter to Sen. Akaka in which they stated that "This report shows just how widespread the embrace of such powerful techniques is becoming within government, and how little has been done to update our oversight mechanisms to compensate."

The three groups also wrote that "the report documents the widespread reliance on private-sector sources of information. This is significant because computers and computer chips are working their way into our daily lives to an amazing extent. While this is improving our lives in many ways, it is also creating a situation where Americans' every action, movement, and communication is likely to be recorded and stored in the memory of some computer database. And because the bulk of citizens' daily transactions occur within the private sector - which often has strong economic incentives to gather and store information -- government access to such databases creates the potential for a dramatic increase in government monitoring of individuals."

Also, on May 26, James Dempsey and Paul Rosenzweig released a paper [15 pages in PDF] titled "Technologies That Can Protect Privacy As Information Is Shared to Combat Terrorism". Dempsey is the Executive Director of the CDT. Rosenzweig is a research fellow  at the Heritage Foundation and an adjunct professor at George Mason University School of Law.

This paper states that "The same technology that permits the accumulation, sharing, and analysis of huge databases also allows for the incorporation into information sharing systems of features that protect information from abuse or misuse".

After discussing the nature of private sector and government databases, the privacy interests at stake, and the threats to privacy, the paper identifies, explains, and discusses three such technologies: "anonymization of data, permissioning rules built into the data and search engines to regulate access, and immutable audit trails that can identify abuse (while also assisting in linking people into ad hoc collaborative teams)". (Parentheses in original.)