[Congressional Record: January 10, 2007 (Senate)] [Page S359-S394] STATEMENTS ON INTRODUCED BILLS AND JOINT RESOLUTIONS By Mr. FEINGOLD (for himself, Mr. Sununu, Mr. Leahy, and Mr. Akaka): S. 236. A bill to require reports to Congress on Federal agency use of data mining; to the Committee on the Judiciary. Mr. FEINGOLD. Mr. President, I am pleased today to introduce the Federal Agency Data Mining Reporting Act of 2007. I want to thank Senator Sununu for once again cosponsoring this bill, which we also introduced in the last Congress. Senator Sununu has consistently been a leader on privacy issues, and I am pleased to work with him on this effort. I also want to thank Senators Leahy, Akaka, and Wyden, for their continuing support of the bill. The controversial data analysis technology known as data mining is capable of reviewing millions of both public and private records on each and every American. The possibility of government law enforcement or intelligence agencies fishing for patterns of criminal or terrorist activity in these vast quantities of digital data raises serious privacy and civil liberties issues--not to mention serious questions about the effectiveness of these types of searches. But four years after Congress first learned about and defunded the Defense Department's program called Total Information Awareness, there is still much Congress does not know about the Federal Government's work on data mining. We have made some progress. We know from reviews conducted by the Government Accountability Office that as of May 2004 there were nearly 200 Federal data mining programs, more than one hundred of which relied on personal information and 29 of which were for the purpose of investigating terrorists or criminals. And we have learned a few more details on five of those programs from a follow-up report that GAO issued in August 2005. We also have a brief report from the DHS Inspector General published in August 2006, and as a result of my amendment to the DHS appropriations bill we have a July 2006 report from the Privacy Office at the Department of Homeland Security that provides some interesting policy suggestions relating to data mining. But this information has come to us haphazardly, and lacks detail about the precise nature of the data mining programs being utilized or developed, their efficacy, and the consequences Americans could face as a result. Furthermore, much of the reporting thus far has focused on the Department of Homeland Security. It also appears there has been little if any government-wide consideration of privacy policies for these types of programs. Indeed, public debate on government data mining has been generated more by press stories than as a result of congressional oversight. My bill would require all Federal agencies to report to Congress within 180 days and every year thereafter on data mining programs developed or used to find a pattern or anomaly indicating terrorist or other criminal activity on the part of individuals, and how these programs implicate the civil liberties and privacy of all Americans. If necessary, specific information in the various reports could be classified. This is information we need to have. Congress should not be learning the details about data mining programs after millions of dollars are spent testing or using data mining against unsuspecting Americans. The possibility of unchecked, secret use of data mining technology threatens one of the most important values that we are fighting for in the war against terrorism--freedom. Data mining could rely on a combination of intelligence data and personal information like individuals' traffic violations, credit card purchases, travel records, medical records, and virtually any information contained in commercial or public databases. Congress must conduct oversight to make sure that all government agencies engaged in fighting terrorism and other criminal enterprises--not just the Department of Homeland Security, but also the Department of Justice, the Department of Defense and others--use these types of sensitive personal information effectively and appropriately. Let me clarify what this bill does not do. It does not have any effect on the government's use of commercial data to conduct individualized searches on people who are already suspects, nor does it require that the government report on these types of searches. It does not end funding for any program, determine the rules for use of data mining technology, or threaten any ongoing investigation that might use data mining technology. My bill would simply provide Congress with information about the nature of the technology and the data that will be used. The Federal Agency Data Mining Reporting Act would require all government agencies to assess [[Page S360]] the efficacy of the data mining technology they are using or developing--that is, whether the technology can deliver on the promises of each program. In addition, my bill would make sure that Congress knows whether the Federal agencies using data mining technology have considered and developed policies or guidelines to protect the privacy and due process rights of individuals, such as privacy technologies and redress procedures. With complete information about the current data mining plans and practices of the Federal Government, Congress will be able to conduct a thorough review of the costs and benefits of the practice of data mining on a program-by-program basis and make considered judgments about whether programs should go forward. Congress will also be able to evaluate whether new privacy rules are necessary. In addition, Congress must look closely at the government's activities because data mining is unproven in this area. Some argue that data mining can help locate potential terrorists before they strike. But we do not, today, have evidence that pattern-based data mining will prevent terrorism. In fact, some technology experts have warned that this type of data mining is not the right approach for the terrorism problem. Just last month, the Cato Institute released a report--coauthored by a scientist specializing in data analytics and an information privacy expert--concluding that ``[t]he only thing predictable about predictive data mining for terrorism is that it would be consistently wrong.'' Some commercial uses of data mining have been successful, but have arisen in a very different context than counterterrorism efforts. For example, the financial world has successfully used data mining to identify people committing fraud because it has data on literally millions, if not billions, of historical financial transactions. And the banks and credit card companies know, in large part, which of those past transactions have turned out to be fraudulent. So when they apply sophisticated statistical algorithms to that massive amount of historical data, they are able to make a pretty good guess about what a fraudulent transaction might look like in the future. We do not have that kind of historical data about terrorists and sleeper cells. We have just a handful of individuals whose past actions can be analyzed, which makes it virtually impossible to apply the kind of advanced statistical analysis required to use data mining in this way. That raises serious questions about whether data mining will ever be able to locate an actual terrorist. Before the government starts reviewing personal information about every man, woman and child in this country, we should learn what data mining can and can't do--and what limits and protections are needed if data mining programs do go forward. We must also bear in mind that there will inevitably be errors in the underlying data. Everyone knows people who have had errors on their credit reports--and that is the one area of commercial data where the law already imposes strict accuracy requirements. Other types of commercial data are likely to be even more inaccurate. Even if the technology itself were effective, I am very concerned that innocent people could be ensnared because of mistakes in the data that make them look suspicious. The recent rise in identity theft, which creates even more data accuracy problems, makes it even more important that we address this issue. I also want to touch on one issue that has proved difficult in many debates about data mining: how to define the term. What is data mining? From policy debates to government reports, many people have wrestled with this question. While it can be defined more broadly, for the purpose of this reporting requirement, data mining is limited to the process of attempting to predict future events or actions by discovering or locating patterns or anomalies in data. However, for purposes of the reporting requirement in this bill, which seeks information on those data mining programs most likely to threaten the privacy and civil liberties of Americans, I have limited the definition in a couple of other ways. First, the bill's core definition of data mining is to conduct a query, search or other analysis of one or more electronic databases to ``discover a predictive pattern or an anomaly indicative of terrorist or criminal activity on the part of any individual or individuals.'' Data mining has a number of applications at various government agencies outside the context of terrorism and other criminal investigations, but I have limited the definition for purposes of this legislation in order to get reports on the programs most likely to raise privacy concerns. For example, the May 2004 GAO report identified a number of government data mining programs whose goals are managing resources efficiently or identifying fraud, waste and abuse in government programs, and that do not rely on personally identifiable information. I am not seeking reports on programs like these. Second, as I alluded to earlier, the definition explicitly excludes queries to retrieve information from a database that is based on information--such as address, passport number or license plate number-- that is associated with a particular individual or individuals. This type of query is a traditional investigative technique. Although government agencies must be careful in their use of commercial databases, simply querying a Choicepoint database for information about someone who is already a suspect is not data mining. Most Americans believe that their private lives should remain private. Data mining programs run the risk of intruding into the lives of individuals who have nothing to do with terrorism or other criminal activity and understandably do not want their credit reports, shopping habits and doctor visits to become a part of a gigantic computerized search engine operating without any controls or oversight, and without much promise of locating terrorists. As the Cato report put it, ``[t]he possible benefits of predictive data mining for finding planning or preparation for terrorism are minimal. The financial costs, wasted effort, and threats to privacy and civil liberties are potentially vast.'' At a minimum, the administration should be required to report to Congress about the various data mining programs now underway or being studied, and the impact those programs may have on our privacy and civil liberties, so that Congress can determine whether any benefits of this practice come at too high a price to our privacy and personal liberties. As Senator Wyden and I have told the Director of National Intelligence, we must have a public discussion about the efficacy and privacy implications of data mining. We wrote a letter to him on November 15, 2006, that included the following: [W]e believe there needs to be a public discussion before the implementation of any government data mining program that would rely on domestic commercial data and other information about Americans. There are serious questions about whether pattern analysis of such data can effectively identify terrorists, given the relative lack of historical data about terrorist activities. And as the furor over the Total Information Awareness program demonstrated, the American public has serious--and legitimate--concerns about the privacy ramifications of programs designed to fish for patterns of criminal or terrorist activity in vast quantities of digital data, collected by other entities for entirely different reasons. Pattern analysis runs the risk of generating a large number of false positives, meaning that innocent Americans could become the subject of investigation. Before we go down that path, it is critical that we have a public discussion about the efficacy and privacy implications of this technology. And, if we decide that data mining is effective enough to warrant spending taxpayer dollars on it, we should establish strong privacy protections to protect innocent people from being the subject of government suspicion. Of course, the Intelligence Community should be taking advantage of new technologies in its critical responsibility to protect our country from terrorists, and much of its work must remain classified to protect national security. But we can have a public debate about what privacy rules should constrain data mining programs deployed domestically, without revealing sensitive information like the precise algorithms that the government has developed. This bill is the first step in this process--a way for Congress and, to the degree appropriate, the public to finally understand what is going on behind the closed doors of the executive branch so that we can start to have a policy discussion about data mining that is long overdue. I urge my colleagues to support this bill. All it asks for is information to which Congress and the American people are entitled. [[Page S361]] Mr. President, I ask unanimous consent that the text of this bill be printed in the Record. There being no objection, the text of the bill was ordered to be printed in the Record, as follows: S. 236 Be it enacted by the Senate and House of Representatives of the United States of America in Congress assembled, SECTION 1. SHORT TITLE. This Act may be cited as the ``Federal Agency Data Mining Reporting Act of 2007''. SEC. 2. DEFINITIONS. In this Act: (1) Data mining.--The term ``data mining'' means a query, search, or other analysis of 1 or more electronic databases, where-- (A) a department or agency of the Federal Government, or a non-Federal entity acting on behalf of the Federal Government, is conducting the query, search, or other analysis to discover or locate a predictive pattern or anomaly indicative of terrorist or criminal activity on the part of any individual or individuals; and (B) the query, search, or other analysis does not use personal identifiers of a specific individual, or inputs associated with a specific individual or group of individuals, to retrieve information from the database or databases. (2) Database.--The term ``database'' does not include telephone directories, news reporting, information publicly available to any member of the public without payment of a fee, or databases of judicial and administrative opinions. SEC. 3. REPORTS ON DATA MINING ACTIVITIES BY FEDERAL AGENCIES. (a) Requirement for Report.--The head of each department or agency of the Federal Government that is engaged in any activity to use or develop data mining shall submit a report to Congress on all such activities of the department or agency under the jurisdiction of that official. The report shall be made available to the public, except for a classified annex described in subsection (b)(8). (b) Content of Report.--Each report submitted under subsection (a) shall include, for each activity to use or develop data mining, the following information: (1) A thorough description of the data mining activity, its goals, and, where appropriate, the target dates for the deployment of the data mining activity. (2) A thorough description of the data mining technology that is being used or will be used, including the basis for determining whether a particular pattern or anomaly is indicative of terrorist or criminal activity. (3) A thorough description of the data sources that are being or will be used. (4) An assessment of the efficacy or likely efficacy of the data mining activity in providing accurate information consistent with and valuable to the stated goals and plans for the use or development of the data mining activity. (5) An assessment of the impact or likely impact of the implementation of the data mining activity on the privacy and civil liberties of individuals, including a thorough description of the actions that are being taken or will be taken with regard to the property, privacy, or other rights or privileges of any individual or individuals as a result of the implementation of the data mining activity. (6) A list and analysis of the laws and regulations that govern the information being or to be collected, reviewed, gathered, analyzed, or used with the data mining activity. (7) A thorough discussion of the policies, procedures, and guidelines that are in place or that are to be developed and applied in the use of such technology for data mining in order to-- (A) protect the privacy and due process rights of individuals, such as redress procedures; and (B) ensure that only accurate information is collected, reviewed, gathered, analyzed, or used. (8) Any necessary classified information in an annex that shall be available, as appropriate, to the Committee on Homeland Security and Governmental Affairs, the Committee on the Judiciary, the Select Committee on Intelligence, and the Committee on Appropriations of the Senate and the Committee on Homeland Security, the Committee on the Judiciary, the Permanent Select Committee on Intelligence, and the Committee on Appropriations of the House of Representatives. (c) Time for Report.--Each report required under subsection (a) shall be-- (1) submitted not later than 180 days after the date of enactment of this Act; and (2) updated not less frequently than annually thereafter, to include any activity to use or develop data mining engaged in after the date of the prior report submitted under subsection (a). Mr. LEAHY. Mr. President, I am pleased today to join with Senators Feingold, Sununu and others to introduce the Federal Agency Data Mining Reporting Act of 2007. This important privacy legislation would begin to restore key checks and balances by requiring Federal agencies to report to Congress on their datamining programs and activities. We joined together to introduce a similar bill last Congress. Regrettably, it received no attention. This year, I intend to make sure that we do a better job in considering Americans' privacy, checks and balances, and the proper balance to protect Americans' privacy rights while fighting smarter and more effectively against security threats. In recent years, the Federal Government's use of data mining technology has exploded. According to a May 2004 report by the General Accounting Office, there are at least 199 different government data mining programs operating or planned throughout the Federal Government, with at least 52 different Federal agencies currently using data mining technology. And, more and more, these data mining programs are being used with little or no notice to ordinary citizens, or to Congress. Advances in technologies make data banks and data mining more powerful and more useful than at any other time in our history. These can be useful tools in our national security arsenal, but we should use them appropriately so that they can be most effective. A mistake can cost Americans their jobs and wreak havoc in their lives and reputations that can take years to repair. Without adequate safeguards, oversight and checks and balances, these powerful technologies also become an invitation to government abuse. The government must take steps to ensure that it is properly using this technology. Too often, government data mining programs lack adequate safeguards to protect the privacy rights and civil liberties of ordinary Americans, whose data is collected and analyzed by these programs. Without these safeguards, government data mining programs are prone to produce inaccurate results and are ripe for abuse, error and unintended consequences. This legislation takes an important first step in addressing these concerns by pulling back the curtain on how this Administration is using this technology. It does not by its terms prohibit the use of this technology, but rather provides an oversight mechanism to begin to ensure it is being used appropriately and effectively. This bill would require Federal agencies to report to Congress about its data mining programs. The legislation provides a much-needed check on federal agencies to disclose the steps that they are taking to protect the privacy and due process rights of American citizens when they use these programs. We need checks and balances to keep government data bases from being misused against the American people. That is what the Constitution and our laws should provide. We in Congress must make sure that when our government uses technology to detect and deter illegal activity that it does so in a manner that also protects our most basic rights and liberties. This bill advances this important goal, and I urge all Senators to support this important privacy legislation. ______