[Congressional Record: January 10, 2007 (Senate)]
[Page S359-S394]



          STATEMENTS ON INTRODUCED BILLS AND JOINT RESOLUTIONS

      By Mr. FEINGOLD (for himself, Mr. Sununu, Mr. Leahy, and Mr.
        Akaka):
  S. 236. A bill to require reports to Congress on Federal agency use
of data mining; to the Committee on the Judiciary.
  Mr. FEINGOLD. Mr. President, I am pleased today to introduce the
Federal Agency Data Mining Reporting Act of 2007. I want to thank
Senator Sununu for once again cosponsoring this bill, which we also
introduced in the last Congress. Senator Sununu has consistently been a
leader on privacy issues, and I am pleased to work with him on this
effort. I also want to thank Senators Leahy, Akaka, and Wyden, for
their continuing support of the bill.
  The controversial data analysis technology known as data mining is
capable of reviewing millions of both public and private records on
each and every American. The possibility of government law enforcement
or intelligence agencies fishing for patterns of criminal or terrorist
activity in these vast quantities of digital data raises serious
privacy and civil liberties issues--not to mention serious questions
about the effectiveness of these types of searches. But four years
after Congress first learned about and defunded the Defense
Department's program called Total Information Awareness, there is still
much Congress does not know about the Federal Government's work on data
mining.
  We have made some progress. We know from reviews conducted by the
Government Accountability Office that as of May 2004 there were nearly
200 Federal data mining programs, more than one hundred of which relied
on personal information and 29 of which were for the purpose of
investigating terrorists or criminals. And we have learned a few more
details on five of those programs from a follow-up report that GAO
issued in August 2005. We also have a brief report from the DHS
Inspector General published in August 2006, and as a result of my
amendment to the DHS appropriations bill we have a July 2006 report
from the Privacy Office at the Department of Homeland Security that
provides some interesting policy suggestions relating to data mining.
  But this information has come to us haphazardly, and lacks detail
about the precise nature of the data mining programs being utilized or
developed, their efficacy, and the consequences Americans could face as
a result. Furthermore, much of the reporting thus far has focused on
the Department of Homeland Security. It also appears there has been
little if any government-wide consideration of privacy policies for
these types of programs. Indeed, public debate on government data
mining has been generated more by press stories than as a result of
congressional oversight.
  My bill would require all Federal agencies to report to Congress
within 180 days and every year thereafter on data mining programs
developed or used to find a pattern or anomaly indicating terrorist or
other criminal activity on the part of individuals, and how these
programs implicate the civil liberties and privacy of all Americans. If
necessary, specific information in the various reports could be
classified.
  This is information we need to have. Congress should not be learning
the details about data mining programs after millions of dollars are
spent testing or using data mining against unsuspecting Americans. The
possibility of unchecked, secret use of data mining technology
threatens one of the most important values that we are fighting for in
the war against terrorism--freedom.
  Data mining could rely on a combination of intelligence data and
personal information like individuals' traffic violations, credit card
purchases, travel records, medical records, and virtually any
information contained in commercial or public databases. Congress must
conduct oversight to make sure that all government agencies engaged in
fighting terrorism and other criminal enterprises--not just the
Department of Homeland Security, but also the Department of Justice,
the Department of Defense and others--use these types of sensitive
personal information effectively and appropriately.
  Let me clarify what this bill does not do. It does not have any
effect on the government's use of commercial data to conduct
individualized searches on people who are already suspects, nor does it
require that the government report on these types of searches. It does
not end funding for any program, determine the rules for use of data
mining technology, or threaten any ongoing investigation that might use
data mining technology.
  My bill would simply provide Congress with information about the
nature of the technology and the data that will be used. The Federal
Agency Data Mining Reporting Act would require all government agencies
to assess

[[Page S360]]

the efficacy of the data mining technology they are using or
developing--that is, whether the technology can deliver on the promises
of each program. In addition, my bill would make sure that Congress
knows whether the Federal agencies using data mining technology have
considered and developed policies or guidelines to protect the privacy
and due process rights of individuals, such as privacy technologies and
redress procedures. With complete information about the current data
mining plans and practices of the Federal Government, Congress will be
able to conduct a thorough review of the costs and benefits of the
practice of data mining on a program-by-program basis and make
considered judgments about whether programs should go forward. Congress
will also be able to evaluate whether new privacy rules are necessary.
  In addition, Congress must look closely at the government's
activities because data mining is unproven in this area. Some argue
that data mining can help locate potential terrorists before they
strike. But we do not, today, have evidence that pattern-based data
mining will prevent terrorism. In fact, some technology experts have
warned that this type of data mining is not the right approach for the
terrorism problem. Just last month, the Cato Institute released a
report--coauthored by a scientist specializing in data analytics and an
information privacy expert--concluding that ``[t]he only thing
predictable about predictive data mining for terrorism is that it would
be consistently wrong.''
  Some commercial uses of data mining have been successful, but have
arisen in a very different context than counterterrorism efforts. For
example, the financial world has successfully used data mining to
identify people committing fraud because it has data on literally
millions, if not billions, of historical financial transactions. And
the banks and credit card companies know, in large part, which of those
past transactions have turned out to be fraudulent. So when they apply
sophisticated statistical algorithms to that massive amount of
historical data, they are able to make a pretty good guess about what a
fraudulent transaction might look like in the future.
  We do not have that kind of historical data about terrorists and
sleeper cells. We have just a handful of individuals whose past actions
can be analyzed, which makes it virtually impossible to apply the kind
of advanced statistical analysis required to use data mining in this
way. That raises serious questions about whether data mining will ever
be able to locate an actual terrorist. Before the government starts
reviewing personal information about every man, woman and child in this
country, we should learn what data mining can and can't do--and what
limits and protections are needed if data mining programs do go
forward.
  We must also bear in mind that there will inevitably be errors in the
underlying data. Everyone knows people who have had errors on their
credit reports--and that is the one area of commercial data where the
law already imposes strict accuracy requirements. Other types of
commercial data are likely to be even more inaccurate. Even if the
technology itself were effective, I am very concerned that innocent
people could be ensnared because of mistakes in the data that make them
look suspicious. The recent rise in identity theft, which creates even
more data accuracy problems, makes it even more important that we
address this issue.
  I also want to touch on one issue that has proved difficult in many
debates about data mining: how to define the term. What is data mining?
From policy debates to government reports, many people have wrestled
with this question. While it can be defined more broadly, for the
purpose of this reporting requirement, data mining is limited to the
process of attempting to predict future events or actions by
discovering or locating patterns or anomalies in data. However, for
purposes of the reporting requirement in this bill, which seeks
information on those data mining programs most likely to threaten the
privacy and civil liberties of Americans, I have limited the definition
in a couple of other ways. First, the bill's core definition of data
mining is to conduct a query, search or other analysis of one or more
electronic databases to ``discover a predictive pattern or an anomaly
indicative of terrorist or criminal activity on the part of any
individual or individuals.'' Data mining has a number of applications
at various government agencies outside the context of terrorism and
other criminal investigations, but I have limited the definition for
purposes of this legislation in order to get reports on the programs
most likely to raise privacy concerns. For example, the May 2004 GAO
report identified a number of government data mining programs whose
goals are managing resources efficiently or identifying fraud, waste
and abuse in government programs, and that do not rely on personally
identifiable information. I am not seeking reports on programs like
these.
  Second, as I alluded to earlier, the definition explicitly excludes
queries to retrieve information from a database that is based on
information--such as address, passport number or license plate number--
that is associated with a particular individual or individuals. This
type of query is a traditional investigative technique. Although
government agencies must be careful in their use of commercial
databases, simply querying a Choicepoint database for information about
someone who is already a suspect is not data mining.
  Most Americans believe that their private lives should remain
private. Data mining programs run the risk of intruding into the lives
of individuals who have nothing to do with terrorism or other criminal
activity and understandably do not want their credit reports, shopping
habits and doctor visits to become a part of a gigantic computerized
search engine operating without any controls or oversight, and without
much promise of locating terrorists. As the Cato report put it, ``[t]he
possible benefits of predictive data mining for finding planning or
preparation for terrorism are minimal. The financial costs, wasted
effort, and threats to privacy and civil liberties are potentially
vast.''
  At a minimum, the administration should be required to report to
Congress about the various data mining programs now underway or being
studied, and the impact those programs may have on our privacy and
civil liberties, so that Congress can determine whether any benefits of
this practice come at too high a price to our privacy and personal
liberties. As Senator Wyden and I have told the Director of National
Intelligence, we must have a public discussion about the efficacy and
privacy implications of data mining. We wrote a letter to him on
November 15, 2006, that included the following:

       [W]e believe there needs to be a public discussion before
     the implementation of any government data mining program that
     would rely on domestic commercial data and other information
     about Americans. There are serious questions about whether
     pattern analysis of such data can effectively identify
     terrorists, given the relative lack of historical data about
     terrorist activities. And as the furor over the Total
     Information Awareness program demonstrated, the American
     public has serious--and legitimate--concerns about the
     privacy ramifications of programs designed to fish for
     patterns of criminal or terrorist activity in vast quantities
     of digital data, collected by other entities for entirely
     different reasons. Pattern analysis runs the risk of
     generating a large number of false positives, meaning that
     innocent Americans could become the subject of investigation.
     Before we go down that path, it is critical that we have a
     public discussion about the efficacy and privacy implications
     of this technology. And, if we decide that data mining is
     effective enough to warrant spending taxpayer dollars on it,
     we should establish strong privacy protections to protect
     innocent people from being the subject of government
     suspicion.
       Of course, the Intelligence Community should be taking
     advantage of new technologies in its critical responsibility
     to protect our country from terrorists, and much of its work
     must remain classified to protect national security. But we
     can have a public debate about what privacy rules should
     constrain data mining programs deployed domestically, without
     revealing sensitive information like the precise algorithms
     that the government has developed.

  This bill is the first step in this process--a way for Congress and,
to the degree appropriate, the public to finally understand what is
going on behind the closed doors of the executive branch so that we can
start to have a policy discussion about data mining that is long
overdue. I urge my colleagues to support this bill. All it asks for is
information to which Congress and the American people are entitled.

[[Page S361]]

  Mr. President, I ask unanimous consent that the text of this bill be
printed in the Record.
  There being no objection, the text of the bill was ordered to be
printed in the Record, as follows:

                                 S. 236

       Be it enacted by the Senate and House of Representatives of
     the United States of America in Congress assembled,

     SECTION 1. SHORT TITLE.

       This Act may be cited as the ``Federal Agency Data Mining
     Reporting Act of 2007''.

     SEC. 2. DEFINITIONS.

       In this Act:
       (1) Data mining.--The term ``data mining'' means a query,
     search, or other analysis of 1 or more electronic databases,
     where--
       (A) a department or agency of the Federal Government, or a
     non-Federal entity acting on behalf of the Federal
     Government, is conducting the query, search, or other
     analysis to discover or locate a predictive pattern or
     anomaly indicative of terrorist or criminal activity on the
     part of any individual or individuals; and
       (B) the query, search, or other analysis does not use
     personal identifiers of a specific individual, or inputs
     associated with a specific individual or group of
     individuals, to retrieve information from the database or
     databases.
       (2) Database.--The term ``database'' does not include
     telephone directories, news reporting, information publicly
     available to any member of the public without payment of a
     fee, or databases of judicial and administrative opinions.

     SEC. 3. REPORTS ON DATA MINING ACTIVITIES BY FEDERAL
                   AGENCIES.

       (a) Requirement for Report.--The head of each department or
     agency of the Federal Government that is engaged in any
     activity to use or develop data mining shall submit a report
     to Congress on all such activities of the department or
     agency under the jurisdiction of that official. The report
     shall be made available to the public, except for a
     classified annex described in subsection (b)(8).
       (b) Content of Report.--Each report submitted under
     subsection (a) shall include, for each activity to use or
     develop data mining, the following information:
       (1) A thorough description of the data mining activity, its
     goals, and, where appropriate, the target dates for the
     deployment of the data mining activity.
       (2) A thorough description of the data mining technology
     that is being used or will be used, including the basis for
     determining whether a particular pattern or anomaly is
     indicative of terrorist or criminal activity.
       (3) A thorough description of the data sources that are
     being or will be used.
       (4) An assessment of the efficacy or likely efficacy of the
     data mining activity in providing accurate information
     consistent with and valuable to the stated goals and plans
     for the use or development of the data mining activity.
       (5) An assessment of the impact or likely impact of the
     implementation of the data mining activity on the privacy and
     civil liberties of individuals, including a thorough
     description of the actions that are being taken or will be
     taken with regard to the property, privacy, or other rights
     or privileges of any individual or individuals as a result of
     the implementation of the data mining activity.
       (6) A list and analysis of the laws and regulations that
     govern the information being or to be collected, reviewed,
     gathered, analyzed, or used with the data mining activity.
       (7) A thorough discussion of the policies, procedures, and
     guidelines that are in place or that are to be developed and
     applied in the use of such technology for data mining in
     order to--
       (A) protect the privacy and due process rights of
     individuals, such as redress procedures; and
       (B) ensure that only accurate information is collected,
     reviewed, gathered, analyzed, or used.
       (8) Any necessary classified information in an annex that
     shall be available, as appropriate, to the Committee on
     Homeland Security and Governmental Affairs, the Committee on
     the Judiciary, the Select Committee on Intelligence, and the
     Committee on Appropriations of the Senate and the Committee
     on Homeland Security, the Committee on the Judiciary, the
     Permanent Select Committee on Intelligence, and the Committee
     on Appropriations of the House of Representatives.
       (c) Time for Report.--Each report required under subsection
     (a) shall be--
       (1) submitted not later than 180 days after the date of
     enactment of this Act; and
       (2) updated not less frequently than annually thereafter,
     to include any activity to use or develop data mining engaged
     in after the date of the prior report submitted under
     subsection (a).

  Mr. LEAHY. Mr. President, I am pleased today to join with Senators
Feingold, Sununu and others to introduce the Federal Agency Data Mining
Reporting Act of 2007. This important privacy legislation would begin
to restore key checks and balances by requiring Federal agencies to
report to Congress on their datamining programs and activities. We
joined together to introduce a similar bill last Congress. Regrettably,
it received no attention. This year, I intend to make sure that we do a
better job in considering Americans' privacy, checks and balances, and
the proper balance to protect Americans' privacy rights while fighting
smarter and more effectively against security threats.
  In recent years, the Federal Government's use of data mining
technology has exploded. According to a May 2004 report by the General
Accounting Office, there are at least 199 different government data
mining programs operating or planned throughout the Federal Government,
with at least 52 different Federal agencies currently using data mining
technology. And, more and more, these data mining programs are being
used with little or no notice to ordinary citizens, or to Congress.
  Advances in technologies make data banks and data mining more
powerful and more useful than at any other time in our history. These
can be useful tools in our national security arsenal, but we should use
them appropriately so that they can be most effective. A mistake can
cost Americans their jobs and wreak havoc in their lives and
reputations that can take years to repair. Without adequate safeguards,
oversight and checks and balances, these powerful technologies also
become an invitation to government abuse. The government must take
steps to ensure that it is properly using this technology. Too often,
government data mining programs lack adequate safeguards to protect the
privacy rights and civil liberties of ordinary Americans, whose data is
collected and analyzed by these programs. Without these safeguards,
government data mining programs are prone to produce inaccurate results
and are ripe for abuse, error and unintended consequences.
  This legislation takes an important first step in addressing these
concerns by pulling back the curtain on how this Administration is
using this technology. It does not by its terms prohibit the use of
this technology, but rather provides an oversight mechanism to begin to
ensure it is being used appropriately and effectively. This bill would
require Federal agencies to report to Congress about its data mining
programs. The legislation provides a much-needed check on federal
agencies to disclose the steps that they are taking to protect the
privacy and due process rights of American citizens when they use these
programs.
  We need checks and balances to keep government data bases from being
misused against the American people. That is what the Constitution and
our laws should provide. We in Congress must make sure that when our
government uses technology to detect and deter illegal activity that it
does so in a manner that also protects our most basic rights and
liberties. This bill advances this important goal, and I urge all
Senators to support this important privacy legislation.
                                 ______