On to Chapter Four
In Chapter One of this book we pointed out that information is not intelligence. Information is the source of intelligence. In extracting intelligence from information it is necessary to go through a process of application and activation. We also pointed out that the target of gathering is information and not intelligence, and that the source of intelligence is not identical to the source of information. So what really is information? What are its categories, attributes and functions? This is the main content of this chapter.
Chapter 3: Overview of Information
Section One -- Explanation of Symbols
Before introducing what information is, we will simply explain the problems of symbols. This will deepen our understanding and knowledge of information.
I. The Symbolic Expression of Knowledge
In order to achieve the transmission of knowledge, it is necessary to turn the knowledge in people's brains into matter. As humans desire to express knowledge, they must use the help of various classes of symbols. In their essence symbols are matter. In order for there to be communication and exchange between humans and humans, between humans and machines, and between machines and machines, symbols are essential. When people need to express more complex concepts and content of knowledge, they must use systems of symbols, strings of characters, and strings of digits.
Certain systems of symbols express certain significance. Distinct systems of symbols can express distinct knowledge content, and can express the same knowledge content. Different symbols may be interchanged, as language can be converted to writing, and writing can be converted into code.
II. Categories of Symbols
In knowledge-transmission activities, people have created a great variety of symbolic systems to meet the needs of many classes of information.
Symbolic systems can be divided into two kinds. One kind is natural symbols, like natural language and writing. Humans can directly perceive and distinguish this class of symbols. Another kind is symbols created by humans. Humans create such symbols as various kinds of codes, digit strings, and character strings for specific purposes. There are also retrieval languages. Some of these can still be directly perceived and distinguished by humans, but the majority must first be converted before humans can perceive and distinguish them. Similarly, machines do not understand human language. The only way to carry on interchange with a machine is to convert natural language into machine language.
Symbolic systems are a human convention. They are a standard for both partners in any communication.
Section Two -- What Information Is
There have always been many understandings and explanations about what information is. Overall, the concept that human beings have of information has grown gradually deeper and more complete with the development of the social capabilities that technological information has.
I. Information Is Documents
This attitude is quite widespread in China. If we say that the archives of a technical intelligence organization has such and such a quantity of information, the information that we speak of here refers to documents. Information in this sense is largely the same as the English word "document".
In order to distinguish between information and documents, people have suggested "documents are information that is expressed in writing". There is some truth to this explanation. The theory is also easy for people to grasp. In practical retrieval work, however, that definition is not universally acknowledged or it presents some intuitive or cognitive difficulty for some people.
There are two reasons why the concept that information is documents is universal and deep-rooted. One reason is that technical information work has evolved from or is an offshoot of library research. This appears even more to be the case with the work of collecting information. The fact that traditional library work uses publications as the main focus of study has had a far-reaching influence on people's understanding of information. A second reason is that written information like printed items enjoy the most pervasive application in the information activities of today's society. This has led people to consider information to be documents.
Today information is becoming more diversified all the time. Data read by machine, and information other than books and paper have appeared in great quantity. In order to solve the new problems of theory and guidance that intelligence gathering has met in practical work, the people who consider information to be documents have extended the traditional concept of documents. Now they regard "machine-readable data", "audio data", "object information", and "verbal information" as "machine-readable documents", "audio documents", "object documents", and "verbal documents". All of these, along with printed documents, are considered to be "documents". In the same vein, the people who hold this view, aside from mentioning "first-degree documents", "second-degree documents", and "third-degree documents" also talk about "zero-degree documents". With this kind of understanding, "information collection work" is "documents collection work". "Data processing work" becomes "documents processing work". Though there has really never been anything wrong with this explanation, when it is applied to collection work, however, there is a concern. Talking about document collection rather than data collection does not really adhere to the custom and mindset of people in the information age, and there is always the concern of a bias.
II. Information Is Intelligence
This understanding is quite universal in foreign countries. What they mean by science and technology [S&T] intelligence [qingbao] work is what China calls S&T information [ziliao] work. What they call S&T intelligence gathering is what China calls S&T information gathering. In this sense, the Chinese word "ziliao" is close in meaning to the English word "information".
NTIS (National Technical Information Service) and ISI (Institute for Scientific Information) in the US, VINITI (the Soviet Science and Technology Information Institute) in the USSR, and JICST (Japan Information Center of Science and Technology) in Japan all consider information to be the focus of their research.
In China, people already know intellectually and intuitively that intelligence and information are not the same thing. In expression, however, there is still a considerable amount of confusion, for instance the "intelligence information work" that we often see and hear about. Regardless of whether "intelligence" is an attribute or an appositive of "information," in either case this expression is an unclear logical concept from both a semantic and connotative perspective. At the very least it is insufficiently rigorous. Getting to the point, what is called "intelligence information work" is in fact information work. When applied to collection, it is not necessary to say "intelligence information collection"; "information collection" is altogether sufficient. This will spare people a sense of tedium and gilding the lily.
III. Information Is Intellectual Material that Serves the People's Scientific Research or Practice
In this sense of the word, "information" is very similar to the English word "material". People with this understanding believe that information is experience generalized from practical work; it is intellectual information. This new viewpoint attempts to summarize information from the perspective of information science and explain information from the angle of the human store of knowledge.
IV. Concept of Documents
In discussing the concept of documents, some people consider documents to be "the material form for recording, preserving and transmitting knowledge". Some say documents are "the medium or carrier on which knowledge content is recorded, stored and transmitted using technical methods". These interpretations tell us that in reality information is not identical to printed documents. The scope of information is much broader. For example, information includes object information. This understanding attempts to use principles from information science to treat documents from the angle of transmitting knowledge.
V. Expanding the Concept of Documents and Publications
With the rapid advance of knowledge storage technology, some information scientists and library scientists feel that rigidly adhering to the traditional library science concepts of documents and publications has led to a conflict between the progress of their work and the information needs of the actual users. They therefore have included the new information science content that has appeared with the new computer, storage and communications technology in their theories. Aside from the new concept of information, the concept of "information carrier" has come to the fore. They call the medium for recording and transmitting information content the "information carrier." Information carriers can be divided into "carriers of written information", "carriers of visual information", "carriers of sound information" and "new carriers". These "new carriers" refer to "carriers of written, graphic, and sound information that are transmitted using computers and long-range communication networks." This explanation makes a clear distinction between intelligence and information in its investigation of problems.
VI. Information Is Knowledge that Has Been Turned into Matter
We believe that information is knowledge that has been turned into matter. Some people say that information is solidified knowledge. In this sense, our word "ziliao" is close to the English word "data". This is a broad view of information.
From the perspective of the theory of knowledge, information is the material expression of human knowledge. Only through information can knowledge be expressed.
Formal logic considers information to be subsumed under "knowledge". The specific difference is that information is materialized.
In a narrow sense, information is knowledge that has been turned into symbols. Here we should understand that the use of symbols is matter or a material phenomenon. This definition fits quite well with customary usage. The symbols of which we speak here include writing and pictures, as well as various codes, character strings, and numerical strings. It also includes sound, light, and electromagnetic signals.
Overall, information is a material expression of humans' knowledge of the objective world; it is the material manifestation of the human store of knowledge. Seeing information like this in the theory of knowledge can help retrieval work become more specialized and scientific. In addition, it becomes easier to define the chosen target of research in information science and retrieval science, which will in turn accelerate the development of these sciences.
Section Three -- Categories of Information
At present there is no unified standard and method for making distinctions between classes of information. Classified according to the transmission characteristics, there are verbal, object and document information. Some are classified according to the level of processing of the information being transmitted. These are zero-degree, first-degree, second-degree and third-degree information. There is classification between disciplines, such as chemical- or electronics-related information, etc. Some information is grouped by industry, such as information used specifically in a given industry, information for commercial purposes, or scientific research data. The most universal method of grouping is by the transmission medium. Here, information is considered as printed, miniaturized, machine-readable or audiovisual data. Some people classify information according to the nature of the needs of the user.
In gathering together the human store of knowledge, our goals are to advance the formation of information science and retrieval science and to inspire retrieval work to advance to a new level. We want to quickly transform the traditional understanding and methods of restricting the target of practical retrieval work to documents. When we make distinctions between classes of information, we should move away from the classical concept of documents, and use the standard of whether or not humans can directly perceive and distinguish documents in classifying these documents.
I. Information that Humans Are Able to Directly Perceive and Distinguish
Humans can directly perceive, distinguish and utilize this category of information. Examples are printed data, miniaturized data, verbal data, and real object data.
By "perceive" we are not at all limited to the sense of sight. Linguistic data uses the sense of hearing. Braille information requires the sense of touch. In order to distinguish real object data, it is necessary to use the senses of smell and taste.
Retrieval workers have always favored printed information. Whether in the present day or far into the future, printed information is going to be the main target of retrieval.
In recent years retrieval workers have shown a strong liking for miniaturized data. Miniaturized data have now become an important target of retrieval.
Even though people customarily acknowledge that verbal and real object data are intelligence resources, there is considerable difference of opinion as to whether collecting this information is truly intelligence work. The actual situation in China at present is that intelligence departments to a greater or lesser degree have all undertaken the work of gathering and transmitting verbal and real object data, such as in academic exchanges inside and outside China and in technical exchange work. The intelligence departments, however, do not complete the main and fundamental aspects. They are directed and implemented by technology management departments, foreign affairs departments, trade and economics departments, and scholarly associations. Unfortunately, many of these departments are not connected as they do the work of attaining information. They lack mutual contact, and the overall effectiveness of retrieval work is diminished. Consequently the social utility of retrieval work is not fully realized. Nevertheless, with the arrival of the information age, people put higher and higher demands on the timeliness of the knowledge that is transmitted. The collection of verbal and real object data will receive more and more emphasis and the coordination of collection work will gradually be improved. Of course there is no need and it is not possible to assign all the work of retrieving verbal and real object data to S&T intelligence departments.
Printed, miniaturized, verbal and real object data will be discussed in more detail in Section Two of Chapter Four. We will not present them further here. We would like to emphasize one point here, however. That is how to convert the information that humans can perceive and distinguish into data that machines can perceive and distinguish. At present there are still considerable technical difficulties. We will need artificial intelligence technology and the fifth generation of computers to accomplish the task.
II. Data that Only Machines Can Perceive and Distinguish
Humans cannot directly use the knowledge expressed by this kind. Only with the help of machines to convert the knowledge into a form that humans can perceive and distinguish can people utilize it. Examples of this class include modulated light waves, electromagnetic waves, tapes, floppy disks, optical disks, and phonograph records. The appearance of this category of information shows that the information work of human beings may advance to an all-new depth and with unprecedented speed. People still do not pay enough attention to the collection of this class of information and the work is carried on in a haphazard manner. The level of application of this information to society is not yet sufficient.
At present there is still no unified standard for distinguishing between the classes of this kind of information. For example they can be divided according to the form of the carrier, as radio wave or magnetic medium data, etc. Another way of distinguishing is by the class of signals that are received, as in graphics, writing, language, natural language, or artificial language information.
A retrieval worker must certainly give full attention to high-density storage data, such as floppy disks and even optical disks. Optical disks in particular are able to gather text, images, and sound data in one. In addition they have amazingly high storage density. If optical disk data ever become more widely used, they will surely bring great change to retrieval work.
With simplicity of nomenclature in mind, some people call this class of information that can only be perceived and distinguished by machines "electronic data" or "electronic publications".
III. Classifying the Information According to the Characteristics of the Users' Demands
Whether distinguishing information by the transmission characteristics, the level of processing of the knowledge that is transmitted, or by whether or not humans can directly perceive and distinguish the information, these methods of classifying information all consider the nature of the information itself in the classification. Now, however, it is necessary to emphasize what intelligence work addresses. Distinguishing between information by the characteristics of the needs of the users will add focus to the work of retrieval, and help overcome the trend of stressing collecting information more than using it.
Shannon, the founder of information theory, has developed five classes of information depending on where it is used. They are directional information, program information, concrete activity information, product information, and revised (feedback) information. Referring to his method of classification, information may also be grouped into the following five kinds according the their function and use.
1. Directional information that is needed for a certain purpose.
2. Information needed for plans or programs that are synthesized from directional data that are needed for various purposes.
3. Information that is needed for decisions on concrete action.
4. Information that is needed by products.
5. Constantly fed back and revised information that is needed for the goals, measures, and items of extensive programs.
IV. Features and Categories of Information that are Needed for Macro-management of National Defense Technology
Different levels of leaders and leadership organs need information of different natures. The higher the level of leadership, the higher the level of synthesis of information they need, and the harder it is to predict. Generally speaking, leadership no longer urgently needs information on problems that have already been decided. They do not urgently need information on problems that they have not even considered either. For the problems they have considered but have yet to set policy on, however, they do urgently need information. What are the characteristics for classifying this kind of information? This is a concept that the gatherer of information must have in the work of gathering information for macro-management.
Of course, a large quantity of data alone will not be able to meet the needs for information in leadership policy-making. The information needed at the various stages of raising the questions, answering the questions, supervision and implementation are not the same. The classes and characteristics of information needed by the users who manage national defense technology are as follows:
1. Making distinctions by time, information can be divided into historical and predictive. The quantity of historical data is greater. These data are most helpful for leaders in the process of setting problems and in management and implementation. Predictive data are most suitable for selecting the direction, finalizing programs, and in adopting action. Statistical data, documents, ordinances, regulations and laws related to national defense technology all have a strong time element.
2. Making distinctions by level of expectation, information can be divided into predictable and unpredictable data. Predictable data are data whose appearance from a certain data source can be foreseen. This class of information is very useful to leaders as they solve problems and supervise implementation. Information retrieval personnel should practice monitoring and tracing of this class of information. Unpredictable data are data whose occurrence is not easily foreseen. This kind of data often helps leaders discover information and will influence the selection of a direction, the setting of programs and revisions of plans. The retrieval personnel must be sensitive and flexible toward this class of information.
3. Information can be divided into internal and external according to the source from which it comes. Internal may mean within China and it may mean within the work unit. External can mean foreign and it can mean outside the work unit. The higher the level of management, the more pressing the need for foreign information and information from outside the work unit. At lower levels of management, more often the information needed is from within the work unit or within China. No matter what level, however, there is always a need for internal and external information. At present the work of gathering information within China that is concerned with national defense technology management often goes beyond the responsibility of the intelligence departments. Administrative pathways are also needed to complete the work.
4. Information can be divided into specialized information and synthesized information, according to the content. If it is a manager that needs the information, it will certainly not be single, specialized information. Rather it will be synthesized information that comprises political, economic, scientific, technical and military information. The higher the level of management, the more synthesized the information that is needed. Due to historical reasons and the quality of the information collection personnel, the collection of synthesized information for now is still a difficult point in the retrieval work.
5. According to the level of organization, information can be divided into highly organized information and diffuse information. The information that has been ordered, processed and perhaps even activated by the information worker is considered highly organized information. Information that has not been processed or has been only slightly processed by the information worker is diffuse information. Generally speaking, the higher the level of policy making, the more diverse is the information that is needed. It is very difficult to gather all of the information that relates to the issue. It is also difficult to gather a lot of relevant information in time through selecting topics and looking them up. At the lower levels of management, it is easier to obtain the information that is needed, and after most of this information has already been put in order. Here we would like especially to point out that the information used for raising questions, determining a direction, and setting a plan is usually not directly available in large quantity from libraries or data banks. For example there was very little information that had already been organized and made available for China's year 2000 national defense technology strategy.
6. According to the level of compression or processing, information can be divided into detailed information and summary information. Generally the policy-making and program development information that is needed by managers is summarized data or intelligence data. Very rarely are they detailed or original data or source language information. Moreover, at the higher levels of management, there is greater need for highly condensed summary information. Gathering volumes and volumes of original data, therefore, very often will not satisfy the demands of the high-level user who manages national defense technology. Likewise large sets of series data will not meet the needs of this level of leader.
7. Classifying information according to the possibility of its occurrence. Generally speaking, not only are there few information sources related to national defense technology management and national defense development strategy on a national level, the quantity of information from those sources is very small. For this reason the information is difficult to gather. Information that deals with a concrete item of technology, however, is relatively likely to be produced. The quantity is greater, and the information is relatively easy to gather.
8. Dividing information according to the accuracy of the content. Overall, the information used by a policy maker at any level needs to be accurate. In general, when tactical policy is being made, the information needs to be accurate. When making policy on strategic problems, relatively accurate information is needed. There was never a requirement that the information needed for the year 2000 national defense technology plan be 100 percent accurate in describing reality and phenomena. 80 or 90 percent accuracy was required, though.
The following table explains the trends of the relationships between the categories of policy making and the categories of information.Table 3.1
Trends in the Relationships between the Categories of
Policy Making and the Categories of Information
Information Characteristic Category of Policy Making Tactical Strategic -------- --------- Time Historical Predictive Expectation Predictable Unpredictable Source Internal External Content Specialized Synthesized Organization Highly Organized Diffuse Compression Detailed Summarized Rate of Production High Low Accuracy High Accuracy Fairly High Accuracy
Section Four -- The Essential Elements, Attributes and Functions of Information
In this section, we focus on the nature of information, which includes the main elements that constitute information, and the attributes and functions of information.
I. The Elements of Which Information Is Composed
We have pointed out before that in a broad sense, information is materialized knowledge. In a narrower sense, information is symbolized knowledge. Information is a form of material expression of humans' understanding of the objective world. From a physical perspective, the following elements are needed to constitute information.
1. A Certain Quantity of Knowledge. Knowledge is the fruit of humans' understanding of the objective world and is the intellectual wealth of human society. Without a certain quantity of knowledge content, it is impossible to form information. A blank piece of paper, an empty tape, an electromagnetic wave that has not undergone modulation does not constitute information. A set of symbols that represent matter or a material phenomenon depicts a certain quantity of knowledge that has a specific meaning. In their practical work, gatherers of information must investigate the density and the level of processing of the knowledge of the information they will collect or that has already been collected. They must also consider the appropriateness of the knowledge content for the user as well as its originality and usefulness.
2. A Specifically Chosen Physical Quantity. If the knowledge inside people's brains is to be turned into matter, it must be represented by the variation of a specifically selected physical quantity. Braille information is expressed through the variation of magnitude and direction of mechanical force. Written and graphic information are manifested through the variation of light intensity, color, frequency and energy density. Electronic data, data in databases, radio wave data and verbal information are expressed through variations of the electric intensity of electromagnetic waves, magnetic field strength, or frequency. In gathering information, the retrieval worker must choose a specific gathering method depending on the physical quantity that was used to materialize the knowledge. For example, when we receive wireless signals, we need the appropriate receiver and signal conversion equipment. Sometimes it is necessary to use code-cracking techniques or system identification.
3. An Appropriate Carrier. The carrier is a material entity that matches the selected physical quantity. Through modulation, the carrier can express the variations of the physical quantity. Knowledge must be materialized on the carrier. Paper, magnetic media, electronic media, film, electromagnetic waves and sound waves are all carriers. Systems of human knowledge are expressed through systems of carriers that comprise various carriers. The transmission of knowledge is achieved through the movement of the carrier through space and through time. When collecting information, we can never depart from the carrier. We must therefore investigate the system's structure, physical and chemical characteristics, as well as the distribution qualities and activity characteristics of the carrier that has been modulated. There are three main categories of carriers. The first kind does not easily store knowledge, but transmits knowledge very quickly, including light, electromagnetic and sound waves. The second kind of carrier including various magnetic media, paper, compact disks and film, can both store and transmit knowledge. The third kind as exemplified by various real objects generally does not materialize knowledge in order to produce information, yet it fulfills the purpose of a carrier of knowledge.
4. Finally we would point out that energy is an element of information. It takes consumption and conversion of energy to turn knowledge into matter or symbols and to modulate that physical quantity.
II. The Attributes of Information
As the material record of the human store of knowledge, information has three basic attributes:
1. Objectivity. The objectivity of information can be understood from two aspects. First, information along with carriers are real objects that exist in nature. Once information is formed it remains forever and preserves its original appearance, unless it is deleted or destroyed. Another aspect is that information content is the expression of knowledge that exists objectively. Unless the variations of the physical quantity that are expressed in the carrier are erased, the knowledge that has been materialized will exist forever and preserve its original significance.
2. Transmission. Information can be transmitted in time and space. If information could not be transmitted, it would lose its purpose. The transmission of knowledge is achieved through the movement of information. The movement of information from the source to the user constitutes data flow. Modern data transmission frequently is assisted by the organization and adjustment of information workers and retrieval workers. The retrieval worker usually exerts some control over the transmission process. This is not the case, however, with exceptional classes of information or transmission processes.
3. Activation. Information can be activated. The knowledge that is stored in information can be directly known and distinguished by people, or it may require machines to make knowledge perceptible. The process by which humans activate information is actually the process of demodulating and re-modulating the variations of a physical quantity. It is a process of decoding and re-coding symbolic knowledge. Activation is logical processing of the knowledge content (not the external characteristics) in order to extract and produce new knowledge.
III. The Functions of Information
Information can have five functions:
1. A measure of the level of humans' knowledge of the natural world. Information is the material expression of humans' knowledge of the objective world. The deeper the knowledge that humans have of the objective world, the more accurate will be information's expression of the objective world, and the more valuable the information will be. For this reason it is said that information is a measure of humans' knowledge of the objective world.
2. A form for storing knowledge. Information is the material result of humans' understanding of the objective world. Created by humans, knowledge expresses people's understanding of the objective world. Knowledge is something that all humans possess. It is objectively stored and kept in data storage, in archives, and in databases. Information is the only form in which humans can store knowledge.
3. A method for transmitting knowledge. If the knowledge in people's brains is not made into material information then it has no use for the collective and cannot be transmitted to succeeding generations nor exchanged in any way. Only through the movement of information can knowledge be transmitted and utilized. Information is the only way that humans can transmit knowledge.
4. A tool for understanding the objective world. If people desire to know the objective world, not only do they need to have contact with that world as the object of knowing, they must also have contact from the start with the materialized store of knowledge, or information. In order to advance society, the timely and complete use of the information that keep in storage the sum of human knowledge and the conscious use of information as a tool for advancing knowledge of the objective world and expanding the reproduction of knowledge are required.
5. A fountainhead of intelligence. Intelligence is knowledge that is needed to solve a specified problem. It is a special kind of knowledge; it is a special kind of knowledge that is extracted from information. Information is not intelligence. Information is the material from which intelligence is extracted, a source material for processing knowledge. Dead information is not intelligence. Intelligence is enlivened knowledge. Information is the fountainhead from which intelligence is extracted and the source of intelligence is information.
We have explained above that in their natures and functions, intelligence and information are not the same. When we study retrieval work and further the study of retrieval science we must therefore first make clear that the target of retrieval is information and not intelligence. Retrieval work consists of information retrieval, not intelligence retrieval. Intelligence is special knowledge that is extracted from information. Information is the fountainhead of intelligence, the foundational medium for activating knowledge. The specified research target of retrieval science is information, and it is nothing else.
Section Five -- Data Banks and Databases
After information is collected together, it is first put in order, and then stored. The information is further sorted into various kinds of storage, like data banks and databases.
I. Data Banks
Generally speaking, any information that humans can directly perceive and distinguish can be accumulated in a data bank. The basic function of data banks is to store the information that humans can directly perceive and distinguish.
Information that has existed for a long time may be included directly into data banks, immediately improving the holdings. Printed information can form "the stacks", and audiovisual information forms the "tape library". Gathered together, the information in the form of samples make up a "display item collection."
Information that is short-lived cannot be directly included in the data bank. Data such as light wave signals, electromagnetic signals and sound wave signals cannot be stored in a data bank unless they are converted.
The structure and organization of data banks is a discipline in itself and is an important component of information science. How a data bank is set up and data retrieval are closely connected. From the aspect of input, setting up a data bank requires attention to the scope, quantity, quality and speed of the specified materials. As the specified user of the output of a data bank, the data retrieval department will collect the material it needs from the respective data bank according to its own collection policies, financial situation, and level of technology.
In recent years, the status and function of databases has grown day by day. Databases have already taken the place of some data banks, and have achieved certain functions that data banks cannot do. Data banks and databases each have their own strength, however, and will continue their mutually beneficial existence for a long time.
Since the sixties, many kinds of databases have been set up. This has not only advanced the development of intelligence work, intelligence technology and information science, it also sent a powerful impulse to retrieval practices and retrieval work. At present the gathering of written publications information remains the main focus of collecting practice for the worker in technology, the individual, or the departments that specialize in retrieval work. Anyone who does information retrieval work, however, whether in the target or content of retrieval, or in the methods and techniques of retrieval, and especially in the research of sources of intelligence and information, have all consciously or unconsciously become connected with databases. This connection will grow continually closer along with the development of retrieval science and technology.
1. What databases are. As to what databases are, at present there is not a consistently acknowledged definition. Some people believe that "databases are new sets of data documents that are produced and supplied by computers and that are stored and organized on magnetic media (tape and disk). Other people believe that "databases are the set of machine-readable data or information that have a certain access method in common." More simply stated, databases are computerized sets of documents, abstracts, almanacs, handbooks, dictionaries, encyclopedias, etc. From the angle of information science, databases are also no more than compendia of data. It's nothing more than people customarily calling sets of bound volumes or miniaturized data banks and calling the sets of data that are perceived and distinguished by machines databases. Without being overly rigorous, we may therefore say that databases are sets of electronic data or electronically published materials. In fact, the individual documents that are formed into databases are not necessarily all data documents.
Of course, the study of the structure and organization of databases forms an academic discipline. The main characteristics of databases are their high flexibility and the ease of expanding and revising the data that is stored, as well as the versatility of applications. Furthermore, databases are easier to share as the source for extracting information. The quantity of databases has therefore become one of the criteria by which the level of modernization in technology and intelligence work is assessed. Because database technology is closely coordinated with modern communications and computer technology, in application it is easier for networked computers to achieve real-time processing. From here it is not hard to realize that when the data retrieval worker or technical personnel is collecting information, they cannot afford to neglect gathering information from databases.
2. Categories of databases. Databases are in the process of formation as an industry that develops the national economy. Each database has a unique use. They are small and large, have all kinds of professional content, and are recorded on different media. It is therefore difficult to make distinctions of category using a single viewpoint. The most frequently seen method of categorizing databases of technological data is to divide them into databases of technology documents, databases of facts and numerical values, and management databases.
To help improve the effectiveness of information collection work, we will make the following finer distinctions of databases that are related to military technology.
(1) Databases that are directed toward leadership and leadership organs. This class includes planning and programming databases, databases of research projects, weapons and equipment databases, databases of the real industrial strength of national defense technology, databases for managing national defense technology results, databases for managing national defense patents and databases covering trade of military products. The above databases are all in the management category of databases.
(2) Databases that are directed toward technology personnel. These include Chinese and foreign discipline-related databases of technical documents, subject catalog databases, databases of abstracts, and databases of complete texts. There are also Chinese and foreign discipline-related databases of master data, numerical values, and computer software.
(3) Databases that are directed toward industries and trades in national defense technology. These include military technical databases made suitable for civilian use, databases of national defense technology results, databases of national defense patents. There are also databases that list names of work units in national defense technology and industry, databases of foreign companies supplying the military as well as databases of the technical market, business trends, product samples and military standards.
(4) Databases that are directed toward foundational work. These include databases of such materials as almanacs, handbooks, encyclopedias, dictionaries, and dictionary-like reference works.
3. Compiling and Utilizing Databases
When an information retrieval worker desires to compile a database or use a database for search purposes, he or she must consider the following factors.
(1) The relationship between the producer of the database and the host computer. The relationship between the producer of the database and the host computer is very close. An intelligence unit is at times both the producer of the database and the host computer. Some intelligence units are either the producer of the database or the host computer. Of course some basic level intelligence organs are often neither the producer of the database nor the host computer. From the standpoint of the producer of the database, the more computer hosts there are the more opportunities there are for utilizing the database that they produce. The more databases that a computer host has, the more users they will be able to attract.
If the intelligence unit is both the database producer and a host computer, then the relationship between the computer and the database is fairly simple. When the intelligence unit is not both the producer of the database and the host computer at the same time, then the following relationships are possible.
The first kind. The intelligence unit is a computer host. They buy database tapes from the database producers, and make their own databases.
The second kind. The intelligence unit is a host computer. They rent database tapes from the producers and set up their own databases.
The third kind. The intelligence unit produces databases. They rent computer time from the host computers.
The fourth kind. These are databases that are set up through coordinating the efforts of the database producers and the host computers.
When collecting database tapes or utilizing databases, intelligence units must decide which class to use. This is settled by the status, nature and function of the work unit, and by their human, material and financial resources. It is also necessary to do unified programming and coordinated development. When importing foreign databases, it is important to avoid buying the same tape twice.
(2) Utilizing the Channels for Transmission of Databases
As they undertake the work of retrieving information, the intelligence units may use three channels for transmission of the information from the database producers or host computers.
The first kind: The intermediate medium is a tape or a disk. For example, buying or renting a GRA database tape from NTIS (US) in order to use the GRA database to look up an AD report.
The second kind: Setting up a communications network with the host computer and installing terminals when necessary. It is possible to network computers and do searches for information related to national defense science and technology on systems such as Lockheed's Dialog system, System Development Corporation's Orbit system, Bibliographic Retrieval Services BRS system, or Defense Marketing Services' DMS system.
The third kind: Using printed materials that reflect the content of the database. For example, buying the printed version of the GRA contents journal from NTIS to look up information.
The expense of the first kind is rather high for intelligence units. It will be necessary to analyze the capacity for economic support as well as the level of equipment and the frequency of use of the databases that have been installed. Though the expense of the third kind is low, it is hard to fully exploit the benefits of a database. The expense of the second kind is in the middle and it appears to fit with the trend of networking computers. The key is the frequency of use. It is also necessary to consider the communications conditions.
(3) Databases in Chinese and Databases in Western Languages. The present situation in China is that both the Western language databases that are available through networks or have been purchased have a good foundation and are being utilized. However, there has not been enough emphasis on the utilization of large quantities of economic and management factual and numerical value databases from foreign countries for which there is a need.
Databases of documents in Chinese have still not been given positive support as an endeavor of foundational construction in the nation and industry or for social benefit. Though the Chinese factual and numerical databases have made a good start, there is still a need to coordinate work, reduce duplication, uphold sharing, and to gradually increase coverage and the continuity and stability of production. Overall, utilizing databases in Chinese is still difficult. China's database service is still backward.
4. The Influence of Databases on Information Retrieval Work
Modern electronic and communications technology and high-density storage technology are the technical foundation for the rapid development of the database industry. The daily progress of database technology, the ever-increasing production of databases and the continuous progress of retrieval technology on networks have had a great influence on information retrieval work. This progress will bring about a gradual reform in retrieval thinking, the steady improvement of retrieval ability, and a gradual reform of styles and methods of retrieval.
(1) The target of retrieval has been adjusted. The target of traditional retrieval is "hard" printed materials and miniaturized items that are stored and replenished in data banks. The appearance of databases requires use of both "hard" and "soft". Besides gathering "hard" materials, it is necessary to understand information and clues. According to the needs of the user, one must use communications networks to request information from outside sources and even from very distant databases.
(2) Challenging the idea of "Center of all titles". An intelligence unit traditionally has wanted to make their data bank as complete as possible and become the "Center of all titles". With the use of databases to search for information, however, it is not possible or necessary to require that all related databases be gathered together.
(3) The construction of retrieval networks has progressed. The launching and utilization of database resources is closely connected to communications networks and to the transmission networks of databases on media such as tape and disk. The need of society to gather fully and effectively database materials has driven the development of retrieval networks and especially of communications networks.
(4) The transmission speed of intelligence and information has speeded up. Because all of the intelligence and information that are output from databases are transmitted through electronic communications networks at a very high speed, it can meet the user's needs for immediate use very well.
(5) It has promoted renewal of the knowledge of retrieval personnel and their retrieval skills. Doing a good job of putting together and utilizing databases requires the growth and maturity of a large group of new cadres who are involved with retrieval work. The modernization of retrieval work has promoted a restructuring of the knowledge of retrieval personnel and the improvement of retrieval techniques.
Section Six -- Evaluation of Information
The research target of information science is information, as it is the target of retrieval work. In order to gather information, it is necessary to make selections of external characteristics or knowledge content of the data. From there the assessment of the value, the content, and the overall evaluation naturally come forth.
At present, people have not yet found a scientific and practical way to evaluate the content and worth of information. In custom, the user of the information provides the evaluation. In evaluating information, they consider if and how useful the information is for reference purposes. Some scholars consider the circulation links of the information. They use such indices as the circulation rate to evaluate information. Granted these assessment methods are good for practical use during the gathering stages. From the perspective of information science or retrieval science, however, such evaluation seems insufficient at the science and technology stages of retrieval. There is urgent need to give theoretical guidance, and bring completeness to the practical work. The authors will attempt to develop some new ways of thinking from the new angles of the content and value of information. Finally, some comprehensive methods of evaluating information from the perspective of practical use will be introduced.
I. Assessing the Value of Information
As stated in Section Two of this Chapter, information in a broad sense is materialized knowledge. In a narrower sense, information is symbolized knowledge. In summary, information is a material expression of the knowledge that humans have of the objective world. It is a material manifestation of humans' store of knowledge.
1. The Value of Information. The value of information is expressed in:
(1) Information that is needed by society or intelligence users to solve a particular problem.
(2) In the process of scientific labor the intelligence users may activate information and extract the useful knowledge, which is intelligence.
(3) This intelligence can promote the progress of science and technology. Turned into productive forces, this progress benefits the society and the economy.
(4) Production of information expends the human intellect, requires labor time, and expends a certain quantity of energy and materials.
2. Obstacles to Assessing the Value of Information. Though the value of information may be seen in the above ways, there still remain many obstacles to carrying out evaluation of information.
(1) The Obstacle of Demand. The users have a definite need for intelligence. The various social environments and quality of the users, however, lead to extremely complex specific needs. This poses the most fundamental obstacle for evaluating information.
(2) The Obstacle of Understanding. In principle, the advance and development of science and technology are the true desire in achieving China's economic vitality. In reality, this has not been universally accepted by society. There are many important people in society who consider intelligence and information work to have a "supplemental" status and function in China's economic construction. This means that it is difficult to accurately assess the value of information.
(3) The Obstacle of Indirect Benefit. Information is only able to benefit the society and the economy after the user has activated it and extracted the useful knowledge for use in their scientific and technical activities. This makes the benefits of technical information less direct, which increases the difficulty of evaluating information.
(4) The Obstacle of the Vagueness of the Value. In all of the above expressions of the value of information, it has been difficult to find clear quantities. The value was expressed with vague terms such as "extremely", "very", "average", or "not". This produces a challenge to accurately assessing the value of information.
3. Assessing the Value of Information. The evaluation of information is an objective judgement involving many indices. They comprise the degree of need for the knowledge product in the information, the ease in activation of the information, the amount of benefit, how much mental, energy and material resources are expended, and the amount of required labor time expended. Because many obstacles exist in evaluating information, it is almost too hard to begin a complete assessment of the value of information. The following ways of thinking can be useful in the assessment, however.
(1) If information is seen as a product of complex labor, then it is possible to turn their value into a complex function of simple product value.
(2) If information is seen as a product, it is possible to disregard a few factors, or over-emphasize the weight of other factors and focus on assessing the exchange value of information in the economy. In this aspect, retrieval workers have accumulated considerable experience and have found that often it is feasible to make assessments and judgments this way.
(3) Using a fuzzy evaluation method. With the help of fuzzy comprehensive evaluation of fuzzy mathematics, it is possible to analyze the value of information into a series of measurable and directly related indices. After comprehensive optimization, the value of information may be assessed. This evaluation method is both scientific and relatively easy to use. It has reliable results and has potential practical value when the retrieval department decides on policy in light of a certain class of information or certain specialized information.
II. Evaluating the Content of Information
The content of information is a specified quantity of knowledge. Intelligence is knowledge that is needed to solve a specific problem, and information is the source from which intelligence is extracted. In evaluating the content of information, we may meet with obstacles similar to those met in assessing the worth of information. The obstacle of the specificity of the needs of the user is even harder to overcome. From a research perspective we may do principled evaluation of the content of information from the angle of increasing knowledge or from the angle of problem solving.
1. Evaluating information content from the angle of increasing knowledge. The British scholar Brooks believes that knowledge is a comprehensive (structure) of concepts that are connected by relationships. He considers intelligence to be a small part of this structure. He has suggested a basic formula to be used to describe the relationship between intelligence and knowledge:
K(S) + DI = K(S + DS)where K(S) is the original knowledge structure, I is the increment of intelligence, and K(S + DS) is the improved knowledge structure obtained from this increase of intelligence, and (S is the result of the improvement.
He also notes that he has not actually assigned a definite meaning to each symbol in the equation. It would also be correct to replace DI with DK. Using DI, however, it is possible express that distinct knowledge structures may have distinct results.
Brooks also has pointed out that the increase of knowledge is not simply piling up knowledge. After intelligence has been included in the knowledge structure, what it adds is not simply more; it actually performs a certain adjustment of the knowledge structure.
Brooks' equation provides a mental avenue as we evaluate the content of information. When evaluating the content of knowledge, it is necessary to evaluate the extent to which the knowledge (intelligence) contained in the information can improve the knowledge structure that the user needs in solving a specific problem.
If people encounter a problem, in the first place they ought to have an understanding of the problem. Once a certain amount of knowledge is attained, then this knowledge can be materialized into a series of symbols of matter or material phenomena. People utilize and activate information in order to solve problems. From information they extract useful knowledge and obtain intelligence, and can thereby gain a new understanding of the problem. With increased new knowledge, the problem may obtain a partial or complete solution. This new knowledge can be materialized into a series of new symbols. Comparing this new set of symbols with the original set reveals that the relationships between the individual symbols have been adjusted, and have been ordered and organized anew. We may therefore consider the difference between new and old structures of knowledge to be a measure for evaluating the content of the information. So that people are able to perceive this measure, it must be expressed through the exchange of information or of symbols.
From the angle of increasing knowledge, it is also possible to apply fuzzy mathematics to the problem of quantifying the evaluation of information content.
2. Evaluating Information Content from the Angle of Problem Solving. When interacting with information, what people care most about is the content of the information. They are most interested in how much intelligence they can draw from the information. As such, evaluating the content of information is identical to evaluating the quantity of intelligence in each class of information. We know that information is an entity that can be seen and touched, and that intelligence is knowledge that is needed for solving problems. There is no way to observe or directly measure this intelligence. It is possible only to observe and evaluate information with the help of some accompanying phenomena that can be measured.
According to the basic tenets of Shannon's theory of information, we may see problems as an event, as a system that we must understand. If we are to solve a problem, we must understand the situation of the system. If we have complete knowledge about it, then we will be able to affirm completely the situation in which the system resides, and the problem will receive a complete solution. On the other hand, if we have no knowledge about it at all, then we know nothing about the situation of the system, and the problem will not be solved to any extent. If we have partial knowledge of a problem, then we only know the situation that the system might be in. The problem may achieve a partial solution. Every time we obtain new intelligence (new useful knowledge) from information, then the situation of this system becomes more certain to us. The possibility that we can solve the problem becomes greater. This proves that there is a close relationship between intelligence and the uncertainty of things. Part of the process of knowing is activating information and obtaining intelligence from information. The uncertainty of the situations of things is reduced through this process. For this reason we can consider this uncertainty to be a measurement of how much knowledge we have of this thing or this system. The degree to which uncertainty is reduced can be seen as a measurement of the quantity of intelligence, and as a standard by which to evaluate the content of information.
Suppose we know in advance that the probability of an event (a solution to a problem) occurring is P1. After obtaining a certain amount of intelligence from information we know that the probability of this event occurring is P2. (P2 is greater than or equal to P1.) Then the quantity of intelligence that is obtained from the information is:
I = -log2 P1/P2The unit of calculating the quantity of intelligence in this way is the bit. We may use the size of the "I" value to evaluate the content of the information.
Using "I" to evaluate the quantity of intelligence matches our general understanding. If the specific problem of the user has already been completely solved and the probability is 1, then any information that is collected has no purpose for the user. When I is 0, there is much further to go before a solution to the user's problem is found. It is then easier for the user to find the necessary intelligence in the information that is collected, and the requirements of the user are more easily met.
Finally we would like to point out that the quantity of intelligence obtained from a class of information is greatly affected by the individual intelligence level and knowledge background. The quantity of intelligence obtained may vary according to the user or researcher.
III. Comprehensive Evaluations of Information
In doing retrieval work and in selecting information, retrieval workers consider not only their own evaluations of the worth and content of the information. They must often consider some factors like the category of the information, its external characteristics and its circulation utility, and make a comprehensive evaluation of the information. Here we introduce some principles and methods for comprehensively evaluating information from the perspective of practical use.
1. Principles for Selecting the Evaluation Method. There are many methods we can use to evaluate information comprehensively. In order to decide which method to use, it is important to abide by the following principles.
(1) Ease of use. The method chosen must be made as simple as possible so that it can be used fully or used for the most part. If the method is too complicated to use, then it will have little meaning for daily retrieval work.
(2) Suitability for many classes of information. There are many classes of information, such as documents, verbal information, real object data, machine-readable data, audiovisual data and radio wave data. The evaluation method chosen should be suitable for evaluating various classes of information, so that the evaluation results may be compared.
(3) The Principle of Quantifying. The method that is chosen should employ a certain degree of quantitative analysis. These quantities may be used to assess the value of information.
(4) The Principle of Qualitative Adjustment. Because of the vagueness and relativity of the value of information, human subjectivity may have an excessively large role in the assessment. It is therefore necessary when choosing an evaluation method to use qualitative measures to adjust the quantification, and revise the results of the quantified assessment.
2. Index Systems for Evaluation of Information
(1) Reliability. How close the knowledge contained in the data matches actual practice and real results. Only when information is reliable can it have considerable value for activation. If information is not reliable, it will be less valuable to the user. In studying the characteristics and conditions of the intelligence sources, an understanding of the patterns of the external features may make it easier to judge the reliability of the information content.
(2) Suitability. Suitability is how useful the knowledge contained is to the user. It refers to the social and economic benefit after the information has been activated and scientifically processed by the user. The content of information must match the intelligence needs of the user. Specific users need information of specific content. One kind of information will be more useful to one user than it will to another. Investigating the suitability of the information must be carried out with the research of the user's needs, the information source and the information circulation in mind.
(3) Timeliness. Timeliness is both the originality of the content of the information and the time it takes for the information to be transmitted from the information source to the retrieval department. The time lag for national defense technology information is generally quite large.
(4) Availability. Availability means how easy or difficult the information is to obtain. The organ or individual that produces the information must protect their own political, technical and economic interests. They must also often enact security measures in the movement of some information. Hence there is public information, internal information, confidential information, secret information, and top secret information. The higher the level of secrecy, the less available the information is. Market conditions may also affect the availability of information. Generally speaking, once the reliability and suitability conditions are met, classified information has the most intelligence. By contrast, national defense technology information is rather difficult to obtain.
(5) Ease of Decoding. How easy is information for people to understand. Data are encoded systems of symbols. There are all kinds of symbols, and a great variety of encoding methods. Some symbolic systems are easy for people to understand and others difficult. Some decoding is easy, and some decoding is harder. For example the frequent problems encountered in language and writing. People have no way of understanding machine-readable data unless they are converted.
(6) The Network Element. Has information been adequately distributed in macro-intelligence and mid-intelligence systems? Though certain information may have considerable reliability, suitability and timeliness, if it already exists at a network point, there is less need to obtain that information.
(7) Economics, or the price of the information. People are recognizing the commercial attributes of information more and more. In fact, the intelligence contained in information is not always in direct proportion to its price. At times, an inverse relationship may exist. Nowadays, funds for information retrieval are short everywhere and the cost of information is on the rise. People therefore pay more attention to the economic factor in evaluating information.
At present it is not possible to find one best scientific and practical method or standard for comprehensively evaluating information. The experience method of evaluation, however, has been in use for many years and the theoretical quantification method is being studied.
3. The Experience Evaluation Method. This is mainly conducting evaluation based on the practical experience of the retrieval worker. This method is the one used most frequently and most extensively in actual retrieval work. When a retrieval worker selects information, it is not always a choice made according to a qualitative or quantitative evaluation of information. Frequently the decision is made according to experience.
The experience evaluation method is simple to use, but because the experience, history, and knowledge background of the retrieval worker restrict it, it is not easy to evaluate information completely and accurately. It is possible that individual preference will affect the decision.
4. The Individual Scoring Method. First a system of indices for evaluating information is set up. Each index is assigned a set of standards, and then the retrieval workers assign a point value to each index. Then the scores of each index are added up for each of the various classes of information to give the overall score for a class (or category) of information. Finally qualitative revision is done according to the total point value, and ordering and selection of the information can be completed.
Because the users of some specified information have expectations and demands for each index that is evaluated, it is possible to weight the various indices according to their importance. The score of each index is multiplied by the index weight and then summed to give the total. This evaluation result is more likely to be to be closer to the actual situation.
In its essence, this evaluation method combines both qualitative and quantitative methods. Though it reflects a comprehensive evaluation of information to a certain degree, the method is still very rudimentary, and not very accurate. The main reason is that the scoring is still very subjective. The individual evaluator's work history, knowledge background and understanding of the standards for the index are all unique. The evaluation results may therefore have low validity, or even lead to an opposite conclusion. The method of direct evaluation by the individual however is convenient to blend with the experience evaluation method, and will certainly provide some practical reference verification when selecting information.
Organizing evaluation committees of retrieval workers in theory would make the evaluation results more accurate. Each committee member would separately score each data set according to his or her own evaluation program. All of the data would be statistically analyzed and then put in order by the final point total. Granted this method of evaluation to a certain extent eliminates the deviation due to human individual differences. However, Since this method is rather troublesome to use, it does not have very much real significance for retrieval work.
5. Fuzzy Evaluation Methods. Fuzzy mathematics studies and processes vague phenomena. The value of information has such vagueness, so the evaluation of information is also vague. The use of fuzzy methods to evaluate information has drawn much attention.
(1) Basic Assumptions:
- Acknowledging that evaluation of information is vague, and believing that methods of fuzzy mathematics may turn the vague evaluation into precise measurement.(2) Arranging the evaluation committee. An intermediate link is needed to quantify the fuzzy evaluation. The judgments and opinions of the members on the committee form the basis of the quantification. Since the information is evaluated for retrieval purposes, the committee is composed mainly of retrieval workers. The committee members should be appropriate for the task. The committee should have authority, be representative, and be just.
- With a focus on the comprehensive result of factors, emphasis on overall optimization.
- Belief that the evaluator can use vague information and his or her own knowledge and experience to make the correct decision.
- Use of computers to process information quickly, accurately and reliably.
(3) Setting up the index system and its standards. Due to the many classes of information, there are many indices that need to be assessed. The goal is to address the key conflicts, and make the system of indices detailed and complete, so that it can fully reflect the objective value of the data. The system must not excessively increase the amount of data processing, yet the evaluation indices must also be set up to reflect the objectivity, transmission ability, and capacity for activation of the information. The indices must not be overloaded and should be easy for the evaluators to understand. These are the difficult points in selection. For example, the following eight indices might be chosen for evaluation: reliability, appropriateness, originality, timeliness, availability, ease of decoding, network attributes, and economy. When appropriate some of the less important indices may be disregarded.
After the evaluation indices have been decided, each index must be divided into levels, such as "extremely", "very", "generally" and "not". Then each level of each index is given content of principle. This gives the reference standards table for evaluating the indices. With this table, the evaluators can evaluate the information by making a check mark for a certain level of a particular index of the information that is evaluated. The single-factor fuzzy matrix can be obtained according to the judgment results from the committee.
The term rij in the above fuzzy matrix represents the membership grade of level j of index i. rij is less than or equal to one.
If eight indices are being evaluated, and there are four levels, then R is a fuzzy matrix with eight lines and four columns.
(4) Determining the Weight Coefficient. Considering only the factors in the matrix is not enough. Because the importance of each index and its influence on the value of the information vary from one index to another, it is necessary to assign a weight coefficient ak to each index.
Various methods may be used in determining the weight coefficient ak, such as the experience method, the Delphi Approach and the voting-statistical method.
If there are m evaluation indices, then
A = ( a1, a2, ... am)
If there are eight evaluation indices, A is a fuzzy matrix with one line and eight columns.
(5) Setting the mathematical model. Once we have the evaluation index matrix R and the weight coefficient matrix A, then we may obtain the fuzzy evaluation matrix B per the fuzzy comprehensive evaluation method.r11 r12 ...... r1n B = A · R = ( a1, a2, ... am ) ( r21 r22 ...... r2n ) rm1 rm2 ...... rmn = ( b1, b2, ... bn )where b1 is the sum of the first level for all the indices, and b2, b3, ... bn for the subsequent levels. bj is less than or equal to 1. The comprehensive evaluation matrix uses a value between 0 and 1 to express the overall evaluation result given to the particular information category by the evaluators.
(6) Determining the Evaluation Standards. In order to sequence and compare all the information that is being evaluated, it is necessary to do another weighting. This second weighting matrix is:f1 F = ( f2 ) fn
where fj is the weighting of the jth level. Adjoining matrices B and F givesf1 G = B · F = ( b1, b2, ... bn ) (f2) = b1f1 + b2f2 + ... + bnfn fn
This is the comprehensive score for the evaluated information.
The fuzzy evaluation method is simple and easy to use because all the evaluators need to do is make a check mark. This evaluation method may be used on many classes of information and the results are reliable. It may also be used in coordination with qualitative evaluation methods. This method of evaluation has potential application value in the actual work of evaluating information.
Section Seven -- The Present State of Technological Information and Trends for Development
Many scholars have done analysis and prediction related to the present state of technological information and trends for development using "documents" as an indicator. They have pointed out the exponential growth of the quantity of documents, the overlapping of document content, the scatteredness of documents, the decrease of time before documents are no longer useful, the diversity of carriers, the continuing increase of languages, the rapid increase of translated documents, the increasing trend toward industrialization, and the increasing seriousness of the "time lag" problem. We would like to set forth some ideas on the present state and trends for development of technological information from the broad perspective that information is materialized knowledge.
I. The Rapid Increase in the Production of Information
At present, the human store of knowledge is very abundant and is becoming more plentiful all the time. As a material sign of humans knowledge, the rate of production of information has continued to increase rapidly, and the speed is increasing all the time. This has led to the great variety and quantity of information today and the consequent challenges of finding the right information in this new sea of data.
The rate of increase of information is not the same for the various fields. The rate of increase of information in science and technology is higher than that of basic sciences information. For a long time, high technology information has been produced more rapidly than general science information.
II. The Proportion of Machine-Readable Data Increases Daily
Nowadays people most frequently use vision and printed materials to access information. There is a clear trend toward an increasing proportion of machine-readable data. There will come a day when the main way in which people utilize information is on the foundation of machine-readable data.
A technological revolution is occurring in the world today. The central content of this revolution constitutes the nation's and even the world's information systems. In the future people will no longer rely solely on their brain memories in doing information work. Instead, humans will join forces with information systems and computers. The memory and search tasks will be allotted to information systems. The only requirement is that the speed at which knowledge is turned into matter be rapid, that the data can be transmitted quickly from the information source to the user and that information can be found rapidly in the data banks and databases. All of these tasks are easily achieved with database information. Machine-readable data fits the information society. It will therefore develop rapidly, and gain special support and the protection of state policy.
The quantity of databases produced is ever increasing and new electronic books, magazines and newspapers appear all the time. They may become the main way in which people access information. These electronic media, however, will never completely replace visual materials. The information that humans can examine with their eyes will always coexist with information that is "examined" by machines. The two will complement each other and have areas in common.
III. The Extensive Future of Miniaturized Materials
Miniaturized data are printed materials that have been reproduced in miniature on a photosensitive medium. The main kinds of miniaturized data are microform, microfiche, and micro-cards. In recent years, laser holograms have also appeared.
Miniaturized data have small volume, are easy to transmit, and are inexpensive. They are growing in popularity among retrieval workers.
The following two factors contribute to the broad future of development for miniaturized data:
1. In contrast with printed items, miniaturized data may be easily integrated with computer search systems. Miniaturized data may be the input film and the output film for computers, thus significantly increasing the speed at which the data is processed.
2. Holographic data that are produced using laser hologram technology will make high-density storage possible. The development of fiber optics transmission technology will increase the value of miniaturized data tenfold.
IV. The Status and Role of Verbal Information Increases Daily
Verbal information is an important component of information that has always been valued. When people have a problem, they first hope to solve it through direct verbal communication. The specificity of verbal information is strong, transmission is fast, and feedback is immediate. These assets are acknowledged the world over.
The following reasons account for the high status of verbal information in the information age:
1. Telephone technology, satellite communications technology, and other such modern communications methods have made it possible for people to have direct verbal communication though they may be 10,000 miles apart. The advance of modernized transportation--in particular the aviation industry--has shrunk the distances between people and increased the opportunity for face to face direct verbal exchange.
2. The progress of artificial intelligence technology and the launching of the fifth generation of computers have made direct dialog between humans and machines possible. One day it may become possible for machines to directly understand and process human natural language. That will lead to a great increase in the status of verbal information.
3. Security is very important for knowledge concerning national defense science and technology. The private ownership of know-how knowledge is more and more pronounced. When information of this nature is desired, verbal information are often the most helpful. There are frequent occurrences of "laying bare a secret with a single remark".
These days, verbal exchange activities are more frequent. Every year there are thousands of international conferences on technology scholarship and technical exchange. The range and frequency of such activity is high for scientists. In the past, China's technical personnel, for various reasons, have had very limited opportunities for international interaction. As the state policy of reform and opening-up has been put into practice, however, the situation has improved immensely. Scholarly and technical exchange within China faces many new problems due to the trend of commercializing technology, however.