
DATA MINING
Data mining is the process of extracting hidden patterns from data. As more data is gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.

While data mining can be used
to uncover patterns in data samples, it is important to be aware that the use of
non-representative samples of data may produce results that are not indicative
of the domain. Similarly, data mining will not find patterns that may be present
in the domain, if those patterns are not present in the sample being "mined".
There is a tendency for insufficiently knowledgeable "consumers" of the results
to attribute "magical abilities" to data mining, treating the technique as a
sort of all-seeing crystal ball. Like any other tool, it only functions in
conjunction with the appropriate raw material: in this case, indicative and
representative data that the user must first collect. Further, the discovery of
a particular pattern in a particular set of data does not necessarily mean that
pattern is representative of the whole population from which that data was
drawn. Hence, an important part of the process is the verification and
validation of patterns on other samples of data.
The term data mining has also
been used in a related but negative sense, to mean the deliberate searching for
apparent but not necessarily representative patterns in large amounts of data.
To avoid confusion with the other sense, the terms
data dredging and
data snooping are often used.
Note, however, that dredging and snooping can be (and sometimes are) used as
exploratory tools when developing and clarifying hypotheses.