A few days ago, inquiring about projection models I was faced with a question about whether these models are purely data mining, or simply a derivation of a statistical model. The interesting thing is that all this questioning mathematical model or staging are essential for support and help for the physical and chemical sciences that are also support for data mining. If we make a precise definition of data mining, the branch would be KDD process that studies the behavior of the data, plotting mathematical and statistical models and iterative algorithms that explain and give meaning to information for decision-making.
Today data mining is a tool that is used in different industries in order to be able to solve problems of prediction, classification and segmentation, being very useful for understanding the area studied, but being for the exclusive use of certain departments. Due to the high costs involved in building models, these techniques and tools have become unreachable for some organizations, especially when the world economy is in recession.
Due to the latter, eventually data mining would not become as essential to the organization, but because this technique helps us to know and gives us the vision to where we want to go, or what we want to achieve, it becomes essential for managers to take this technique as a tool that illuminates the path of the organization.
As the horizon of data mining is definitely a light at the end of the tunnel for the whole organization (large or small), the DM (data mining) will provide necessary guidelines as to where the organization must walk to avoid getting lost in this maelstrom of economic ups and downs, just as the KDD process is directly related to the AI (Artificial Intelligence):
Relationship between KDD and Artificial Intelligence
1. The natural language presents significant opportunities for mining in free text format, specifically for automated annotation and indexing pre-compiled text classification. Limited analysis capabilities can help substantially in the task of deciding which item is concerned. Therefore, the spectrum from simple natural language processing all the way to the understanding of language can help substantially. Thus, natural language processing can significantly contribute as an effective interface to deliver aid to mining algorithms and visualizing and explaining knowledge derived by a KDD system.
2. The planning process considers a complicated data analysis. This involves the conduct of complex operations of data access and transformation, applying preprocessing routines and in some cases, pay attention to resource constraints and data access. Typically, the steps for processing data are expressed in terms of desired post and pre conditions for the application of certain routines, which easily lend themselves to be represented as a planning problem. Additionally, the ability of planning can play an important role in automated agents to collect samples of data or conduct a search for needed data sets.
3. Intelligent agents can be launched to collect the necessary information from a variety of sources. Also, information agents can remotely activate over the net or trigger the occurrence of a certain event and start your scanning operation. Finally, agents can help navigate and shape the WWW, another area of increasing importance.
4. Uncertainty in Artificial Intelligence includes problems to cope with uncertainty, appropriate inference mechanisms in the presence of uncertainty and reasoning about chance, all fundamental to the theory and practice of KDD. 5. Knowledge representation includes ontology, a new concept to represent, organize and access knowledge. Also included schemes for representing knowledge and human use prior knowledge about the underlying process by the KDD system.
Well as data mining is related to mathematics, statistics and physics, and I think the real horizon of data mining in the world today is that it should be stated as an exact science, to help the world today.




















» Already a member? Login now to comment!
» Not a member? Register to comment!
SandroSaitta said:
Dear Josue,
Thanks for your post. Very interesting, although I don't agree with the following part: "Due to the high costs involved in building models, these techniques and tools have become unreachable for some organizations [...]".
I think that a combination of mySQL and R, for example, is free of charge in terms of tools and techniques. Of course, you still need the hardware (does it still cost?) and most importantly the data miners. However, for organizations that gather sufficient data to be used by data mining techniques, investing money in analytics to support production processed and/or decision making is a must-have.
Kind regards
Tue, 2012-01-03 12:35 — SandroSaitta» Login or register now to comment!
josueoteiza said:
Sandro:
I agree with the scope of his observation, and thanks for your comments, but I meant licensed tools, because that is where the costs increase and it is necessary to add the cost of implementation and the cost of calibration models. But if you require a minimal cost analysis or cost 0, there exists a variety of these tools as R and Mysql, among others, which are very useful for data analysis.
thanks
Josue Oteiza
Tue, 2012-01-03 14:48 — josueoteiza» Login or register now to comment!
Please note: Your first comment as a registered user will be held for moderation up to 24 hours (usually less). For more information about comments on our site, please read our FAQ and terms of use.