free adobe software downloads Online Software Store - Buy Cheap Software Downloads hp psc 750 software downloadsmp4 software download Online Software Store adobe pagemaker software downloadcute pdf software download Software Reviews nec 525 software download
 
 
Home articles & posts interviews "Learning Has Just Started" - an interview with Prof. Vladimir Vapnik
"Learning Has Just Started" - an interview with Prof. Vladimir Vapnik PDF Print E-mail
Written by Ran Gilad-Bachrach   
Sunday, 02 March 2008 00:00
Article Index
"Learning Has Just Started" - an interview with Prof. Vladimir Vapnik
Learning Has Just Started - Part II
Learning Has Just Started - Part III
All Pages

As a part of the renovation of the learningTheory.org web site, we are launching a series of interviews with leading researchers in learning theory and related fields. We are proud that Prof. Vladimir Vapnik accepted our invitation to be the first to be interviewed.

    Prof. Vapnik has been working on learning theory related problems for more than four decades. Together with Alexey Chervonenkis he studied the problem of uniform convergence of empirical means and developed the VC theory. He also developed the large margin principles and the Support Vector Machines algorithm.

R-GB: Thank you for accepting our invitation to be the first one to be interviewed for learningtheory.org. Can you tell us what your current research directions are?

V-V: My current research interest is to develop advanced models of empirical inference. I think that the problem of machine learning is not just a technical problem. It is a general problem of philosophy of empirical inference. One of the ways for inference is induction. The main philosophy of inference developed in the past strongly connected the empirical inference to the inductive learning process.

    I believe that induction is a rather restrictive model of learning and I am trying to develop more advanced models. First, I am trying to develop non-inductive methods of inference, such as transductive inference, selective inference, and many other options. Second, I am trying to introduce non-classical ways of inference. Here is an example of such an inference. In the classical scheme, given a set of admissible indicator functions {f(x)} and given a set of training data, pairs (xi,yi)Î X´{±1}, one tries to find the best classification function in this set. In the new setting, called master-class learning, we have also given a set of admissible functions  {f(x)}  and our goal is also to find the best classification function in this set. However, for the training data we are given additional information: we are given triplets (xi,x*i,yi), where vectors x*i belong to space X* (generally speaking, different from the space $X$). These vectors are carriers of "hidden information" about vectors x (they will not be available during testing). These vectors can be a special type of holistic description of the data.

    For example, when you have a technical description x of the object and have some impression x* about this object you have two forms of description: a formal description and a holistic description or Gestalt description. Using both descriptions during training can help to find a better decision function. This technique remains master-class learning, like musicians training in master classes. The teacher does not show exactly how to play. He talks to students and gives some images transmitting some hidden information - and this helps. So, the challenge is to create an algorithm which using additional information, will generalize better than classical algorithms.

    I believe that learning has just started, because whatever we did before, it was some sort of a classical setting known to classical statistics as well. Now we come to the moment where we are trying to develop a new philosophy which goes beyond classical models.

R-GB: You gave the example of master-classes where you see this additional information, can you give another example?

V-V: Consider for example a figure skating coach. The coach cannot skate as well as a good young skater; nevertheless, he can explain how to ski. The explanation is not in technical details but something like what you should more focus on or giving you some images you should think of. You can look at it as if it is just blah-blah-blah, but it is not.

    This type of description contains hidden information that affects your choice of a good rule. We checked this opportunity in the digit recognition task. We developed metaphoric descriptions of all digits of the training set, and used these descriptions to improve performance, and it works. This is what real learning is about: it uses technical description x and uses hidden information provided by the teacher in a completely different language to create a good technical decision rule.

R-GB: Do you think this setting can be formalized in the same sense that uniform convergence was formalized?

V-V: It is easy to formalize it, and you can use it with well-known algorithms like support vector machines. In support vector machines one uses many independent slack parameters in the optimization process. Hidden information leads to a restricted set of admissible slack functions which have a smaller capacity than all possible slack functions used in classical SVMs. Many of these ideas were discussed in the after-word in the second edition of my 1982 book "Estimation of Dependencies Based on Empirical Data".

    I believe that something drastic has happened in computer science and machine learning. Until recently, philosophy was based on the very simple idea that the world is simple. In machine learning, for the first time, we have examples where the world is not simple. For example, when we solve the "forest" problem (which is a low-dimensional problem) and use data of size 15,000 we get 85%-87% accuracy. However, when we use 500,000 training examples we achieve 98% of correct answers. This means that a good decision rule is not a simple one, it cannot be described by a very few parameters. This is actually a crucial point in approach to empirical inference.

    This point was very well described by Einstein who said "when the solution is simple, God is answering". That is, if a law is simple we can find it. He also said "when the number of factors coming into play is too large, scientific methods in most cases fail". In machine learning we dealing with a large number of factors. So the question is what is the real world? Is it simple or complex? Machine learning shows that there are examples of complex worlds. We should approach complex worlds from a completely different position than simple worlds. For example, in a complex world one should give up explain-ability (the main goal in classical science) to gain a better predict-ability.



Last Updated ( Wednesday, 09 April 2008 04:37 )