The  Weizmann  Institute  of  Science
                  Faculty of Mathematics and Computer Science


                            Computer Science Seminar

                                Hava Siegelmann
                                Technion - Haifa


                                 will speak on


                      Learning Algorithms for Data Mining
                       and Search in Information Systems

Abstract:
Given the exponential flowering of data in information systems and especially
the internet, it has become crucial to develop tools which can extract
information with high precision and accurate ranking.  Current internet search
engines require domain knowledge of the subject being hand (in terms of
Keywords or an Example site) or alternatively use a long interactive search. A
system that learns the user's preferences and guides him through the search
could be superior in speed of use and accuracy of results.  Concentration on
relevant documents and sites is facilitated by the grouping of individual
documents into clusters of related documents.  Techniques for automatic
clustering are therefore useful both for gaining general information about the
database and for expediting searches.


This talk will propose three learning algorithms:  1. An automatic interactive
   method that guides the user to reach the most relevant documents. This
   algorithm is individulalized to provide documents according to the user's
   personal wishes by learning while providing the service. The technique is
   based on supervised learning with active queries.  This algorithm can be
   applied on top of any search engine (I'll show how to do it on HITS and
   Google).

  (Joint work with Oren Schnizer, CS Technion).


2. Two new clustering algorithms for points in metric spaces
   that allow for highly irregular shapes. Both methods are particularly
   applicable for data mining in the sense that they report the geometric
   characteristics of the clusters.  Applications are shown for documents
   clustering after preprocessing of Latent Semantic Indexing.

   A. The first algorithm is based on tensor multiplication and the
      Hebb rule of unsupervised learning.  It allows for very close, non-convex
      shapes (like "worms", or shapes with holes) and overlapping clusters, and
      reports geometric features of high order statistics.

     (Joint work with Hod Lipson, MIT).

   B. The second is a hierarchical method based on kernel functions;
      it reports  the support vectors of the clusters boundaries.  (Joint work
      with David Horn of TAU, Asa Ben-Hur of Technion and Vladimir Vapnik of
      At&T).


                      The lecture will take place in the 
                    Seminar Room, Room 261, Ziskind Building
                            on Monday, May 15, 2000
                                    at 14:30