The Weizmann Institute of Science Faculty of Mathematics and Computer Science Computer Science Seminar Hava Siegelmann Technion - Haifa will speak on Learning Algorithms for Data Mining and Search in Information Systems Abstract: Given the exponential flowering of data in information systems and especially the internet, it has become crucial to develop tools which can extract information with high precision and accurate ranking. Current internet search engines require domain knowledge of the subject being hand (in terms of Keywords or an Example site) or alternatively use a long interactive search. A system that learns the user's preferences and guides him through the search could be superior in speed of use and accuracy of results. Concentration on relevant documents and sites is facilitated by the grouping of individual documents into clusters of related documents. Techniques for automatic clustering are therefore useful both for gaining general information about the database and for expediting searches. This talk will propose three learning algorithms: 1. An automatic interactive method that guides the user to reach the most relevant documents. This algorithm is individulalized to provide documents according to the user's personal wishes by learning while providing the service. The technique is based on supervised learning with active queries. This algorithm can be applied on top of any search engine (I'll show how to do it on HITS and Google). (Joint work with Oren Schnizer, CS Technion). 2. Two new clustering algorithms for points in metric spaces that allow for highly irregular shapes. Both methods are particularly applicable for data mining in the sense that they report the geometric characteristics of the clusters. Applications are shown for documents clustering after preprocessing of Latent Semantic Indexing. A. The first algorithm is based on tensor multiplication and the Hebb rule of unsupervised learning. It allows for very close, non-convex shapes (like "worms", or shapes with holes) and overlapping clusters, and reports geometric features of high order statistics. (Joint work with Hod Lipson, MIT). B. The second is a hierarchical method based on kernel functions; it reports the support vectors of the clusters boundaries. (Joint work with David Horn of TAU, Asa Ben-Hur of Technion and Vladimir Vapnik of At&T). The lecture will take place in the Seminar Room, Room 261, Ziskind Building on Monday, May 15, 2000 at 14:30