For humans, it is natural to perceive the world around us in terms of different objects that belong to different object categories - we see people, dogs, cats, cars, buildings and many other types of objects. Computer scientists strived to create computer systems with similar capacities: systems that will be able to analyze an image and decide on their own which objects are visible in the image, and where. This problem of automatic object recognition turned out to be remarkably difficult. The reason is that objects from the same general type, or category, can vary widely in their appearance. For example, there are many different types of dogs with wide variations in shape, and furthermore, the same dog can appear very differently when it is running, or sitting, observed from the front or its side, and so on. Our brains have some rules that allow them to correctly recognize different images of dogs, and separate them from images of other object categories, such as cats.
WIS scientists found that object recognition can be approached effectively by describing each object category in terms of its characteristic fragments, or patches. For example, dog images share characteristic components such as their ears, snout, tail, legs, etc. Objects in a given category can then be described in terms of image patches depicting their shared components, as well as the typical spatial arrangement of the individual components. Automatic methods for extracting the best category-specific components, and using them for object recognition, have become an important part of automated methods for object recognition by computers. Studies of the human visual system using brain imaging methods have supported the view that similar representations are also employed by the human brain for the purpose of object recognition.