The evaluation setup is as follows. The human segmentations are used to construct ground-truth segmentation by thresholding the number of votes each foreground pixel received from each human subject; this produces a binary segmentation that will be used as ground truth. To assess the consistency of a given segmentation with the ground-truth we use the F-measure, which is the harmonic mean of precision and recall measures calculated on the foreground pixels. For each image we report its F-measure score, and then the average F-measure score on the entire database is served as the final score. Our evaluation is comprised of two tests: The evaluation process consists of two tests:
- Single segment test - In each run this test seeks the single segment who fits the best the foreground, according to the F-measure score.
- Fragmentation test - This test premits a union of segments which largely overlap with the foreground object. Then similar to the single segment test this union of the segments tested for consitancy, according to the F-measure score.