"Drop The GAN: In Defense of Patch Nearest Neighbors as as Single Image Generative Models

The Weizmann Institute of Science
Faculty of Mathematics and Computer Science
Computer Vision Lab

Drop The GAN: In Defense of Patch Nearest Neighbors as Single Image Generative Models
Accepted to CVPR 2022
Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, Michal Irani
[arxiv] [Code]

Abstract
Image manipulation dates back long before the deep learning era. The classical prevailing approaches were based on maximizing patch similarity between the input and generated output. Recently, single-image GANs were introduced as a superior and more sophisticated solution to image manipulation tasks. Moreover, they offered the opportunity not only to manipulate a given image, but also to generate a large and diverse set of different outputs from a single natural image. This gave rise to new tasks, which are considered "GAN-only". However, despite their impressiveness, single-image GANs require long training time (usually hours) for each image and each task and often suffer from visual artifacts. In this paper we revisit the classical patch-based methods, and show that – unlike previously believed – classical methods can be adapted to tackle these novel "GAN-only" tasks. Moreover, they do so better and faster than single-image GAN-based methods. More specifically, we show that: (i) by introducing slight modifications, classical patch-based methods are able to unconditionally generate diverse images based on a single natural image; (ii) the generated output visual quality exceeds that of single-image GANs by a large margin (confirmed both quantitatively and qualitatively); (iii) they are orders of magnitude faster (runtime reduced from hours to seconds).

Supplementary Material

1. Diverse image generation based on a single image
2. Conditional Inpainting
3. Structural Analogies
4. Retargeting
5. Collage
6. Editing
7. User-Study Images
8. Runtime and Memory
9. Implementation Details

1. Diverse Image Generation Based on a Single Image

GPNN generates many realistic and visually appealing images based on a single image.
We show 50 random generations per image for a few sample images, both for GPNN and SinGAN [23].
The runtime of GPNN (ours) is ~2sec, while the runtime of SinGAN [23] is ~1hr.

Single Source Image
GPNN (Ours)
SinGAN [23]

2. Conditional Inpainting

We present more examples for the conditional inpainting task.

Source Image

Source Image

Source Image

Source Image

Source Image

Source Image

Source Image

Source Image

3. Analogies

Additional examples for Image-to-Image translation (structural analogies, sketch-to-image). GPNN results are compared with [3].
Please use the buttons on the right to flicker between the images (GPNN (Ours), Structural Analogies [3]).
The runtime of GPNN (ours) is ~10sec, while the runtime of Structural Analogies [3] is ~8.5hrs.

Input A
Input B

A to B

B to A

Input A
Input B

A to B

4. Retargeting

Additional examples for Retargeting. GPNN results are compared with [24, 36].
Please use the buttons on the right to flicker between the results (GPNN (Ours), InGAN [24], Bidirectional Similarity [26]).
The runtime of GPNN (ours) is 10-30sec, while the runtime of InGAN [24] is 1-4hrs and the runtime of Bidirectional Similarity [26] is ~3min.

Source
GPNN (Ours)

Source

GPNN (Ours)

Source
GPNN (Ours)

Source

Ours

Source

GPNN (Ours)

Ours

Source

GPNN (Ours)

Source
Ours

5. Collage

We present more examples for the collage task. Our results are compared with Bidirectional Similarity [26].
The runtime of GPNN (ours) is ~20sec, while the runtime of Bidirectional Similarity [26] is ~3min.

Source Images

GPNN (Ours)

Source Images

GPNN (Ours)

Source Images

Source

6. Editing

We present more examples for the editing task, compared with [23, 26].

Source Image
Edit

7. User-Study Images

Please view the user-study images and details in AMT.pdf

8. Runtime and Memory

The computational bottleneck in GPNN is computing the distances matrix D_i,j, the normalized scores S_i,j and finding nearest neighbors (steps 2-4 in Fig.3 in the main text). Given n query patches and m key-value pairs of patches, these matrices are of size n×m. GPNN utilizes a GPU to compute these entries in parallel, for speed. Furthermore, GPNN exploits the fact that each operation uses only 1 row/column, and stores only small parts of the matrix at a time. This reduces the memory footprint of GPNN from O(n·m) to O(n+m). We acknowledge that approximate nearest-neighbor-search algorithms (e.g., PatchMatch [2]) have better asymptotic complexity of O(n+m). While our current implementation has a runtime of O(n·m), it is part of our future plans to incorporate approximate-nearest-neighbors search into GPNN, to further reduce its runtime.

9. Implementation Details

The patch size is 7×7 and is fixed at all scales. Like [23], the pyramid depth N is chosen such that the patch size is half of the image height. The downscaling factor r is 4/3, and all down/up-sampling operations are done using bicubical kernel. PNN is applied 10 times at each scale. We set the noise std σ to be 0.75.

Relevant references:
[2] Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. ACM Trans. Graph.,28(3):24, 2009.
[3] Sagie Benaim, Ron Mokady, Amit Bermano, and L Wolf. Structural Analogy from a Single Image Pair. In Computer Graphics Forum. Wiley Online Library, 2020.
[23] Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. SinGAN: Learning a Generative Model from a Single Natural Image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4570-4580, 2019.
[24] Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. InGAN: Capturing and Remapping the "DNA" of a Natural Image. In arXiv, 2019.
[26] Denis Simakov, Yaron Caspi, Eli Shechtman, and Michal Irani. Summarizing Visual Data Using Bidirectional Similarity. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE, 2008.

Drop The GAN: In Defense of Patch Nearest Neighbors as Single Image Generative Models
Accepted to CVPR 2022

Abstract

Supplementary Material

1. Diverse image generation based on a single image
2. Conditional Inpainting
3. Structural Analogies
4. Retargeting
5. Collage
6. Editing
7. User-Study Images
8. Runtime and Memory
9. Implementation Details

Source Image		Source Image
Source Image		Source Image
Source Image		Source Image
Source Image		Source Image

Drop The GAN: In Defense of Patch Nearest Neighbors as Single Image Generative Models Accepted to CVPR 2022

Abstract

Supplementary Material

1. Diverse image generation based on a single image 2. Conditional Inpainting 3. Structural Analogies 4. Retargeting 5. Collage 6. Editing 7. User-Study Images 8. Runtime and Memory 9. Implementation Details

Drop The GAN: In Defense of Patch Nearest Neighbors as Single Image Generative Models
Accepted to CVPR 2022

1. Diverse image generation based on a single image
2. Conditional Inpainting
3. Structural Analogies
4. Retargeting
5. Collage
6. Editing
7. User-Study Images
8. Runtime and Memory
9. Implementation Details