The Weizmann Institute of Science
Faculty of Mathematics and Computer Science
Computer Vision Lab

"Zero Shot" Super-Resolution using Deep Internal Learning
Assaf Shocher, Nadav Cohen, Michal Irani
This webpage presents the paper "Zero-Shot Super-Resolution using Deep Internal Learning" (CVPR 2018).
[Paper PDF] [bibtex] [Code]

Abstract
Deep Learning has led to a dramatic leap in SuperResolution (SR) performance in the past few years. However, being supervised, these SR methods are restricted to specific training data, where the acquisition of the lowresolution (LR) images from their high-resolution (HR) counterparts is predetermined (e.g., bicubic downscaling), without any distracting artifacts (e.g., sensor noise, image compression, non-ideal PSF, etc). Real LR images, however, rarely obey these restrictions, resulting in poor SR results by SotA (State of the Art) methods. In this paper we introduce "Zero-Shot" SR, which exploits the power of Deep Learning, but does not rely on prior training. We exploit the internal recurrence of information inside a single image, and train a small image-specific CNN at test time, on examples extracted solely from the input image itself. As such, it can adapt itself to different settings per image. This allows to perform SR of real old photos, noisy images, biological data, and other images where the acquisition process is unknown or non-ideal. On such images, our method outperforms SotA CNN-based SR methods, as well as previous unsupervised SR methods. To the best of our knowledge, this is the first unsupervised CNN-based SR method.

Supplementary Material

This file contains:
1. SR of Real photos experiment.
2. SR under `non-ideal' downscaling kernels (The random kernel experiment)
3. SR of poor-quality LR images (The random degradation experiment)
4. Remaining images from the paper figures

To switch between images please use the colored buttons on the right.

Relevant references:
[12] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
[14] T. Michaeli and M. Irani. Nonparametric blind superresolution. In International Conference on Computer Vision (ICCV), 2013.

1. Real Low-res images (No ground truth)

In REAL images taken under `non-ideal' conditions, SotA SR (e.g., EDSR [12]) is often only marginally better than bicubic interpolation.

ZSSR can handle such non-ideal cases.

Historic image: Check-point Charlie (end of World-War II) - SR x2

Bi-Cubic Interpolation
Input LR Image

Historic image: JFK funeral - SR x2

Bi-Cubic Interpolation
Input LR Image

Historic image: Abraham Lincoln photograph - SR x2

Bi-Cubic Interpolation
Input LR Image

iPhone image - SR x3

Bi-Cubic Interpolation
Input LR Image

Outdoor image downloaded from the Internet - SR x2

Bi-Cubic Interpolation
Input LR Image

2. SR under `non-ideal' downscaling kernels (The random kernel experiment)

LR images were generated using "non-ideal" (non-bicubic), but realistic, downscaling kernels (see Sec. 4.2 in the paper for more details). Each image was downscaled by a different random kernel (which is unknown to the SR algorithm).

We used the method of [14] to estimate the downscaling kernel directly from the test image, and fed it to our image-specific network ZSSR.

Note that SotA SR methods cannot benefit from knowing the downscaling kernel of a specific LR image at test time, since they were trained and optimized for a different (fixed) kernel - usually bicubic).

In such `non-ideal' (but realistic) cases, SotA SR (e.g., EDSR [12]) is often only marginally better than bicubic interpolation.

ZSSR can handle such non-ideal cases.

Below we show a few sample images. Extensive empirical evaluation on the full dataset of 100 images can be found in the paper.

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 26.74 / 0.7817 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 22.88 / 0.6608 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 25.19 / 0.7222 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 26.62 / 0.8333 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 21.36 / 0.4582 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 25.04 / 0.7874 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 28.72 / 0.7805 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 22.50 / 0.6036 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 22.51 / 0.7026 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 28.54 / 0.7626 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 25.91 / 0.7685 Input LR Image

The "ideal" (bicubic) downscaling kernel used to generate the training sets for SotA SR CNNs (e.g. EDSR [12]).

The true (unknown) downscaling kernel used to generate the LR Input Image from the HR image.

The downscaling kernel estimated directly from the LR test image (using [14]). This is fed to our image-specific CNN (ZSSR). Bi-Cubic Interpolation
(PSNR / SSIM) = 24.04 / 0.6759 Input LR Image

3. SR of poor-quality LR images (The random degradation experiment)

LR images were randomly contaminated using one of 3 types of image degradations (e.g., noise, compression artifacts, etc. -- see Sec. 4.2 in the paper for more details). The type of degradation is unknown to the SR algorithm.

SR methods tend to magnify the noise together with the image details. They increase all high frequencies in the image (even subtle noise is magnified).

Image-specific CNN (ZSSR) is relatively robust to unknown degradation. It magnifies details, while eliminating the noise. We attribute this phenomenon to the fact that image-specific details tend to recur across image scales, whereas noise artifacts do not (see Sec. 3.2 in the paper for more details).

Below we show a few sample images. Extensive empirical evaluation on the full dataset of 100 images can be found in the paper.

Bi-Cubic Interpolation
(PSNR / SSIM) = 36.44 / 0.9555
Input LR Image

Bi-Cubic Interpolation
(PSNR / SSIM) = 29.73 / 0.9044
Input LR Image

Bi-Cubic Interpolation
(PSNR / SSIM) = 25.38 / 0.6948
Input LR Image

Bi-Cubic Interpolation
(PSNR / SSIM) = 25.79 / 0.7097
Input LR Image

Bi-Cubic Interpolation
(PSNR / SSIM) = 28.27 / 0.8974
Input LR Image

Bi-Cubic Interpolation
(PSNR / SSIM) = 28.94 / 0.7711
Input LR Image

Bi-Cubic Interpolation
(PSNR / SSIM) = 29.44 / 0.6763
Input LR Image

4. Remaining images from the paper figures. (All other images from the paper figures were included above).

Ideally (bicubic) downscaled image with strong internal repetitive structures - SR x3

Bi-Cubic Interpolation
(PSNR / SSIM) = 17.20 / 0.7872
Input LR Image

SR under Aliasing (LR was sampled by a Delta-function) - SR x2

Bi-Cubic Interpolation
(PSNR / SSIM) = 29.48 / 0.8204
Input LR Image