1. Capsule networks by Geoffrey Hinton:

There is a big hype around what seems to be a new approach in deep learning even though the basic idea was invented by Hinton long ago. Lately he demonstrated SotA results using a relatively shallow network with this new special concept. There is a lot of philosophy around the mechanism that allows that which Hinton mentions in his talks.

Papers:

http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf

https://arxiv.org/pdf/1710.09829.pdf

https://openreview.net/pdf?id=HJWLfGWRb

Blogs and video tutorials:

https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

https://jhui.github.io/2017/11/03/Dynamic-Routing-Between-Capsules/

https://jhui.github.io/2017/11/14/Matrix-Capsules-with-EM-routing-Capsule-Network/

https://www.youtube.com/watch?v=rTawFwUvnLE&t=3589s (Hinton explaining)

https://www.youtube.com/watch?v=YqazfBLLV4U&t=2353s

comments:

There can be found many tutorials with intuition regarding CapsNets, but it is very important to also describe the architecture and mechanisms thoroughly and mathematically.
Note: There is a strong connection between this approach and the old classical Hough-transform (Hinton mentions it himself in his lecture). It is advised for the speakers to also read about this method if they are not familiar with it.

2. VQA to Visual dialogue by Dhruv Batra’s group

A press-report about Facebook having to shut down an experiment after crazy bots invented a secret language was published a few months ago. This description is, of course, not accurate. It relates to a work by Dhruv Batra’s group; A remarkable set of papers dealing with vision, natural language and deep reinforcement learning. Starting from visual question answering and going through bots chatting with each other about images.

Papers:

https://arxiv.org/abs/1505.00468

https://arxiv.org/abs/1612.00837

https://arxiv.org/abs/1611.08669

https://arxiv.org/abs/1706.08502

https://arxiv.org/abs/1610.02391

https://arxiv.org/abs/1703.06585

Blogs and video tutorials:

https://www.youtube.com/watch?v=7cGbl_muKIY&t=1545s

https://www.youtube.com/watch?v=Xbl-rQls77U&t=733s

https://www.youtube.com/redirect?event=video_description&v=Xbl-rQls77U&redir_token=T_qSxKjF6sppTacA1qFHH2PfLSV8MTUxODAyMDQ4OUAxNTE3OTM0MDg5&q=http%3A%2F%2Fvideolectures.net%2Fsite%2Fnormal_dl%2Ftag%3D1137915%2Fdeeplearning2017_parikh_batra_deep_rl.pdf

Comments:

Consider perhaps a tutorial about deep RL as pre-reading.
Familiarity with RNNs is also needed.
Familiarity with basic image captioning works may be a good basis too.
Try to tell it as s full story and not as separate works (get inspired from the attached slides and video lecture).

3. Deep image enhancement and restoration

CNNs are very effective in image processing tasks.

Papers:

https://cv.snu.ac.kr/publication/conf/2017/DeepDeblur.pdf (Deblurring)

https://cv.snu.ac.kr/publication/conf/2017/thkim_iccv2017_online.pdf (Video Deblurring)

https://cv.snu.ac.kr/publication/conf/2017/PaletteNet.pdf (Recolorization)

https://cv.snu.ac.kr/publication/conf/2017/EDSR.pdf (EDSR super-resolution)

https://arxiv.org/abs/1701.01698 (Class aware denoising)

https://arxiv.org/abs/1803.02735 (Back projection nets for super-resolution)

https://dmitryulyanov.github.io/deep_image_prior (deep image prior)

Blogs and video tutorials:

https://www.youtube.com/watch?v=nSugL7HsKmg

comments:

This is a large set of papers, decide wisely on which ones you should focus more. Try to bundle similar tasks.

4. Domain transfer

A recent challenge in AI is transferring images from one domain to another. An even more advanced challenge is doing it without having example pairs of images but only two large sets. In the words of Yaniv Taigman: “True AI need no explicit supervision”.

Papers:

https://arxiv.org/abs/1611.07004 (Pix2Pix- Efros)

https://arxiv.org/abs/1703.10593 (CycleGAN- Efros)

https://papers.nips.cc/paper/6650-toward-multimodal-image-to-image-translation.pdf (BiCycleGAN- Efros)

https://openreview.net/pdf?id=BkN_r2lR- (Analogies across domains- Wolf)

https://arxiv.org/pdf/1706.00826.pdf (DistanceGAN- Wolf)

Blogs and video tutorials:

https://affinelayer.com/pixsrv/

https://affinelayer.com/pix2pix/

https://www.youtube.com/watch?v=AxrKVfjSBiA

https://hardikbansal.github.io/CycleGANBlog/

https://www.youtube.com/watch?v=JvGysD2EFhw

Comments:

CycleGAN was presented last year, should be reviewed briefly as basis for some of the other papers.
Familiarity With GANs and Conditional GANs needed.

5. Special architectures for non-local CNNs

This is a set of papers about non-regular ConvNets architectures that make use of non-local data.

Papers:

https://arxiv.org/abs/1711.07971 (Non local neural networks)

http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf (deformable convolutions)

https://arxiv.org/abs/1611.06757 (Non-local denoising: CNNs + nearest neighbors)

Blogs and video tutorials:

https://www.youtube.com/watch?v=HRLMSrxw2To

https://medium.com/@phelixlau/notes-on-deformable-convolutional-networks-baaabbc11cf3

6. Advanced Recurrent Neural Networks

A set of recent RNN approaches with emphasis on images and video.

Papers:

https://arxiv.org/pdf/1502.02367.pdf (GRU)

https://arxiv.org/pdf/1502.03240.pdf (CRF as RNN)

https://arxiv.org/abs/1801.10308 (Nested LSTMs)

https://arxiv.org/abs/1511.06432 (A convolutional GRU for video)

Blogs and video tutorials:

https://jhui.github.io/2017/03/15/RNN-LSTM-GRU/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Comments:

Prerequisites: RNNs, LSTM.

7. Improving time and memory efficiency

Most papers are from Song Han’s lab at MIT. It is specialized in efficiency of DNNs. They make deep learning possible for mobile devices and find great ways of making feasible computation time for neural networks. Their famous squeeze-net gained a lot of popularity and they have very recent (to be presented at ICLR’18) impressive works.

Papers:

https://arxiv.org/pdf/1602.07360.pdf (squeeze-net 2016)

https://arxiv.org/pdf/1607.04381.pdf (dense-sparse-dense training 2017)

https://arxiv.org/pdf/1612.01064.pdf (trained ternary quantization 2017)

https://openreview.net/pdf?id=HJzgZ3JCW (sparse-Winograd convolution 2018)

https://arxiv.org/pdf/1712.01887.pdf (deep gradient compression 2018)

https://arxiv.org/pdf/1710.07739.pdf (Discrete weights reparameterization trick- Fetaya)

Blogs and video tutorials:

https://www.youtube.com/watch?v=eZdOkDtYMoo

8. Relations and compositions of objects

These are some recent novel works that present more advanced learning tasks and try to discuss reasoning and understanding in machine learning.

Papers:

https://www.cs.cmu.edu/~imisra/data/composing_cvpr17.pdf (red wine to red tomato)

https://arxiv.org/pdf/1707.03389.pdf (Deepmind- Learning Compositional Visual Concepts)

https://arxiv.org/pdf/1705.03633.pdf (Fei-Fei Lee, Visual reasoning)

https://arxiv.org/pdf/1706.01427.pdf (Deepmind - a simpodule for relational reasoning)

Blogs and video tutorials:

https://www.youtube.com/watch?v=57KKh2BIFuc

9. Analyzing and visualizing the loss of neural nets

This is a set of papers that analyzes the weight space, surfaces and minima. Trying to understand why optimization works in such a highly-non-convex surface and how do different architectures influence the optimization.

Papers:

https://arxiv.org/abs/1712.09913 (Visualizing the Loss Landscape of Neural Nets)

https://arxiv.org/pdf/1702.08591.pdf (If resnets are the answer, then what is the question?)

https://arxiv.org/pdf/1702.05777.pdf (Exponentially vanishing sub-optimal local minima)

https://arxiv.org/pdf/1707.04926.pdf (optimization landscape of over-parameterized shallow neural networks)

Comments:

It is a good basis to be aware of the conclusions (not the entire reasoning or proof) of some prior works, especially https://arxiv.org/pdf/1412.0233.pdf . a presentation about some earlier works that was presented in this course two years ago can be found here: https://prezi.com/80lg8giuws-4/minima-and-saddle-points/?utm_campaign=share&utm_medium=copy

10. Analyzing and improving GANs:

After understanding the basics of GANs, it is now time to analyze them and find ways to train them better.

Papers:

https://arxiv.org/pdf/1606.03498.pdf (Improved techniques for training GANs by Goodfellow)

https://arxiv.org/pdf/1706.08224.pdf (Do GANs learn distribution?)

http://proceedings.mlr.press/v70/arjovsky17a/arjovsky17a.pdf (Wasserstein GAN)

https://arxiv.org/pdf/1704.00028.pdf (Improved training of Wasserstein GAN)

https://openreview.net/pdf?id=SJx9GQb0- (Improving the Improved Wasserstein GAN)

https://arxiv.org/pdf/1609.03126.pdf (Energy based GANs by LeCun)

11. Detection and Segmentation

Papers:

https://arxiv.org/abs/1311.2524 (R-CNN)

https://arxiv.org/abs/1504.08083 (Fast R-CNN)

https://arxiv.org/abs/1506.01497 (Faster R-CNN)

https://arxiv.org/abs/1703.06870 (Mask R-CNN)

https://arxiv.org/pdf/1506.02640v5.pdf (YOLO)

https://arxiv.org/abs/1612.08242 (YOLO 9000)

https://arxiv.org/abs/1512.02325 (SSD)

Blogs and video tutorials:

https://www.youtube.com/watch?v=nDPWywWRIRo&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

https://www.youtube.com/watch?v=GBu2jofRJtk

https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4

12. Gradient based optimization- advanced algorithms

Since the basic use of stochastic gradient descent, a lot of advanced techniques appeared. The most popular is ADAM but there are now some more recent approaches. This papers analyze the optimization algorithms and suggests new ones.

Papers:

https://arxiv.org/pdf/1412.6980.pdf (ADAM & ADAMAX)
https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ (NADAM)
https://openreview.net/pdf?id=ryQu7f-RZ (AMSgrad)
https://openreview.net/pdf?id=rJTutzbA- (Insufficiency of momentum schemes for Optimization)
https://openreview.net/pdf?id=B1YfAfcgl (Entropy-SGD)