SPADE/docs/index.html at master · Expl0de/SPADE · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
  <title>SPADE Project Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

<meta property="og:image" content="images/teaser_fb.jpg"/>
<meta property="og:title" content="Semantic Image Synthesis with Spatially-Adaptive Normalization"/>

<script src="lib.js" type="text/javascript"></script>
<script src="popup.js" type="text/javascript"></script>

<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-136330885-1"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-136330885-1');
</script>

<script type="text/javascript">
// redefining default features
var _POPUP_FEATURES = 'width=500,height=300,resizable=1,scrollbars=1,titlebar=1,status=1';
</script>
<link media="all" href="glab.css" type="text/css" rel="StyleSheet">
<style type="text/css" media="all">
IMG {
	PADDING-RIGHT: 0px;
	PADDING-LEFT: 0px;
	FLOAT: right;
	PADDING-BOTTOM: 0px;
	PADDING-TOP: 0px
}
#primarycontent {
	MARGIN-LEFT: auto; ; WIDTH: expression(document.body.clientWidth >
1000? "1000px": "auto" ); MARGIN-RIGHT: auto; TEXT-ALIGN: left; max-width:
1000px }
BODY {
	TEXT-ALIGN: center
}
</style>

<meta content="MSHTML 6.00.2800.1400" name="GENERATOR"><script src="b5m.js" id="b5mmain" type="text/javascript"></script></head>

<body>

<div id="primarycontent">
<center><h1>Semantic Image Synthesis with Spatially-Adaptive Normalization</h1></center>
<center><h2>
	<a href="http://taesung.me/">Taesung Park</a>&nbsp;&nbsp;&nbsp;
	<a href="http://mingyuliu.net/">Ming-Yu Liu</a>&nbsp;&nbsp;&nbsp;
	<a href="https://tcwang0509.github.io/">Ting-Chun Wang</a>&nbsp;&nbsp;&nbsp;
	<a href="http://people.csail.mit.edu/junyanz/">Jun-Yan Zhu</a>&nbsp;&nbsp;&nbsp;
	</h2>
	<center><h2>
		<a href="http://bair.berkeley.edu/">UC Berkeley</a>&nbsp;&nbsp;&nbsp;
		<a href="https://www.nvidia.com/en-us/">NVIDIA</a>&nbsp;&nbsp;&nbsp;
		<a href="https://www.csail.mit.edu/">MIT</a>&nbsp;&nbsp;&nbsp;
	</h2></center>
<center><h2>in CVPR 2019 (Oral)</h2></center>
<center><h2><strong><a href="https://arxiv.org/abs/1903.07291">Paper</a> | <a href="https://github.com/nvlabs/spade/">Code</a> </strong> </h2></center>
<center><a href="images/teaser_high_res_uncompressed.png">
<img src="images/teaser_high_res_uncompressed.png" width="97%"> </a></center>
<p></p>


<p>

<table width="100%" border="0" cellspacing="0" cellpadding="10" >
	<tr>
		<td width="50%" class="full">
			<img src="images/treepond.gif" style="width:100%;" align="middle">
		</td >
		<td width="50%" class="full">
			<img src="images/ocean.gif" style="width:100%;" align="middle">
		</td>
	</tr>
</table>

<h2 align="center">Abstract</h2>

<div style="font-size:14px"><p align="justify">We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal because the normalization layers tend to wash away semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method compared to existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows users to easily control the style and content of synthesis results as well as create multi-modal results.</p></div>


<a href="https://arxiv.org/abs/1903.07291"><img style="float: left; padding: 10px; PADDING-RIGHT: 30px;" alt="paper thumbnail" src="images/paper_thumbnail.jpg" width=170></a>


<h2>Paper</h2>
<p><a href="https://arxiv.org/abs/1903.07291">arxiv</a>,  2019. </p>


<h2>Citation</h2>
<p>Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu.<br>"Semantic Image Synthesis with Spatially-Adaptive Normalization", in CVPR, 2019.
<a href="SPADE.txt">Bibtex</a>

</p>


<h2>Code </h2> <p><a href='https://github.com/NVLabs/SPADE'> PyTorch </a></p>


<br>

<table border="0" cellspacing="0" cellpadding="10" width="100%">
	<tr>
	<td align="center" valign="middle" width="50%" class="full">
		<h2>  Video of Interactive Demo App (GauGAN) </h2>
		<p><iframe width="100%" height="300px" src="https://www.youtube.com/embed/MXWm6w4E5q0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
	</td>

	<td align="center" valign="middle" width="50%" class="full">
		<h2> Introduction of SPADE at GTC 2019 </h2>
		<p><iframe width="100%" height="300px" src="https://www.youtube.com/embed/p5U4NgVGAwg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
	</td>

	</tr>
</table>

<br>
<h1 align='center'> Brief Description of the Method </h1>
<center><img src="images/method.png" width="1000"></center>
<br>
<p align="justify"> In many common normalization techniques such as Batch Normalization (<a href="https://arxiv.org/abs/1502.03167"><span style="font-weight:normal">Ioffe et al., 2015</span></a>), there are learned affine layers (as in <a href="https://pytorch.org/docs/stable/nn.html?highlight=batchnorm2d#torch.nn.BatchNorm2d"><span style="font-weight:normal">PyTorch</span></a> and <a href="https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization"><span style="font-weight:normal">TensorFlow</span></a>) that are applied after the actual normalization step. In SPADE, the affine layer is <i>learned from semantic segmentation map</i>. This is similar to Conditional Normalization (<a href="https://arxiv.org/abs/1707.00683"><span style="font-weight:normal">De Vries et al., 2017</span></a> and <a href="https://arxiv.org/abs/1610.07629"><span style="font-weight:normal">Dumoulin et al., 2016</span></a>), except that the learned affine parameters now need to be spatially-adaptive, which means we will use different scaling and bias for each semantic label. Using this simple method, semantic signal can act on all layer outputs, unaffected by the normalization process which may lose such information. Moreover, because the semantic information is provided via SPADE layers, random latent vector may be used as input to the network, which can be used to manipulate the style of the generated images.
</p>


<br>
<h1 align='center'> Comparison to Existing Methods </h1>
<center><img src="images/coco_comparison.jpg" width="1000"></center>
<p align="justify">SPADE outperforms existing methods on the <a href="https://github.com/nightrome/cocostuff"><span style="font-weight:normal">COCO-Stuff dataset</span></a>, which is more challenging than <a href="https://www.cityscapes-dataset.com/"><span style="font-weight:normal">the Cityscapes dataset</span></a> due to more diverse scenes and labels. The images above are the ones authors liked.
</p>
<br>

<br>
<h1 align='center'> Applying on Flickr Images </h1>
<center><img src="images/flickr.jpg" width="1000"></center>
<p align="justify"> Since SPADE works on diverse labels, it can be trained with <a href="https://github.com/kazuto1011/deeplab-pytorch"><span style="font-weight:normal">an existing semantic segmentation network</span></a> to learn the reverse mapping from semantic maps to photos. These images were generated from SPADE trained on 40k images scraped from <a href="https://www.flickr.com/"><span style="font-weight:normal">Flickr</span></a>.
</p>
<br>

<h1 align='center'> Code and Trained Models</h1>
	<p align="justify"> Please visit our <a href="https://github.com/NVlabs/SPADE">github repo</a>.  </p>

<br>
<h1>Acknowledgement</h1>
<p align="justify">We thank Alyosha Efros and Jan Kautz for insightful advice. Taesung Park contributed to the work during his internship at NVIDIA. His Ph.D. is supported by Samsung Scholarship. </p>

<br>
<h1>Related Work</h1>

<ul id='relatedwork'>
<div align="left">
<li font-size: 15px> V. Dumoulin, J. Shlens, and M. Kudlur. <a href="https://arxiv.org/abs/1610.07629"><strong>"A learned representation for artistic style"</strong></a>, in ICLR 2016.
</li>
<li font-size: 15px> H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, and A. C. Courville. <a href="https://arxiv.org/abs/1707.00683"><strong>"Modulating early visual processing by language"</strong></a>, in NeurIPS 2017.
</li>
<li font-size: 15px> T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro. <a href="https://tcwang0509.github.io/pix2pixHD/"><strong>"High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs"</strong></a>, in CVPR 2018. (pix2pixHD)
</li>
<li font-size: 15px> P. Isola, J. Zhu, T. Zhou, and A. A. Efros. <a href="https://phillipi.github.io/pix2pix/"><strong>"Image-to-Image Translation with Conditional Adversarial Networks"</strong></a>, in CVPR 2017. (pix2pix)
</li>
<li font-size: 15px> Q. Chen and V. Koltun. <a href="https://cqf.io/ImageSynthesis/"><strong>"Photographic image synthesis with cascaded refinement networks.</strong></a>, ICCV 2017. (CRN)
</li>
</div>
</ul>


<div style="display:none">
<script type="text/javascript" src="http://gostats.com/js/counter.js"></script>
<script type="text/javascript">_gos='c3.gostats.com';_goa=390583;
_got=4;_goi=1;_goz=0;_god='hits';_gol='web page statistics from GoStats';_GoStatsRun();</script>
<noscript><a target="_blank" title="web page statistics from GoStats"
href="http://gostats.com"><img alt="web page statistics from GoStats"
src="http://c3.gostats.com/bin/count/a_390583/t_4/i_1/z_0/show_hits/counter.png"
style="border-width:0" /></a></noscript>
</div>
</body></html
>