This respository contains an official impemention of paper "Fine-Granularity Face Sketch Synthesis" submitted to ICASSP2024, the complete code will be released upon publication.
- Analysis 1
- The difficulty of Case-1 includes two aspects. The first is the generation of the glasses. It can be seen that the first four methods all generate white spots on the glasses near the nose, while CAGAN and our method generate the clear glasses. The second is the broken beard, where CAGAN fails to generate correct beard. The person in the photo has no beard at the bottom of his chin, but CAGAN ``completes'' it. The other methods successfully generated the correct results.
- The reflection on the glasses is the focus of Case-2. For CycleGAN, DualGAN and PS2MAN, the results encounter the blurring problem near the reflection area. Pix2Pix is misled to generate double top border of the glasses. CAGAN and our method success to deal with such challenge. This means the introduction of face parsing to face sketch synthesis can effectively solve the problem of glass reflection.
- The focus of Case-3 is on the left side of the picture. There is an isolate strand of hair. Pix2Pix fails to generate such ``outlier''. CycleGAN and DualGAN tackle this problem successfully. PS2MAN generates a blurred sketch with many details lost. CAGAN and our method generate the strand as well, while there are some artifacts near the strand in the results of CAGAN. In addition, there are some gaps in the hair. CycleGAN, DualGAN and our method generate the gaps successfully. The rest methods fail to show the gaps faithfully.
- Case-4 is selected from CUHK. Case-3 and Case-4 are selected from different sub-datasets, but the generation of a small strand keeps consistent. In the generation of scattered hair part, CycleGAN and DualGAN come first. Our method produces clear hair as well, while the noises in our result are far less than those by CycleGAN and DualGAN. Pix2Pix produces interrupted lines for the bangs.
- Among all the methods, the main difference is the result of teeth. Overall, Pix2Pix, PS2MAN and CAGAN fail to generate the sketch of teeth, while CycleGAN, DualGAN and our method succeed. However, CycleGAN and DualGAN produce much more noises on the face than those by other methods. Such problem may be caused by their generation strategies. The two methods tend to generate all the details, including the light change on the face. This is also the reason why they are able to generate the teeth. Their strategies are more like a style transfer. However, our method can learn the local information from Stage-1. Then, the URN-GAN in Stage-2 can solve the problem of noise on the face, and reserve the local fine-grained information.
- All the methods except PS2MAN generate a relatively good results for Case-6. Our method generates the best hair with a faithful trend and texture. Although the hair trends in the results of CycleGAN and DualGAN are faithful as well, they produce too many white parts, leading to the misunderstanding of white hair.
- On XM2VTS, PS2MAN fails to generate good sketches among the cases. CycleGAN produces some abnormal white spots on the forehead. In Case-7, the slight reflection on the glasses does not affect the generation. CAGAN encounters the blurring problem on the area of collar. The shadows of spectacles frames become the main noise in this case. Pix2Pix, CycleGAN and DualGAN are misled to generate the sketch of shadows, while CAGAN and our method distinguish such noise and generate correct sketches.
- In Case-8, CycleGAN and DualGAN fail to solve the problem of reflection as well. However, they generate a good texture of hair. As discussed in Case-5, such details are related to the style-transfer-like strategy. CAGAN generates a sketch with obviously light color and some noises on the chin. This may be caused by the difference of skin color. Our method shows its robustness despite such change on skin colors and light conditions.
- The same problem has arisen in the result of CAGAN as Case-8. Over all the cases in XM2VTS, DualGAN gives clear and detailed sketch-style results. However, our method gives real sketches, which are more close to the ground truth drawn by artists, instead of sketches more like the results of style transfer.
- Analysis 2
- CycleGAN gives faithful transfer from the original photo, including the incompleteness. What's more, the white spot occurs again in Case-1. The other methods succeed to generate a complete face. Pix2Pix generates asymmetrical eyes which are not consistent to the photo. Comparing CAGAN and our method, our method generates a more real right ear and more clear edge of face. This is due to the learning of local fine-grained information. Our model learns that ear is one of the key facial components and how to generate the correct sketch.
- All the methods fail to generate the slightly open mouth in Case-2 except CycleGAN. However, CycleGAN also generates a weird mouth. In the result of our method, there is a tendency of grinning in the mouth, and a clear neck is ``imagined''.
- In Case-3, CycleGAN gives the best teeth generation. CAGAN and our method also generate an open mouth. Comparing these two methods, our results are more real for there is distinction with teeth, rather than a piece of white in the result of CAGAN.
- There is a extreme reflection in Case-4. Pix2Pix generates a blurred right eye due to this reflection. CycleGAN just gives a white part. The right eye in the result of CAGAN is ``flattened''. However, our method gives a good result of right eye which has similar clarity and shape to the left eye. This is due to the knowledge learned in Stage-1. CAGAN knows where the right eye is with the help of face parsing result, but does not know how to generate a correct eye when the corresponding area is almost completely obscured by reflection.
- In Case-5, the guide of face parsing lose effectiveness. The beard disturbs the parse of the face plate. Therefore, CAGAN and our method fail to generate the correct beard.
- There are two representative features of CUFSF in Case-6, i.e., reflection and incompleteness. CycleGAN just gives white spots as Case-4 for reflection. Pix2Pix and CycleGAN fail to complete the face. Comparing the results of CAGAN and our method, our method gives a sketch with clearer edge. It even gives a little part of the left ear, which is totally omitted in the photo.

