Introduction for ViSha
ViSha is a short name for “Video shadow detection dataset”. ViSha includes 120 videos with diverse content, varying length, and object-level annotations. More than half videos are from 5 widely-used video tracking benchmarks (i.e., OTB[1], VOT[2], LaSOT[3], TC-128[4], and NfS[5]). The remaining 59 videos are self-captured with different hand-held cameras, over different scenes, at varying times. The frame rate is adjusted to 30 fps for all video sequences.
Eventually, ViSha contains 120 video sequences, with a totally of 11,685 frames and 390 seconds duration. The longest video contains 103 frames and the shortest contains 11 frames.
Quickly know ViSha
Form of ViSha
To provide guidelines for future works, we randomly split the dataset into training and testing sets with a ratio of 5:7. The 50 training set and 70 testing set can be downloaded above this page.
If you download ViSha and unzip each file, you can find the dataset structure as follows:
▾ <train>/
▾ images/
▾ baby_cat/
00000001.jpg
...
▾ baby_wave1/
00000001.jpg
...
...
▾ labels/
▾ baby_cat/
00000001.png
...
▾ baby_wave1/
00000001.png
...
...
▾ <test>/
...
Statistics of ViSha
ViSha include many shadow attributes:
Visualization of the statistics of ViSha. (a) Shadow categories. (b) Ratio distribution of the shadows. (c) Mutual dependencies among shadow categories in (a).
Citation
If you utilize ViSha in your work, please cite our paper as follows
@inproceedings{chen21TVSD,
author = {Chen, Zhihao and Wan, Liang and Zhu, Lei and Shen, Jia and Fu, Huazhu and Liu, Wennan and Qin, Jing},
title = {Triple-cooperative Video Shadow Detection},
booktitle = {CVPR},
year = {2021}
}
Paper arXiv link: https://arxiv.org/abs/2103.06533