Challenge 2 (Semi-Supervised Action Recognition in the Dark) Datasets

Track 2: (Semi-Supervised) Action Recognition from Dark Videos

About the Dataset

The ARID[1] dataset is a dataset dedicated for the action recognition task in dark videos (without additional sensors such as IR sensor). It is the first of its kind to our knowledge. Meanwhile, we intend to elevate current public dataset to be used in conjunction with unlabeled dark videos to build action recognition models robust to poor illumination. Such methods would be more eco-friendly and computation resource efficient with the exclusion of costly video annotation.

    Dataset and baseline report: Arxiv Springer
  • Note that for this challenge track we have updated the dataset described in the report. The updated dataset contains more scenarios but with the same amount of classes.

Training & Evaluation

In this challenge, the participant teams are allowed to use external training data that are not mentioned in the Description, including self-synthesized or self-collected data; but they must state so in their submissions ("Method description" section in Codalab). The ranking criteria will be the Top-1 Accuracy and the Cross-Entropy Loss on the testing set.

Semi-supervised Action Recognition in the Dark

Participants are expected to perform action recognition in dark videos in a semi-supervised manner, utilizing labeled videos collected from common public video datasets, and unlabeled dark videos. We provide a curated subset of labeled clear videos from HMDB51[2], UCF101[3], Kinetics-600[4], and Moments in Time[5], that includes a total of 2,625 videos from 11 classes: drink, jump, pick, pour, push, run, sit, stand, turn, walk, and wave. To boost the effectiveness of the approaches on real dark videos, we provide another unlabeled set of 3,088 videos with the same 11 classes, collected from the ARID dataset, which might be used at the participants’ discretization.
Note that these 3,088 dark videos are strictly PROHIBITED to be used by manually labeling the videos. The final action recognition test would be performed on a hold-out test set of real dark videos from the ARID dataset with 3,103 videos. The hold-out test set contain the same classes as the provided training set.

If you have any questions about this challenge track please feel free to email

[1] Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J. and See, S., 2020. ARID: A New Dataset for Recognizing Action in the Dark. arXiv preprint arXiv:2006.03876.
[2] Jhuang, H., Garrote, H., Poggio, E., Serre, T. and Hmdb, T., 2011, November. A large video database for human motion recognition. In Proc. of IEEE International Conference on Computer Vision (Vol. 4, No. 5, p. 6).
[3] Soomro, K., Zamir, A.R. and Shah, M., 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402.
[4] Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P. and Suleyman, M., 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.
[5] Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K., Bargal, S.A., Yan, T., Brown, L., Fan, Q., Gutfreund, D., Vondrick, C. and Oliva, A., 2019. Moments in time dataset: one million videos for event understanding. IEEE transactions on pattern analysis and machine intelligence, 42(2), pp.502-508.