UG2 Prize Challenge evaluates algorithms for enhancement of images/videos at scale. The most successful and innovative teams will be invited to present at the CVPR 2018 workshop. We provide two challenge categories:

  1. Image Enhancement to Facilitate Manual Inspection
  2. Image Enhancement to Improve Automatic Object Recognition

Each challenge will have:

  • $25K awarded to the top scoring entry (including travel money to attend CVPR 2018)
  • $12.5K awarded to the runner up (including travel money to attend CVPR 2018)

Participants may submit algorithms to both of the challenges, with a limit of 3 algorithms submitted per challenge. The registration form must be completed by one of the team contributors, indicating the team's affiliation, contact information and challenges in which the team will participate.


1. Image Enhancement to Facilitate Manual Inspection

The first challenge will be an evaluation of the qualitative enhancement of images (super-resolution, de-noising, and deblurring are within scope here) from a sequestered testing dataset. At least one test in the challenge will include calibration chart ground-truth: charts with vertical lines and color calibration data.

Scoring in this case will make use of human raters voting on perceived improvement. In order to do this, we will use Amazon’s Mechanical Turk service to crowdsource the judgements over large populations of raters. A likert-scale based task will be deployed, whereby a reasonably large number of raters will determine which image, the original or the restored/enhanced version, is of higher quality.

Further, inspired by visual psychophysics, rater reaction time will be used to gauge the difficulty of presented image pairs. This will allow us to assign weights to each result (i.e., if a rater takes a long time to determine that an enhanced image is better, it may not be as good as a result where the rater immediately made the same conclusion). This will be factored into the final score.

2. Image Enhancement to Improve Automatic Object Recognition

The second challenge will be an evaluation of classification improvement. The evaluation protocol will allow participants to make use of some within dataset training data, and as much out of dataset training data as they would like for training / validation purposes. Participants will not be tasked with the creation of novel classification algorithms.

In order to establish good baselines for classification performance before and after the application of image enhancement and restoration algorithms, the testing will make use of a selection of common deep learning approaches to recognize annotated objects and then considered the correct classification rate. Namely, the Keras versions of the pre-trained networks VGG16 and VGG19, Inception V3, and ResNet50.

Each candidate restoration or enhancement algorithm will be treated as an image pre-processing step to prepare sequestered test images to be submitted to all four networks, which serve as canonical classification references.

Classification Metrics

The networks used for the UG2 classification task return a list of the ImageNet synsets along with the probability of the object belonging to each of the synsets classes. However, taking into account that in some cases it is impossible to provide a fine-grained labeling for the annotated objects, we defined 31 super-classes for the dataset most of them composed by more than one ImageNet synset. For example, the Car super-class would contain ImageNet synsets such as n02930766: cab, n03594945: jeep, n04467665: trailer truck, etc.

That is, each annotated image i has a single super-class label Li which in turn is defined by a set of ImageNet synsets Li={s1, s2, ..., si}. For a complete list of the equivalencies between UG2 superclasses and ImageNet synsets, see: UG2's' supplemental material

To measure accuracy, we observe the number of correctly identified synsets in the top 5 predictions made by each pre-trained network. A prediction is considered to be correct if it's synset belongs to the set of synsets in the ground-truth super-class label. We use two metrics for this. The first measures the rate of detection of at least 1 correctly classified synset class. In other words, for a super-class label (Li = {s1, ..., sn}), a network is able to detect 1 or more correctly classified synsets in the top 5 predictions. The second measures the rate of detecting all the possible correct synset classes in the super-class label synset set. For example, for a super-class label (Li = {s1, s2, s3}), a network is able to detect 3 correct synsets in the top 5 labels.