What makes an image memorable? Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude Oliva, MIT
Motivation Database
1) Simple image stats? Pixels
e.g. mean hue, brightness, number of objects
GIST
2) Global image features?
Prediction algorithm
What image content matters?
pixel histograms, GIST, SIFT, HOG, SSIM
SIFT
HOG
SSIM
3) Object segmentation statistics?
Stacked RGB histograms.
l = 0.16
l = 0.46
Database: 2222 photographs from SUN database (Xiao et al. 2010). 665 participants on Amazon’s Mechanical Turk. 1-7 back
“Aquarium”
4) Object and scene semantics?
+
+
...
100
+ Memory repeat
time
91-109 back
90%
100
300
500 700 Predicted rank N
900
1100
l= 0.75
Group 1 Group 2 Chance
Predicted memorable
Predicted average
Average
(computed per image)
Forgettable
mountain
0
tree
sky
building
Objects ranked according to object score (averaged across images) 200 1000 1800 Image rank N, according to specified group
Wide range of memorabilities and high inter-subject consistency
Average memorability for top N ranked 75 images (%) 70
l = 0.43
0
200
400 600 800 Predicted rank N
Errors -- Overpredicted
Errors -- Underpredicted
Conclusions − 0.15
40%
80
Predicted forgettable
80% Average % memorability, according to 70% Group 1, of 25 images 60% centered about rank N 50%
All Global Features l = 0.46
Object score = (prediction when object included in image’s feature vector) - (prediction when object removed)
Objects shaded according to object score
100%
l = 0.43
Same again, but with HOG2x2 descriptors, each of whcih is a stack of 2x2 neighboring HOG descriptors.
70
What content makes an image memorable?
+
l = 0.41
Same construction as SIFT, but with SSIM descriptors.
5) HOG2x2
85
l = 0.38
Filter bank with 8 orientations, 4 scales, averaged over a 4x4 grid. RBF kernel.
4) SSIM
l = 0.20
l = 0.50
l = 0.22
Dense descriptors quantized into visual words and summarized in spatial pyramid histogram.
Rich features necessary
Vigilance repeat
2) GIST 3) Dense SIFT
Average 80 memorability for top N ranked images (%) 75
number, size, and rough position of each object class; scene category
Memorability = probability of correctly detecting a repeat after a single view of an image in a long stream.
Other humans l = 0.75 Global features and annotations l = 0.54 Chance
85
histograms of object segments counts, sizes, and rough positions
Memory Game
Memorable
1) Pixel histograms
Prediction algorithm: SVM Regression with non-linear kernels
Intrinsic memorability?
+
Automatic predictions
What classes of information predict memorability?
Scenes ranked according to their average memorability
+ 0.09
seats
floor
bedroom (76%)
bakery shop (81%)
person sitting person
... natural broadleaf botanical forest (52%) garden (52%) lake (52%)
campus (53%)
...
art studio (81%)
bathroom (84%)
Stable intrinsic memorability
Standardized database
People, close-ups, and human-scale objects predict memorability
Predictable from global features
900