Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude

Phillip Isola, Jianxiong Xiao, Antonio Torralba, ... ImageMemorability CVPR poster v4 Created Date: 6/16/2011 8:04:49 PM...

13 downloads 577 Views 3MB Size
What makes an image memorable? Phillip Isola, Jianxiong Xiao, Antonio Torralba, Aude Oliva, MIT

Motivation Database

1) Simple image stats? Pixels

e.g. mean hue, brightness, number of objects

GIST

2) Global image features?

Prediction algorithm

What image content matters?

pixel histograms, GIST, SIFT, HOG, SSIM

SIFT

HOG

SSIM

3) Object segmentation statistics?

Stacked RGB histograms.

l = 0.16

l = 0.46

Database: 2222 photographs from SUN database (Xiao et al. 2010). 665 participants on Amazon’s Mechanical Turk. 1-7 back

“Aquarium”

4) Object and scene semantics?

+

+

...

100

+ Memory repeat

time

91-109 back

90%

100

300

500 700 Predicted rank N

900

1100

l= 0.75

Group 1 Group 2 Chance

Predicted memorable

Predicted average

Average

(computed per image)

Forgettable

mountain

0

tree

sky

building

Objects ranked according to object score (averaged across images) 200 1000 1800 Image rank N, according to specified group

Wide range of memorabilities and high inter-subject consistency

Average memorability for top N ranked 75 images (%) 70

l = 0.43

0

200

400 600 800 Predicted rank N

Errors -- Overpredicted

Errors -- Underpredicted

Conclusions − 0.15

40%

80

Predicted forgettable

80% Average % memorability, according to 70% Group 1, of 25 images 60% centered about rank N 50%

All Global Features l = 0.46

Object score = (prediction when object included in image’s feature vector) - (prediction when object removed)

Objects shaded according to object score

100%

l = 0.43

Same again, but with HOG2x2 descriptors, each of whcih is a stack of 2x2 neighboring HOG descriptors.

70

What content makes an image memorable?

+

l = 0.41

Same construction as SIFT, but with SSIM descriptors.

5) HOG2x2

85

l = 0.38

Filter bank with 8 orientations, 4 scales, averaged over a 4x4 grid. RBF kernel.

4) SSIM

l = 0.20

l = 0.50

l = 0.22

Dense descriptors quantized into visual words and summarized in spatial pyramid histogram.

Rich features necessary

Vigilance repeat

2) GIST 3) Dense SIFT

Average 80 memorability for top N ranked images (%) 75

number, size, and rough position of each object class; scene category

Memorability = probability of correctly detecting a repeat after a single view of an image in a long stream.

Other humans l = 0.75 Global features and annotations l = 0.54 Chance

85

histograms of object segments counts, sizes, and rough positions

Memory Game

Memorable

1) Pixel histograms

Prediction algorithm: SVM Regression with non-linear kernels

Intrinsic memorability?

+

Automatic predictions

What classes of information predict memorability?

Scenes ranked according to their average memorability

+ 0.09

seats

floor

bedroom (76%)

bakery shop (81%)

person sitting person

... natural broadleaf botanical forest (52%) garden (52%) lake (52%)

campus (53%)

...

art studio (81%)

bathroom (84%)

Stable intrinsic memorability

Standardized database

People, close-ups, and human-scale objects predict memorability

Predictable from global features

900