Get Heatmap from CNN ( Convolution Neural Network ), AKA CAM

Seachaos
tree.rocks
Published in
4 min readJul 25, 2021

--

Convolution Neural Network (CNN) is incredible. And if you want to know how it sees the world ( image ), there have a way is visualize it.
The idea is we get weights from the last dense layers multiply with the final CNN layer. And this needs Global Average Pooling (GAP) to work.

Choose a model

In this tutorial, we are using Keras with Tensorflow and ResNet50.

Because ResNet50 has a Global Average Pooling (GAP) layer ( will explain later ), it’s suitable for our demonstration. That’s perfect.

test image

Heat-Map how does it work

Heatmap from CNN, aka Class Activation Mapping (CAM ). The idea is we collect each output of the convolution layer ( as image ) and combine it in one shot. ( We will show the code step by step later )

the convolution layer output

So here is how Global Average Pooling (GAP) or Global Max Pooling work
(depend on which you use, but they are the same idea).

In some models after feature extraction, we use the flatten layer ( fully connection ) to the Neural Network to predict the result. But this step is like drop out images dimension and some information.

In contrast, use Global Average Pooling (GAP) or Global Max Pooling (GMP) is working here. It keeps the image dimension info and makes Neural Network decide which CNN channel (feature image) is more crucial for predicting results.

Example

let’s start with ResNet50 in Keras.

from tensorflow.keras.applications import ResNet50
res_model = ResNet50()
res_model.summary()
ResNet-50 summary

As you can see ( above fig ):

  • the red: we will use this layer as “transfer leanring.”
  • the green: Global Average Pooling (GAP). It’s crucial about this work.

And import libraries and the image for later use.

import cv2
import matplotlib.pyplot as plt
from scipy.ndimage import zoom
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
img = cv2.imread('./test_cat.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
X = np.expand_dims(img, axis=0).astype(np.float32)
X = preprocess_input(X)

We use “from scipy.ndimage import zoom".
For resize the heatmap because of the CNN, the feature extraction images shape is smaller than the original image.

Transfer Learning

Now extract the layer we will use.
P.S: You can train your model from scratch, but it will take a long time, and feature extraction may also need a lot of tuning.

from tensorflow.keras.models import Modelconv_output = res_model.get_layer("conv5_block3_out").output
pred_ouptut = res_model.get_layer("predictions").output
model = Model(res_model.input, outputs=[conv_ouptut, pred_layer])

Here we have two outputs ( as above mentioned, red part in fig).

  • The first is convolution network output
  • The second is the predicted result

and do predict

conv, pred = model.predict(X)
decode_predictions(pred)

The result looks like this. It’s pretty good

[[('n02123159', 'tiger_cat', 0.7185241),
('n02123045', 'tabby', 0.1784818),
('n02124075', 'Egyptian_cat', 0.034279127),
('n03958227', 'plastic_bag', 0.006443105),
('n03793489', 'mouse', 0.004671723)]]

The Output

Now, let’s see some CNN output.

scale = 224 / 7
plt.figure(figsize=(16, 16))
for i in range(36):
plt.subplot(6, 6, i + 1)
plt.imshow(img)
plt.imshow(zoom(conv[0, :,:,i], zoom=(scale, scale)), cmap='jet', alpha=0.3)
The CNN output

We show the ground image at first ( plt.imshow(img) ), so we can compare it with the ground image.
( if you don’t, will get the result like this )

no ground image in fig

Combine in one shot

Here is the crucial. We are using the predict result index (target) to get weights.
And multiply with each feature map with weights ( dot product )

target = np.argmax(pred, axis=1).squeeze()
w, b = model.get_layer("predictions").weights
weights = w[:, target].numpy()
heatmap = conv.squeeze() @ weights

Then showing heatmap with the ground image.

scale = 224 / 7
plt.figure(figsize=(12, 12))
plt.imshow(img)
plt.imshow(zoom(heatmap, zoom=(scale, scale)), cmap='jet', alpha=0.5)
the heat-map of CNN

That’s the result we want.

Reference:

--

--