Learn A-Z Deep Learning in 15 Days
Object Detection is the process of localization and Recognition.
Region-CNN or RCNN
RCNN takes the input image-
- Performs the sliding window
- Proposes bounding box using Selective Search Algo.
- CNN classifies the Propose region
- Linear Regressor generates the tighter bounding box
Disadvantages
- Slow: calculate a feature map (one CNN forward pass) for each region proposal.
- Hard to train: Remember that in the R-CNN System we had 3 different parts (CNN, SVM, Bounding Box Regressor) that we had to train separately. This makes training very difficult.
- Large memory requirement: Save every feature map of each region proposal. This needs a lot of memory.
Fast RCNN
- Image is passed to a ConvNet which in turn generates the Regions of Interest.
- RoI pooling layer is applied to all of these regions to reshape them as per the input of the ConvNet. Then, each region is passed on to a fully connected network.
- The softmax layer is used on top of the fully connected network to output classes. Along with the softmax layer, a linear regression layer is also used parallelly to output bounding box coordinates for predicted classes.
Disadvantage
Takes an input image
- Pass it to the ConvNet which returns feature maps for the image
- Apply Region Proposal Network (RPN) on these feature maps and get proposals(anchors).
- Apply ROI pooling layer to bring down all the proposals to the same size
- Finally, pass these proposals to a fully connected layer in order to classify any predict the bounding boxes for the image
Total_loss = rpn_class + rpn_regressor
Disadvantage
- Selective search is slow and hence computation time is still high.
Comparison of RCNN, Fast RCNN & Faster RCNN
Data Preprocessing with Berekely dataset
Categories frequency
car-----------------------713211
traffic sign---------------239686
traffic light-------------186117
person--------------------91349
truck-----------------------29971
bus-----------------------11672
bike-----------------------7210
rider-----------------------4517
motor-----------------------3002
train-----------------------136
- Total annotation count = 1286871
- Total images = 69863
**Run the Code in Google Colab
import os
import pandas as pd
import json
df = pd.DataFrame()
filepath='./bdd100k/labels/bdd100k_labels_images_train.json'
jsfile = json.loads(open(filepath).read())
for js in jsfile:
for label_point in js['labels']:
if label_point['category']=='lane' or label_point['category']=='drivable area':
continue
df=df.append({'label':label_point['category'],'name':js['name'],
'xmin':label_point['box2d']['x1'],'ymin':label_point['box2d']['y1'],
'xmax':label_point['box2d']['x2'],'ymax':label_point['box2d']['y2']
}, ignore_index=True)
df.to_csv('./bdd100k_train.csv', index=False)
import pandas as pd
import json
df = pd.DataFrame()
filepath='./bdd100k/labels/bdd100k_labels_images_train.json'
jsfile = json.loads(open(filepath).read())
for js in jsfile:
for label_point in js['labels']:
if label_point['category']=='lane' or label_point['category']=='drivable area':
continue
df=df.append({'label':label_point['category'],'name':js['name'],
'xmin':label_point['box2d']['x1'],'ymin':label_point['box2d']['y1'],
'xmax':label_point['box2d']['x2'],'ymax':label_point['box2d']['y2']
}, ignore_index=True)
df.to_csv('./bdd100k_train.csv', index=False)
# Replace the label
df.loc[df.label=='bike','label']='Bicycle'
pdtxtdata.loc[pdtxtdata.label=='rider','label']='cyclist/rider'
# Drop label inplace
x = df.loc[df["label"] == 'train']
df.drop(x.index, axis = 0 , inplace = True)
df.loc[df.label=='bike','label']='Bicycle'
pdtxtdata.loc[pdtxtdata.label=='rider','label']='cyclist/rider'
# Drop label inplace
x = df.loc[df["label"] == 'train']
df.drop(x.index, axis = 0 , inplace = True)
# Select via label as 'motor' 'cyclist/rider' 'bicycle' and 'bus' import pandas as pd
csvtrain = pd.read_csv('bdd100k_train.csv')for img_name in list(set(csvtrain['name'])):
x = csvtrain.loc[csvtrain ['name'] == img_name]
if len(x.loc[ ( (x['label']=='motor') | (x['label']=='cyclist/rider') | (x['label']=='Bicycle') | (x['label']== 'Bus' ) ) , : ]) ==0 :
csvtrain.drop(x.index, axis = 0 , inplace = True) df.to_csv('./bdd100k_train.csv', index=False)
import pandas as pd
csvtrain = pd.read_csv('bdd100k_train.csv')# Make the annotation txt file from csv
with open('bk100_train.txt','a') as f:
for label, name, x2, x1, y2, y1 in csvtrain.iloc[:,[0,1,2,3,4,5]].values:
f.write(str('/C:/Users/mayank singh/Desktop/berekely/'+name)+','+str(round(x1))+','+str(round(y1))+','+str(round(x2))+','+str(round(y2))+','+label+'\n')
Faster RCNN Implementation
Train RPN for Frcnn
!git clone https://github.com/kentaroy47/frcnn-from-scratch-with-keras.git
- Upload the 'bk100_train.txt' file in the 'frcnn-from-scratch-with-keras'
cd /content/frcnn-from-scratch-with-keras
Start the RPN Training
#elen is the total number of annotation
!python train_rpn.py --network resnet50 -o simple -p bk100_train.txt --num_epochs 50 --elen 1286871
!python train_rpn.py --network resnet50 -o simple -p bk100_train.txt --num_epochs 50 --elen 1286871
Train Frcnn
- Use the RPN weight 'rpn.resnet50.weights.88-0.63.hdf5' stored in 'models/rpn' direcctory
!python train_frcnn.py --network resnet50 -o simple -p bdd100_train.txt --num_epochs 50 --elen 1286871 --rpn models/rpn/rpn.resnet50.weights.88-0.63.hdf5
Test Frcnn
!python test_frcnn.py --network resnet50 -p test_image/ --load models/resnet50/voc.hdf5
In the next blog, we will cover Object Detection with Retinanet.
https://sngurukuls247.blogspot.com/2020/05/deep-learning-6-object-detection-with.html
Feel free contact me on-
Email - sn.gurukul24.7uk@gmail.com
No comments:
Post a Comment