【Python-Module】 Faster RCNN 수행하기 - OpenCV DNN 모듈

15 Aug 2020 in Artificial Intelligence

이전 Post를 통해서 Faster RCNN이론에 대해서 공부했다. OpenCV DNN 모델을 사용해서 Faster RCNN을 수행해보자.

OpenCV DNN 모듈을 사용해서 Detection 수행하기
/DLCV/Detection/fast_rcnn/OpenCV_FasterRCNN_ObjectDetection.ipynb 참조

0. OpenCV 모듈 과정 요약

cs_net = cv2.dnn.readNetFromFramwork(‘inference 가중치 파일’,’config파일’) 를 사용해서
img_drwa = cv2.dnn.blobFromImage(cv2로 read한 이미지, 변환 형식1 , 변환 형식2)
cv_out = cv_net.forward()
for detection in cv_out[0,0,:,:] 으로 접근해서 output정보 가져오기.

1. OpenCV DNN 패키지를 이용하여 Faster R-CNN

Tensorflow 에서 Pretrained 된 모델 파일을 OpenCV에서 로드하여 이미지와 영상에 대한 Object Detection 수행.

1-0 입력 이미지로 사용될 이미지 보기

import cv2
import matplotlib.pyplot as plt
%matplotlib inline

img = cv2.imread('../../data/image/beatles01.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

print('image shape:', img.shape)
plt.figure(figsize=(12, 12))
# plt.imshow(img_rgb)

image shape: (633, 806, 3)
<Figure size 864x864 with 0 Axes>
<Figure size 864x864 with 0 Axes>

1-1. Tensorflow에서 Pretrained 된 Inference모델(Frozen graph)와 환경파일을 다운로드 받은 후 이를 이용해 OpenCV에서 Inference 모델 생성

https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API 에 다운로드 URL 있음.
Faster-RCNN ResNet-50 2018_01_28 사용할 예정
pretrained 모델은 http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz 에서 다운로드 후 압축 해제
pretrained 모델을 위한 환경 파일은 https://github.com/opencv/opencv_extra/blob/master/testdata/dnn/faster_rcnn_resnet50_coco_2018_01_28.pbtxt 에서 다운로드
download된 모델 파일과 config 파일을 인자로 하여 inference 모델을 DNN에서 로딩함.

%cd /home/sb020518/DLCV/Detection/fast_rcnn
!mkdir pretrained 
%cd ./pretrained
#!wget http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz
#!wget https://raw.githubusercontent.com/opencv/opencv_extra/master/testdata/dnn/faster_rcnn_resnet50_coco_2018_01_28.pbtxt
# !tar -xvf faster*.tar.gz
!mv faster_rcnn_resnet50_coco_2018_01_28.pbtxt graph.pbtxt

%cd /home/sb020518/DLCV/Detection/fast_rcnn

drawing

.pdtxt 파일 다운받기 : github raw 파일 전체 복붙을 해서 다운받기 여기에서 raw클릭해서 나오는 주소를 wget의 입력으로 넣어주어야 한다.
.pdtxt파일 : OpenCV를 위한 Config파일이다.
.pb 파일 : tensorflow inferecne를 위한 가중치 파일

1-2 dnn에서 readNetFromTensorflow()로 tensorflow inference 모델을 로딩

cv_net = cv2.dnn.readNetFromTensorflow('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 
                                     './pretrained/graph.pbtxt')

유의 사항 : object ouput은 class index를 return한다. object 이름을 그대로 return해주지 않는다.
COCO는 91번까지의 ID 번호 object가 있는데, 11개가 COCO2017 에서는 사용하지 않아서 80개의 Object name이 존재한다.
coco 데이터 세트의 클래스id별 클래스명 지정해 주어야 한다.
Class ID가 0 ~ 90 , 0 ~ 91 , 0~79 로 다양하게 사용된다. 이것은 모델별로, 프레임워크별로 다 다르다… 아래와 같이. 아래의 표를 파악하는 방법은 실험적으로 알아내는 방법 밖에 없다.
여기서는 OpenCV tensorflow FasterRCNN이므로 0 ~ 90을 사용한다

drawing

# OpenCV Yolo용 
labels_to_names_seq = {0:'person',1:'bicycle',2:'car',3:'motorbike',4:'aeroplane',5:'bus',6:'train',7:'truck',8:'boat',9:'traffic light',10:'fire hydrant',
                        11:'stop sign',12:'parking meter',13:'bench',14:'bird',15:'cat',16:'dog',17:'horse',18:'sheep',19:'cow',20:'elephant',
                        21:'bear',22:'zebra',23:'giraffe',24:'backpack',25:'umbrella',26:'handbag',27:'tie',28:'suitcase',29:'frisbee',30:'skis',
                        31:'snowboard',32:'sports ball',33:'kite',34:'baseball bat',35:'baseball glove',36:'skateboard',37:'surfboard',38:'tennis racket',39:'bottle',40:'wine glass',
                        41:'cup',42:'fork',43:'knife',44:'spoon',45:'bowl',46:'banana',47:'apple',48:'sandwich',49:'orange',50:'broccoli',
                        51:'carrot',52:'hot dog',53:'pizza',54:'donut',55:'cake',56:'chair',57:'sofa',58:'pottedplant',59:'bed',60:'diningtable',
                        61:'toilet',62:'tvmonitor',63:'laptop',64:'mouse',65:'remote',66:'keyboard',67:'cell phone',68:'microwave',69:'oven',70:'toaster',
                        71:'sink',72:'refrigerator',73:'book',74:'clock',75:'vase',76:'scissors',77:'teddy bear',78:'hair drier',79:'toothbrush' }

labels_to_names_0 = {0:'person',1:'bicycle',2:'car',3:'motorcycle',4:'airplane',5:'bus',6:'train',7:'truck',8:'boat',9:'traffic light',
                    10:'fire hydrant',11:'street sign',12:'stop sign',13:'parking meter',14:'bench',15:'bird',16:'cat',17:'dog',18:'horse',19:'sheep',
                    20:'cow',21:'elephant',22:'bear',23:'zebra',24:'giraffe',25:'hat',26:'backpack',27:'umbrella',28:'shoe',29:'eye glasses',
                    30:'handbag',31:'tie',32:'suitcase',33:'frisbee',34:'skis',35:'snowboard',36:'sports ball',37:'kite',38:'baseball bat',39:'baseball glove',
                    40:'skateboard',41:'surfboard',42:'tennis racket',43:'bottle',44:'plate',45:'wine glass',46:'cup',47:'fork',48:'knife',49:'spoon',
                    50:'bowl',51:'banana',52:'apple',53:'sandwich',54:'orange',55:'broccoli',56:'carrot',57:'hot dog',58:'pizza',59:'donut',
                    60:'cake',61:'chair',62:'couch',63:'potted plant',64:'bed',65:'mirror',66:'dining table',67:'window',68:'desk',69:'toilet',
                    70:'door',71:'tv',72:'laptop',73:'mouse',74:'remote',75:'keyboard',76:'cell phone',77:'microwave',78:'oven',79:'toaster',
                    80:'sink',81:'refrigerator',82:'blender',83:'book',84:'clock',85:'vase',86:'scissors',87:'teddy bear',88:'hair drier',89:'toothbrush',
                    90:'hair brush'}

labels_to_names = {1:'person',2:'bicycle',3:'car',4:'motorcycle',5:'airplane',6:'bus',7:'train',8:'truck',9:'boat',10:'traffic light',
                    11:'fire hydrant',12:'street sign',13:'stop sign',14:'parking meter',15:'bench',16:'bird',17:'cat',18:'dog',19:'horse',20:'sheep',
                    21:'cow',22:'elephant',23:'bear',24:'zebra',25:'giraffe',26:'hat',27:'backpack',28:'umbrella',29:'shoe',30:'eye glasses',
                    31:'handbag',32:'tie',33:'suitcase',34:'frisbee',35:'skis',36:'snowboard',37:'sports ball',38:'kite',39:'baseball bat',40:'baseball glove',
                    41:'skateboard',42:'surfboard',43:'tennis racket',44:'bottle',45:'plate',46:'wine glass',47:'cup',48:'fork',49:'knife',50:'spoon',
                    51:'bowl',52:'banana',53:'apple',54:'sandwich',55:'orange',56:'broccoli',57:'carrot',58:'hot dog',59:'pizza',60:'donut',
                    61:'cake',62:'chair',63:'couch',64:'potted plant',65:'bed',66:'mirror',67:'dining table',68:'window',69:'desk',70:'toilet',
                    71:'door',72:'tv',73:'laptop',74:'mouse',75:'remote',76:'keyboard',77:'cell phone',78:'microwave',79:'oven',80:'toaster',
                    81:'sink',82:'refrigerator',83:'blender',84:'book',85:'clock',86:'vase',87:'scissors',88:'teddy bear',89:'hair drier',90:'toothbrush',
                    91:'hair brush'}

1-3 이미지를 preprocessing 수행하여 Network에 입력하고 Object Detection 수행 후 결과를 이미지에 시각화

# 원본 이미지가 Faster RCNN기반 네트웍으로 입력 시 resize됨. 
# resize된 이미지 기반으로 bounding box 위치가 예측 되므로 이를 다시 원복하기 위해 원본 이미지 shape정보 필요
img = cv2.imread('../../data/image/beatles01.jpg')
rows = img.shape[0]
cols = img.shape[1]
# cv2의 rectangle()은 인자로 들어온 이미지 배열에 직접 사각형을 업데이트 하므로 그림 표현을 위한 별도의 이미지 배열 생성. 
draw_img = img.copy()

# 원본 이미지 배열 BGR을 RGB로 변환하여 배열 입력. 
# Tensorflow Faster RCNN은 size를 고정할 필요가 없는 것으로 추정. 따라서 size = (300, 300)과 같이 파라메터 설정 안해줌
# hblobFromImage 매개변수 정보는 ttps://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/
# 신경망에 이미지를 Inference 시킬 것이라는 것을 명시
cv_net.setInput(cv2.dnn.blobFromImage(img, swapRB=True, crop=False)) 

# Object Detection 수행하여 결과를 cvOut으로 반환 
# 이미지 Inference 수행                 
cv_out = cv_net.forward()
print('cvout type : ', type(cv_out))
print('cvout shape : ', cv_out.shape)
# cv_out에서 0,0,100,7 에서 100은 object의 수. 7은 하나의 object 정보
# 0,0은 Inference도 배치로 할 수 있다. 이미지를 여러개를 넣는다면, 한꺼번에 detection값이 나올때를 대비해서 4차원 cv_out이 나오게 한다. 

# bounding box의 테두리와 caption 글자색 지정. BGR
green_color=(0, 255, 0)
red_color=(0, 0, 255)

# detected 된 object들을 iteration 하면서 정보 추출
# cv_out[0,0,:,:]은 (100 x 7) 배열. detection에는  cv_out[0,0,:,:]의 하나의 행. 즉 7개의 원소가 들어간다. 
for detection in cv_out[0,0,:,:]:  
    score = float(detection[2])  # confidence 
    class_id = int(detection[1])
    # detected된 object들의 score가 0.5 이상만 추출
    if score > 0.7:
        # detected된 object들은 scale된 기준으로 예측되었으므로 다시 원본 이미지 비율로 계산
        # 아래의 값은 좌상단. 우하단. 의 좌표값이다. 
        left = detection[3] * cols  # detection[3],[4],[5],[6] -> 0~1 값이다. 
        top = detection[4] * rows
        right = detection[5] * cols
        bottom = detection[6] * rows
        # labels_to_names_seq 딕셔너리로 class_id값을 클래스명으로 변경.
        caption = "{}: {:.4f}".format(labels_to_names_0[class_id], score)
        print(caption)
        #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
        # cv2를 이용해서 상자를 그리면 무조건 정수값을 매개변수로 넣어줘야 한다. 실수를 사용하고 싶다면 matplot이용할것
        # putText : https://www.geeksforgeeks.org/python-opencv-cv2-puttext-method/
        cv2.rectangle(draw_img, (int(left), int(top)), (int(right), int(bottom)), color=green_color, thickness=2) 
        cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, 1)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

cvout type :  <class 'numpy.ndarray'>
cvout shape :  (1, 1, 100, 7)
person: 0.9998
person: 0.9996
person: 0.9993
person: 0.9970
person: 0.8995
car: 0.8922
car: 0.7602
car: 0.7415

drawing

1-4 위에서 했던 작업을 def 함수로 만들어보자!

추후에 비디오에서도 사용할 예정

import time

def get_detected_img(cv_net, img_array, score_threshold, use_copied_array=True, is_print=True):
    
    rows = img_array.shape[0]
    cols = img_array.shape[1]
    
    draw_img = None
    if use_copied_array:
        draw_img = img_array.copy()
    else:
        draw_img = img_array
    
    cv_net.setInput(cv2.dnn.blobFromImage(img_array, swapRB=True, crop=False))
    
    start = time.time()
    cv_out = cv_net.forward()
    
    green_color=(0, 255, 0)
    red_color=(0, 0, 255)

    # detected 된 object들을 iteration 하면서 정보 추출
    for detection in cv_out[0,0,:,:]:
        score = float(detection[2])
        class_id = int(detection[1])
        # detected된 object들의 score가 함수 인자로 들어온 score_threshold 이상만 추출
        if score > score_threshold:
            # detected된 object들은 scale된 기준으로 예측되었으므로 다시 원본 이미지 비율로 계산
            left = detection[3] * cols
            top = detection[4] * rows
            right = detection[5] * cols
            bottom = detection[6] * rows
            # labels_to_names 딕셔너리로 class_id값을 클래스명으로 변경. opencv에서는 class_id + 1로 매핑해야함.
            caption = "{}: {:.4f}".format(labels_to_names_0[class_id], score)
            print(caption)
            #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
            cv2.rectangle(draw_img, (int(left), int(top)), (int(right), int(bottom)), color=green_color, thickness=2)
            cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, 1)
    if is_print:
        print('Detection 수행시간:',round(time.time() - start, 2),"초")

    return draw_img

## 방금 위에서 만든 함수를 사용해서 다시 추론해보자. 
# image 로드 
img = cv2.imread('../../data/image/beatles01.jpg')
print('image shape:', img.shape)

# tensorflow inference 모델 로딩
cv_net = cv2.dnn.readNetFromTensorflow('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 
                                     './pretrained/graph.pbtxt')
# Object Detetion 수행 후 시각화 
draw_img = get_detected_img(cv_net, img, score_threshold=0.6, use_copied_array=True, is_print=True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

drawing

# image 로드 
img = cv2.imread('../../data/image/baseball01.jpg')
print('image shape:', img.shape)

# tensorflow inference 모델 로딩
cv_net = cv2.dnn.readNetFromTensorflow('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 
                                     './pretrained/graph.pbtxt')
# Object Detetion 수행 후 시각화 
draw_img = get_detected_img(cv_net, img, score_threshold=0.7, use_copied_array=True, is_print=True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

drawing

2. Video Object Detection 수행

2-1 원본 영상 보기

뼈대 코드 공부 : https://junha1125.github.io/artificial-intelligence/2020-08-12-OpenCV/

# Video API는 mp4에서만 사용가능함
from IPython.display import clear_output, Image, display, Video, HTML
Video('../../data/video/John_Wick_small.mp4')

2-2 VideoCapture와 VideoWriter 설정하기

VideoCapture를 이용하여 Video를 frame별로 capture 할 수 있도록 설정
VideoCapture의 속성을 이용하여 Video Frame의 크기 및 FPS 설정.
VideoWriter를 위한 인코딩 코덱 설정 및 영상 write를 위한 설정

video_input_path = '../../data/video/John_Wick_small.mp4'
# linux에서 video output의 확장자는 반드시 avi 로 설정 필요. 
video_output_path = '../../data/output/John_Wick_small_cv01.avi'

cap = cv2.VideoCapture(video_input_path)

codec = cv2.VideoWriter_fourcc(*'XVID')

vid_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))) 
vid_fps = cap.get(cv2.CAP_PROP_FPS)  # 프레임 속도 값. 동영상의 1초 프레임 갯수
    
vid_writer = cv2.VideoWriter(video_output_path, codec, vid_fps, vid_size) 

frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print('총 Frame 갯수:', frame_cnt)

총 Frame 갯수: 58

2-3 총 Frame 별로 iteration 하면서 Object Detection 수행.

개별 frame별로 위에서 한 단일 이미지 Object Detection을 수행해서 vid_writer에 프래임을 차곡차곡 쌓음
여기서는 위에서 만든 함수 사용 하지 않음

# bounding box의 테두리와 caption 글자색 지정
green_color=(0, 255, 0)
red_color=(0, 0, 255)

while True:

    hasFrame, img_frame = cap.read()
    if not hasFrame:
        print('더 이상 처리할 frame이 없습니다.')
        break

    rows = img_frame.shape[0]
    cols = img_frame.shape[1]
    # 원본 이미지 배열 BGR을 RGB로 변환하여 배열 입력
    cv_net.setInput(cv2.dnn.blobFromImage(img_frame,  swapRB=True, crop=False))
    
    start= time.time()
    # Object Detection 수행하여 결과를 cv_out으로 반환 
    cv_out = cv_net.forward()
    frame_index = 0
    # detected 된 object들을 iteration 하면서 정보 추출
    for detection in cv_out[0,0,:,:]:
        score = float(detection[2])
        class_id = int(detection[1])
        # detected된 object들의 score가 0.5 이상만 추출
        if score > 0.5:
            # detected된 object들은 scale된 기준으로 예측되었으므로 다시 원본 이미지 비율로 계산
            left = detection[3] * cols
            top = detection[4] * rows
            right = detection[5] * cols
            bottom = detection[6] * rows
            # labels_to_names_0딕셔너리로 class_id값을 클래스명으로 변경.
            caption = "{}: {:.4f}".format(labels_to_names_0[class_id], score)
            #print(class_id, caption)
            #cv2.rectangle()은 인자로 들어온 draw_img에 사각형을 그림. 위치 인자는 반드시 정수형.
            cv2.rectangle(img_frame, (int(left), int(top)), (int(right), int(bottom)), color=green_color, thickness=2)
            cv2.putText(img_frame, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, red_color, 1)
    print('Detection 수행 시간:', round(time.time()-start, 2),'초')
    vid_writer.write(img_frame)
# end of while loop

vid_writer.release()
cap.release()   

Detection 수행 시간: 5.24 초
Detection 수행 시간: 5.04 초
Detection 수행 시간: 5.02 초
Detection 수행 시간: 5.01 초
Detection 수행 시간: 5.03 초
Detection 수행 시간: 5.0 초
더 이상 처리할 frame이 없습니다.

drawing

# Google Cloud Platform의 Object Storage에 동영상을 저장 후 Google Cloud 에 접속해서 다운로드
# 혹은 지금의 Jupyter 환경에서 다운로드 수행
!gsutil cp ../../data/output/John_Wick_small_cv01.avi gs://my_bucket_dlcv/data/output/John_Wick_small_cv01.avi

2-4 위에서 만든 함수를 사용해서 ,video detection 전용 함수 생성.

def do_detected_video(cv_net, input_path, output_path, score_threshold, is_print):
    
    cap = cv2.VideoCapture(input_path)

    codec = cv2.VideoWriter_fourcc(*'XVID')

    vid_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    vid_fps = cap.get(cv2.CAP_PROP_FPS)

    vid_writer = cv2.VideoWriter(output_path, codec, vid_fps, vid_size) 

    frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    print('총 Frame 갯수:', frame_cnt)

    green_color=(0, 255, 0)
    red_color=(0, 0, 255)
    while True:
        hasFrame, img_frame = cap.read()
        if not hasFrame:
            print('더 이상 처리할 frame이 없습니다.')
            break
        
        img_frame = get_detected_img(cv_net, img_frame, score_threshold=score_threshold, use_copied_array=False, is_print=is_print)
        
        vid_writer.write(img_frame)
    # end of while loop

    vid_writer.release()
    cap.release()

do_detected_video(cv_net, '../../data/video/John_Wick_small.mp4', '../../data/output/John_Wick_small_02.avi', 0.2, True)

!gsutil cp ../../data/output/John_Wick_small_02.avi gs://my_bucket_dlcv/data/output/John_Wick_small_02.avi

【Paper】 Faster RCNN 개념 정리 + OpenCV DNN 모듈 기본

15 Aug 2020 in Artificial Intelligence

Faster RCNN 개념을 다시 상기하고 정리해보고, 햇갈렸던 내용을 다시 공부해볼 예정이다.

Faster RCNN 개념, OpenCV Detection 구현

1. Faster RCNN

Faster RCNN = RPN(Region Proposal Network) + Fast RCNN
Region Proposal도 Network를 이용해서 찾는다.

drawing

ROI Pooing은 동일하다. (Mask RCNN : Align ROI Pooling)
RPN 에서도 Classification, Regression 모두가 들어가 있다. Classification는 객체의 유무만을 판단하고, Regression은 각 그리드 위치에 따른 객체가 있을 법한 위치를 뽑아준다. 이곳의 목표는 Object 유무와 위치를 대강 뽑아주는 것이다.
위에서 대강 뽑은 Region Proposal에 대헤서 이후 끝단에 Classification, Regression을 해준다.

1-1 RPN과 Anchor Box

RPN에서 주어진 입력 Pixel에 대해서 Selective Search 정도의 혹은 그 이상의 Region Proposal을 해주어야 한다. 다시 말해서 RPN 입장에서 우리가 가진 것은 데이터(Ground True Bounding Box)와 Backbone Network를 통과해서 나온 Feature Map 뿐이다. 이 2가지를 어떻게 이용해서 Region Proposal을 수행할 수 있을까?

Feature Map에 Anchor Box를 놓아두고 그곳에 Object가 있는지 없는지, Object의 위치는 좀 더 어디에 위치한지 알아본다.
Anchor Box는 9개로 구성되어 있다. (128x128 256x256 512x512) X (1:1 1:2 2:1)

drawing

위 이미지에 대해서 오른쪽 Feature Map의 하나의 1x1x512 pixel 값은 왼쪽 이미지의 16x16x3 pixel이 가지는 특징이 무엇인지에 대한 정보를 가지고 있다고 할 수 있다. (대충 그렇다는 것이다. 물론 Conv Layer를 통과하면서 Receptive Feild가 어땠는지는 다시 고려해봐야 할 것 이다.)

RPN의 구조

drawing

참고로 RPN을 통과해서 나온는 Proposal Region x y w h 는 대강 추천해주는 Region Proposal 결과이다.
RPN은 간단하다. Featrue Map에 (1 padding + 3x3 conv) + (1x1 conv)를 하여 Classification, Box Regression 정보를 얻을 수 있다.
Softmax Classification에서 1 x 1 x 512 필터 하나가 하나의 엥커에 의해서 판단되는 결과(객체 유vs무)가 나와야 한다. 이 객체 유무의 값에서 ‘유’값(0~1값이) RPN에서의 Confidence값이라고 생각하면 된다.
Bounding Box Regression에서 1 x 1 x 512 필터 하나가 하나의 엥커에 의해서 판단되는 결과(대강 물체 Region Proposal)가 나와야 한다.
참고 이미지 (알파프로젝트 하는 동안 Faster RCNN공부한 내용)
m x n 에서 m x n 은 Feature Extractor를 통과후 나오는 Feature Map의 Resolution
1 padding + 3x3 conv -> 1x1 conv

1-2 RPN 상세 설명과 Loss 함수

Positive Anchor Box VS Negative Anchor Box
- Ground True와 Bounding Box의 IOU값으로 분류 할 수 있다.
- IOU 0.7 이상 : positive (Foreground를 학습)
- IOU 0.3 이하 : Negative (Background를 학습)
- IOU 0.3~0.7 사이 : 애매한 경우로 학습에 사용안함
- Negative, Positive 만을 학습에 사용.
학습순서
1. 실제 이미지에 Anchor 박스를 올려 놓는다!(위에서 2번쨰 사진처럼)
2. Ground True랑 방금 올려놓은 Anchor 박스와의 Positive, Negative 정도를 판단한다. positive일 때만 아래의 Box regression학습을 한다.
3. Ground True랑 방금 올려놓은 Anchor 박스와의 Bounding Box Regression 정도를 파악하여 신경망을 학습시키고, RPN 신경망의 결과가 relative ofsset이 되도록 학습시킨다. (Anchor에 대해서, 어느정도로 이동하고 확장해야, Ground True Box로 근접하여 more positive가 될 수 있는지 신경망이 알도록 만든다. (Anchore이동으로 완벽히 Ground True가 될 필요는 없다. Anchor이동으로 Ground True에 대해서 more Positive가 되기만 하면 된다. ))
4. 위의 2번과 3번에서 나온 결과값(A)이 RPN Layer를 통과하고 나온 결과(B), 즉 위에서 3번쨰 사진의 가장 오른쪽에 나오는 결과(B)가 되어야 한다.
5. 따라서 RPN의 Loss함수는 이렇다고 할 수 있다.
  Loss Function = L1(A - B)
위에서 Loss 함수의 감을 잡았다면, 정확한 Loss 함수 수식을 살펴 보자.
pi,ti가 B이고 pi*,ti* 값이 A이다. pi는 Softmax값이고 pi*는 1 or 0이다.

drawing

워낙 Negative Anchor가 많기 때문에 Positive Anchore를 항상 적절하게 섞어주어 학습을 진행해 주었다. 그냥 이미지 그대로 전체 학습을 시키면 Negative가 너무 많아서 학습이 잘 안된다고 한다. (Nagative Mining)
만약 모든 Positive Anchor가 Proposal된 Region이라고 하고, 이 모든 것들을 Head detector에 보내려면, 너무 많은 수의 Region을 보내야 한다. 따라서 다음과 같은 절차를 수행한다.
- Objectness Score를 계산 한다.
  Objectness Score = pi값(Object일 확률 softmax값) * Ground True와 Bounding Box값의 IOU값
- Objectness Score가 높은 순으로 오른차순 정렬을 한다.
- 오름차순된 Region Proposal에 대해서 몇개까지 만을 다음 Layer에 가져갈지 숫자인 N(다음 Detection Layer로 보낼 Region Proposal의 갯수)을 정한다. Ex 2000. 300. 50
- 그것들만 2 stage인 classification, Box regression 단으로 가져가 학습에 사용한다.

1-3 Alternation Training

drawing

2개의 Network가 있다. 2개의 Network를 동시에 학습시키는 것이 쉽지 않다. 따라서 Training의 과정을 위의 사진과 같은 방법으로 해야한다. 여기서 1번과 2번은 Backbone과 같이 학습하고, 3번과 4번에서는 각각 1x1 conv, FC Layer만 좀만 더 Fine Turning 해준다.
이런 방식으로 학습한 모델을 Inference 돌린 결과 실시간성 확보가 가능해지기 시작했다. 하지만 그래도 아직은 부족함이 존재했다.
- 1 stage detector 개발 필요
- End to End 학습 모델 구축 필요

2. OpenCV를 이용한 Object Detection 개념

2-1 OpenCV DNN 장단점

장점
1. 딥러닝 개발 프레임 워크 없이 쉽게 Inferecne 구현 가능
2. OpenCV에서 지원하는 다양한 Computer vision 처리 API와 Deep Learning을 쉽게 결합
3. 인텔은 모바일 분야에 집중하여 CPU기반 OpenCV
4. NVIDIA - GPU 학습 혹은 추론 각각에서 좋은 성능을 가진 다양한 GPU개발되고 있음
5. TPU - 구글에서 개발.
단점
1. GPU 지원이 약함
2. Inference만 가능
3. CPU 기반 Inference 속도가 개선되었으나, GPU를 사용하는 것 대비 아직도 속도가 느림.

2-2 Deep Learning Frame 사용방법

drawing

Yolo를 만든것이 DarkNet그룹이므로, OpenCV에서 Yolo를 사용하려면 DarkNet 회사에서 재공하는 가중치 모델 파일, 환경 파일을 넣어야 한다.
cvNet으로 Neural Network가 반환이 되어 그 변수를 사용할 수 있다.

2-3 참고 Document 사이트

drawing

2-4 OpenCV DNN을 이용하기 위한 Inference 수행 절차

우리가 앞으로 코드에서 사용할 내용의 전체적인 큰 그림이다. 계속 사용할 것이므로, 잘 알아주자.

drawing

가중치와 환경설정파일의 형식(.pb, .pbtxt)임에 주의하자.
1-1 그리고 이미지를 가져온다. 이미지를 가져오는 동안 bolbFromImage()라는 함수를 사용할 수 있다.
bolbFromImage() 의 역할 : Network에 들어갈 이미지로 사이즈 조정, 이미지 값 스케일링, BGR을 RGB로 변경. 이미지 Crop할 수 있는, 모든 옵션 제공
```
 img_bgr = cv2.imread('img path')
 cvNet.setInput(cv2.dnn.blobFromImage(img_bgr, size = (300,300), SwapRB = True, crop = False)
 # swapRB : BGR을 RGB로 변경
 # size는 내가 사용하는 CvNet에 따라 다른다. 내가 사용하고 싶은 네트워크가 무슨 사이즈의 Input을 받는지는 알아두어야 한다. 
```
1-2 Vidoe Stream Capture
- OpenCV의 VideoCapture() API를 사용하면 Video Stream을 Frame별로 Capture한 Image에 대해서 Object Detection을 쉽게 수행할 수 있다.
- 참고 이전 게시물
- 모든 곳에서 아주 유용하게 활용되고 있으니 잘 알아두자.
이미지를 cvNet에 넣는다.
forward를 통해서 Detect된 output을 변수에 저장한다.
Detection된 결과에 for를 통해서 하나하나의 Object를 확인해본다.

2-5 코드 활용

다음 Blog Post 참조

【Paper】 RCNN(Regions with CNN), SPP(Spatial Pyramid Pooling), Fast RCNN

14 Aug 2020 in Artificial Intelligence

당연하다고 생각하지만, RCNN 개념을 다시 상기하고 정리해보면서 공부해볼 예정이다.

RCNN(Regions with CNN) 계열에 대한 고찰과 공부

1. RCNN

초기의 방식 - Sliding Wnidow 방식과 Selective Search 기반의 Region Propsal 방식
RCNN은 Selective Search 을 사용한다.

drawing

위의 사진에서 Bounding Box Regression에서 나오는 값은 사실 중앙좌표와 box크기 x,y,w,h 이다.
위의 사진에서 Softmax를 적용하는게 아니라, SVM Classifier(Class를 구분하는 직선,평면을 찾는다)을 사용했다. Train 시킬때는 Softmax를 사용하고 Inference에서는 SVM을 사용함으로써 성능향상을 보았다고 한다.
Selective Search를 통해서 나온 이미지를 AlexNet에 넣기위해서 227*227을 만들어야 하기 때문에, Crop, Warp를 적용했다. (ROI Pooling과 같은 느낌)
Bounding Box Reggresion에서 좀더 정확한 Bounding Box를 찾아낸다.

1-1 RCNN 특징 및 의의

CNN을 통과하는 과정이 대략 2000개의 ROI에 모두 적용하므로 시간이 매우 오래걸린다.
그 시기에 높은 Detection 정확도를 가졌지만, 너무 느리다.
복잡한 프로세스 떄문에, 학습 및 모델 코드 구현이 매우 복잡하고 어렵다.
각각의 Selective Search, Feature Extractor, FC layer, Box Regression 등이 서로 따로 놀아서 구현 및 다루기가 매우 힘들다.
딥러닝 기반 Object Detection 성능을 입증
Region Proposal 기반 성능 입증
위와 같은 복잡한 프로세스의 통합 연구의 필요성 제시

1-2 Bounding Box Regression

Selective Search를 하고 Bounding Box Regression을 왜 적용할까? - Selective Search에서 정확한 좌표가 나오지 않기 때문이다.
여기서 p는 Selective Search에서 나오는 좌표들이고, g는 ground true bounding box이다. d=t함수를 찾는 것이 Bounding Box Regression의 목표이다.

drawing

[C:\Users\sb020\OneDrive\백업완료_2020.03\ML\알파프로젝트\Faster-Rcnn] 의 필기자료.

drawing

2. SPP(Spatial Pyramid Pooling) Net

Fast Rcnn 이전의 RCNN을 개선한 모델이라고 할 수 있다.
Pyramid Pooling 에서 ROI Pooling으로 바꾼게 Fast Rcnn이라고 할 수 있다.
Selective Search에서 나온 영역 모두를 DNN에 통과시키는게 아니라, 먼저 DNN에 통과시키고 Feature map에 대해서 Selective Search에서 제안한 영역을 바라 본다.

drawing

여기서 문제가 있다. Feature Map에서 FC로 갈 때, FC에 들어가야하는 Input은 고정적이다. 하지만 Selective Search에서 Mapping된 Feature의 사이즈들이 서로서로 다르다. Ex[13x13x256, 9x9x256 등등..]
Spatial Pyramid Pooling이 Fature map과 FC 사이에서 동작한다. Spatial Pyramid Poolings는 위의 예시와 같은 다른 사이즈의 Feature Map을 Pooling을 통해서 동일한 크기의 Feature Map으로 바꾸고 그것을 FC에 넣음으로써 문제를 해결했다. (ROI Pooing과 거의 동일)
Spatial Pyramid Matching 기법이라는 컴퓨터 비전의 전통적인 방법을 가져온 것이 Spatial Pyramid Pooling이다. 또 Spatial Pyramid Matching 기법은 Bag of Visual words방법에 근간 한다. 따라서 흐름도는 다음과 같이 그릴 수 있다. SPP가 가장 간단하다.
- Bag of Visual words —> Spatial Pyramid Matching —> Spatial Pyramid Pooling
위와 같은 방식으로 어떤 크기의 Feature라고 하더라도 21개의 백터값으로 표현할 수 있었다. 이것을 고정된 값이므로 FC로 넣는데 매우 용이했다. 아래의 사진 하단에서 Feature Map의 크기가 어떤 것이라고 할지라도, 21x256개의 값을 FC에 쉽게 넣을 수 있었다.

SPP Net을 사용함으로써 RCNN에 비교해서 성능은 비슷하지만, Inference 시간이 20배 정도 줄었다.

3. Fast RCNN

위에서 공부한 Spatial Pyramid Pooling을 ROI Pooling으로 바꿨다. SPP처럼 굳이 Level0 Level1 Level2 로 만들지 않고, 7*7의 grid로만 쪼개고 Max Pooling과정을 거친다.
End-to-End Learning을 수행했다. SVM을 Softmax로 다시 변환했다. 이전까지는 Classification과 Regression을 따로따로 학습시켰는데, 이제는 Multi-task Loss함수를 적용해서 함께 학습시키는 방법을 고안했다.

drawing

u = 0 이면 Background라는 것을 의미하고 background에 대해서는 당연히 Bounding box regression을 수행하지 않는다. 감마는 multi-tast loss의 밸런스 상수이다.
Fast RCNN은 RCNN보다 조금 성능이 향상되고, 추론 시간이 RCNN보다 빠르다. 하지만 Selective Search를 추가하고 안하고의 시간이 각각 2.3초, 0.32초 인것을 감안했을 때, Region Proposal의 오랜시간이 걸리는 문제점을 해결해야함을 알 수 있었다.

【Vision】 Detection과 Segmentation 다시 정리 3 - Framework, Module, GPU

13 Aug 2020 in Artificial Intelligence

당연하다고 생각하지만, 아직은 공부할게 많은 Detection과 Segmentation에 대한 개념을 다시 상기하고 정리해보면서 공부해볼 계획이다.

Detection과 Segmentation 다시 정리 3

1. Object Detection을 위한 다양한 모듈

Keras & Tensorflow, Pytorch
- Customization 가능 .
- 알고리즘 별로 구현을 다르게 해줘야함.
- Keras와 Tensorflow는 동시에 사용되고 있다. 소스 코드에 서로가 서로를 사용하고 있으니 항상 같이 import하는 것을 추천.
OpenCV의 DNN 모듈
- 간편하게 Object Detection Inference 가능.
- 학습이 불가능.
- CPU 위주로 동작함. GPU 사용 불가.
Tensorflow Object Detection API
- 많은 Detection 알고리즘 적용가능.
- 다루기가 어렵고 학습을 위한 절차가 너무 복잡.
- 다른 오픈소스 패키지에 비해, Pretrain weight가 많이 지원된다. 다양한 모델, 다양한 Backbone에 대해서.
- Yolo는 지원하지 않고, Retinanet에 대한 지원도 약함
- MobileNet을 backbone으로 사용해서 실시간성, 저사양 환경에서 돌아갈 수 있도록 하는 목표를 가지고 있다.
- Tensorflow Version2와 Version1의 충돌.
- Document나 Tutorial이 부족하다.
- Research 형태의 모듈로 안정화에 의문
Detectron2
- 많은 Detection 알고리즘 적용가능
- 상대적으로 다루기가 쉽고 절차가 간단.

2. 사용하면 좋은 Keras와 Tensorflow 기반의 다양한 오픈소스 패키지들

아래의 코드들은 대부분 Tensorflow & Keras 를 사용하고 있습니다.

Yolo - https://github.com/qqwweee/keras-yolo3
- 심플하면서 좋은 성능 보유
- 조금 오래 됐고, 업그레이드가 잘 안됨.
Retinanet - https://github.com/fizyr/keras-retinanet
- 정교한 검출 성능, 작은 물체라도 잘 검출하는 능력이 있음
Mask R-CNN - https://github.com/matterport/Mask_RCNN
- Instance Segmentation의 중심.
- 다양한 기능과 편리한 사용성으로 여러 프로젝트에 사용된다.
Open CV - DNN Pakage Document
- 컴퓨터 비전 처리를 널리 사용되고 있는 범용 라이브러리
- 오픈소스 기반 실시간 이미지 프로세싱에 중점을 둔 라이브러리.
- Deep Learning 기반의 Computer Vision 모듈 역시 포팅되어 있음.
- OpenCV가 지원하는 딥러닝 프레임워크. 이것을 사용해 Inference.
  - Caffe, Tensorflow, Torch, Darknet
- 모델만 Load해서 Inference하면 되는 장점이 있기 때문에 매우 간편
- CPU 위주이며, GPU 사용이 어렵다.

3. GPU 활용

CUDA : GPU에서 병렬처리 알고리즘을 C언어를 비롯한 산업 표준 언어를 사용하여 작성할 수 있도록 하는 기술. GPU Driver Layer 바로 위에서 동작한다고 생각하면 된다.
cuDNN :CUDA는 그래픽 작업을 위해서 존재하는 것이고, cuDNN은 딥러닝 라이브러리를 위해서 추가된 기술.

drawing

위의 모든 것이 잘 설치되어 있다면, 당연히
$ nvidia-smi
$ watch -n 1 nvidia-smi
학습을 하는 동안 GPU 메모리 초과가 나오지 않게 조심해야 한다.
따라서 GPU를 사용하는 다른 Processer를 끄는 것을 추천한다. (nvidia-smi에 나옴)

4. Object Detection 개요

drawing

Feature Extraction Network : Backbone Network, Classification을 위해 학습된 모델을 사용한다. 핵심적인 Feature를 뽑아내는 역할을 한다.
Object Detection Network
Region Progosal : Selective Search, RPN 등등..
Image Resolution, FPS, Detection 성능의 당연한 상관 관계 아래 그림 참조
yoloV2 성능비교를 통해서 확인해 보자.

drawing

【Vision】 Detection과 Segmentation 다시 정리 2 - Datasets

12 Aug 2020 in Artificial Intelligence

당연하다고 생각하지만, 아직은 공부할게 많은 Detection과 Segmentation에 대한 개념을 다시 상기하고 정리해보면서 공부해볼 계획이다.

Detection과 Segmentation 다시 정리 2

Dataset에 대한 진지한 고찰

1. Pascal VOC 기본개념

기본 개념
- XML Format (20개 Class)
- 하나의 이미지에 대해서 Annotation이 하나의 XML 파일에 저장되어 있다.
- 데이터셋 홈페이지 바로가기
- 대회 종류
  - Classification/Detection
  - Segmentation
  - Action Classification
  - Person Layout
- Annotation : Pascal VOC 같은 경우에 아래와 같은 형식의 Annotation 파일로써 Ground True 정보를 파악할 수 있다.
- Data Set을 다운받으면 다음과 같은 구조를 확인할 수 있다. segmented : segmentation 정보를 가지고 있느냐?를 의미한다.

2. PASCAL VOC 2012 데이터 탐색해보기

코드 안에 주석도 매우 중요하니 꼭 확인하고 공부하기
DLCV/Detection/preliminary/PASCAL_VOC_Dataset_탐색하기.ipynb 파일을 통해 공부한 내용

  !ls ~/DLCV/data/voc/VOCdevkit/VOC2012

  Annotations  ImageSets	JPEGImages  SegmentationClass  SegmentationObject

  !ls ~/DLCV/data/voc/VOCdevkit/VOC2012/JPEGImages | head -n 5

  2007_000027.jpg
  2007_000032.jpg
  2007_000033.jpg
  2007_000039.jpg
  2007_000042.jpg
  ls: write error: Broken pipe

JPEGImages 디렉토리에 있는 임의의 이미지 보기

  import cv2
  import matplotlib.pyplot as plt
  %matplotlib inline

  img = cv2.imread('../../data/voc/VOCdevkit/VOC2012/JPEGImages/2007_000032.jpg')
  img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  print('img shape:', img.shape)

  plt.figure(figsize=(8, 8))
  plt.imshow(img_rgb)
  plt.show()

  img shape: (281, 500, 3)

Annotations 디렉토리에 있는 임의의 annotation 파일 보기

  !cat ~/DLCV/data/voc/VOCdevkit/VOC2012/Annotations/2007_000032.xml

  <annotation>
    <folder>VOC2012</folder>
    <filename>2007_000032.jpg</filename>
    <source>
      <database>The VOC2007 Database</database>
      <annotation>PASCAL VOC2007</annotation>
      <image>flickr</image>
    </source>
    <size>
      <width>500</width>
      <height>281</height>
      <depth>3</depth>
    </size>
    <segmented>1</segmented>
    <object>
      <name>aeroplane</name>
      <pose>Frontal</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
        <xmin>104</xmin>
        <ymin>78</ymin>
        <xmax>375</xmax>
        <ymax>183</ymax>
      </bndbox>
    </object>
    <object>
      <name>aeroplane</name>
      <pose>Left</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
        <xmin>133</xmin>
        <ymin>88</ymin>
        <xmax>197</xmax>
        <ymax>123</ymax>
      </bndbox>
    </object>
    <object>
      <name>person</name>
      <pose>Rear</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
        <xmin>195</xmin>
        <ymin>180</ymin>
        <xmax>213</xmax>
        <ymax>229</ymax>
      </bndbox>
    </object>
    <object>
      <name>person</name>
      <pose>Rear</pose>
      <truncated>0</truncated>
      <difficult>0</difficult>
      <bndbox>
        <xmin>26</xmin>
        <ymin>189</ymin>
        <xmax>44</xmax>
        <ymax>238</ymax>
      </bndbox>
    </object>
  </annotation>

SegmentationObject 디렉토리에 있는 있는 임의의 maksing 이미지 보기

  img = cv2.imread('../../data/voc/VOCdevkit/VOC2012/SegmentationObject/2007_000032.png')
  img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  print('img shape:', img.shape)

  plt.figure(figsize=(8, 8))
  plt.imshow(img_rgb)
  plt.show()

  img shape: (281, 500, 3)

Annotation xml 파일에 있는 요소들을 파싱하여 접근하기
파일 이름만 뽑아아서 xml_files 라는 변수에 저장해 두기

  # 파일 이름만 뽑아아서 xml_files 라는 변수에 저장해 두기
  import os
  import random

  VOC_ROOT_DIR ="../../data/voc/VOCdevkit/VOC2012/"
  ANNO_DIR = os.path.join(VOC_ROOT_DIR, "Annotations")
  IMAGE_DIR = os.path.join(VOC_ROOT_DIR, "JPEGImages")

  xml_files = os.listdir(ANNO_DIR)                       
  print(xml_files[:5]); print(len(xml_files))

  ['2012_001360.xml', '2008_003384.xml', '2008_007317.xml', '2009_000990.xml', '2009_003539.xml']
  17125

xml 파일을 다루기 위한 모듈은 다음과 같다.

  # !pip install lxml
  # 틔
  import os
  import xml.etree.ElementTree as ET

  xml_file = os.path.join(ANNO_DIR, '2007_000032.xml')

  # XML 파일을 Parsing 하여 Element 생성
  # 이렇게 2번의 과정을 거쳐서 Parsing을 완료하면 root라는 변수에 원하는 정보들이 저장되어 있다. 
  tree = ET.parse(xml_file)
  root = tree.getroot()

  # root가 dictionary 변수이면 root.keys()이렇게 출력하면 될덴데...
  print("root.keys = ", end='')
  for child in root:
      print(child.tag, end = ', ')
          

  # image 관련 정보는 root의 자식으로 존재
  # root를 이용해서 dictionary형식의 anotation 정보를 뽑아내는 방법은 다음과 같다.
  image_name = root.find('filename').text
  full_image_name = os.path.join(IMAGE_DIR, image_name)
  image_size = root.find('size')
  image_width = int(image_size.find('width').text)
  image_height = int(image_size.find('height').text)

  # 파일내에 있는 모든 object Element를 찾음.
  objects_list = []
  for obj in root.findall('object'):
      # object element의 자식 element에서 bndbox를 찾음. 
      xmlbox = obj.find('bndbox')
      # bndbox element의 자식 element에서 xmin,ymin,xmax,ymax를 찾고 이의 값(text)를 추출 
      x1 = int(xmlbox.find('xmin').text)
      y1 = int(xmlbox.find('ymin').text)
      x2 = int(xmlbox.find('xmax').text)
      y2 = int(xmlbox.find('ymax').text)
      
      bndbox_pos = (x1, y1, x2, y2)
      class_name=obj.find('name').text
      object_dict={'class_name': class_name, 'bndbox_pos':bndbox_pos}
      objects_list.append(object_dict)

  print('\nfull_image_name:', full_image_name,'\n', 'image_size:', (image_width, image_height))

  for object in objects_list:
      print(object)

      

  root.keys = folder, filename, source, size, segmented, object, object, object, object, 
  full_image_name: ../../data/voc/VOCdevkit/VOC2012/JPEGImages/2007_000032.jpg 
  image_size: (500, 281)
  {'class_name': 'aeroplane', 'bndbox_pos': (104, 78, 375, 183)}
  {'class_name': 'aeroplane', 'bndbox_pos': (133, 88, 197, 123)}
  {'class_name': 'person', 'bndbox_pos': (195, 180, 213, 229)}
  {'class_name': 'person', 'bndbox_pos': (26, 189, 44, 238)}

Annotation내의 Object들의 bounding box 정보를 이용하여 Bounding box 시각화

  import cv2
  import os
  import xml.etree.ElementTree as ET

  xml_file = os.path.join(ANNO_DIR, '2007_000032.xml')

  tree = ET.parse(xml_file)
  root = tree.getroot()

  image_name = root.find('filename').text
  full_image_name = os.path.join(IMAGE_DIR, image_name)

  img = cv2.imread(full_image_name)
  # opencv의 rectangle()는 인자로 들어온 이미지 배열에 그대로 사각형을 그려주므로 별도의 이미지 배열에 그림 작업 수행. 
  draw_img = img.copy()
  # OpenCV는 RGB가 아니라 BGR이므로 빨간색은 (0, 0, 255)
  green_color=(0, 255, 0)
  red_color=(0, 0, 255)

  # 파일내에 있는 모든 object Element를 찾음.
  objects_list = []
  for obj in root.findall('object'):
      xmlbox = obj.find('bndbox')
      
      left = int(xmlbox.find('xmin').text)
      top = int(xmlbox.find('ymin').text)
      right = int(xmlbox.find('xmax').text)
      bottom = int(xmlbox.find('ymax').text)
      
      class_name=obj.find('name').text
      
      # draw_img 배열의 좌상단 우하단 좌표에 녹색으로 box 표시 
      cv2.rectangle(draw_img, (left, top), (right, bottom), color=green_color, thickness=1)
      # draw_img 배열의 좌상단 좌표에 빨간색으로 클래스명 표시
      cv2.putText(draw_img, class_name, (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, thickness=1)

  img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)
  plt.figure(figsize=(10, 10))
  plt.imshow(img_rgb)

drawing

3. MS COCO

json Format (80개 Class : paper에서는 80개라고 했지만, 이미지에서는 굳이 분류 안한 class가 있으니 주의할 것.)
300K 이미지들 1.5M개 Object들 (하나의 이미지당 5개의 객체)
모든 이미지에 대해서 하나의 json 파일이 존재한다.
Pretrained Weight로 활용하기에 좋다.
데이터셋 홈페이지 바로가기
- 보통 2017데이터 셋 가장 최신의 데이터셋을 사용한다.
데이터셋 Explore를 쉽게 할 수 있다.
데이터셋 구성
하나의 이미지 안에 여러 Class들 여러 Object들이 존재하고, 타 데이터 세트에 비해 난이도가 높은 데이터이다.
실제 우리의 환경에서 잘 동작할 수 있는 모델을 만들기 위해 만들어진 데이터셋이다.

4. Google Open Image

csv Format (600개 Class)
Size도 매우 크기 때문에 학습에도 오랜 시간이 걸릴 수 있다.

【Python-Module】 비전 처리 라이브러리 활용 - OpenCV 뼈대 코드

12 Aug 2020 in Artificial Intelligence

다양한 비전 라이브러리를 잠시만 공부해보고, OpenCV를 집중적으로 공부해보자. 이미지 뿐만 아니라 동영상까지 다루는 방법에 대해서, 앞으로 계속 사용할 뼈대 코드들에 대해서 공부해본다.

DLCV/Detection/preliminary/OpenCV이미지와 영상처리 개요 파일 참조

1. Python 기반 주요 이미지 라이브러리

PIL(Python Image Library) : 주요 이미지 처리만. 처리 성능이 상대적으로 느림. 망해가는 모듈.
Scikit-Image : Scipy기반. Numpy기반.
OpenCV
- 최고 인기 컴퓨터 비전 라이브러리. 컴퓨터 비전 기능 일반화에 크게 기여. 대중화.
- cv2.imread(“파일명”) -> numpy array로 변환해 저장 -> RGB형태가 아닌 BGR형태로 로딩!!
- 따라서 image를 바꿔줘야 한다.
```
  import cv2
  import matplotlib.pyplot as plt
  bgr_img_array = cv2.imread('file Name')
  rgb_img_array = cv2.cvtColor(bgr_img_arry, cv2.COLOR.BGR2RGB)
  plt.imshow(rgb_img_array)
```
- 근데 imwrite에서는 내부적으로 또 RGB로 바꾸기 때문에 주의해야한다.
```
  bgr_img_array = cv2.imread('file Name')
  rgb_img_array = cv2.cvtColor(bgr_img_arry, cv2.COLOR.BGR2RGB)
  cv2.imwirte('저장할 파일 이름', bgr_img_array)  # 따라서 bgr이미지 넣어야함
```
- OpenCV Windows Frame 인터페이스 : 아래와 같은 기능들은 GUI 윈도우 우분투 환경에서만 쓸 수 있기 때문에 주피터 노트북 기반에서는 사용시 오류가 발생한다. 따라서 matplotlib를 사용해서 주피터 노트북 기반 이미지 시각화를 해야한다.
  1. cv2.imshow() : 윈도우 frame에 보여줌(따라서 주피터에서는 안보인다.)
  2. cv2.waitKey() : 키보드 입력이 있을때 까지 무한 대기
  3. cv2.destroyAllWindows() : 화면의 윈도우 프레임 모두 종료

2. OpenCV의 이미지 처리

Python에서 사용되는 여러 image라이브러리를 간단히 살펴보고 OpenCV와의 차이 이해
OpenCV의 단일 이미지 처리 방식 이해
OpenCV의 비디오 영상 처리 방식 이해

OpenCV 이미지 처리 이해 및 타 패키지 비교

PIL 패키지를 이용하여 이미지 로드하기

import matplotlib.pyplot as plt
%matplotlib inline  
# https://korbillgates.tistory.com/85 - Jupyter Line에 그림 표등이 출력될 수 있게 하는 것 

from PIL import Image

# PIL은 oepn()으로 image file을 읽어서 ImageFile객체로 생성. 
pil_image = Image.open("../../data/image/beatles01.jpg")
print('image type:', type(pil_image))

plt.figure(figsize=(10, 10))
plt.imshow(pil_image)
#plt.show()

image type: <class 'PIL.JpegImagePlugin.JpegImageFile'>

drawing

skimage(사이킷이미지)로 이미지 로드 하기

skimage는 imread()를 이용하여 RGB 원본 이미지를 RGB 형태의 넘파이 배열로 반환함.

from skimage import io

#skimage는 imread()를 이용하여 image를 numpy 배열로 반환함. 
sk_image = io.imread("../../data/image/beatles01.jpg")
print('sk_image type:', type(sk_image), ' sk_image shape:', sk_image.shape)

plt.figure(figsize=(10, 10))
plt.imshow(sk_image)
#plt.show()

sk_image type: <class 'numpy.ndarray'>  sk_image shape: (633, 806, 3)

drawing

OpenCV, matplotlib으로 이미지 로드하기

OpenCV는 imread()를 이용하여 원본 RGB 이미지를 BGR 형태의 넘파이 배열로 반환함.
OpenCV의 imwrite()를 이용한다면 BGR 형태의 이미지 배열을 파일에 기록할 때 다시 RGB형태로 변환하므로 사용자는 RGB->BGR->RGB 변환에 신경쓰지 않아도 됨.

import cv2

cv2_image = cv2.imread("../../data/image/beatles01.jpg")  # cv2_image 여기서는 BGR이지만
cv2.imwrite("../../data/output/beatles02_cv.jpg", cv2_image) # 여기서는 RGB로 자동으로 다시 바뀐다
print('cv_image type:', type(cv2_image), ' cv_image shape:', cv2_image.shape)  

plt.figure(figsize=(10, 10))
img = plt.imread("../../data/output/beatles02_cv.jpg")
plt.imshow(img)
#plt.show()

cv_image type: <class 'numpy.ndarray'>  cv_image shape: (633, 806, 3)

drawing

OpenCV의 imread()로 반환된 BGR 이미지 넘파이 배열을 그대로 시각화 하기

OpenCV의 imread()는 RGB를 BGR로 변환하므로 원하지 않는 이미지가 출력됨

cv2_image = cv2.imread("../../data/image/beatles01.jpg")

plt.figure(figsize=(10, 10))
plt.imshow(cv2_image)
plt.show()

drawing

sk_image = io.imread("../../data/image/beatles01.jpg")
print(sk_image.shape)
sk_image[:, :, 0]

(633, 806, 3)





array([[ 18,  17,  18, ...,  46,  38,  63],
       [ 18,  18,  18, ...,  72,  41,  37],
       [ 18,  18,  18, ...,  84,  56,  42],
       ...,
       [225, 226, 228, ..., 231, 228, 229],
       [225, 225, 226, ..., 229, 229, 227],
       [225, 225, 224, ..., 227, 227, 227]], dtype=uint8)

cv2_image = cv2.imread("../../data/image/beatles01.jpg")
print(type(cv2_image))
print(cv2_image.shape)
cv2_image[:, :, 0]  # BGR - R값만 print한다. (633, 806, 3) -> (633, 806, 0)

<class 'numpy.ndarray'>
(633, 806, 3)





array([[ 19,  19,  20, ...,  47,  39,  64],
       [ 20,  20,  20, ...,  71,  40,  36],
       [ 20,  20,  20, ...,  82,  54,  40],
       ...,
       [198, 199, 201, ..., 190, 189, 188],
       [198, 198, 199, ..., 188, 188, 186],
       [199, 199, 198, ..., 186, 186, 186]], dtype=uint8)

 cv2_image[:, :, 2]  # BRG - G값만 print한다. (633, 806, 3) -> (633, 806, 0)

array([[ 18,  18,  18, ...,  47,  39,  64],
       [ 19,  19,  18, ...,  72,  41,  37],
       [ 18,  18,  18, ...,  84,  56,  41],
       ...,
       [225, 226, 228, ..., 231, 230, 229],
       [225, 225, 226, ..., 229, 229, 227],
       [225, 225, 224, ..., 227, 227, 227]], dtype=uint8)

cv2_image = cv2.imread("../../data/image/beatles01.jpg")
draw_image = cv2.cvtColor(cv2_image, cv2.COLOR_BGR2RGB)  # 이 작업에 시간 많이 안든다. 단순 메모리 포인트 변화만 하므로. 
cv2.imwrite("../../data/output/beatles02_cv.jpg", draw_image)
# 이렇게 저장된 것을 확인해보면 색이 엉망인것을 확인할 수 있다. 따라서 imwirte를 사용해서 저장한다면 반드시 BGR 이미지가 들어가게 하도록! 

plt.figure(figsize=(10, 10))
plt.imshow(draw_image)
plt.show()

drawing

3. OpenCV 영상처리

뼈대 코드 이다. 나중에 Object Detection 결과를 동영상으로 저장할 때 이와 같은 방법을 사용할 것이므로, 꼭 잘 알아두어야 한다.
OpenCV는 간편하게 비디오 영상처리를 할 수 있는 API를 제공
VideoCapture 객체는 Video Streaming을 Frame 별로 Capture하여 처리할 수 있는 기능 제공
VideoWriter 객체는 VideoCapture로 읽어들인 Frame을 동영상으로 Write하는 기능 제공

# 일단 시각화만 해본다. 
# wget https://github.com/chulminkw/DLCV/blob/master/data/video/Night_Day_Chase.mp4?raw=true 으로 다운로드 가능. 
from IPython.display import clear_output, Image, display, Video, HTML
Video('../../data/video/Night_Day_Chase.mp4') # 소리도 있음

여기서부터 Video를 사용하는 방법이다.
Linux에서 video output의 확장자는 반드시 avi 로 설정 필요. (mp4로 재생되긴 되더라..)

OpenCV 영상 처리 개요

VideoCapture클래스는 동영상을 개별 Frame으로 하나씩 읽어(wideoCapture.read()) 들이는 기능을 제공한다.
VideoWriter는 VideoCapture로 읽어들인 개별 Frame을 차곡차곡 쌓아서 Write 수행하며 하나의 동영상을 만든다.
get 메소드를 사용해서 동영상으 상세 정보를 하나하나 불러들여올 수 있다.

VideoWriter(저장위치, 코덱, 프레임, 사이즈(1x2 배열))

코덱 포멧을 다양하게 Encoding할 수 있다. (DIVX, XVID, MJPG, X264, WMV1, WMV2)
하지만 가상환경의 우리의 리눅스 환경에서는 XVID 코덱을 사용해야 하고, write시 동영상 인코더는 avi로 해야한다. input을 mp4를 했다고 하더라도. ㅈ

# 여기서부터 Video를 사용하는 방법이다.
import cv2

video_input_path = '../../data/video/Night_Day_Chase.mp4'
# linux에서 video output의 확장자는 반드시 avi 로 설정 필요. 
video_output_path = '../../data/output/Night_Day_Chase_output.avi'

cap = cv2.VideoCapture(video_input_path) # <- 앞으로 이 변수를 중심으로 사용할 것
# Codec은 *'XVID'로 설정. 
codec = cv2.VideoWriter_fourcc(*'XVID')  # 코덱 객체를 생성

vid_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))) #(200, 400)
vid_fps = cap.get(cv2.CAP_PROP_FPS )
frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))    
    
vid_writer = cv2.VideoWriter(video_output_path, codec, vid_fps, vid_size) # 처음에는 아무것도 없는 동영상
print('총 Frame 갯수:', frame_cnt, 'FPS:', round(vid_fps), 'Frame 크기:', vid_size)

총 Frame 갯수: 1383 FPS: 28 Frame 크기: (1216, 516)

import time

green_color=(0, 255, 0)
red_color=(0, 0, 255)

start = time.time()
index=0
while True:
    hasFrame, img_frame = cap.read() 
    # cap.read가 frame 하나하나를 순서대로 뱉어준다. 한 프래임씩 읽어주는 아주 좋은 매소드함수라고 할 수 있다.
    if not hasFrame:
        print('더 이상 처리할 frame이 없습니다.')
        break
    index += 1
    print('frame :', index, '처리 완료')
    cv2.rectangle(img_frame, (300, 100, 800, 400), color=green_color, thickness=2)  # 동영상 프래임 이미지 각각에 박스 넣기
    caption = "frame:{}".format(index)
    cv2.putText(img_frame, caption, (300, 95), cv2.FONT_HERSHEY_SIMPLEX, 0.7, red_color, 1)  # 동영상 프레임 이미지 각각에 몇번째 Frame인지 적기
    
    vid_writer.write(img_frame) 
    # 동영상에 차곡차곡 동영상프래임이미지를 쌓아올린다. 

print('write 완료 시간:', round(time.time()-start,4))
vid_writer.release()  # 나중에 저장한 동영상. file.close() 와 같은 느낌
cap.release()         # 처음에 가져온 이미지. file.close() 와 같은 느낌

frame : 1 처리 완료
frame : 2 처리 완료
frame : 3 처리 완료
frame : 4 처리 완료
fr...
frame : 1381 처리 완료
frame : 1382 처리 완료
frame : 1383 처리 완료
더 이상 처리할 frame이 없습니다.
write 완료 시간: 12.3946

Video('../../data/output/Night_Day_Chase_output.avi') 

avi 파일은 주피터 노트북에서 실행할 수 없다. 그래서 파일을 만들었으면, 위와 같이 실행할 수 없다. mp4 파일만 실행할 수 있음을 알아두자.
그래서 avi 파일은 다운을 받아서 나의 컴퓨터에 실행해봐야 한다. 다운을 받는 방법은 그냥 jupyter notebook에서 download를 하면 된다. 그리고 아래와 같이 object storage에 넣고, 다운 받는 방법 또한 있다.
당연히 동영상에 소리는 없다. Detection을 위한다면 소리는 필요없다.

!gsutil cp ../../data/output/Night_Day_Chase_output.avi gs://my_bucket_dlcv/data/output/Night_Day_Chase_output.avi

【Tensorflow】Faster RCNN Inference 수행하기 + GPU 자원 주의사항

11 Aug 2020 in Pytorch / Docker / Git

Tensorflow 1.3. Faster RCNN API로 Object Detection 수행하기
/DLCV/Detection/fast_rcnn/Tensorflow_FasterRCNN_ObjectDetection.ipynb 참조

0. Tensorflow inferece 과정

이미지 read 하기
.pb 파일만 읽어오기 - tf.gfile.FastGFile, graph_def = tf.GraphDef() 사용
세션을 시작한다 - with tf.Session() as sess:
세션 내부에서 graph를 import한다 - tf.import_graph_def(graph_def, name=’’)
sess.run으로 forward처리하고, 원하는 정보를 뽑아온다. out = sess.run
객체 하나하나에 대한 정보를 추출하여 시각화 한다 - for i in range(int(out[0][0])):

1. GPU 자원 주의사항

drawing

해결방안

$ nvidia-smi 를 주기적으로 확인하고 학습을 시작하기 전에,
1. Jupyter - Running - 안쓰는 Notebook Shutdown 하기
2. Notebook - Restart & Clear output 하기
3. nvidia-smi 에서 나오는 process 중 GPU많이 사용하는 프로세서 Kill -9 <Processer ID>하기
4. Jupyter Notebook 을 Terminal에서 kill하고 다시 키기 (~ /start_nb.sh)

2. tensorflow로 Object Detection 수행하기

1. 단일 이미지 Object Detection

import cv2
import matplotlib.pyplot as plt
%matplotlib inline

img = cv2.imread('../../data/image/john_wick01.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

print('image shape:', img.shape)
plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

image shape: (450, 814, 3)

labels_to_names = {1:'person',2:'bicycle',3:'car',4:'motorcycle',5:'airplane',6:'bus',7:'train',8:'truck',9:'boat',10:'traffic light',
                    11:'fire hydrant',12:'street sign',13:'stop sign',14:'parking meter',15:'bench',16:'bird',17:'cat',18:'dog',19:'horse',20:'sheep',
                    21:'cow',22:'elephant',23:'bear',24:'zebra',25:'giraffe',26:'hat',27:'backpack',28:'umbrella',29:'shoe',30:'eye glasses',
                    31:'handbag',32:'tie',33:'suitcase',34:'frisbee',35:'skis',36:'snowboard',37:'sports ball',38:'kite',39:'baseball bat',40:'baseball glove',
                    41:'skateboard',42:'surfboard',43:'tennis racket',44:'bottle',45:'plate',46:'wine glass',47:'cup',48:'fork',49:'knife',50:'spoon',
                    51:'bowl',52:'banana',53:'apple',54:'sandwich',55:'orange',56:'broccoli',57:'carrot',58:'hot dog',59:'pizza',60:'donut',
                    61:'cake',62:'chair',63:'couch',64:'potted plant',65:'bed',66:'mirror',67:'dining table',68:'window',69:'desk',70:'toilet',
                    71:'door',72:'tv',73:'laptop',74:'mouse',75:'remote',76:'keyboard',77:'cell phone',78:'microwave',79:'oven',80:'toaster',
                    81:'sink',82:'refrigerator',83:'blender',84:'book',85:'clock',86:'vase',87:'scissors',88:'teddy bear',89:'hair drier',90:'toothbrush',
                    91:'hair brush'}

# !mkdir pretrained; cd pretrained
# !wget http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz
# !wget https://raw.githubusercontent.com/opencv/opencv_extra/master/testdata/dnn/faster_rcnn_resnet50_coco_2018_01_28.pbtxt
# cd faster_rcnn_resnet50_coco_2018_01_28; mv faster_rcnn_resnet50_coco_2018_01_28.pbtxt graph.pbtxt

import numpy as np
import tensorflow as tf
import cv2
import time
import matplotlib.pyplot as plt
%matplotlib inline


#inference graph를 읽음. .
with tf.gfile.FastGFile('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    
with tf.Session() as sess:
    # Session 시작하고 inference graph 모델 로딩 
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')
    # 여기서 session 내부에 graph가 들어가게 된다. 후에 sess변수를 사용하면서 grpah 정보를 가져올 수 있다. 
    
    # 입력 이미지 생성 및 BGR을 RGB로 변경 
    img = cv2.imread('../../data/image/beatles01.jpg')
    draw_img = img.copy()
    rows = img.shape[0]
    cols = img.shape[1]
    input_img = img[:, :, [2, 1, 0]]   # BGR -> RGB
    
    start = time.time()
    # Object Detection 수행. 
    # run - graph.get을 통해서 내가 가져오고 싶은 것을 인자로 적어놓는다. 순서대로 [객체수, 신뢰도, Box위치, Class]
    out = sess.run([sess.graph.get_tensor_by_name('num_detections:0'),
                    sess.graph.get_tensor_by_name('detection_scores:0'),
                    sess.graph.get_tensor_by_name('detection_boxes:0'),
                    sess.graph.get_tensor_by_name('detection_classes:0')],
                   feed_dict={'image_tensor:0': input_img.reshape(1, input_img.shape[0], input_img.shape[1], 3) } ) # 이미지 여러게 
    print('type of out:', type(out), 'length of out:',len(out))  # list(4) = [객체수, 신뢰도, Box위치, Class]
    print(out)
    green_color=(0, 255, 0)
    red_color=(0, 0, 255)
    
    # Bounding Box 시각화 
    num_detections = int(out[0][0])
    for i in range(num_detections):
        classId = int(out[3][0][i])
        score = float(out[1][0][i])
        bbox = [float(v) for v in out[2][0][i]]
        if score > 0.5:
            left = bbox[1] * cols
            top = bbox[0] * rows
            right = bbox[3] * cols
            bottom = bbox[2] * rows
            cv2.rectangle(draw_img, (int(left), int(top)), (int(right), int(bottom)), green_color, thickness=2)
            caption = "{}: {:.4f}".format(labels_to_names[classId], score)
            print(caption)
            cv2.putText(draw_img, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, 1)
    
    print('Detection 수행시간:',round(time.time() - start, 2),"초")
    
img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

# NMS 필터링에서 약간의 문제가 있는듯 하다... 약간 결과가 꺼림직하다. 
        

type of out: <class 'list'> length of out: 4
밑의 내용 : [객체수, 신뢰도, Box위치, Class]

[array([19.], dtype=float32), 

array([[0.99974984, 0.99930644, 0.9980475 , 0.9970795 , 0.9222008 ,
        0.8515703 , 0.8055376 , 0.7321974 , 0.7169089 , 0.6350252 ,
        0.6057731 , 0.5482028 , 0.51252437, 0.46408176, 0.43892667,
        0.41287616, 0.4075464 , 0.39610404, 0.3171757 , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        .
        ...
        .
        .

        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ]],
      dtype=float32), 
      
array([[[0.40270284, 0.2723695 , 0.8693631 , 0.46764165],
        [0.40439418, 0.06080557, 0.88185936, 0.24013077],
        [0.40899867, 0.68438506, 0.9282361 , 0.9033634 ],
        [0.42774147, 0.4751278 , 0.8887425 , 0.7367553 ],
        [0.3681334 , 0.5855469 , 0.41420895, 0.6274197 ],
        [0.36090973, 0.7612593 , 0.46531847, 0.78825235],
        [0.35362682, 0.5422665 , 0.3779468 , 0.56790847],
        [0.35872525, 0.47497243, 0.37832502, 0.4952262 ],
        [0.39067298, 0.17564818, 0.54261357, 0.31135702],
        [0.3596046 , 0.6206162 , 0.4659364 , 0.7180736 ],
        [0.36052787, 0.7542875 , 0.45949724, 0.7803741 ],
        [0.35740715, 0.55126834, 0.38326728, 0.57657194],
        [0.36718863, 0.5769864 , 0.40654665, 0.61239254],
        [0.35574582, 0.4798463 , 0.37322614, 0.4985193 ],
        [0.35036406, 0.5329462 , 0.3708444 , 0.5514975 ],
        [0.367587  , 0.39456058, 0.41583234, 0.43441534],
        [0.3562084 , 0.47724184, 0.37217227, 0.49240994],
        [0.36195153, 0.6252996 , 0.46575055, 0.72400415],
        [0.36365557, 0.5674811 , 0.39475283, 0.59136254],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ],
        ...
        .
        .
        ...
        .
        [0.        , 0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        , 0.        ]]], dtype=float32), 
        
        
    array([[1., 1., 1., 1., 3., 1., 3., 3., 3., 8., 1., 3., 3., 3., 3., 3.,
        3., 3., 3., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1.]], dtype=float32)]

person: 0.9997
person: 0.9993
person: 0.9980
person: 0.9971
car: 0.9222
person: 0.8516
car: 0.8055
car: 0.7322
car: 0.7169
truck: 0.6350
person: 0.6058
car: 0.5482
car: 0.5125

Detection 수행시간: 12.99 초

drawing

2. 위의 과정 함수로 def하고 활용해보기

def get_tensor_detected_image(sess, img_array, use_copied_array):
    
    rows = img_array.shape[0]
    cols = img_array.shape[1]
    if use_copied_array:
        draw_img_array = img_array.copy()
    else:
        draw_img_array = img_array
    
    input_img = img_array[:, :, [2, 1, 0]]  # BGR2RGB

    start = time.time()
    # Object Detection 수행. 
    out = sess.run([sess.graph.get_tensor_by_name('num_detections:0'),
                    sess.graph.get_tensor_by_name('detection_scores:0'),
                    sess.graph.get_tensor_by_name('detection_boxes:0'),
                    sess.graph.get_tensor_by_name('detection_classes:0')],
                   feed_dict={'image_tensor:0': input_img.reshape(1, input_img.shape[0], input_img.shape[1], 3)})
    
    green_color=(0, 255, 0)
    red_color=(0, 0, 255)
    
    # Bounding Box 시각화 
    num_detections = int(out[0][0])
    for i in range(num_detections):
        classId = int(out[3][0][i])
        score = float(out[1][0][i])
        bbox = [float(v) for v in out[2][0][i]]
        if score > 0.5:
            left = bbox[1] * cols
            top = bbox[0] * rows
            right = bbox[3] * cols
            bottom = bbox[2] * rows
            cv2.rectangle(draw_img_array, (int(left), int(top)), (int(right), int(bottom)), green_color, thickness=2)
            caption = "{}: {:.4f}".format(labels_to_names[classId], score)
            cv2.putText(draw_img_array, caption, (int(left), int(top - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, 1)
            #print(caption)
    print('Detection 수행시간:',round(time.time() - start, 2),"초")
    return draw_img_array
# end of function

방금 위에서 만든 함수 사용해서 Image Object Detection 수행하기

with tf.gfile.FastGFile('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    
with tf.Session() as sess:
    # Session 시작하고 inference graph 모델 로딩 
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')
    
    # 입력 이미지 생성 및 BGR을 RGB로 변경 
    img = cv2.imread('../../data/image/john_wick01.jpg')
    draw_img = get_tensor_detected_image(sess, img, True)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)

Detection 수행시간: 15.58 초

drawing

3. 위에서 만든 함수로 Video Object Detection 수행

video_input_path = '../../data/video/John_Wick_small.mp4'
# linux에서 video output의 확장자는 반드시 avi 로 설정 필요. 
video_output_path = '../../data/output/John_Wick_small_tensor01.avi'

cap = cv2.VideoCapture(video_input_path)

codec = cv2.VideoWriter_fourcc(*'XVID')

vid_size = (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
vid_fps = cap.get(cv2.CAP_PROP_FPS)
    
vid_writer = cv2.VideoWriter(video_output_path, codec, vid_fps, vid_size) 

frame_cnt = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print('총 Frame 갯수:', frame_cnt)

green_color=(0, 255, 0)
red_color=(0, 0, 255)

#inference graph를 읽음. .
with tf.gfile.FastGFile('./pretrained/faster_rcnn_resnet50_coco_2018_01_28/frozen_inference_graph.pb', 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    
with tf.Session() as sess:
    # Session 시작하고 inference graph 모델 로딩 
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')

    while True:
        hasFrame, img_frame = cap.read()
        if not hasFrame:
            print('더 이상 처리할 frame이 없습니다.')
            break

        draw_img_frame = get_tensor_detected_image(sess, img_frame, False)
        vid_writer.write(draw_img_frame)
    # end of while loop

vid_writer.release()
cap.release()  

!gsutil cp ../../data/output/John_Wick_small_tensor01.avi gs://my_bucket_dlcv/data/output/John_Wick_small_tensor01.avi

drawing

【Vision】 Selective Search Python Module, IOU 계산 코드 만들기

11 Aug 2020 in Artificial Intelligence

Selective Search Python 모듈을 사용해보고 IOU를 적용해보자.

Selective Search Python Module 사용해보기 DLCV/Detection/preliminary/Selective_search와 IOU구하기 파일 참조

conda activate tf113
jupyter notebook 실행 - $ nphup jupyter notebook &

1. Selective Search 코드 실습

import cv2 에서 ImportError: libGL.so.1: cannot open shared object file: No such file or directory 에러가 뜬다면
$ sudo apt-get install libgl1-mesa-glx 을 실행하기.

#!pip install selectivesearch

import selectivesearch
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

### 오드리헵번 이미지를 cv2로 로드하고 matplotlib으로 시각화
img_bgr = cv2.imread('../../data/image/audrey01.jpg')
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
print('img shape:', img.shape)

plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
plt.show()

img shape: (450, 375, 3)

selectivesearch.selective_search()는 이미지의 Region Proposal정보를 반환
아래와 같이 selectivesearch 모듈을 사용하는 방법은 다음과 같다.
매개변수는 ( 이미지, scale= object의 사이즈가 어느정도 인가? 알고리즘 조정하기, min_size보다는 넓이가 큰 bounding box를 추천해달라)

_, regions = selectivesearch.selective_search(img_rgb, scale=100, min_size=2000)
print(type(regions), len(regions))
print(regions[0])
print(regions[1])
# (x1 y1 좌상단 width hight)  (bounding box size)  (label이 1개면 독자적인 영역. 2개 이상이면 각 Label을 합친 영역이라는 것을 의미)

<class 'list'> 41
{'rect': (0, 0, 107, 167), 'size': 11166, 'labels': [0.0]}
{'rect': (15, 0, 129, 110), 'size': 8771, 'labels': [1.0]}

반환된 regions 변수는 리스트 타입으로 세부 원소로 딕셔너리를 가지고 있음.
개별 딕셔너리내 KEY값별 의미
- rect 키값은 x,y 좌상단 좌표와 너비, 높이 값을 가지며 이 값이 Detected Object 후보를 나타내는 Bounding box임.
- size는 Bounding box의 크기.
- labels는 해당 rect로 지정된 Bounding Box내에 있는 오브젝트들의 고유 ID. 아래로 내려갈 수록 너비와 높이 값이 큰 Bounding box이며 하나의 Bounding box에 여러개의 box가 합쳐진 box이다. 여러개의 오브젝트가 있을 확률이 크다.

# rect정보(x1 y1 좌상단 width hight) 만 출력해서 보기
cand_rects = [box['rect'] for box in regions]
print(cand_rects)

bounding box를 시각화 하기

# opencv의 rectangle()을 이용하여 시각화 그림에 사각형을 그리기
# rectangle()은 이미지와 좌상단 좌표, 우하단 좌표, box컬러색, 두께등을 인자로 입력하면 원본 이미지에 box를 그려줌.

green_rgb = (125, 255, 51)
img_rgb_copy = img_rgb.copy()
for rect in cand_rects:

    left = rect[0]
    top = rect[1]
    # rect[2], rect[3]은 너비와 높이이므로 우하단 좌표를 구하기 위해 좌상단 좌표에 각각을 더함.
    right = left + rect[2]
    bottom = top + rect[3]

    img_rgb_copy = cv2.rectangle(img_rgb_copy, (left, top), (right, bottom), color=green_rgb, thickness=2)
    # 상자를 추가한 Image로 변수 변경

plt.figure(figsize=(8, 8))
plt.imshow(img_rgb_copy)
plt.show()

bounding box의 크기가 큰 후보만 추출
- 바로 위에 코드랑 똑같은 코드지만 size만 조금 더 고려

cand_rects = [cand['rect'] for cand in regions if cand['size'] > 10000]

green_rgb = (125, 255, 51)
img_rgb_copy = img_rgb.copy()
for rect in cand_rects:

    left = rect[0]
    top = rect[1]
    # rect[2], rect[3]은 너비와 높이이므로 우하단 좌표를 구하기 위해 좌상단 좌표에 각각을 더함.
    right = left + rect[2]
    bottom = top + rect[3]

    img_rgb_copy = cv2.rectangle(img_rgb_copy, (left, top), (right, bottom), color=green_rgb, thickness=2)

plt.figure(figsize=(8, 8))
plt.imshow(img_rgb_copy)
plt.show()

2. IOU 적용해보기

IOU 구하기

입력인자로 후보 박스와 실제 박스를 받아서 IOU를 계산하는 함수 생성

import numpy as np

# input에는 (x1 y1 x2 x2) 이미지의 좌상단, 우하단 좌표가 들어가 있다.
def compute_iou(cand_box, gt_box):

    # Calculate intersection areas
    x1 = np.maximum(cand_box[0], gt_box[0])
    y1 = np.maximum(cand_box[1], gt_box[1])
    x2 = np.minimum(cand_box[2], gt_box[2])
    y2 = np.minimum(cand_box[3], gt_box[3])

    intersection = np.maximum(x2 - x1, 0) * np.maximum(y2 - y1, 0) # 혹시 모르게 음수가 나올 수 있으니까..

    cand_box_area = (cand_box[2] - cand_box[0]) * (cand_box[3] - cand_box[1])
    gt_box_area = (gt_box[2] - gt_box[0]) * (gt_box[3] - gt_box[1])
    union = cand_box_area + gt_box_area - intersection

    iou = intersection / union
    return iou

import cv2
import matplotlib.pyplot as plt
%matplotlib inline

# 실제 box(Ground Truth)의 좌표를 아래와 같다고 가정.
gt_box = [60, 15, 320, 420]

img = cv2.imread('../../data/image/audrey01.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

red = (255, 0 , 0)
img_rgb = cv2.rectangle(img_rgb, (gt_box[0], gt_box[1]), (gt_box[2], gt_box[3]), color=red, thickness=2)

plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
plt.show()

import selectivesearch

#selectivesearch.selective_search()는 이미지의 Region Proposal정보를 반환
_, regions = selectivesearch.selective_search(img_rgb, scale=100, min_size=2000)

print(type(regions), len(regions))

<class 'list'> 53

cand_rects = [cand['rect'] for cand in regions if cand['size'] > 15000]
# cand_box 값도 (좌상단 x1, y1, width, hight) 를 (좌상단, 우하단)의 좌표로 바꾼다.
for index, cand_box in enumerate(cand_rects):
    cand_box = list(cand_box)
    cand_box[2] += cand_box[0]
    cand_box[3] += cand_box[1]

    # 각각의 Box 별로 IOU값을 구해본다
    iou = compute_iou(cand_box, gt_box)
    print('index:', index, "iou:", iou)

index: 0 iou: 0.5933903133903133
index: 1 iou: 0.20454890788224123
index: 2 iou: 0.5958024691358025
index: 3 iou: 0.5958024691358025
index: 4 iou: 0.1134453781512605
index: 5 iou: 0.354069104098905
index: 6 iou: 0.1134453781512605
index: 7 iou: 0.3278419532685744
index: 8 iou: 0.3837088388214905
index: 9 iou: 0.3956795484151107
index: 10 iou: 0.5008648690956052
index: 11 iou: 0.7389566501483806
index: 12 iou: 0.815085997397344
index: 13 iou: 0.6270619201314865
index: 14 iou: 0.6270619201314865
index: 15 iou: 0.6270619201314865

바로 위의 코드를 적용해서 이미지와 IOU를 표현하는 이미지를 그려보자.

img = cv2.imread('../../data/image/audrey01.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print('img shape:', img.shape)

green_rgb = (125, 255, 51)
cand_rects = [cand['rect'] for cand in regions if cand['size'] > 3000]
gt_box = [60, 15, 320, 420]
img_rgb = cv2.rectangle(img_rgb, (gt_box[0], gt_box[1]), (gt_box[2], gt_box[3]), color=red, thickness=2)

for index, cand_box in enumerate(cand_rects):

    cand_box = list(cand_box)
    cand_box[2] += cand_box[0]
    cand_box[3] += cand_box[1]

    iou = compute_iou(cand_box, gt_box)

    if iou > 0.7:
        print('index:', index, "iou:", iou, 'rectangle:',(cand_box[0], cand_box[1], cand_box[2], cand_box[3]) )
        cv2.rectangle(img_rgb, (cand_box[0], cand_box[1]), (cand_box[2], cand_box[3]), color=green_rgb, thickness=1)
        text = "{}: {:.2f}".format(index, iou)
        cv2.putText(img_rgb, text, (cand_box[0]+ 100, cand_box[1]+10), cv2.FONT_HERSHEY_SIMPLEX, 0.4, color=green_rgb, thickness=1)

plt.figure(figsize=(12, 12))
plt.imshow(img_rgb)
plt.show()

img shape: (450, 375, 3)
index: 3 iou: 0.9874899187876287 rectangle: (59, 14, 321, 421)
index: 4 iou: 0.9748907882241216 rectangle: (62, 17, 318, 418)
index: 43 iou: 0.7389566501483806 rectangle: (63, 0, 374, 449)
index: 44 iou: 0.815085997397344 rectangle: (16, 0, 318, 418)

【Vision】 Detection과 Segmentation 다시 정리 1 - 계보 및 개요, mAP

10 Aug 2020 in Artificial Intelligence

당연하다고 생각하지만, 아직은 공부할게 많은 Detection과 Segmentation에 대한 개념을 다시 상기하고 정리해보면서 공부해볼 계획이다.

Detection과 Segmentation 다시 정리 1

1. Object Detection 계보

Pascal VOC 데이터 기반에서 AlexNet을 통해서 딥러닝이 화두가 되었다.
Detection을 정확히 분류하면 다음과 같이 분류할 수 있다.
- Classification
- Localization
- Detection : Bounding Box Regression + Classification
- Segmentation : Bounding Box Regression + Pixel Level Classification
계보
- Traditional Detection 알고리즘 - VJ det, HOG det …
- Deep Learning Detection 알고리즘 - 1 Stage Detection, 2 Stage Detectio (Region proposal + Classification)
- SSD -> Yolo2, Yolo3
- Retina-Net : 실시간성은 Yolo3보다는 아주 조금 느리지만, 정확성은 좋다.
Detection은 API가 잘 정해져있지만, Train에 오류가 많이 나서 쉽지가 않다.

2. Object Detection

요소
- Region Proposal
- Feature Extraction & Network Prediction (Deep Nueral Network)
- IOU/ NMS/ mAP/ Anchor box
난제
- 하나의 이미지에서 여러개의 물체의 Localization + classification해야함
- 물체의 크기가 Multi-Scale Objects 이다.
- Real Time + Accuracy 모두를 보장해야함.
- 물체가 가려져 있거나, 물체 부분만 나와 있거나
- 훈련 DataSet의 부족 (Ms Coco, Google Open Image 등이 있다.)

3. Object Localization 개요

아래와 같은 전체적인 흐름에서 Localizattion을생각해보자.
- 위 사진의 4개의 값을 찾는다. (x1, y1, width, hight) 물체가 있을 법한 이 좌표를 찾는 것이 목적이다.
- Object Localization 예측 결과
  - class Number, Confidence Score, x1, y1, width, hight
- 2개 이상의 Object 검출하기
  - Sliding Window 방식 - anchor(window)를 슬라이딩 하면서 그 부분에 객체가 있는지 계속 확인하는 방법. 다양한 크기 다양한. 비율의 windows.
    또는 이미지를 조금씩 작게 만든 후 상대적으로 큰 window를 슬라이딩 하는 방식도 있다. (FPN의 기원이라고 할 수 있다.)
  - Region Proposal : 위와 같은 방법이 아니라, 일종의 알고리즘 방식으로 물체가 있을 법한 위치를 찾자.
    1. Selective Search : window방법보다는 빠르고 비교적 정확한 추천을 해줬다. Pixel by Pixel로 {컬러, 무늬, 형태} 에 따라서 유사한 영역을 찾아준다. 처음에는 하나의 이미지에 대략 200개의 Region을 제안한다. 그 각 영역들에 대해 유사한 것끼리 묶는 Grouping 과정을 반복하여 적절한 영역을 선택해 나간다. (Pixel Intensity 기반한 Graph-based segment 기법에 따라 Over Segmentation을 수행한다.)
    2. RPN(Region Proposal Network)

4. Object Detection 필수 구성 성분

IOU : 2개의 Boundiong Box에 대해서 (교집합 / 합집합)
NMS(Non Max Suppression) : Object가 있을 만한 곳을 모든 Region을 배출한다. 따라서 비슷한 영역의 Region proposal이 나올 수 있다. 일정 IOU값 이상의 Bounding Boxs에 대해서 Confidence Score가 최대가 아닌것은 모두 눌러버린다.
1. Confidence Score 가 0.5 이햐인 Box는 모두 제거
2. Confidence Score에 대해서 Box를 오름차순 한다
3. 높은 Confidence Score의 Box부터 겹치는 다른 Box를 모두 조사하여 특정 IOU 이상인 Box를 모두 제거 한다(IOU Threshold > 0.4, [주의] 이 값이 낮을 수록 많은 Box가 제거 된다. )
4. 남아있는 Box만 선택

5. 필수 성능지표 mAP (mean Average Precision)

Inference Time도 중요하지만 AP 수치도 중요하다.
Precision(예측 기준)과 Recall(정답 기준)의 관계는 다음과 같다.
또 다른 설명은 여기(참고 이전 Post)를 참고 할 것.
Precision과 Recall이 적절히 둘다 좋아야 좋은 모델이다. 라고 할 수 있다.
Pascal VOC - IOU:0.5 // COCO challenge - IOU:0.5 0.6 0.7 0.8 0.9
TN FN FP TP 분류하는 방법
암인데 암이 아니라고 하고, 사기인데 사기가 아니라고 하면 심각한 문제가 발생하므로 ‘진짜를 진짜라고 판단하는 **Recall**‘이 중요하다.(FN이 심각한 문제를 야기한다.) 반대로 스팸 메일이 아닌데 스팸이라고 판단하면 심각한 문제가 발생하므로 ‘내가 진짜라고 판단한게 진짜인 **Precision**‘이 중요하다.(FP가 심각한 문제를 야기한다.)
이러한 조절을 Confidence Threshold를 이용해서 할 수 있다.
Ex. Confidence Threshold를 낮게 한다면 Recall(재현율)이 높아 진다. (다 Positive라고 판단해버리기)
Confidence Threshold을 높게 한다면 Precision(정밀도)가 높아진다.(정말 확실한 경우만 Positive라고 예측하기)
즉 Confidence에 따른 Precison과 Recall의 변화 그래프이므로, 여기(참고 이전 Post)에서 Confidence에 대해서 내림차순 정렬을 하고, 차근차근 Recall, Precision점을 찍고 그 그래프의 넓이를 구하는 것이다.
Confidence Threshold가 큰것부터 시작했으므로, Precision은 높고, Recall은 낮은 상태부터 시작한다. 주의할 점은 오른쪽 최대 Precision 값을 연결하고, mAP를 계산한다!
지금까지가 AP를 구하는 방법이었다. 즉 AP는 한개의 Object에 대해서 값을 구하는 것이다. (참고 이전 Post) 그리고 모든 Object Class(새, 나비, 차, 사람 등등)에 대해서 AP를 구한 후 평균값을 사용하는 것이 바로 mAP이다.
COCO Dataset에 대해서는 IOU Threshold를 다양하게(AP@[.50:.05:.95]) 주기 때문에 높은 IOU에 대해서 낮은 mAP가 나올 수 있음을 명심해야 한다.(높은 IOU Threshold라는 것은 FP와 TP 중 TP가 되기 힘듦을 의미한다.)
그리고 COCO는 Object의 Scale 별로 대/중/소에 따른 mAP도 즉정한다.

【VScode】 Prettier 문제 해결을 위한 고찰

06 Aug 2020 in Ubuntu / Language / Algorithm

Prettier 문제 해결을 위한 과정 기록

문제점

junha1125.github.io 폴더의 파일은 Prettier이 동작하지 않는 문제

구글링을 통한 해결 과정

setting - format save on - 위의 사진과 같은 에러가 뜸.
하지만 windows VScode, windows 다른 폴더에서 작업을 하면 Prettier 정상작동 md, py 파일 모두 정상 작동
굳이 junha1125.github.io 이 폴더에서만 위와 같은 에러로 Formating 동작 안함
setting - Prettier : Prettier path 를 npm -global~을 실행해서 나오는 결과로 적어 넣으라고 해서 그렇게 함. 위의 에러는 안 뜨지만 동작하지 않음 이 사이트의 조언
junha1125.github.io 폴더를 완전히 전체 삭제한 후, C 드라이브 위치말고 ~/user/sb020 내부 onedrive위치에 옮겨놓음. 그래도 동작 안함
그러나 WSL에서는 모든 것이 잘 동작. 어느 폴더에서 어떤 파일형식으로 동작하든 잘 동작.

결론

어떤 폴더든 어떤 파일이든 Pretter는 window에서도 WSL에서도 잘 동작한다.
딱 하나! VScode - Open Folder - junha1125.github.io 로 열어서 작업하면 동작하지 않는다. (windows든 WSL이든)
따라서 md파일을 수정하면서 prettier효과를 보고 싶다면, md 파일 하나만 VScode로 열어서 작업하라.
또는 _post 폴더 위에서 작업해도 Prettier 효과를 잘 볼 수 있다.

해결

md파일을 수정하면서 prettier효과를 보고 싶다면, md 파일 하나만 VScode로 열어서 작업하라.
근데 에러는 뜬다. (동작 잘하는데 왜 똑같은 에러가 뜨는거지??????)
아하! 그렇다면 위의 링크 사이트대로 Prettier Path 수정하니까 에러도 안뜨고 동작도 잘 함(물론 Working Folder가 조금 꺼림직함…)

추가 조언

Powershell, WSL, git Bash를 VS vode에서 동작하게 하는 방법.
1. Default Shell 클릭 -> 원하는 Shell 선택
2. VScode - Terminal - New Terminal
3. 내가 아까 원했던 Shell이 나오는 것을 확인할 수 있다.
막상 Prettier이 있는 상태로 md파일을 수정하니 너무 불편하다. 그래서 Format Save 설정을 꺼버렸다.

문제 해결 완료.

0. OpenCV 모듈 과정 요약

1. OpenCV DNN 패키지를 이용하여 Faster R-CNN

1-0 입력 이미지로 사용될 이미지 보기

1-1. Tensorflow에서 Pretrained 된 Inference모델(Frozen graph)와 환경파일을 다운로드 받은 후 이를 이용해 OpenCV에서 Inference 모델 생성

1-2 dnn에서 readNetFromTensorflow()로 tensorflow inference 모델을 로딩

1-3 이미지를 preprocessing 수행하여 Network에 입력하고 Object Detection 수행 후 결과를 이미지에 시각화

1-4 위에서 했던 작업을 def 함수로 만들어보자!

2. Video Object Detection 수행

2-1 원본 영상 보기

2-2 VideoCapture와 VideoWriter 설정하기

2-3 총 Frame 별로 iteration 하면서 Object Detection 수행.

2-4 위에서 만든 함수를 사용해서 ,video detection 전용 함수 생성.

1. Faster RCNN

1-1 RPN과 Anchor Box

1-2 RPN 상세 설명과 Loss 함수

1-3 Alternation Training

2. OpenCV를 이용한 Object Detection 개념

2-1 OpenCV DNN 장단점

2-2 Deep Learning Frame 사용방법

2-3 참고 Document 사이트

2-4 OpenCV DNN을 이용하기 위한 Inference 수행 절차

2-5 코드 활용

1. RCNN

1-1 RCNN 특징 및 의의

1-2 Bounding Box Regression

2. SPP(Spatial Pyramid Pooling) Net

3. Fast RCNN

1. Object Detection을 위한 다양한 모듈

2. 사용하면 좋은 Keras와 Tensorflow 기반의 다양한 오픈소스 패키지들

3. GPU 활용

4. Object Detection 개요

Dataset에 대한 진지한 고찰

1. Pascal VOC 기본개념

2. PASCAL VOC 2012 데이터 탐색해보기

3. MS COCO

4. Google Open Image

1. Python 기반 주요 이미지 라이브러리

2. OpenCV의 이미지 처리

OpenCV 이미지 처리 이해 및 타 패키지 비교

PIL 패키지를 이용하여 이미지 로드하기

skimage(사이킷이미지)로 이미지 로드 하기

OpenCV, matplotlib으로 이미지 로드하기

OpenCV의 imread()로 반환된 BGR 이미지 넘파이 배열을 그대로 시각화 하기

3. OpenCV 영상처리

OpenCV 영상 처리 개요

0. Tensorflow inferece 과정

1. GPU 자원 주의사항

2. tensorflow로 Object Detection 수행하기

1. 단일 이미지 Object Detection

2. 위의 과정 함수로 def하고 활용해보기

3. 위에서 만든 함수로 Video Object Detection 수행

1. Selective Search 코드 실습

2. IOU 적용해보기

1. Object Detection 계보

2. Object Detection

3. Object Localization 개요

4. Object Detection 필수 구성 성분

5. 필수 성능지표 mAP (mean Average Precision)

문제점

구글링을 통한 해결 과정

결론

해결

추가 조언

Pagination

Templates (for web app):

Error