【Keras】Keras기반 Mask-RCNN - 오픈 소스 코드로 Inference

03 Sep 2020 in Pytorch / Docker / Git

Keras 기반 Mask-RCNN를 이용해 Inference 를 수행해보자

/DLCV/Segmentation/mask_rcnn/Matterport패키지를_이용한_Segmentation.ipynb 파일 참조
Keras 기반, Mask RNN위한 최고의 오픈소스 코드 사용
matterport는 3D비전 회사

1. matterport/Mask_RCNN의 특징

장점
- 최적화 된 Mask RCNN 알고리즘으로 보다 뛰어난 성능
- 전 세계에서 다양한 분야에서 사용 중
- 다양한 편의 기능을 제공 및 이해를 위한 여러 샘플 코드 및 Document 제공
- Mask Rcnn, Faster Rcnn을 이해하는데도 큰 도움을 주는 코드
- 쉬운 Config 설정
단점
- Coco Dataset 포맷을 기반의 Generator를 사용해서, 원하는 데이터를 COCO 형태로 변환 필요
- voc(xml)는 coco(json)로 바꾸면 된다.
- (너무 친절해서) 자체 시각화 기능이 Matplotlib으로 되어 있어서 영상 Segmentation시 costomization 필요
- COCO 데이터세트를 기반으로 하는 Evaluation을 한다. 따라서 자체적은 Evalution은 없고, COCO data의 PycocoTools Evaluation API을 사용한다.
여러가지 프로젝트에서 이 페키지를 사용한 다양한 응용 에플리케이션 등이 있다.
꾸준히 업그레이드가 되고 있다.
시간을 충분히 투자해서 공부해도 좋은 페키지이다.
제공되는 많은 sample 코드들을 공부하면 많은 도움이 된다.

2. matterport/Mask_RCNN으로 Inference 수행하기

Keras 2.2 version, Tensorflow 1.13 을 사용한다.
코랩 사용한다면, 코랩 전용 압축 파일 참고해서 matterport/Mask_RCNN패키지의 main.py파일 바꾸기

설치하기

  $ git clone https://github.com/matterport/Mask_RCNN.git
  $ cd Mask-RCNN
  $ pip install -r requirements.txt
  $ python setup.py install

Setup을 이렇게 하고 나면, ./mrcnn/model.py ./mrcnn/utils.py 를 import해서 사용할 수 있게 된다.

./mrcnn 내부의 파일 알아보기
- config.py : 환경설정의 Defalut 값을 가진다(모두 대문자)
- parallel_model.py : GPU 2개 이상 사용할 때
- model.py : 모델에 관련된 핵심적인 함수, 클래스들이 들어 있다.
- utils.py : coco dataset generater 를 사용한다.
- visualization.py : 시각화를 위한 편리한 기능.(너무 편리하게 해놔서 문제), matplotlib, skimage를 사용하기 때문에, 영상 처리 시각화를 위해서 customization이 필요하다.

3. sample을 이용해서 코드를 이해해보자.

/DLCV/Segmentation/mask_rcnn/Matterport패키지를_이용한_Segmentation.ipynb 파일 참조

1. Matterport 패키지 모듈 및 클래스 활용

pretrained coco 모델을 로딩 후 단일 이미지와 영상 Segmentation 수행.

import os
import sys
import random
import math
import numpy as np
import cv2

drawing

python setup.py install 함으로써 mrcnn이 우리 python의 모듈이 되었다. import numpy하는 것처럼, import mncnn을 해서 사용할 수 있게 되었다.
class.__dict__, object.__dict__, module.__dict__ 를 사용하면 내부의 맴버 함수나 클래스, 객체, 모듈에 대한 정보를 알 수 있다. 참고 stackoverflow

from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize

Matterport로 pretrained된 coco weight 모델을 다운로드함(최초시)
utils.download_trained_weights에 있음

# URL from which to download the latest COCO trained weights
COCO_MODEL_URL = "https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5"
with urllib.request.urlopen(COCO_MODEL_URL) as resp, open(coco_model_path, 'wb') as out:
        shutil.copyfileobj(resp, out)

from mrcnn import utils

ROOT_DIR = os.path.abspath('.')

# 최초에는 coco pretrained 모델을 다운로드함. 
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "./pretrained/mask_rcnn_coco.h5")

if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

!ls ~/DLCV/Segmentation/mask_rcnn/pretrained

mask_rcnn_coco.h5
mask_rcnn_inception_v2_coco_2018_01_28
mask_rcnn_inception_v2_coco_2018_01_28.tar.gz

MASK RCNN 모델을 위한 Config 클래스의 객체 생성 설정
mrcnn/utile/config.py
config 객체 생성 1 방법

from mrcnn.config import Config

infer_config = Config()
infer_config.BATCH_SIZE=1
infer_config.display()   # config 정보들이 쭉 print된다.

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 2
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                13
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           None
NUM_CLASSES                    1
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001

config 객체 생성 2 방법

# Config 클래스를 상속받아서 사용
from mrcnn.config import Config

#환경 변수는 모두 대문자 
class InferenceConfig(Config):
    # inference시에는 batch size를 1로 설정. 그리고 IMAGES_PER_GPU도 1로 설정. 
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1 # BATCH_SIZE=1 과 같은 효과
    # NAME은 반드시 주어야 한다. 
    NAME='coco_infer'  # NAME을 꼭 주어야 한다.
    NUM_CLASSES=81     # Background 0번 + coco 80개 
    

infer_config = InferenceConfig()
# infer_config.display()

COCO ID와 클래스명 매핑

# matterport는 0을 Background로, 1부터 80까지 coco dataset 클래스 id/클래스 명 매핑. 
labels_to_names = {0:'BG',1: 'person',2: 'bicycle',3: 'car',4: 'motorbike',5: 'aeroplane',6: 'bus',7: 'train',8: 'truck',9: 'boat',10: 'traffic light',
                   11: 'fire hydrant',12: 'stop sign',13: 'parking meter',14: 'bench',15: 'bird',16: 'cat',17: 'dog',18: 'horse',19: 'sheep',20: 'cow',
                   21: 'elephant',22: 'bear',23: 'zebra',24: 'giraffe',25: 'backpack',26: 'umbrella',27: 'handbag',28: 'tie',29: 'suitcase',30: 'frisbee',
                   31: 'skis',32: 'snowboard',33: 'sports ball',34: 'kite',35: 'baseball bat',36: 'baseball glove',37: 'skateboard',38: 'surfboard',39: 'tennis racket',40: 'bottle',
                   41: 'wine glass',42: 'cup',43: 'fork',44: 'knife',45: 'spoon',46: 'bowl',47: 'banana',48: 'apple',49: 'sandwich',50: 'orange',
                   51: 'broccoli',52: 'carrot',53: 'hot dog',54: 'pizza',55: 'donut',56: 'cake',57: 'chair',58: 'sofa',59: 'pottedplant',60: 'bed',
                   61: 'diningtable',62: 'toilet',63: 'tvmonitor',64: 'laptop',65: 'mouse',66: 'remote', 67: 'keyboard',68: 'cell phone',69: 'microwave',70: 'oven',
                   71: 'toaster',72: 'sink',73: 'refrigerator',74: 'book',75: 'clock',76: 'vase',77: 'scissors',78: 'teddy bear',79: 'hair drier',80: 'toothbrush' }

2. Inferecne 모델 구축 및 pretrained 사용

아래 과정 정리
1. mrcnn.model.MaskRCNN Class를 사용해서 객체를 생성한다.
2. 기본적인 신경망 모델 객체가 생성됐다. weigts import 안함
  (3. InferenceConfig의 Name 설정안하면 에러 발생.)
3. model object 생성!
4. 이제 이 model을 가지고
  model.detect([cv2 read img])하면 inference 수행 끝!

# MS-COCO 기반으로 Pretrained 된 모델을 로딩
# snapshots 폴더 만들어 놓아야 함.
# model_dir는 보통 epoch마다 파일 저장하는 폴더 인데... inference랑 상관없지만..
import mrcnn.model as modellib

MODEL_DIR = os.path.join(ROOT_DIR,'snapshots') 
print(MODEL_DIR)
model = modellib.MaskRCNN(mode="inference",  model_dir=MODEL_DIR, config=infer_config) # snapshots에는 아무 지솓 안하니 걱정 ㄴ

model.load_weights(COCO_MODEL_PATH, by_name=True)

/home/sb020518/DLCV/Segmentation/mask_rcnn/snapshots

3. Inference 수행

import matplotlib.pyplot as plt
%matplotlib inline 

beatles_img = cv2.imread('../../data/image/beatles01.jpg')
# matterport는 내부적으로 image처리를 위해 skimage를 이용하므로 BGR2RGB처리함. 
beatles_img_rgb = cv2.cvtColor(beatles_img, cv2.COLOR_BGR2RGB)
 
results = model.detect([beatles_img_rgb], verbose=1)
# verbose를 하면 정보를 출력해준다.

Processing 1 images
image                    shape: (633, 806, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32

print(type(results))
print(len(results))
print(type(results[0]))
print(results[0].keys())
print(type(results[0]['class_ids']))
print(results[0]['class_ids'].shape) # 18개의 객체 발견
# 각각 Region of interest, class ID, Confidence, Mask 정보

<class 'list'>
1
<class 'dict'>
dict_keys(['rois', 'class_ids', 'scores', 'masks'])
<class 'numpy.ndarray'>
(18,)

# results[0]['masks']는 object 별로 mask가 전체 이미지에 대해서 layered된 image 배열을 가지고 있음.  
results[0]['rois'].shape, results[0]['scores'].shape, results[0]['class_ids'].shape, results[0]['masks'].shape

((18, 4), (18,), (18,), (633, 806, 18))

result 정보 분석하기
- (18, 4) : 18개의 bounding box 좌상단 우하단
- (18,) : 18개의 클래스 ID
- (18,) : 18개의 confidence
- (633, 806, 18) : input 이미지가 (633, 806, 3)였다. (633, 806)에 대한 false true 정보가 담겨 있다
이 정보를 visualize 함수로 쉽게 시각화 할 수 있다.
- visualize.display_instances 코드 확인

지금까지의 모델 중 가장 안정성과 성능이 뛰어난 것을 확인할 수 있다. 아주 최적화가 잘 된 패키지이다. 미친 성능이다.

from mrcnn import visualize

r = results[0]
class_names = [value for value in labels_to_names.values()]
visualize.display_instances(beatles_img_rgb, r['rois'], r['masks'], r['class_ids'], 
                            class_names, r['scores'])

drawing

4. Inference 수행 작업만 함수화

import time

wick_img = cv2.imread('../../data/image/john_wick01.jpg')
wick_img_rgb = cv2.cvtColor(wick_img, cv2.COLOR_BGR2RGB)

def get_segment_result(img_array_list, verbose):
    
    start_time = time.time()
    results = model.detect(img_array_list, verbose=1)
    
    if verbose==1:
        print('## inference time:{0:}'.format(time.time()-start_time))
    
    return results

r = get_segment_result([wick_img_rgb], verbose=1)[0]
visualize.display_instances(wick_img_rgb, r['rois'], r['masks'], r['class_ids'], 
                            class_names, r['scores'])

cpu로 24초 소요. P100으로 0.1ch thdy

drawing

5. 위의 함수로 Video에 Segmentation 활용

MaskRCNN 패키지는 visualize.display_instances() 함수내부에서 matplotlib를 이용하여 자체 시각화를 수행.
따라서 우리는 이미지를 결과로 얻고 싶은데 matplotlib를 그냥 해버리니 우리가 직접 이미지 결과 추출 함수 구현
visualize.display_instances() 코드 보기 - 처음부터 _, ax = plt.subplots(1, figsize=figsize) 객체를 선언해서 쭈욱 그려나가는 모습을 볼 수 있다.
이 코드의 대부분을 차용해서 아래의 get_segmented_image 함수를 정의한다.
visualize.display_instances()에서는 matplot.show으로 함수를 마무리 하지만, get_segmented_image는 최종으로 그려진 이미지를 return한다.
visualize.display_instances()에서는 from matplotlib.patches import Polygon 으로 함수를 사용 하지만, get_segmented_image는 cv2.polylines를 사용한다.

from mrcnn.visualize import *
import cv2

def get_segmented_image(img_array, boxes, masks, class_ids, class_names,
                      scores=None, show_mask=True, show_bbox=True, colors=None, captions=None):
   
    # Number of instances
    N = boxes.shape[0]
    if not N:
        print("\n*** No instances to display *** \n")
    else:
        assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
    

    # Generate random colors
    colors = colors or random_colors(N)

    # Show area outside image boundaries.
    height, width = img_array.shape[:2]

    masked_image = img_array.astype(np.uint32).copy()

    for i in range(N):
        color = np.array(colors[i])*255
        color = color.tolist()

        # Bounding box 그리기
        if not np.any(boxes[i]):
            # Skip this instance. Has no bbox. Likely lost in image cropping.
            continue
        y1, x1, y2, x2 = boxes[i]
        
        if show_bbox:
            cv2.rectangle(img_array, (x1, y1), (x2, y2), color, thickness=1 )

        # Label 표시하기
        if not captions:
            class_id = class_ids[i]
            score = scores[i] if scores is not None else None
            label = class_names[class_id]
            caption = "{} {:.3f}".format(label, score) if score else label
        else:
            caption = captions[i]
            
        cv2.putText(img_array, caption, (x1, y1+8), cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 255), thickness=1)
        
        # Mask 정보 그려 넣기
        # 클래스별 mask 정보를 추출 
        mask = masks[:, :, i]
        if show_mask:
            # visualize 모듈의 apply_mask()를 적용하여 masking 수행. 
            # from mrcnn.visualize import * 모든 함수 import
            img_array = apply_mask(img_array, mask, color)
            
            # mask에 contour 적용. 
            padded_mask = np.zeros(
                            (mask.shape[0] + 2, mask.shape[1] + 2), dtype=np.uint8)
            padded_mask[1:-1, 1:-1] = mask
            contours = find_contours(padded_mask, 0.5)
            for verts in contours:
                # padding 제거. 아래에서 verts를 32bit integer로 변경해야 polylines()에서 오류 발생하지 않음. 
                verts = verts.astype(np.int32)
                #x, y 좌표 교체
                verts = np.fliplr(verts) - 1
                cv2.polylines(img_array, [verts], True, color, thickness=1)
    
    return img_array

단일 IMAGE에 적용

import time

wick_img = cv2.imread('../../data/image/john_wick01.jpg')
wick_img_rgb = cv2.cvtColor(wick_img, cv2.COLOR_BGR2RGB)

r = get_segment_result([wick_img_rgb], verbose=1)[0]
segmented_img = get_segmented_image(wick_img_rgb, r['rois'], r['masks'], r['class_ids'], 
                                    class_names, r['scores'])

plt.figure(figsize=(16, 16))
plt.imshow(segmented_img)

Video Segmentation 적용

import time

video_input_path = '../../data/video/John_Wick_small.mp4'
# video output 의 포맷은 avi 로 반드시 설정 필요. 
video_output_path = '../../data/output/John_Wick_small_matterport01.avi'

cap = cv2.VideoCapture(video_input_path)
codec = cv2.VideoWriter_fourcc(*'XVID')
fps = round(cap.get(cv2.CAP_PROP_FPS))

vid_writer = cv2.VideoWriter(video_output_path, codec, fps, (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))

total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print("총 Frame 개수: {0:}".format(total))

frame_index = 0
while True:
    
    hasFrame, image_frame = cap.read()
    if not hasFrame:
        print('End of frame')
        break
    
    frame_index += 1
    print("frame index:{0:}".format(frame_index), end=" ")
    r = get_segment_result([image_frame], verbose=1)[0]
    segmented_img = get_segmented_image(image_frame, r['rois'], r['masks'], r['class_ids'], 
                                    class_names, r['scores'])
    vid_writer.write(segmented_img)
    
vid_writer.release()
cap.release()       

!gsutil cp ../../data/output/John_Wick_small_matterport01.avi gs://my_bucket_dlcv/data/output/John_Wick_small_matterport01.avi

drawing

다른 동영상에 적용.

video_input_path = '../../data/video/London_Street.mp4'
# video output 의 포맷은 avi 로 반드시 설정 필요. 
video_output_path = '../../data/output/London_Street_matterport01.avi'

cap = cv2.VideoCapture(video_input_path)
codec = cv2.VideoWriter_fourcc(*'XVID')
fps = round(cap.get(cv2.CAP_PROP_FPS))
vid_writer = cv2.VideoWriter(video_output_path, codec, fps, (round(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))

total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print("총 Frame 개수: {0:}".format(total))

import time

frame_index = 0
while True:
    
    hasFrame, image_frame = cap.read()
    if not hasFrame:
        print('End of frame')
        break
    
    frame_index += 1
    print("frame index:{0:}".format(frame_index), end=" ")
    r = get_segment_result([image_frame], verbose=1)[0]
    segmented_img = get_segmented_image(image_frame, r['rois'], r['masks'], r['class_ids'], 
                                    class_names, r['scores'])
    vid_writer.write(segmented_img)
    
vid_writer.release()
cap.release()       

!gsutil cp ../../data/output/London_Street_matterport01.avi gs://my_bucket_dlcv/data/output/London_Street_matterport01.avi

drawing

【Keras】Keras기반 Mask-RCNN - 오픈 소스 코드로 Inference

1. matterport/Mask_RCNN의 특징

2. matterport/Mask_RCNN으로 Inference 수행하기

3. sample을 이용해서 코드를 이해해보자.

1. Matterport 패키지 모듈 및 클래스 활용

2. Inferecne 모델 구축 및 pretrained 사용

3. Inference 수행

지금까지의 모델 중 가장 안정성과 성능이 뛰어난 것을 확인할 수 있다. 아주 최적화가 잘 된 패키지이다. 미친 성능이다.

4. Inference 수행 작업만 함수화

5. 위의 함수로 Video에 Segmentation 활용

단일 IMAGE에 적용

Video Segmentation 적용

다른 동영상에 적용.

Just Do It And Then Some

Error

1. matterport/Mask_RCNN의 특징

2. matterport/Mask_RCNN으로 Inference 수행하기

3. sample을 이용해서 코드를 이해해보자.

1. Matterport 패키지 모듈 및 클래스 활용

2. Inferecne 모델 구축 및 pretrained 사용

3. Inference 수행

지금까지의 모델 중 가장 안정성과 성능이 뛰어난 것을 확인할 수 있다. 아주 최적화가 잘 된 패키지이다. 미친 성능이다.

4. Inference 수행 작업만 함수화

5. 위의 함수로 Video에 Segmentation 활용

단일 IMAGE에 적용

Video Segmentation 적용

다른 동영상에 적용.

Templates (for web app):

Error