COCO Dataset：介紹、下載、取得方式、標註資料格式(key points)

在訓練模型時，需要大量的訓練資料，這時就會需要許多公開的數據集，在動作偵測(pose estimation)方面，<strong>標注資料是人體關節</strong>(keypoint)的數據集很多，COCO 就是其中一個。

<br /><br />

而且在 github 上找 code 的時候，因為數據集龐大的關係，所以並不會把訓練資料一併上傳至 repository 裡，因此需要自行去找到數據集下載。

<br /><br />

廢話不多說，開始吧。

<br /><br /><br />

> coco 首頁

找到coco的網站，點開上排導覽列的Dataset，進入Download頁面，會有很多連結：

中間是圖片下載，都有標示年份、用途、資料集大小
右邊是標註的資料，這個是訓練的答案，答案有很多種類，keypoint跟segmentation都有，coco dataset存放是用json格式。
左邊是coco dataset自製for拿圖片對應的答案的python package(github)

> coco 資料集下載處

用facebook家的DensePose舉例，他自己有標註資料，但是用的圖片還是coco dataset的。

怎麼看 github 的 code 用的是哪個dataset的?

這個會需要你去翻一下reopsitory，以DensePose來說，像是INSTALL.md和get_DensePose_COCO.sh都可以知道訓練集是使用coco 2014的dataset。

如果是單純用2014年的coco dataset做訓練，要下載的有：

2014 Train images [83K/13GB]
2014 Val images [41K/6GB]
2014 Train/Val annotaions [241MB]

如果是要用dense pose的dataset，則只需下載圖片，annotation要另外下載：

2014 Train images [83K/13GB]
2014 Val images [41K/6GB]

標註格式

詳細的annotation格式可以在下載頁面連過去(data format)，我是寫這篇的時候才去找的，但我通常都沒啥耐心看(也造成我常常走彎路…)，我都直接看json檔裡面有什麼東西，然後不明白的再查。因為下載的json檔很大，直接打開會後悔，最好用特定方法開：

一些 json format online viewer
vs code editor 聽說可以用

最快的就是用online viewer。

> 部分json格式

上圖是coco dataset自己的標註資料格式，有少掉一些東西，但比較重要的就這些。要注意的是，annotation跟image的資訊是分開存放的，對應的鍵(key)是id 和image_id。

["annotations"]["image_id"] == ["images"]["id"]

以pose estimation舉例，訓練資料是人體關節keypoint，所以一定要取出的data

category_id = 1
keypoints

以下示範取得資料有：


```
["annotations"]: "image_id", "category_id", "bbox", "num_keypoints", "keypoints"
["images"]: "file_name", "width", "height", "id"
```





```python
import os

import json
import tensorflow as tf

annotations_file = 'person_keypoints_train2014.json' # 下載的 json 存放路徑，記得解壓縮
images_dir = 'train2014/' # 下載的對應圖片的資料夾路徑，記得解壓縮，斜線可加可不加

def print_human_info(image, annotation, images_dir):
    """
    Args:
      image: dict
      annotation: dict
      images_dir: string. directory containing the image files.
    """
    
    category_id = int(annotation['category_id'])
    if category_id != 1: # 如果類別不是人
        print('\tcategory of ', image['file_name'], "isn't human.")
        return
    if annotation['num_keypoints'] <= 0: # 如果可見keypoints少於指定數量
        print('\tnumber of keypoints of ', image['file_name'], 'is less than wanted.')
        return
    
    print('file information:')
    #print('\tfile name: ', image['file_name'])
    #print('\timage id: ', image['id'])
    #print('\timage height: ',image['height'])
    #print('\timage width: ',image['width'])
    #print('\timage path: ', os.path.join(images_dir, image['file_name']))
    (x, y, width, height) = tuple(annotation['bbox'])
    #print('\tbounding box width: ', annotation['bbox'][2])
    #print('\tbounding box height: ', annotation['bbox'][3])
    # ["keypoints"]存放是[ x,y,v, x,y,v, x,y,v, ... ]
    keypoints_x = annotation['keypoints'][0::3]
    keypoints_y = annotation['keypoints'][1::3]
    keypoints_v = annotation['keypoints'][2::3]
    #print('\tkeypoints:')
    #print('\t\t', keypoints_x)
    #print('\t\t', keypoints_y)
    #print('\t\t', keypoints_v)

with tf.gfile.GFile(annotations_file, 'r') as fid:
    groundtruth_data = json.load(fid)
    
    # 用["images"]["id"]去做index，取得["images"]的資訊
    image_info = {}
    for image in groundtruth_data['images']:
        image_info[image['id']] = {'id': image['id'],
                                'height': image['height'],
                                'width': image['width'],
                                'file_name': image['file_name']}
                      
    # 按照讀到的annotation順序，依序去讀取image資料
    for idx, annotation in enumerate(groundtruth_data['annotations']):
        image = image_info[annotation['image_id']]
        if idx % 10000 == 0: # 資料集很大，取幾個示意
            print(idx)
            print_human_info(image, annotation, images_dir)
```



### 參考
[tensorflow/models/research/object_detection/dataset_tools/create_coco_tf_record.py](https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_coco_tf_record.py), by pkulzc, from github

搜尋此網誌

Twig's Blog

精選文章

manjaro使用心得 / 中文(注音)輸入法調整