Add at new repo again

This commit is contained in:
2025-01-28 21:48:35 +00:00
commit 6e660ddb3c
564 changed files with 75575 additions and 0 deletions

View File

@@ -0,0 +1,4 @@
# Read the docs:
The latest documentation built from this directory is available at [detectron2.readthedocs.io](https://detectron2.readthedocs.io/).
Documents in this directory are not meant to be read on github.

View File

@@ -0,0 +1,99 @@
# Setup Builtin Datasets
Detectron2 has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
`DETECTRON2_DATASETS`.
Under this directory, detectron2 expects to find datasets in the structure described below.
You can set the location for builtin datasets by `export DETECTRON2_DATASETS=/path/to/datasets`.
If left unset, the default is `./datasets` relative to your current working directory.
The [model zoo](https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md)
contains configs and models that use these builtin datasets.
## Expected dataset structure for COCO instance/keypoint detection:
```
coco/
annotations/
instances_{train,val}2017.json
person_keypoints_{train,val}2017.json
{train,val}2017/
# image files that are mentioned in the corresponding json
```
You can use the 2014 version of the dataset as well.
Some of the builtin tests (`dev/run_*_tests.sh`) uses a tiny version of the COCO dataset,
which you can download with `./prepare_for_tests.sh`.
## Expected dataset structure for PanopticFPN:
```
coco/
annotations/
panoptic_{train,val}2017.json
panoptic_{train,val}2017/ # png annotations
panoptic_stuff_{train,val}2017/ # generated by the script mentioned below
```
Install panopticapi by:
```
pip install git+https://github.com/cocodataset/panopticapi.git
```
Then, run `python prepare_panoptic_fpn.py`, to extract semantic annotations from panoptic annotations.
## Expected dataset structure for LVIS instance segmentation:
```
coco/
{train,val,test}2017/
lvis/
lvis_v0.5_{train,val}.json
lvis_v0.5_image_info_test.json
```
Install lvis-api by:
```
pip install git+https://github.com/lvis-dataset/lvis-api.git
```
Run `python prepare_cocofied_lvis.py` to prepare "cocofied" LVIS annotations for evaluation of models trained on the COCO dataset.
## Expected dataset structure for cityscapes:
```
cityscapes/
gtFine/
train/
aachen/
color.png, instanceIds.png, labelIds.png, polygons.json,
labelTrainIds.png
...
val/
test/
leftImg8bit/
train/
val/
test/
```
Install cityscapes scripts by:
```
pip install git+https://github.com/mcordts/cityscapesScripts.git
```
Note: labelTrainIds.png are created using cityscapesescript with:
```
CITYSCAPES_DATASET=$DETECTRON2_DATASETS/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py
```
They are not needed for instance segmentation.
## Expected dataset structure for Pascal VOC:
```
VOC20{07,12}/
Annotations/
ImageSets/
Main/
trainval.txt
test.txt
# train.txt or val.txt, if you use these splits
JPEGImages/
```

View File

@@ -0,0 +1,58 @@
# Configs
Detectron2 provides a key-value based config system that can be
used to obtain standard, common behaviors.
Detectron2's config system uses YAML and [yacs](https://github.com/rbgirshick/yacs).
In addition to the [basic operations](../modules/config.html#detectron2.config.CfgNode)
that access and update a config, we provide the following extra functionalities:
1. The config can have `_BASE_: base.yaml` field, which will load a base config first.
Values in the base config will be overwritten in sub-configs, if there are any conflicts.
We provided several base configs for standard model architectures.
2. We provide config versioning, for backward compatibility.
If your config file is versioned with a config line like `VERSION: 2`,
detectron2 will still recognize it even if we change some keys in the future.
"Config" is a very limited abstraction.
We do not expect all features in detectron2 to be available through configs.
If you need something that's not available in the config space,
please write code using detectron2's API.
### Basic Usage
Some basic usage of the `CfgNode` object is shown here. See more in [documentation](../modules/config.html#detectron2.config.CfgNode).
```python
from detectron2.config import get_cfg
cfg = get_cfg() # obtain detectron2's default config
cfg.xxx = yyy # add new configs for your own custom components
cfg.merge_from_file("my_cfg.yaml") # load values from a file
cfg.merge_from_list(["MODEL.WEIGHTS", "weights.pth"]) # can also load values from a list of str
print(cfg.dump()) # print formatted configs
```
Many builtin tools in detectron2 accepts command line config overwrite:
Key-value pairs provided in the command line will overwrite the existing values in the config file.
For example, [demo.py](../../demo/demo.py) can be used with
```
./demo.py --config-file config.yaml [--other-options] \
--opts MODEL.WEIGHTS /path/to/weights INPUT.MIN_SIZE_TEST 1000
```
To see a list of available configs in detectron2 and what they mean,
check [Config References](../modules/config.html#config-references)
### Best Practice with Configs
1. Treat the configs you write as "code": avoid copying them or duplicating them; use `_BASE_`
to share common parts between configs.
2. Keep the configs you write simple: don't include keys that do not affect the experimental setting.
3. Keep a version number in your configs (or the base config), e.g., `VERSION: 2`,
for backward compatibility.
We print a warning when reading a config without version number.
The official configs do not include version number because they are meant to
be always up-to-date.

View File

@@ -0,0 +1,77 @@
# Use Custom Dataloaders
## How the Existing Dataloader Works
Detectron2 contains a builtin data loading pipeline.
It's good to understand how it works, in case you need to write a custom one.
Detectron2 provides two functions
[build_detection_{train,test}_loader](../modules/data.html#detectron2.data.build_detection_train_loader)
that create a default data loader from a given config.
Here is how `build_detection_{train,test}_loader` work:
1. It takes the name of a registered dataset (e.g., "coco_2017_train") and loads a `list[dict]` representing the dataset items
in a lightweight, canonical format. These dataset items are not yet ready to be used by the model (e.g., images are
not loaded into memory, random augmentations have not been applied, etc.).
Details about the dataset format and dataset registration can be found in
[datasets](./datasets.md).
2. Each dict in this list is mapped by a function ("mapper"):
* Users can customize this mapping function by specifying the "mapper" argument in
`build_detection_{train,test}_loader`. The default mapper is [DatasetMapper](../modules/data.html#detectron2.data.DatasetMapper).
* The output format of such function can be arbitrary, as long as it is accepted by the consumer of this data loader (usually the model).
The outputs of the default mapper, after batching, follow the default model input format documented in
[Use Models](./models.html#model-input-format).
* The role of the mapper is to transform the lightweight, canonical representation of a dataset item into a format
that is ready for the model to consume (including, e.g., read images, perform random data augmentation and convert to torch Tensors).
If you would like to perform custom transformations to data, you often want a custom mapper.
3. The outputs of the mapper are batched (simply into a list).
4. This batched data is the output of the data loader. Typically, it's also the input of
`model.forward()`.
## Write a Custom Dataloader
Using a different "mapper" with `build_detection_{train,test}_loader(mapper=)` works for most use cases
of custom data loading.
For example, if you want to resize all images to a fixed size for Mask R-CNN training, write this:
```python
from detectron2.data import build_detection_train_loader
from detectron2.data import transforms as T
from detectron2.data import detection_utils as utils
def mapper(dataset_dict):
# Implement a mapper, similar to the default DatasetMapper, but with your own customizations
dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below
image = utils.read_image(dataset_dict["file_name"], format="BGR")
image, transforms = T.apply_transform_gens([T.Resize((800, 800))], image)
dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))
annos = [
utils.transform_instance_annotations(obj, transforms, image.shape[:2])
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
instances = utils.annotations_to_instances(annos, image.shape[:2])
dataset_dict["instances"] = utils.filter_empty_instances(instances)
return dataset_dict
data_loader = build_detection_train_loader(cfg, mapper=mapper)
# use this dataloader instead of the default
```
Refer to [API documentation of detectron2.data](../modules/data) for details.
If you want to change not only the mapper (e.g., to write different sampling or batching logic),
you can write your own data loader. The data loader is simply a
python iterator that produces [the format](./models.md) your model accepts.
You can implement it using any tools you like.
## Use a Custom Dataloader
If you use [DefaultTrainer](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer),
you can overwrite its `build_{train,test}_loader` method to use your own dataloader.
See the [densepose dataloader](../../projects/DensePose/train_net.py)
for an example.
If you write your own training loop, you can plug in your data loader easily.

View File

@@ -0,0 +1,221 @@
# Use Custom Datasets
Datasets that have builtin support in detectron2 are listed in [datasets](../../datasets).
If you want to use a custom dataset while also reusing detectron2's data loaders,
you will need to
1. __Register__ your dataset (i.e., tell detectron2 how to obtain your dataset).
2. Optionally, __register metadata__ for your dataset.
Next, we explain the above two concepts in detail.
The [Colab tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)
has a live example of how to register and train on a dataset of custom formats.
### Register a Dataset
To let detectron2 know how to obtain a dataset named "my_dataset", you will implement
a function that returns the items in your dataset and then tell detectron2 about this
function:
```python
def my_dataset_function():
...
return list[dict] in the following format
from detectron2.data import DatasetCatalog
DatasetCatalog.register("my_dataset", my_dataset_function)
```
Here, the snippet associates a dataset "my_dataset" with a function that returns the data.
The registration stays effective until the process exists.
The function can processes data from its original format into either one of the following:
1. Detectron2's standard dataset dict, described below. This will work with many other builtin
features in detectron2, so it's recommended to use it when it's sufficient for your task.
2. Your custom dataset dict. You can also return arbitrary dicts in your own format,
such as adding extra keys for new tasks.
Then you will need to handle them properly downstream as well.
See below for more details.
#### Standard Dataset Dicts
For standard tasks
(instance detection, instance/semantic/panoptic segmentation, keypoint detection),
we load the original dataset into `list[dict]` with a specification similar to COCO's json annotations.
This is our standard representation for a dataset.
Each dict contains information about one image.
The dict may have the following fields,
and the required fields vary based on what the dataloader or the task needs (see more below).
+ `file_name`: the full path to the image file. Will apply rotation and flipping if the image has such exif information.
+ `height`, `width`: integer. The shape of image.
+ `image_id` (str or int): a unique id that identifies this image. Used
during evaluation to identify the images, but a dataset may use it for different purposes.
+ `annotations` (list[dict]): each dict corresponds to annotations of one instance
in this image. Required by instance detection/segmentation or keypoint detection tasks.
Images with empty `annotations` will by default be removed from training,
but can be included using `DATALOADER.FILTER_EMPTY_ANNOTATIONS`.
Each dict contains the following keys, of which `bbox`,`bbox_mode` and `category_id` are required:
+ `bbox` (list[float]): list of 4 numbers representing the bounding box of the instance.
+ `bbox_mode` (int): the format of bbox.
It must be a member of
[structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode).
Currently supports: `BoxMode.XYXY_ABS`, `BoxMode.XYWH_ABS`.
+ `category_id` (int): an integer in the range [0, num_categories) representing the category label.
The value num_categories is reserved to represent the "background" category, if applicable.
+ `segmentation` (list[list[float]] or dict): the segmentation mask of the instance.
+ If `list[list[float]]`, it represents a list of polygons, one for each connected component
of the object. Each `list[float]` is one simple polygon in the format of `[x1, y1, ..., xn, yn]`.
The Xs and Ys are either relative coordinates in [0, 1], or absolute coordinates,
depend on whether "bbox_mode" is relative.
+ If `dict`, it represents the per-pixel segmentation mask in COCO's RLE format. The dict should have
keys "size" and "counts". You can convert a uint8 segmentation mask of 0s and 1s into
RLE format by `pycocotools.mask.encode(np.asarray(mask, order="F"))`.
+ `keypoints` (list[float]): in the format of [x1, y1, v1,..., xn, yn, vn].
v[i] means the [visibility](http://cocodataset.org/#format-data) of this keypoint.
`n` must be equal to the number of keypoint categories.
The Xs and Ys are either relative coordinates in [0, 1], or absolute coordinates,
depend on whether "bbox_mode" is relative.
Note that the coordinate annotations in COCO format are integers in range [0, H-1 or W-1].
By default, detectron2 adds 0.5 to absolute keypoint coordinates to convert them from discrete
pixel indices to floating point coordinates.
+ `iscrowd`: 0 (default) or 1. Whether this instance is labeled as COCO's "crowd
region". Don't include this field if you don't know what it means.
+ `sem_seg_file_name`: the full path to the ground truth semantic segmentation file.
Required by semantic segmentation task.
It should be an image whose pixel values are integer labels.
Fast R-CNN (with precomputed proposals) is rarely used today.
To train a Fast R-CNN, the following extra keys are needed:
+ `proposal_boxes` (array): 2D numpy array with shape (K, 4) representing K precomputed proposal boxes for this image.
+ `proposal_objectness_logits` (array): numpy array with shape (K, ), which corresponds to the objectness
logits of proposals in 'proposal_boxes'.
+ `proposal_bbox_mode` (int): the format of the precomputed proposal bbox.
It must be a member of
[structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode).
Default is `BoxMode.XYXY_ABS`.
#### Custom Dataset Dicts for New Tasks
In the `list[dict]` that your dataset function returns, the dictionary can also have arbitrary custom data.
This will be useful for a new task that needs extra information not supported
by the standard dataset dicts. In this case, you need to make sure the downstream code can handle your data
correctly. Usually this requires writing a new `mapper` for the dataloader (see [Use Custom Dataloaders](./data_loading.md)).
When designing a custom format, note that all dicts are stored in memory
(sometimes serialized and with multiple copies).
To save memory, each dict is meant to contain small but sufficient information
about each sample, such as file names and annotations.
Loading full samples typically happens in the data loader.
For attributes shared among the entire dataset, use `Metadata` (see below).
To avoid extra memory, do not save such information repeatly for each sample.
### "Metadata" for Datasets
Each dataset is associated with some metadata, accessible through
`MetadataCatalog.get(dataset_name).some_metadata`.
Metadata is a key-value mapping that contains information that's shared among
the entire dataset, and usually is used to interpret what's in the dataset, e.g.,
names of classes, colors of classes, root of files, etc.
This information will be useful for augmentation, evaluation, visualization, logging, etc.
The structure of metadata depends on the what is needed from the corresponding downstream code.
If you register a new dataset through `DatasetCatalog.register`,
you may also want to add its corresponding metadata through
`MetadataCatalog.get(dataset_name).some_key = some_value`, to enable any features that need the metadata.
You can do it like this (using the metadata key "thing_classes" as an example):
```python
from detectron2.data import MetadataCatalog
MetadataCatalog.get("my_dataset").thing_classes = ["person", "dog"]
```
Here is a list of metadata keys that are used by builtin features in detectron2.
If you add your own dataset without these metadata, some features may be
unavailable to you:
* `thing_classes` (list[str]): Used by all instance detection/segmentation tasks.
A list of names for each instance/thing category.
If you load a COCO format dataset, it will be automatically set by the function `load_coco_json`.
* `thing_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each thing category.
Used for visualization. If not given, random colors are used.
* `stuff_classes` (list[str]): Used by semantic and panoptic segmentation tasks.
A list of names for each stuff category.
* `stuff_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each stuff category.
Used for visualization. If not given, random colors are used.
* `keypoint_names` (list[str]): Used by keypoint localization. A list of names for each keypoint.
* `keypoint_flip_map` (list[tuple[str]]): Used by the keypoint localization task. A list of pairs of names,
where each pair are the two keypoints that should be flipped if the image is
flipped horizontally during augmentation.
* `keypoint_connection_rules`: list[tuple(str, str, (r, g, b))]. Each tuple specifies a pair of keypoints
that are connected and the color to use for the line between them when visualized.
Some additional metadata that are specific to the evaluation of certain datasets (e.g. COCO):
* `thing_dataset_id_to_contiguous_id` (dict[int->int]): Used by all instance detection/segmentation tasks in the COCO format.
A mapping from instance class ids in the dataset to contiguous ids in range [0, #class).
Will be automatically set by the function `load_coco_json`.
* `stuff_dataset_id_to_contiguous_id` (dict[int->int]): Used when generating prediction json files for
semantic/panoptic segmentation.
A mapping from semantic segmentation class ids in the dataset
to contiguous ids in [0, num_categories). It is useful for evaluation only.
* `json_file`: The COCO annotation json file. Used by COCO evaluation for COCO-format datasets.
* `panoptic_root`, `panoptic_json`: Used by panoptic evaluation.
* `evaluator_type`: Used by the builtin main training script to select
evaluator. Don't use it in a new training script.
You can just provide the [DatasetEvaluator](../modules/evaluation.html#detectron2.evaluation.DatasetEvaluator)
for your dataset directly in your main script.
NOTE: For background on the concept of "thing" and "stuff", see
[On Seeing Stuff: The Perception of Materials by Humans and Machines](http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf).
In detectron2, the term "thing" is used for instance-level tasks,
and "stuff" is used for semantic segmentation tasks.
Both are used in panoptic segmentation.
### Register a COCO Format Dataset
If your dataset is already a json file in the COCO format,
the dataset and its associated metadata can be registered easily with:
```python
from detectron2.data.datasets import register_coco_instances
register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir")
```
If your dataset is in COCO format but with extra custom per-instance annotations,
the [load_coco_json](../modules/data.html#detectron2.data.datasets.load_coco_json)
function might be useful.
### Update the Config for New Datasets
Once you've registered the dataset, you can use the name of the dataset (e.g., "my_dataset" in
example above) in `cfg.DATASETS.{TRAIN,TEST}`.
There are other configs you might want to change to train or evaluate on new datasets:
* `MODEL.ROI_HEADS.NUM_CLASSES` and `MODEL.RETINANET.NUM_CLASSES` are the number of thing classes
for R-CNN and RetinaNet models, respectively.
* `MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS` sets the number of keypoints for Keypoint R-CNN.
You'll also need to set [Keypoint OKS](http://cocodataset.org/#keypoints-eval)
with `TEST.KEYPOINT_OKS_SIGMAS` for evaluation.
* `MODEL.SEM_SEG_HEAD.NUM_CLASSES` sets the number of stuff classes for Semantic FPN & Panoptic FPN.
* If you're training Fast R-CNN (with precomputed proposals), `DATASETS.PROPOSAL_FILES_{TRAIN,TEST}`
need to match the datasets. The format of proposal files are documented
[here](../modules/data.html#detectron2.data.load_proposals_into_dataset).
New models
(e.g. [TensorMask](../../projects/TensorMask),
[PointRend](../../projects/PointRend))
often have similar configs of their own that need to be changed as well.

View File

@@ -0,0 +1,92 @@
# Deployment
## Caffe2 Deployment
We currently support converting a detectron2 model to Caffe2 format through ONNX.
The converted Caffe2 model is able to run without detectron2 dependency in either Python or C++.
It has a runtime optimized for CPU & mobile inference, but not for GPU inference.
Caffe2 conversion requires PyTorch ≥ 1.4 and ONNX ≥ 1.6.
### Coverage
It supports 3 most common meta architectures: `GeneralizedRCNN`, `RetinaNet`, `PanopticFPN`,
and most official models under these 3 meta architectures.
Users' custom extensions under these architectures (added through registration) are supported
as long as they do not contain control flow or operators not available in Caffe2 (e.g. deformable convolution).
For example, custom backbones and heads are often supported out of the box.
### Usage
The conversion APIs are documented at [the API documentation](../modules/export).
We provide a tool, `caffe2_converter.py` as an example that uses
these APIs to convert a standard model.
To convert an official Mask R-CNN trained on COCO, first
[prepare the COCO dataset](../../datasets/), then pick the model from [Model Zoo](../../MODEL_ZOO.md), and run:
```
cd tools/deploy/ && ./caffe2_converter.py --config-file ../../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
--output ./caffe2_model --run-eval \
MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl \
MODEL.DEVICE cpu
```
Note that:
1. The conversion needs valid sample inputs & weights to trace the model. That's why the script requires the dataset.
You can modify the script to obtain sample inputs in other ways.
2. With the `--run-eval` flag, it will evaluate the converted models to verify its accuracy.
The accuracy is typically slightly different (within 0.1 AP) from PyTorch due to
numerical precisions between different implementations.
It's recommended to always verify the accuracy in case your custom model is not supported by the
conversion.
The converted model is available at the specified `caffe2_model/` directory. Two files `model.pb`
and `model_init.pb` that contain network structure and network parameters are necessary for deployment.
These files can then be loaded in C++ or Python using Caffe2's APIs.
The script generates `model.svg` file which contains a visualization of the network.
You can also load `model.pb` to tools such as [netron](https://github.com/lutzroeder/netron) to visualize it.
### Use the model in C++/Python
The model can be loaded in C++. An example [caffe2_mask_rcnn.cpp](../../tools/deploy/) is given,
which performs CPU/GPU inference using `COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x`.
The C++ example needs to be built with:
* PyTorch with caffe2 inside
* gflags, glog, opencv
* protobuf headers that match the version of your caffe2
* MKL headers if caffe2 is built with MKL
The following can compile the example inside [official detectron2 docker](../../docker/):
```
sudo apt update && sudo apt install libgflags-dev libgoogle-glog-dev libopencv-dev
pip install mkl-include
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protobuf-cpp-3.6.1.tar.gz
tar xf protobuf-cpp-3.6.1.tar.gz
export CPATH=$(readlink -f ./protobuf-3.6.1/src/):$HOME/.local/include
export CMAKE_PREFIX_PATH=$HOME/.local/lib/python3.6/site-packages/torch/
mkdir build && cd build
cmake -DTORCH_CUDA_ARCH_LIST=$TORCH_CUDA_ARCH_LIST .. && make
# To run:
./caffe2_mask_rcnn --predict_net=./model.pb --init_net=./model_init.pb --input=input.jpg
```
Note that:
* All converted models (the .pb files) take two input tensors:
"data" is an NCHW image, and "im_info" is an Nx3 tensor consisting of (height, width, 1.0) for
each image (the shape of "data" might be larger than that in "im_info" due to padding).
* The converted models do not contain post-processing operations that
transform raw layer outputs into formatted predictions.
The example only produces raw outputs (28x28 masks) from the final
layers that are not post-processed, because in actual deployment, an application often needs
its custom lightweight post-processing (e.g. full-image masks for every detected object is often not necessary).
We also provide a python wrapper around the converted model, in the
[Caffe2Model.\_\_call\_\_](../modules/export.html#detectron2.export.Caffe2Model.__call__) method.
This method has an interface that's identical to the [pytorch versions of models](./models.md),
and it internally applies pre/post-processing code to match the formats.
They can serve as a reference for pre/post-processing in actual deployment.

View File

@@ -0,0 +1,43 @@
# Evaluation
Evaluation is a process that takes a number of inputs/outputs pairs and aggregate them.
You can always [use the model](./models.md) directly and just parse its inputs/outputs manually to perform
evaluation.
Alternatively, evaluation is implemented in detectron2 using the [DatasetEvaluator](../modules/evaluation.html#detectron2.evaluation.DatasetEvaluator)
interface.
Detectron2 includes a few `DatasetEvaluator` that computes metrics using standard dataset-specific
APIs (e.g., COCO, LVIS).
You can also implement your own `DatasetEvaluator` that performs some other jobs
using the inputs/outputs pairs.
For example, to count how many instances are detected on the validation set:
```
class Counter(DatasetEvaluator):
def reset(self):
self.count = 0
def process(self, inputs, outputs):
for output in outputs:
self.count += len(output["instances"])
def evaluate(self):
# save self.count somewhere, or print it, or return it.
return {"count": self.count}
```
Once you have some `DatasetEvaluator`, you can run it with
[inference_on_dataset](../modules/evaluation.html#detectron2.evaluation.inference_on_dataset).
For example,
```python
val_results = inference_on_dataset(
model,
val_data_loader,
DatasetEvaluators([COCOEvaluator(...), Counter()]))
```
Compared to running the evaluation manually using the model, the benefit of this function is that
you can merge evaluators together using [DatasetEvaluators](../modules/evaluation.html#detectron2.evaluation.DatasetEvaluators).
In this way you can run all evaluations without having to go through the dataset multiple times.
The `inference_on_dataset` function also provides accurate speed benchmarks for the
given model and dataset.

View File

@@ -0,0 +1,53 @@
# Extend Detectron2's Defaults
__Research is about doing things in new ways__.
This brings a tension in how to create abstractions in code,
which is a challenge for any research engineering project of a significant size:
1. On one hand, it needs to have very thin abstractions to allow for the possibility of doing
everything in new ways. It should be reasonably easy to break existing
abstractions and replace them with new ones.
2. On the other hand, such a project also needs reasonably high-level
abstractions, so that users can easily do things in standard ways,
without worrying too much about the details that only certain researchers care about.
In detectron2, there are two types of interfaces that address this tension together:
1. Functions and classes that take a config (`cfg`) argument
(sometimes with only a few extra arguments).
Such functions and classes implement
the "standard default" behavior: it will read what it needs from the
config and do the "standard" thing.
Users only need to load a given config and pass it around, without having to worry about
which arguments are used and what they all mean.
2. Functions and classes that have well-defined explicit arguments.
Each of these is a small building block of the entire system.
They require users' expertise to understand what each argument should be,
and require more effort to stitch together to a larger system.
But they can be stitched together in more flexible ways.
When you need to implement something not supported by the "standard defaults"
included in detectron2, these well-defined components can be reused.
3. (experimental) A few classes are implemented with the
[@configurable](../../modules/config.html#detectron2.config.configurable)
decorator - they can be called with either a config, or with explicit arguments.
Their explicit argument interfaces are currently __experimental__ and subject to change.
If you only need the standard behavior, the [Beginner's Tutorial](./getting_started.md)
should suffice. If you need to extend detectron2 to your own needs,
see the following tutorials for more details:
* Detectron2 includes a few standard datasets. To use custom ones, see
[Use Custom Datasets](./datasets.md).
* Detectron2 contains the standard logic that creates a data loader for training/testing from a
dataset, but you can write your own as well. See [Use Custom Data Loaders](./data_loading.md).
* Detectron2 implements many standard detection models, and provide ways for you
to overwrite their behaviors. See [Use Models](./models.md) and [Write Models](./write-models.md).
* Detectron2 provides a default training loop that is good for common training tasks.
You can customize it with hooks, or write your own loop instead. See [training](./training.md).

View File

@@ -0,0 +1,79 @@
## Getting Started with Detectron2
This document provides a brief intro of the usage of builtin command-line tools in detectron2.
For a tutorial that involves actual coding with the API,
see our [Colab Notebook](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)
which covers how to run inference with an
existing model, and how to train a builtin model on a custom dataset.
For more advanced tutorials, refer to our [documentation](https://detectron2.readthedocs.io/tutorials/extend.html).
### Inference Demo with Pre-trained Models
1. Pick a model and its config file from
[model zoo](MODEL_ZOO.md),
for example, `mask_rcnn_R_50_FPN_3x.yaml`.
2. We provide `demo.py` that is able to run builtin standard models. Run it with:
```
cd demo/
python demo.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
--input input1.jpg input2.jpg \
[--other-options]
--opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
```
The configs are made for training, therefore we need to specify `MODEL.WEIGHTS` to a model from model zoo for evaluation.
This command will run the inference and show visualizations in an OpenCV window.
For details of the command line arguments, see `demo.py -h` or look at its source code
to understand its behavior. Some common arguments are:
* To run __on your webcam__, replace `--input files` with `--webcam`.
* To run __on a video__, replace `--input files` with `--video-input video.mp4`.
* To run __on cpu__, add `MODEL.DEVICE cpu` after `--opts`.
* To save outputs to a directory (for images) or a file (for webcam or video), use `--output`.
### Training & Evaluation in Command Line
We provide a script in "tools/{,plain_}train_net.py", that is made to train
all the configs provided in detectron2.
You may want to use it as a reference to write your own training script.
To train a model with "train_net.py", first
setup the corresponding datasets following
[datasets/README.md](./datasets/README.md),
then run:
```
cd tools/
./train_net.py --num-gpus 8 \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
```
The configs are made for 8-GPU training.
To train on 1 GPU, you may need to [change some parameters](https://arxiv.org/abs/1706.02677), e.g.:
```
./train_net.py \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025
```
For most models, CPU training is not supported.
To evaluate a model's performance, use
```
./train_net.py \
--config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
--eval-only MODEL.WEIGHTS /path/to/checkpoint_file
```
For more options, see `./train_net.py -h`.
### Use Detectron2 APIs in Your Code
See our [Colab Notebook](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)
to learn how to use detectron2 APIs to:
1. run inference with an existing model
2. train a builtin model on a custom dataset
See [detectron2/projects](https://github.com/facebookresearch/detectron2/tree/master/projects)
for more ways to build your project on detectron2.

View File

@@ -0,0 +1,18 @@
Tutorials
======================================
.. toctree::
:maxdepth: 2
install
getting_started
builtin_datasets
extend
datasets
data_loading
models
write-models
training
evaluation
configs
deployment

View File

@@ -0,0 +1,184 @@
## Installation
Our [Colab Notebook](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)
has step-by-step instructions that install detectron2.
The [Dockerfile](docker)
also installs detectron2 with a few simple commands.
### Requirements
- Linux or macOS with Python ≥ 3.6
- PyTorch ≥ 1.4
- [torchvision](https://github.com/pytorch/vision/) that matches the PyTorch installation.
You can install them together at [pytorch.org](https://pytorch.org) to make sure of this.
- OpenCV, optional, needed by demo and visualization
- pycocotools: `pip install cython; pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'`
### Build Detectron2 from Source
gcc & g++ ≥ 5 are required. [ninja](https://ninja-build.org/) is recommended for faster build.
After having them, run:
```
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
# (add --user if you don't have permission)
# Or, to install it from a local clone:
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
# Or if you are on macOS
# CC=clang CXX=clang++ python -m pip install -e .
```
To __rebuild__ detectron2 that's built from a local clone, use `rm -rf build/ **/*.so` to clean the
old build first. You often need to rebuild detectron2 after reinstalling PyTorch.
### Install Pre-Built Detectron2 (Linux only)
```
# for CUDA 10.1:
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/index.html
```
You can replace cu101 with "cu{100,92}" or "cpu".
Note that:
1. Such installation has to be used with certain version of official PyTorch release.
See [releases](https://github.com/facebookresearch/detectron2/releases) for requirements.
It will not work with a different version of PyTorch or a non-official build of PyTorch.
2. Such installation is out-of-date w.r.t. master branch of detectron2. It may not be
compatible with the master branch of a research project that uses detectron2 (e.g. those in
[projects](projects) or [meshrcnn](https://github.com/facebookresearch/meshrcnn/)).
### Common Installation Issues
If you met issues using the pre-built detectron2, please uninstall it and try building it from source.
Click each issue for its solutions:
<details>
<summary>
Undefined torch/aten/caffe2 symbols, or segmentation fault immediately when running the library.
</summary>
<br/>
This usually happens when detectron2 or torchvision is not
compiled with the version of PyTorch you're running.
Pre-built torchvision or detectron2 has to work with the corresponding official release of pytorch.
If the error comes from a pre-built torchvision, uninstall torchvision and pytorch and reinstall them
following [pytorch.org](http://pytorch.org). So the versions will match.
If the error comes from a pre-built detectron2, check [release notes](https://github.com/facebookresearch/detectron2/releases)
to see the corresponding pytorch version required for each pre-built detectron2.
If the error comes from detectron2 or torchvision that you built manually from source,
remove files you built (`build/`, `**/*.so`) and rebuild it so it can pick up the version of pytorch currently in your environment.
If you cannot resolve this problem, please include the output of `gdb -ex "r" -ex "bt" -ex "quit" --args python -m detectron2.utils.collect_env`
in your issue.
</details>
<details>
<summary>
Undefined C++ symbols (e.g. `GLIBCXX`) or C++ symbols not found.
</summary>
<br/>
Usually it's because the library is compiled with a newer C++ compiler but run with an old C++ runtime.
This often happens with old anaconda.
Try `conda update libgcc`. Then rebuild detectron2.
The fundamental solution is to run the code with proper C++ runtime.
One way is to use `LD_PRELOAD=/path/to/libstdc++.so`.
</details>
<details>
<summary>
"Not compiled with GPU support" or "Detectron2 CUDA Compiler: not available".
</summary>
<br/>
CUDA is not found when building detectron2.
You should make sure
```
python -c 'import torch; from torch.utils.cpp_extension import CUDA_HOME; print(torch.cuda.is_available(), CUDA_HOME)'
```
print valid outputs at the time you build detectron2.
Most models can run inference (but not training) without GPU support. To use CPUs, set `MODEL.DEVICE='cpu'` in the config.
</details>
<details>
<summary>
"invalid device function" or "no kernel image is available for execution".
</summary>
<br/>
Two possibilities:
* You build detectron2 with one version of CUDA but run it with a different version.
To check whether it is the case,
use `python -m detectron2.utils.collect_env` to find out inconsistent CUDA versions.
In the output of this command, you should expect "Detectron2 CUDA Compiler", "CUDA_HOME", "PyTorch built with - CUDA"
to contain cuda libraries of the same version.
When they are inconsistent,
you need to either install a different build of PyTorch (or build by yourself)
to match your local CUDA installation, or install a different version of CUDA to match PyTorch.
* Detectron2 or PyTorch/torchvision is not built for the correct GPU architecture (compute compatibility).
The GPU architecture for PyTorch/detectron2/torchvision is available in the "architecture flags" in
`python -m detectron2.utils.collect_env`.
The GPU architecture flags of detectron2/torchvision by default matches the GPU model detected
during compilation. This means the compiled code may not work on a different GPU model.
To overwrite the GPU architecture for detectron2/torchvision, use `TORCH_CUDA_ARCH_LIST` environment variable during compilation.
For example, `export TORCH_CUDA_ARCH_LIST=6.0,7.0` makes it compile for both P100s and V100s.
Visit [developer.nvidia.com/cuda-gpus](https://developer.nvidia.com/cuda-gpus) to find out
the correct compute compatibility number for your device.
</details>
<details>
<summary>
Undefined CUDA symbols; cannot open libcudart.so; other nvcc failures.
</summary>
<br/>
The version of NVCC you use to build detectron2 or torchvision does
not match the version of CUDA you are running with.
This often happens when using anaconda's CUDA runtime.
Use `python -m detectron2.utils.collect_env` to find out inconsistent CUDA versions.
In the output of this command, you should expect "Detectron2 CUDA Compiler", "CUDA_HOME", "PyTorch built with - CUDA"
to contain cuda libraries of the same version.
When they are inconsistent,
you need to either install a different build of PyTorch (or build by yourself)
to match your local CUDA installation, or install a different version of CUDA to match PyTorch.
</details>
<details>
<summary>
"ImportError: cannot import name '_C'".
</summary>
<br/>
Please build and install detectron2 following the instructions above.
If you are running code from detectron2's root directory, `cd` to a different one.
Otherwise you may not import the code that you installed.
</details>
<details>
<summary>
ONNX conversion segfault after some "TraceWarning".
</summary>
<br/>
The ONNX package is compiled with too old compiler.
Please build and install ONNX from its source code using a compiler
whose version is closer to what's used by PyTorch (available in `torch.__config__.show()`).
</details>

View File

@@ -0,0 +1,151 @@
# Use Models
Models (and their sub-models) in detectron2 are built by
functions such as `build_model`, `build_backbone`, `build_roi_heads`:
```python
from detectron2.modeling import build_model
model = build_model(cfg) # returns a torch.nn.Module
```
`build_model` only builds the model structure, and fill it with random parameters.
See below for how to load an existing checkpoint to the model,
and how to use the `model` object.
### Load/Save a Checkpoint
```python
from detectron2.checkpoint import DetectionCheckpointer
DetectionCheckpointer(model).load(file_path) # load a file to model
checkpointer = DetectionCheckpointer(model, save_dir="output")
checkpointer.save("model_999") # save to output/model_999.pth
```
Detectron2's checkpointer recognizes models in pytorch's `.pth` format, as well as the `.pkl` files
in our model zoo.
See [API doc](../modules/checkpoint.html#detectron2.checkpoint.DetectionCheckpointer)
for more details about its usage.
The model files can be arbitrarily manipulated using `torch.{load,save}` for `.pth` files or
`pickle.{dump,load}` for `.pkl` files.
### Use a Model
A model can be called by `outputs = model(inputs)`, where `inputs` is a `list[dict]`.
Each dict corresponds to one image and the required keys
depend on the type of model, and whether the model is in training or evaluation mode.
For example, in order to do inference,
all existing models expect the "image" key, and optionally "height" and "width".
The detailed format of inputs and outputs of existing models are explained below.
When in training mode, all models are required to be used under an `EventStorage`.
The training statistics will be put into the storage:
```python
from detectron2.utils.events import EventStorage
with EventStorage() as storage:
losses = model(inputs)
```
If you only want to do simple inference using an existing model,
[DefaultPredictor](../modules/engine.html#detectron2.engine.defaults.DefaultPredictor)
is a wrapper around model that provides such basic functionality.
It includes default behavior including model loading, preprocessing,
and operates on single image rather than batches.
### Model Input Format
Users can implement custom models that support any arbitrary input format.
Here we describe the standard input format that all builtin models support in detectron2.
They all take a `list[dict]` as the inputs. Each dict
corresponds to information about one image.
The dict may contain the following keys:
* "image": `Tensor` in (C, H, W) format. The meaning of channels are defined by `cfg.INPUT.FORMAT`.
Image normalization, if any, will be performed inside the model using
`cfg.MODEL.PIXEL_{MEAN,STD}`.
* "instances": an [Instances](../modules/structures.html#detectron2.structures.Instances)
object, with the following fields:
+ "gt_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each instance.
+ "gt_classes": `Tensor` of long type, a vector of N labels, in range [0, num_categories).
+ "gt_masks": a [PolygonMasks](../modules/structures.html#detectron2.structures.PolygonMasks)
or [BitMasks](../modules/structures.html#detectron2.structures.BitMasks) object storing N masks, one for each instance.
+ "gt_keypoints": a [Keypoints](../modules/structures.html#detectron2.structures.Keypoints)
object storing N keypoint sets, one for each instance.
* "proposals": an [Instances](../modules/structures.html#detectron2.structures.Instances)
object used only in Fast R-CNN style models, with the following fields:
+ "proposal_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing P proposal boxes.
+ "objectness_logits": `Tensor`, a vector of P scores, one for each proposal.
* "height", "width": the **desired** output height and width, which is not necessarily the same
as the height or width of the `image` input field.
For example, the `image` input field might be a resized image,
but you may want the outputs to be in **original** resolution.
If provided, the model will produce output in this resolution,
rather than in the resolution of the `image` as input into the model. This is more efficient and accurate.
* "sem_seg": `Tensor[int]` in (H, W) format. The semantic segmentation ground truth.
Values represent category labels starting from 0.
#### How it connects to data loader:
The output of the default [DatasetMapper]( ../modules/data.html#detectron2.data.DatasetMapper) is a dict
that follows the above format.
After the data loader performs batching, it becomes `list[dict]` which the builtin models support.
### Model Output Format
When in training mode, the builtin models output a `dict[str->ScalarTensor]` with all the losses.
When in inference mode, the builtin models output a `list[dict]`, one dict for each image.
Based on the tasks the model is doing, each dict may contain the following fields:
* "instances": [Instances](../modules/structures.html#detectron2.structures.Instances)
object with the following fields:
* "pred_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each detected instance.
* "scores": `Tensor`, a vector of N scores.
* "pred_classes": `Tensor`, a vector of N labels in range [0, num_categories).
+ "pred_masks": a `Tensor` of shape (N, H, W), masks for each detected instance.
+ "pred_keypoints": a `Tensor` of shape (N, num_keypoint, 3).
Each row in the last dimension is (x, y, score). Scores are larger than 0.
* "sem_seg": `Tensor` of (num_categories, H, W), the semantic segmentation prediction.
* "proposals": [Instances](../modules/structures.html#detectron2.structures.Instances)
object with the following fields:
* "proposal_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes)
object storing N boxes.
* "objectness_logits": a torch vector of N scores.
* "panoptic_seg": A tuple of `(Tensor, list[dict])`. The tensor has shape (H, W), where each element
represent the segment id of the pixel. Each dict describes one segment id and has the following fields:
* "id": the segment id
* "isthing": whether the segment is a thing or stuff
* "category_id": the category id of this segment. It represents the thing
class id when `isthing==True`, and the stuff class id otherwise.
### Partially execute a model:
Sometimes you may want to obtain an intermediate tensor inside a model.
Since there are typically hundreds of intermediate tensors, there isn't an API that provides you
the intermediate result you need.
You have the following options:
1. Write a (sub)model. Following the [tutorial](./write-models.md), you can
rewrite a model component (e.g. a head of a model), such that it
does the same thing as the existing component, but returns the output
you need.
2. Partially execute a model. You can create the model as usual,
but use custom code to execute it instead of its `forward()`. For example,
the following code obtains mask features before mask head.
```python
images = ImageList.from_tensors(...) # preprocessed input tensor
model = build_model(cfg)
features = model.backbone(images.tensor)
proposals, _ = model.proposal_generator(images, features)
instances = model.roi_heads._forward_box(features, proposals)
mask_features = [features[f] for f in model.roi_heads.in_features]
mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
```
Note that both options require you to read the existing forward code to understand
how to write code to obtain the outputs you need.

View File

@@ -0,0 +1,50 @@
# Training
From the previous tutorials, you may now have a custom model and data loader.
You are free to create your own optimizer, and write the training logic: it's
usually easy with PyTorch, and allow researchers to see the entire training
logic more clearly and have full control.
One such example is provided in [tools/plain_train_net.py](../../tools/plain_train_net.py).
We also provide a standarized "trainer" abstraction with a
[minimal hook system](../modules/engine.html#detectron2.engine.HookBase)
that helps simplify the standard types of training.
You can use
[SimpleTrainer().train()](../modules/engine.html#detectron2.engine.SimpleTrainer)
which provides minimal abstraction for single-cost single-optimizer single-data-source training.
The builtin `train_net.py` script uses
[DefaultTrainer().train()](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer),
which includes more standard default behavior that one might want to opt in,
including default configurations for learning rate schedule,
logging, evaluation, checkpointing etc.
This also means that it's less likely to support some non-standard behavior
you might want during research.
To customize the training loops, you can:
1. If your customization is similar to what `DefaultTrainer` is already doing,
you can change behavior of `DefaultTrainer` by overwriting [its methods](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer)
in a subclass, like what [tools/train_net.py](../../tools/train_net.py) does.
2. If you need something very novel, you can start from [tools/plain_train_net.py](../../tools/plain_train_net.py) to implement them yourself.
### Logging of Metrics
During training, metrics are saved to a centralized [EventStorage](../modules/utils.html#detectron2.utils.events.EventStorage).
You can use the following code to access it and log metrics to it:
```
from detectron2.utils.events import get_event_storage
# inside the model:
if self.training:
value = # compute the value from inputs
storage = get_event_storage()
storage.put_scalar("some_accuracy", value)
```
Refer to its documentation for more details.
Metrics are then saved to various destinations with [EventWriter](../modules/utils.html#module-detectron2.utils.events).
DefaultTrainer enables a few `EventWriter` with default configurations.
See above for how to customize them.

View File

@@ -0,0 +1,39 @@
# Write Models
If you are trying to do something completely new, you may wish to implement
a model entirely from scratch within detectron2. However, in many situations you may
be interested in modifying or extending some components of an existing model.
Therefore, we also provide a registration mechanism that lets you override the
behavior of certain internal components of standard models.
For example, to add a new backbone, import this code in your code:
```python
from detectron2.modeling import BACKBONE_REGISTRY, Backbone, ShapeSpec
@BACKBONE_REGISTRY.register()
class ToyBackBone(Backbone):
def __init__(self, cfg, input_shape):
# create your own backbone
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=16, padding=3)
def forward(self, image):
return {"conv1": self.conv1(image)}
def output_shape(self):
return {"conv1": ShapeSpec(channels=64, stride=16)}
```
Then, you can use `cfg.MODEL.BACKBONE.NAME = 'ToyBackBone'` in your config object.
`build_model(cfg)` will then call your `ToyBackBone` instead.
As another example, to add new abilities to the ROI heads in the Generalized R-CNN meta-architecture,
you can implement a new
[ROIHeads](../modules/modeling.html#detectron2.modeling.ROIHeads) subclass and put it in the `ROI_HEADS_REGISTRY`.
See [densepose in detectron2](../../projects/DensePose)
and [meshrcnn](https://github.com/facebookresearch/meshrcnn)
for examples that implement new ROIHeads to perform new tasks.
And [projects/](../../projects/)
contains more examples that implement different architectures.
A complete list of registries can be found in [API documentation](../modules/modeling.html#model-registries).
You can register components in these registries to customize different parts of a model, or the
entire model.