For the past few months the Level Up Development team has put in a large research effort into the computer vision space. Computer vision plays a critical role for organizations as they continue to to try understand their audiences better, automate more and see whether or not marketing campaigns are truly working.
3 Ways to Train a Machine to Detect Objects
Along the way we researched a few of the most successful training techniques and thought it’d be helpful to share our insights. Specifically, we thought a simple list of approaches to training object localization models matched up with their pros and cons could help you understand what technique may be best for your project.
1) Training model from scratch
Designing your own Convolutional Neural Network architecture, starting from scratch (weight matrix of all 1’s) train the model on manually created data sets
Pros
- More control over architecture
Cons
- Must design and test model architecture (big task, we have done this in the past)
- Must manually label training data
- Must train model from scratch… takes much much longer than transfer learning (also takes much much more training data)
2) Using pre-trained convolutional neural nets
Examples : faster_rcnn_nas, ssd_inception_v2_coco, ssdlite_mobilenet_v2_coco
Pros
- Ready to go, no training done by user. Weight matrix is frozen in a pre-trained state by team who built the model. Speed and accuracy of model is known and the appropriate model for the task can be selected.
- Parameters can be adjusted using config files
- Easy access to all the latest and greatest vision models (much much better results) https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
- Swapping between models for comparison is fast and easy
- Easily change between local and Cloud computing (TPU training available for some models)
Cons
- Only recognizes the classes its pre-trained on. Most common data set for pre-trained models is COCO (Common Objects in Context) http://cocodataset.org/#home. 90 classes of common objects
- Will not recognize new objects without transfer learning.
- May be non-optimal for some tasks that only care about few object classes, causes noise, and wastes time looking for unwanted objects
3) Transfer learning
Takes a pre-trained model (with its current weight matrix values) and then starts training from there. Depending on type of model, the label data will be more or less complicated to produce.
Pros
- All the benefits of pre-trained models, but you can re-train final layers of the model to only look for the objects you are interested in.
- Much less data/faster than training from scratch
Cons
- Must manually create training data. Depending on model to be used (localization, attributes, segmentation, etc.) this may be more complicated.