5 Hands-On Lab: Data Annotation

Goal

This hands-on lab session is designed to give participants practical experience in data annotation for deep learning. Participants will apply the methods, tools, and best practices discussed in the previous session, working directly with datasets to annotate data effectively.

Key Elements

Use of annotation methods and tools, direct dataset interaction

Choose your own adventure(s)

In this section, we’ll provide some links, basic information, and suggested starter activities for a variety of annotation tools available today. Have a look and get your hands dirty!

Note: You’ll need some images to annotate in each case. Feel free to use any relevant images you might already have, or just do a web search and find something interesting. Of course, when experimenting with the web-based annotation platforms, be sure not to upload anything personal, private, or otherwise sensitive.

Ideally, you’ll cover:

Simple bounding box annotation
Polygon, line, and point annotation
Interactive model-assisted segmentation
Inspecting annotation output files in various formats, including COCO JSON
One or more cloud (web-based) tools
(For the even more adventurous) One or more locally installed tools

5.1 Adventure: Make Sense

Web-based app, no setup or account required

MakeSense.ai is a simple, single-user, browser-based image annotation app
Supports annotation via bounding boxes, polygons, points, and lines
Upload one or more images, apply/edit annotations, then export annotations
Offers model-based semi-automated annotation with an accept/reject interface
If you prefer, you can also grab the source code and run it locally using npm or Docker

Things to try

Upload one or more images
Play around with manually creating various annotations of various classes. What is the experience?
Use Actions to edit label names, colors, etc
Use Actions to export annotations. What formats are offered?
Try exporting polygon annotations in both VGG and COCO formats. How do they compare?
Use Actions to run the COCO SSD model locally to suggest boxes. How well does it work?
When you’re done: Evaluate this tool with respect to the software considerations in Section 4.3.1

5.2 Adventure: Roboflow

Web-based app, requires (free) account signup

Roboflow offers a cloud-hosted, web-based platform for computer vision, including tooling for data annotation along with model training and deployment
They offer a limited free tier, which does not offer any privacy (project and images are automatically public)
Nice interface for doing annotations, managing artifacts, and managing the team

Things to try

Create an account and test project
Upload one or more images
Go to the Annotate interface and experiment with different annotation types. How easy is it to create, edit, and delete annotations?
Use the Smart Polygon tool to create polygons by clicking on an object, then refining by adding more clicks inside and outside the object. What is the experience like? Does this speed up your annotations?
Go back to the main Annotate menu and note how it is organized to support a coherent, team-based annotation workflow. Check out their collaboration documentation. Imagine how you might use this for a multi-person project.

5.3 Adventure: CVAT

Web-based app, requires (free) account signup

CVAT can be used as a desktop application that you install & run on your own local computer or server.
However, for today, consider creating your own (free) account for annotating using their hosted platform
The V7 CVAT guide might be helpful.

Things to try

Create a free account
Log in and create a test Project. At this stage, you’ll need to define at least one relevant label under the Constructor tab (you can edit these later)
Create a Task (i.e., a collection of images to annotate) under your Project, and upload one or more images.
Start an annotation Job within the task. What do you think of the interface? Is the documentation helpful?
Using the menu bar on the left, try creating box, polygon, line, and point annotations. Note: Click the Shape button to start each annotation. How is the experience?
Also, try creating a 3D cuboid annotation. Figure out how to resize and orient the cube. What do you think?
Lastly, try doing brush-based segmentations.
After doing some annotations, go to Jobs, use the 3-dots selector on your job to open the action menu, and export annotations in a couple of different formats. How do they compare?
As another Jobs action, you can click on View analytics and run a performance report.

5.4 Adventure: Zooniverse

Web-based app, no setup or account required

Zooniverse is a web-based community crowdsourcing platform for data annotation and digitization.

Things to try

Check out the Penguin Watch project.
- Visit the About, Talk, and Collect pages. Imagine how you might set up your own project to encourage and support a crowdsourced annotation community
- Visit the Classify page, go through the Tutorial, and then see how the Task works.
Check out the Arctic Bears image classification and interpretation project
Feel free to search the site for other projects

5.5 Adventure: IRIS (Intelligently Reinforced Image Segmentation)

Requires local installation (Python + JavaScript application).

See Installation instructions for setting up IRIS locally with Python/pip.

IRIS is a tool for doing semi-automated image segmentation of satellite imagery (or images in general), with a goal of accelerating the creation of ML training datasets for Earth Observation. The user interface provides configurable simultaneous views of the same image for multispectral imagery, along with interactive AI-assisted segmentation.

Unlike much of the ML we’ll encounter this week, the backend model in this case is a gradient boosted decision tree. The reason this works sufficiently well is that IRIS is geared toward segmenting multispectral imagery into a small number of classes, training from scratch on each image; the model is able to learn the correlation structure between features and labels by leveraging multiple features per pixel after the human-in-the-loop manually segments and labels pixels.

For more information, check out the YouTube video with the main creator, Alistair Francis.

Things to try

Run the IRIS demo that comes with the code
Using the onboard help widgets, figure out how to navigate the interface
Try iteratively labeling some pixels and running the AI. How does it do?
Experiment with changing the views. How does that improve your ability to manually distinguish and label features like clouds?

5.6 Adventure: Segment-Geospatial (samgeo)

Requires local installation (Python library), or can be run in a hosted notebook environment (JupyterLab, Google Collab, etc).

See Installation notes.

This is an open-source tool that you can either install locally or run in JupyterLab (or Google Colab).

First, check out the online Segment Anything Model (SAM) demo. SAM was developed by Meta AI. It is trained as a generalized segmentation model that is able to segment (but not label) arbitrary objects in an image. It is designed as a promptable tool, which means a user can provide initial point(s) or box(es) that roughly localize an object within an image, and SAM will try to fully segment that object. Alternatively, it can automatically segment an entire image, effectively by self-promtping with a complete grid of points, and then intelligently merging the corresponding segments.

Today, SAM is used by numerous image annotation tools to provide interactive, AI-assisted segmentation capabilities.

One such tool is the segment-geospatial Python package, which provides some base functionality for applying SAM to geospatial data, either programmatically or interactively.

Note that in addition to using segment-geospatial directly using Python in a notebook or other environment, you can also play with SAM-assisted segmentation in QGIS and ArcGIS.

Things to try

Run one or more examples
After running automated segmentation, what do you think about the results?
When doing interactive segmentation, how does it do, and what do you think about the results?

5.7 Adventure: Label Studio

Label Studio offers a Python application that you can install and run locally, or you can pay to use their cloud-based Enterprise offering.

See the Quick Start document with instructions for installing with pip and running locally in a web browser.

Multi-type data labeling and annotation tool with standardized output format
Works on various data types (text, image, audio)
Has both an open-source option and a paid cloud service
See the online playground

5.8 Other things to try

VGG Image Annotator (VIA)
- Try a local installation?