Active Learning

Run inference over a folder of images and upload predictions that meet your conditions back into a project for labeling review.

Workspace.active_learning() runs inference on every image in a directory and conditionally uploads the image (and its prediction) to a destination project. It's the SDK's built-in active-learning loop: bootstrap a labeling queue from raw footage by letting your existing model triage what's worth labeling.

The same pattern is best built as a Workflow for production use, but active_learning() is the fastest path from "I have a folder of frames" to "labelled data going into a project".

Basic usage

import roboflow

rf = roboflow.Roboflow(api_key="YOUR_API_KEY")
ws = rf.workspace()

ws.active_learning(
    raw_data_location="./frames",
    raw_data_extension=".jpg",
    inference_endpoint=["my-detector", 3],   # [project, version]
    upload_destination="my-detector",         # destination project
    conditionals={
        "required_class_variance_count": 1,           # at least 1 different class
        "minimum_size_requirement": 100,              # min pixels per detection
        "maximum_size_requirement": 4000000,
        "confidence_interval": [0, 60],               # only low-confidence predictions
    },
)

Parameters

  • raw_data_location (str) - directory of input images.

  • raw_data_extension (str) - image extension to match (e.g. .jpg, .png).

  • inference_endpoint (list, [project, version]) - the model to run as the triage step.

  • upload_destination (str) - project id to upload qualifying images and predictions into. Often the same project as the model.

  • conditionals (dict) - rules that determine whether an image gets uploaded. Common keys:

    • confidence_interval - [min, max]; only images whose detections fall in this range are forwarded.

    • required_class_variance_count - minimum distinct classes required.

    • minimum_size_requirement / maximum_size_requirement - filter by detection area in pixels.

    • required_class_count - total detections.

  • use_localhost (bool, default False) - when True, hit a self-hosted Roboflow Inference server instead of hosted inference.

  • local_server (str) - base URL for the local inference server. Defaults to http://localhost:9001/.

Why use it

The typical loop:

  1. Train a v1 model on a small labeled set.

  2. Point active_learning() at a folder of unlabeled production data.

  3. Forward only the images where the model is uncertain (low confidence) or sees rare classes.

  4. Label those in the Roboflow web app.

  5. Generate v2 with the new examples and retrain.

Tuning conditionals is what turns this from "upload everything" into a real triage policy.

Bigger pipelines

If your input is a video stream, your model lives behind a Workflow, or you want batching and retries, build the equivalent as a Workflow. active_learning() is best for one-off bootstrapping passes from a static folder.

Last updated

Was this helpful?