Track-On-R: Real-World Point Tracking with Verifier-Guided Pseudo-Labeling
Track-On-R is an online point tracking model that improves real-world performance through verifier-guided pseudo-label fine-tuning. It processes videos frame-by-frame using a compact transformer memory.
This model was introduced in the paper Real-World Point Tracking with Verifier-Guided Pseudo-Labeling.
- Project Page: kuis-ai.github.io/track_on_r
- Repository: github.com/gorkaydemir/track_on
Model Description
Models for long-term point tracking are typically trained on synthetic datasets, and their performance often degrades in real-world videos. Track-On-R addresses this by introducing a verifier, a meta-model that learns to assess the reliability of tracker predictions and guide pseudo-label generation. By selecting the most trustworthy predictions from an ensemble, it enables data-efficient adaptation to unlabeled real-world videos.
Sample Usage
You can track points on a video using the Predictor class from the official repository. Ensure you have the repository cloned and dependencies installed.
Minimal Example
import torch
from model.trackon_predictor import Predictor
device = "cuda" if torch.cuda.is_available() else "cpu"
# Initialize
model = Predictor(checkpoint_path="path/to/checkpoint.pth").to(device).eval()
# Inputs
# video: (1, T, 3, H, W) in range 0-255
# queries: (1, N, 3) with rows = (t, x, y) in pixel coordinates
# or use None to enable the model's uniform grid querying
video = ... # e.g., torchvision.io.read_video -> (T, H, W, 3) -> (T, 3, H, W) -> add batch dim
queries = ... # e.g., torch.tensor([[0, 190, 190], [0, 200, 190], ...]).unsqueeze(0).to(device)
# Inference
traj, vis = model(video, queries)
# Outputs
# traj: (1, T, N, 2) -> per-point (x, y) in pixels
# vis: (1, T, N) -> per-point visibility in {0, 1}
Note: Track-On checkpoints do not include the DINOv3 backbone weights due to licensing restrictions. You must request access to the official pretrained weights for dinov3-vits16plus on Hugging Face. Once access is granted and you are logged in (huggingface-cli login), the weights will be automatically downloaded and cached locally on the first run.
Citation
@inproceedings{aydemir2026trackonr,
title = {Real-World Point Tracking with Verifier-Guided Pseudo-Labeling},
author = {Aydemir, G\"orkay and G\"uney, Fatma and Xie, Weidi},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}