VisualSync: Multi-Camera Synchronization via Cross-View Object Motion

TL;DR:VisualSync synchronizes multi-view in-the-wild videos. The synchronized outputs can benefit dynamic reconstruction, novel view synthesis, and multi-view data engines.

NBA and EFL sync results shown below. Play/pause to view synchronized keyframes.

Unsync Input - View 1

Unsync Input - View 2

Unsync Input - View 3

Sync Output - View 1

Sync Output - View 2

Sync Output - View 3

Unsync Input - View 1

Unsync Input - View 2

Unsync Input - View 3

Sync Output - View 1

Sync Output - View 2

Sync Output - View 3

Abstract

Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on four diverse, challenging datasets show that VisualSync outperforms baseline methods, achieving an median synchronization error below 50 ms.

Insight

When cameras are time-aligned, keypoint tracks align with epipolar lines; misalignment causes deviations. Minimizing these deviations across tracklets recovers the correct time offset.

Green dots show a pair of correspondences in two videos. The red line shows the epipolar line of two cameras. The deviation of the green dots from the red line tells the synchronization error.

Unsynchronized video pair (green dot not on the red line)

Synchronized video pair (green dot on the red line)

Method

We extract cross-video temporal correspondences and estimate camera poses to compute epipolar lines. We estimate pairwise video offsets by minimizing epipolar violations over matched correspondences. Later we perform global optimization to align all videos.

Synchronization Results

Example Application: Novel View Synthesis

We train K-Planes on CMU Panoptic multi-view videos for novel view synthesis. Unsynchronized inputs (1st) produce blurry results, while our synchronized outputs (2nd) are sharp and comparable to ground-truth sync (3rd). Our method enables high-quality synthesis from unsynchronized inputs.

Original Unsynchronized Video Inputs

Our Synchronized Video Input

Ground-truth Sync Video Input

BibTeX

@inproceedings{liu2025visualsync,
author    = {Liu, Shaowei and Yao, David Yifan and Gupta, Saurabh and Wang, Shenlong},
title     = {VisualSync: Multi-Camera Synchronization via Cross-View Object Motion},
journal   = {NeurIPS},
year      = {2025},
}