Back to feed

Robbyant/lingbot-map

Robbyant/lingbot-map
4.1k
+288/day
360
Python

A feed-forward 3D foundation model for reconstructing scenes from streaming data

From the README

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

Robbyant Team


πŸ—ΊοΈ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! πŸ—οΈπŸŒ

LingBot-Map has focused on:

  • Geometric Context Transformer: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
  • High-Efficiency Streaming Inference: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518Γ—378 resolution over long sequences exceeding 10,000 frames.
  • State-of-the-Art Reconstruction: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.

βš™οΈ Quick Start

Installation

1. Create conda environment

conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

pip install torch==2.9.1 torchvision==0.24.1 --index-url 

For other CUDA versions, see PyTorch Get Started.

3. Install lingbot-map

pip install -e .

4. Install FlashInfer (recommended)

FlashInfer provides paged KV cache attention for efficient streaming inference:

# CUDA 12.8 + PyTorch 2.9
pip install flashinfer-python -i 

For other CUDA/PyTorch combinations, see FlashInfer installation. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via --use_sdpa.

5. Visualization dependencies (optional)

pip install -e ".[vis]"

πŸ“¦ Model Download

| Model Name | Huggingface Repository | ModelScope Repository | Description | | :--- | :--- | :--- | :--- | | lingbot-map | robbyant/lingbot-map | Robbyant/lingbot-map | Balanced and latest checkpoint β€” strong all-around performance across short and long sequences. | | lingbot-map-long | robbyant/lingbot-map | Robbyant/lingbot-map | Better suited for long sequences. | | lingbot-map-stage1 | robbyant/lingbot-map | Robbyant/lingbot-map | Stage-1 training checkpoint of lingbot-map β€” can be loaded into the VGGT model for bidirectional inference. |

🚧 Coming soon: we're training an stronger model that supports longer sequences β€” stay tuned.

🎬 Demo

Run demo.py for interactive 3D visualization via a browser-based viser viewer (default `).

Try the Example Scenes

We