Robbyant/lingbot-map
Robbyant/lingbot-mapA feed-forward 3D foundation model for reconstructing scenes from streaming data
From the README
LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction
Robbyant Team
πΊοΈ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ποΈπ
LingBot-Map has focused on:
- Geometric Context Transformer: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
- High-Efficiency Streaming Inference: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518Γ378 resolution over long sequences exceeding 10,000 frames.
- State-of-the-Art Reconstruction: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
βοΈ Quick Start
Installation
1. Create conda environment
conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map
2. Install PyTorch (CUDA 12.8)
pip install torch==2.9.1 torchvision==0.24.1 --index-url
For other CUDA versions, see PyTorch Get Started.
3. Install lingbot-map
pip install -e .
4. Install FlashInfer (recommended)
FlashInfer provides paged KV cache attention for efficient streaming inference:
# CUDA 12.8 + PyTorch 2.9
pip install flashinfer-python -i
For other CUDA/PyTorch combinations, see FlashInfer installation. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via
--use_sdpa.
5. Visualization dependencies (optional)
pip install -e ".[vis]"
π¦ Model Download
| Model Name | Huggingface Repository | ModelScope Repository | Description | | :--- | :--- | :--- | :--- | | lingbot-map | robbyant/lingbot-map | Robbyant/lingbot-map | Balanced and latest checkpoint β strong all-around performance across short and long sequences. | | lingbot-map-long | robbyant/lingbot-map | Robbyant/lingbot-map | Better suited for long sequences. | | lingbot-map-stage1 | robbyant/lingbot-map | Robbyant/lingbot-map | Stage-1 training checkpoint of lingbot-map β can be loaded into the VGGT model for bidirectional inference. |
π§ Coming soon: we're training an stronger model that supports longer sequences β stay tuned.
π¬ Demo
Run demo.py for interactive 3D visualization via a browser-based viser viewer (default `).
Try the Example Scenes
We