
27 papers found
When Do We Not Need Larger Vision Models?
Lecture notes in computer science202411 citations
Real-world humanoid locomotion with reinforcement learning
Science Robotics2024117 citations
Large Language Models are Visual Reasoning Coordinators
arXiv (Cornell University)202314 citations
QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence2023102 citations
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
arXiv (Cornell University)202323 citations
Hierarchical Open-vocabulary Universal Image Segmentation
arXiv (Cornell University)20239 citations
Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation
arXiv (Cornell University)202314 citations
Aligning Large Multimodal Models with Factually Augmented RLHF
arXiv (Cornell University)202312 citations
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
arXiv (Cornell University)202320 citations
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
arXiv (Cornell University)202333 citations